About the Role
Senior Deployment Engineer (RAG / LLM Systems)
Company: BHO Tech
Location: Bay Area (Hybrid)
Compensation: $170K–$240K + bonus + equity (DOE)
About BHO Tech
BHO Tech partners with high-growth AI startups to build world-class engineering teams. Our clients move fast and value engineers who take ownership and ship.
Role
We are looking for a Senior Deployment Engineer to deploy, maintain, and scale RAG pipelines and LLM-powered applications in production. You’ll ensure models, embeddings, vector databases, and backend systems run reliably, efficiently, and securely.
What You’ll Do
• Deploy and operate RAG pipelines and LLM-based services in production
• Manage vector databases (Pinecone / Weaviate / Chroma / Milvus)
• Optimize latency, throughput, and memory footprint
• Work closely with ML + Backend teams to push new features live
• Create deployment tooling, rollout strategies, and fallback systems
• Monitor performance, debug production issues, automate reliability workflows
What We’re Looking For
• Strong experience deploying Python-based services or microservices
• Familiar with LLM pipelines, embeddings, or data retrieval flows
• Hands-on with AWS / GCP and containerization (Docker / Kubernetes)
• Solid understanding of CI/CD pipelines and environment automation
• Ability to move fast, communicate clearly, and own deployments end-to-end
Bonus
• Experience with GPU scheduling / Triton / inference optimization
• Background in SRE, DevOps, or Production Engineering
• Prior startup / 0→1 product experience
Interested?
Send your résumé to krisyoung@bhotech.co and I will forward you the full details.
⸻
Best regards,
Kris Young
Director | BHO Tech Inc.
Requirements
About the Company