vLLM Jobs - April 2026

Sort by

Search

Location

Has Compensation Summary

Has Contact Info

Remote

Onsite

Tech Stack

Title

Reset

Freight Brokerage

Posted 2 weeks, 4 days ago

We're a freight brokerage in NJ building a self-hosted, AI-first platform to replace our legacy CRM/TMS. The AI compute infrastructure and data foundation are already in place. We need the architect who turns it into a production operating system. T…

Roles
Founding Backend Architect (Python/PostgreSQL)

Tech Stack
email processing Postgres document ingestion FastAPI Python Linux event-driven pipelines vLLM REST APIs AI Microsoft Graph API CRM migration LLM inference Neo4j Docker

Locations

NJ, Remote (US, EST hours)
Nuance Labs

Posted 1 month, 3 weeks ago

Nuance Labs is building a human foundation model that understands and expresses human emotion in real-time across speech, facial expression, and body language. Small, fast-moving founding team (MIT, UW, Oxford) valuing in-person collaboration at Sea…

Roles
Systems Engineer (Real-Time Engine) Machine Learning Infra Engineer Machine Learning Research Engineer

Tech Stack
Asyncio Dagster Kubernetes Ray Python GPU serving Rust TensorRT vLLM

Locations

Seattle, WA
NVIDIA

Posted 5 months, 3 weeks ago

NVIDIA | vLLM + SGLang | Deep Learning Inference | Remote (North America preferred) Hi everyone — I’m Akbar, Senior Manager of Deep Learning Inference Software at NVIDIA. I lead our engineering efforts around vLLM and SGLang, two of the most widely …

Roles
Engineering Manager DL Performance Software Engineer - LLM Inference Deep Learning Inference Inference Senior Deep Learning Software Engineer

Tech Stack
compiler/runtime kernel fusion runtime optimizations vLLM scheduling optimizations SGLang Blackwell continuous integration GPUs LLM inference distributed serving Hopper

Locations

Remote (North America preferred), Santa Clara, CA
iGent AI

Posted 5 months, 3 weeks ago

Building coding agent systems and an agentic cloud. Small senior team (ex-DeepMind, OpenAI, Microsoft Research, Amazon, Cambridge University; multiple PhDs). Work includes distributed systems, OS/sandboxing, ML and LLM inference/post-training, long-…

Roles
Full-Stack Backend / Eng Lead LLM inference & post-training Agent Infrastructure DevRel & GTM

Tech Stack
ML OS sandboxing OSS/community tooling long context performance optimization vLLM backend full-stack LLM inference orchestration observability post-training distributed systems filesystems cloud sandboxes RL/RLVR

Locations

London, UK
Edgeless Systems

Posted 9 months, 3 weeks ago

We're building an AI inference service leveraging confidential computing to ensure that prompts remain encrypted end-to-end. Our core engineering stack includes Go, Kubernetes, gRPC, and vLLM, with some web development using NextJS and Svelte. Most …

Roles
Software Engineer

Tech Stack
Kubernetes Go vLLM Svelte Next.js gRPC

Locations

Germany, Remote
Edgeless Systems

Posted 11 months, 3 weeks ago

We are a team of ~20 people, building cutting-edge open-source tools for confidential computing and a 'confidential GenAI' service on top of those. Our products span unusually far across the tech stack, starting at measured boot, through Kubernetes …

Roles
Software Engineer

Tech Stack
NixOS AMD SEV-SNP Kubernetes Rust Intel TDX Nix Go vLLM Terraform JavaScript

Locations

REMOTE (EU), Berlin, Bochum