Postúlate en Kit Empleo: kitempleo.com.co/empleo/1an4o0
About Us:
We are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models portable, efficient, and customizable for real-world deployment.
If you want to be part of a small, fast-moving team shaping the future of applied AI systems, this is your opportunity.
Role:
We are looking for a C++ Engineer with strong systems and GPU programming background to help extend and optimize an open-source AI inference runtime. You will work on low-level internals of large language model serving, focusing on:
- Dynamic adapter integration (e.g., LoRA/QLoRA)
- Incremental model update mechanisms
- Multi-session inference caching and scheduling
- GPU performance improvements (Tensor Cores, CUDA/ROCm)
This is a hands-on role: you will be designing, coding, profiling, and iterating on high-performance inference code that runs directly on CPUs and GPUs.
Responsibilities:
- Implement support for runtime adapter loading (LoRA), enabling models to be customized on the fly without retraining or model merges.
- Design and implement mechanisms for incremental model deltas, allowing models to be extended and updated efficiently.
- Extend runtime to handle multi-session execution, with isolation and caching strategies for concurrent users.
- Optimize core math kernels and memory layouts to improve inference performance on CPU and GPU backends.
- Collaborate with backend and infrastructure engineers to integrate your work into APIs and orchestration layers.
- Write benchmarks, unit tests, and profiling tools to ensure correctness and measure performance gains.
- Contribute to system architecture discussions and help define the roadmap for future runtime features.
Requirements:
- Strong proficiency in modern C++ (C++14/17/20) and systems programming.
- Solid understanding o
Postúlate en Kit Empleo: kitempleo.com.co/empleo/1an4o0
📌 C++ Engineer Ai Runtime (Bogotá)
🏢 Baasi
📍 Bogotá