19 abr
|
Framework Ventures
|
Bogotá
19 abr
Framework Ventures
Bogotá
Postúlate en Kit Empleo: kitempleo.com.co/empleo/18dml6
About the role You will own the inference backbone behind QVAC's local AI stack: the C++ systems layer that makes models run fast, reliably, and predictably on real user hardware. The role is centered on engineering quality at runtime level, including startup behavior, memory pressure, throughput/latency balance, and long‑session stability. You will define and evolve the core abstractions that inference features depend on, so new capabilities can be added without sacrificing performance or maintainability. This is a role for someone who enjoys low‑level problem solving, clear technical ownership, and building infrastructure that other teams trust in production. Your work directly enables private, on‑device AI experiences and helps set the technical foundation for QVAC's next generation of peer‑to‑peer AI products. About the job You’ll work on the C++ layer that powers local AI, porting and enhancing inference engines like llama.cpp, ONNX and others, to run efficiently on edge devices. Your focus is on the runtime: making models load faster, run leaner,
and perform well across different hardware. You’ll ensure that the inference layer is stable, optimised, and ready for integration with the rest of the stack. This role is for engineers who want to work close to the metal, enabling private and fast on‑device AI without relying on cloud infrastructure. Responsibilities Work on deploying machine learning models to edge devices using the frameworks: llama.cpp, ggml, onnx Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments Integrate AI features into existing products, enriching them with the latest advancements in machine learning Qualifications Excellent programming skills in C++; experience in JavaScript is a bonus Strong experience with Llama.cpp and ggml inference engines, facilitating the deployment of models to specific GPU architectures Good understanding
Postúlate en Kit Empleo: kitempleo.com.co/empleo/18dml6
📌 AI Inference Engineer QVAC 100% remote Worldwide (Bogotá)
🏢 Framework Ventures
📍 Bogotá