AI Evaluation Engineer (Medellín)

AI Evaluation Engineer (Medellín)

28 may
|
Gramian Consulting
|
Medellín

28 may

Gramian Consulting

Medellín

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.
Role Overview
We are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems. In this role, you will work at the intersection of research, data structuring, and AI evaluation , building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data .
This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks.
Commitments Required:



8 hours per day with an overlap of 4 hours with PST.
Employment type: Contractor assignment (no medical/paid leave)
Duration of contract: 5 weeks+
Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, Vietnam
Interview: take home assessment (60min)
Responsibilities
- Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
- Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
- Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
- Design LLM judge prompts that evaluate agent output field-by-field against the oracle
- Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, the

📌 AI Evaluation Engineer (Medellín)
🏢 Gramian Consulting
📍 Medellín

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: ai evaluation engineer (medellín) / medellín
Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: ai evaluation engineer (medellín) / medellín