Postúlate en Kit Empleo: kitempleo.com.co/empleo/1ami8b
Why this role
Forge’s core promise is safe sandboxed agents + real‑time cost governance + auditable execution. You’ll build the production foundations that make that promise true: secure epoxy runtimes, token→USD metering, end‑to‑end observability, and safe deploy/rollback.
What you’ll own
- Ephemeral execution environments for agents/tools (containers / Firecracker / WASM), with CPU/mem/disk/network quotas, secrets brokering, and isolation hardening
- Cost governance infrastructure: accurate token→USD accounting, per‑tenant budgets, anomaly detection, enforcement hooks (throttle/downshift/queue)
- Release engineering: CI/CD, canaries/blue‑green, rollbacks, feature flags, backups/DR, incident response
Success metrics
- p95 latency and $/task visible for core workflows
- Less than 1% budget overruns,
with automated detection and enforcement
- SLOs + alerts in place; MTTR improving with each incident
Requirements (must‑have)
- 3–8+ years in platform/SRE/systems engineering; you’ve owned production services
- Strong with IaC (Terraform) and cloud networking/security fundamentals (IAM, secrets, TLS)
- Comfortable with container orchestration (Kubernetes/EKS or equivalent)
- Proven experience implementing observability (OTel + metrics/logs/traces) and on‑call/incident practices
Nice to have
- Experience with FinOps in usage‑metered systems (LLMs, APIs, multi‑tenant platforms)
Language Skills
- Strong conversational English for team interactions. (C1/C2)
- Professional proficiency in English for documentation, code reviews, and cross‑functional collaboration.
Soft Skills
- Curiosity and eagerness to learn across the tech stack.
- Strong problem‑solving and debugging skills.
- Ability to work collaboratively
Postúlate en Kit Empleo: kitempleo.com.co/empleo/1ami8b