01 · Cost
Generic models are expensive at scale.
Every call routes to a giant frontier model whether the task needed it or not. Costs compound across millions of low-stakes interactions.
Design partner program open
Your task deserves its own model.
EkaFinch · The AI engineer for specialist agents
Describe your task in plain English. EkaFinch generates the data, trains a small specialist model, and ships it as a deployable agent — automatically.
Named after Darwin’s finches — each shaped to its niche.
$eka-finch engineer create
"Resolve refund and return tickets" \
--autopilot --budget runs=4
[1/6]Compiling task bundle✓
[2/6]Generating synthetic data✓
[3/6]Training specialist model✓
[4/6]Evaluating against baseline✓
[5/6]Comparing candidates✓
[6/6]Promoting champion✓
Why we built this
Generic LLMs are remarkable at being passable at everything. They’re rarely the best tool for any one job — too expensive at scale, too slow in production, too generic to be reliable. Building a specialist used to mean an ML team and a quarter of work. EkaFinch makes it a single command.
01 · Cost
Every call routes to a giant frontier model whether the task needed it or not. Costs compound across millions of low-stakes interactions.
02 · Latency
Latency budgets break the moment users feel the wait. Voice agents, real-time tools, and embedded surfaces need responses in milliseconds, not seconds.
03 · Reliability
Prompt engineering hits a ceiling. Edge cases pile up, hand-written rules sprawl, and nobody can prove the agent is actually getting better.
$4,000per month · 1,400 ms p95
$150per month · 220 ms p95
~27× cheaper · 6× faster
Illustrative. Real savings vary with traffic pattern, backbone choice, and infra utilization.
How it works
Four simple stages. One promoted agent_pack at the end. The same flow works for support triage, intake, voice, or code — any narrow task with a verifier.
01 / Describe
No ML expertise required. Tell EkaFinch what the agent should do, and optionally hand over tools, an environment, or example scenarios from your own logs.
02 / Generate
EkaFinch uses a strong teacher model to produce labelled data shaped to the task. Quality gates and de-duplication run before any training starts — nothing weak makes it into the run.
03 / Train
Choose your size and your budget. EkaFinch fine-tunes a small student to match the teacher on your task — cheaper to run, faster to respond, easier to audit.
04 / Ship
The output is a single typed bundle: model, tools, environment contract, eval suite, traces, and the evidence that justified promotion. Drop it into your runtime and ship.
Alternatives
Here’s how EkaFinch sits next to the paths most teams already consider. We’re honest about where others still win.
| Approach | Cost at scale | Latency | Control | Reviewable | Time to ship |
|---|---|---|---|---|---|
| Prompt it harder | · | · | · | · | ✓ |
| RAG + prompts | · | · | ~ | ~ | ✓ |
| Frontier fine-tune API | ~ | ~ | ~ | · | ✓ |
| In-house ML team | ✓ | ✓ | ✓ | ~ | · |
| EkaFinch | ✓ | ✓ | ✓ | ✓ | ✓ |
Best fit: narrow tasks with automatable verifiers — support triage, intake, structured generation, tool-use loops. For open-ended creative writing or general chat, a frontier model is still the right call.
The engineer
Once you describe the task, the EkaFinch engineer takes over. It compiles the brief, generates training data, trains a specialist student, evaluates against a frozen eval, compares candidates, and promotes a verified champion. Review every step, or hand it the keys.
Review mode
Each action stops with a written rationale and a proposed change. You approve or revise before the next experiment runs — ideal for high-stakes domains.
Autopilot mode
EkaFinch runs experiments until a champion meets your gates — or budgets cleanly stop the program and write a report you can act on.
What you ship
Not a black-box endpoint. A reviewable bundle with the model, the tool contracts, the eval suite, and the evidence that justifies promotion — all hash-pinned.
name: version: model: · base: qwen3-1.7b · hash: 0x9af2… · latency_p95_ms: 220 tools: · count: 7 · contract: tools.json verifier: verifier.yaml eval_suite: · scenarios: 128 · task_success: lineage: lineage.json
Use it
Same engineer, three surfaces. Run it locally for prototyping, embed it in your stack, or call the hosted service when you’re ready to scale.
CLI
$ eka-finch engineer create \
"Resolve refund tickets" \
--autopilot \
--budget runs=4
The same Typer CLI everyone on your team already runs. Single command from prompt to champion.
Python SDK
from eka_finch import client program = client.engineer.create( "Resolve refund tickets", autopilot=True, ) program.wait() program.champion.deploy()
Embed inside your existing pipelines, notebooks, and tests. Async client also available.
Service API
POST /v1/engineer/programs Authorization: Bearer $KEY { "task": "Resolve refunds", "mode": "autopilot", "budget": { "runs": 4 } }
Run on hosted EkaFinch with auth, quotas, artifact storage, and run lineage built in.
Trust & data
FAQ
If your question isn’t here, the fastest path is the Talk-to-us link below.
A teacher model generates candidate examples. A verifier filters anything that doesn’t meet the task spec, and training uses a frozen test set the student never sees.
The same verifier that gates promotion is the one you run in production — so the bar the student cleared is the bar you’re shipping.
EkaFinch runs in your environment — local, your cloud account, or hosted EkaFinch in a dedicated tenant. Prompts, traces, weights, and evals never leave your infrastructure.
Nothing is used to improve EkaFinch itself without explicit consent.
On a narrow, well-defined task with focused data, usually no. You typically trade ~1–2 points of task quality for 10–30× cost and 5–10× latency wins.
The comparison report makes the trade-off explicit so you decide per deployment — not on vibes.
The agent_pack is a typed bundle: model weights, tool contracts, environment profile, eval suite, traces. Drop it into any inference stack — vLLM, TGI, Modal, Bedrock, or your own runtime — and call it.
The same pack powers the CLI, Python SDK, and hosted service. No forks.
Re-run the engineer. New lineage, new candidates, new comparison report. Promotion only fires if the new student beats the current champion on your frozen eval — so the program always ships forward.
Usage-based: compute for data generation, training, and evaluation. A typical first-champion run for a narrow task lands under $100 in compute.
Team plans are priced per program — talk to us for details.
Build a specialist
Bring the brief, the boundary, and a few example scenarios. EkaFinch returns a verified specialist with the evidence that justifies it.