Design partner program open

Your task deserves its own model.

EkaFinch · The AI engineer for specialist agents

One model can’t do every job. EkaFinch trains the one that does yours.

Describe your task in plain English. EkaFinch generates the data, trains a small specialist model, and ships it as a deployable agent — automatically.

Talk to us See how it works

Named after Darwin’s finches — each shaped to its niche.

eka-finch · engineer

$eka-finch engineer create
"Resolve refund and return tickets" \
--autopilot --budget runs=4

[1/6]Compiling task bundle✓

[2/6]Generating synthetic data✓

[3/6]Training specialist model✓

[4/6]Evaluating against baseline✓

[5/6]Comparing candidates✓

[6/6]Promoting champion✓

Promoted c03 · 22% → 68% on frozen eval

Why we built this

Generic LLMs are remarkable at being passable at everything. They’re rarely the best tool for any one job — too expensive at scale, too slow in production, too generic to be reliable. Building a specialist used to mean an ML team and a quarter of work. EkaFinch makes it a single command.

01 · Cost

Generic models are expensive at scale.

Every call routes to a giant frontier model whether the task needed it or not. Costs compound across millions of low-stakes interactions.

02 · Latency

Big models are slow in production.

Latency budgets break the moment users feel the wait. Voice agents, real-time tools, and embedded surfaces need responses in milliseconds, not seconds.

03 · Reliability

One-size-fits-all isn’t reliable.

Prompt engineering hits a ceiling. Edge cases pile up, hand-written rules sprawl, and nobody can prove the agent is actually getting better.

Today · Frontier model

$4,000per month · 1,400 ms p95

GPT-5 · 1M conversations
~800 in + 200 out per call

After · EkaFinch specialist

$150per month · 220 ms p95

Qwen3-1.7B on Modal A10G
promoted agent_pack c03

~27× cheaper · 6× faster

Illustrative. Real savings vary with traffic pattern, backbone choice, and infra utilization.

How it works

From task description to specialist agent in hours.

Four simple stages. One promoted agent_pack at the end. The same flow works for support triage, intake, voice, or code — any narrow task with a verifier.

01 / Describe

A plain-English task brief.

No ML expertise required. Tell EkaFinch what the agent should do, and optionally hand over tools, an environment, or example scenarios from your own logs.

Plain text, JSON, or your existing job folder
Optional tools (OpenAPI, MCP) and environment profile
Optional eval scenarios from your own production traffic

02 / Generate

High-quality synthetic data, on tap.

EkaFinch uses a strong teacher model to produce labelled data shaped to the task. Quality gates and de-duplication run before any training starts — nothing weak makes it into the run.

Teacher-driven generation with quality guardrails
Frozen test set for honest, reproducible evaluation
Targeted regeneration where the model struggles most

03 / Train

A small student that learns the job.

Choose your size and your budget. EkaFinch fine-tunes a small student to match the teacher on your task — cheaper to run, faster to respond, easier to audit.

Qwen, Llama, or Phi backbones — you choose
Modern post-training stack: SFT, OPD (On Policy Distillation), and GRPO
Reproducible runs with full hash-pinned lineage

04 / Ship

A deployable agent_pack.

The output is a single typed bundle: model, tools, environment contract, eval suite, traces, and the evidence that justified promotion. Drop it into your runtime and ship.

One typed manifest, fully reviewable
Hash-pinned model weights and tool schemas
Full lineage from prompt to promotion

Task brief

“Resolve refund and return tickets in our retail support inbox.”

tools.json environment.yaml scenarios.jsonl

10 verifier checks Compile

Synthesizing 128 / 128

001 refund_late_shoes — policy_window_check

002 exchange_wrong_size — create_exchange_order

003 duplicate_charge — inspect_payment_events

004 return_damaged — issue_refund_with_reason

005 order_lookup_missing — ask_for_order_id

Task success

68%+46

Frozen eval · n=128

Base · 22% Student c03 · 68%

agent_pack/ Promoted

agent_pack/├─ model/                       student-c03 · 1.7B├─ tools.json                  7 tools, frozen├─ environment_profile.yaml├─ verifier.yaml├─ eval_suite.jsonl            128 scenarios├─ traces/                      1.3k examples├─ comparison_report.json└─ lineage.json                hash-pinned

Alternatives

Not your only option — just your best one for narrow, high-volume tasks.

Here’s how EkaFinch sits next to the paths most teams already consider. We’re honest about where others still win.

Approach	Cost at scale	Latency	Control	Reviewable	Time to ship
Prompt it harder	·	·	·	·	✓
RAG + prompts	·	·	~	~	✓
Frontier fine-tune API	~	~	~	·	✓
In-house ML team	✓	✓	✓	~	·
EkaFinch	✓	✓	✓	✓	✓

Best fit: narrow tasks with automatable verifiers — support triage, intake, structured generation, tool-use loops. For open-ended creative writing or general chat, a frontier model is still the right call.

The engineer

An AI engineer that runs the whole loop.

Once you describe the task, the EkaFinch engineer takes over. It compiles the brief, generates training data, trains a specialist student, evaluates against a frozen eval, compares candidates, and promotes a verified champion. Review every step, or hand it the keys.

01 Compile Bundle the task
02 Generate Synthetic data
03 Train Specialist student
04 Evaluate Frozen eval
05 Compare Candidates
06 Promote Champion ready

Review mode

Approve every step.

Each action stops with a written rationale and a proposed change. You approve or revise before the next experiment runs — ideal for high-stakes domains.

Autopilot mode

Hand it a budget. Walk away.

EkaFinch runs experiments until a champion meets your gates — or budgets cleanly stop the program and write a report you can act on.

What you ship

An agent you can review, audit, and deploy.

Not a black-box endpoint. A reviewable bundle with the model, the tool contracts, the eval suite, and the evidence that justifies promotion — all hash-pinned.

agent_pack.yaml Champion c03

name: "refund-and-return-specialist"
version: "2026.04.27-c03"
model:
  · base: qwen3-1.7b
  · hash: 0x9af2…
  · latency_p95_ms: 220
tools:
  · count: 7
  · contract: tools.json
verifier: verifier.yaml
eval_suite:
  · scenarios: 128
  · task_success: 68%
lineage: lineage.json

Model: The trained student, with weights, base lineage, and inference latency captured at promotion time.
Tools: Typed tool contracts in a single manifest. Frozen at promotion so behavior cannot drift after deploy.
Verifier: The exact checks the agent must pass in production — the same ones that gated promotion.
Evidence: Traces, scorecards, and a base-to-champion comparison report. Everything you need for a credible review.

Use it

From your terminal, your code, or your service.

Same engineer, three surfaces. Run it locally for prototyping, embed it in your stack, or call the hosted service when you’re ready to scale.

CLI

$ eka-finch engineer create \
    "Resolve refund tickets" \
    --autopilot \
    --budget runs=4

The same Typer CLI everyone on your team already runs. Single command from prompt to champion.

Python SDK

from eka_finch import client

program = client.engineer.create(
    "Resolve refund tickets",
    autopilot=True,
)
program.wait()
program.champion.deploy()

Embed inside your existing pipelines, notebooks, and tests. Async client also available.

Service API

POST /v1/engineer/programs
Authorization: Bearer $KEY

{
  "task": "Resolve refunds",
  "mode": "autopilot",
  "budget": { "runs": 4 }
}

Run on hosted EkaFinch with auth, quotas, artifact storage, and run lineage built in.

Trust & data

Your environment. Prompts, traces, and weights stay where you put them — local, your cloud, or a dedicated tenant.
Your data. EkaFinch never trains on your private data without explicit consent.
Your promotion bar. Every artifact is hash-pinned at promotion — what you review is exactly what ships.
Built in the EU. Eka Labs is a Düsseldorf GmbH; EkaFinch runs in your environment under GDPR-aligned defaults — your data does not leave your infrastructure unless you choose otherwise.

FAQ

Answers to what you’re probably thinking.

If your question isn’t here, the fastest path is the Talk-to-us link below.

How is the synthetic data actually any good?

A teacher model generates candidate examples. A verifier filters anything that doesn’t meet the task spec, and training uses a frozen test set the student never sees.

The same verifier that gates promotion is the one you run in production — so the bar the student cleared is the bar you’re shipping.

What happens to my data?

EkaFinch runs in your environment — local, your cloud account, or hosted EkaFinch in a dedicated tenant. Prompts, traces, weights, and evals never leave your infrastructure.

Nothing is used to improve EkaFinch itself without explicit consent.

Won’t a small model be worse than GPT-5?

On a narrow, well-defined task with focused data, usually no. You typically trade ~1–2 points of task quality for 10–30× cost and 5–10× latency wins.

The comparison report makes the trade-off explicit so you decide per deployment — not on vibes.

How do I deploy the shipped agent?

The agent_pack is a typed bundle: model weights, tool contracts, environment profile, eval suite, traces. Drop it into any inference stack — vLLM, TGI, Modal, Bedrock, or your own runtime — and call it.

The same pack powers the CLI, Python SDK, and hosted service. No forks.

What if my task changes?

Re-run the engineer. New lineage, new candidates, new comparison report. Promotion only fires if the new student beats the current champion on your frozen eval — so the program always ships forward.

How much does it cost?

Usage-based: compute for data generation, training, and evaluation. A typical first-champion run for a narrow task lands under $100 in compute.

Team plans are priced per program — talk to us for details.

Build a specialist

Have a task that deserves its own model?

Bring the brief, the boundary, and a few example scenarios. EkaFinch returns a verified specialist with the evidence that justifies it.

Talk to us