Sakana Fugu: The Model Router Hiding Inside One API

📋 Table of Contents

What Sakana Fugu is
Why it matters
What it replaces
Fugu vs Fugu Ultra
Benchmarks and source claims
Pricing
Setup notes
What beginners get wrong
Where it breaks
Quick reference

What Sakana Fugu Is

Sakana Fugu is a multi-agent orchestration system wrapped as one model API.

That is the important part.

You do not manually call five different models.
You do not build your own routing layer from scratch.
You do not hard-code “use Model A for coding, Model B for reasoning, Model C for long context.”

You send a request to Fugu.

Fugu coordinates the model pool behind the scenes.

Verified: Sakana describes Fugu as a system that dynamically orchestrates “the world’s best models” through a single API.

The clean way to think about it:

❝

Fugu is not the worker.
Fugu is the operator deciding which workers should touch the job.

That is why this launch matters.

The model market is crowded. The routing layer is still early.

2. Why It Matters

The beginner question is:

❝

“Which model is best?”

The better builder question is:

❝

“Which system knows which model to use for each part of the task?”

That is the real shift.

Modern AI work is no longer one prompt, one answer.

Real workflows look like this:

Plan the task
Search or inspect context
Write code
Check edge cases
Verify output
Rewrite
Run again
Decide when the result is good enough

One model can do all of that badly.

A multi-agent setup can split the job.

Sakana’s pitch is that Fugu learns how to assemble and coordinate those agents instead of relying on hand-designed workflows.

That is the unlock.

Not more prompts.

Better delegation.

3. What It Replaces

Fugu can replace part of your manual model-selection workflow.

Before Fugu

You decide:

Which model to use
When to switch models
When to ask for verification
Which model handles code
Which model handles reasoning
Which provider fits the task
How to keep cost under control

With Fugu

You call one API.

Fugu handles model selection and switching.

Verified: Sakana says Fugu provides access to a coordinated pool of specialized models through one API and handles model selection and switching for each task.

What it does not replace:

Clear task design
Good prompts
Evaluation
Human review
Security boundaries
Cost monitoring
Compliance checks

Wrong assumption: orchestration means you can stop thinking.

No.

It means the model stack gets more automated, but your workflow still needs guardrails.

4. Fugu vs Fugu Ultra

Sakana offers two models:

Model	Best For	Tradeoff
Fugu	Everyday coding, code review, interactive work, chatbots	Balanced quality and latency
Fugu Ultra	Harder multi-step reasoning, research, paper reproduction, Kaggle, cybersecurity, patent analysis	Higher quality, slower response

Verified: Sakana says Fugu balances latency and quality, while Fugu Ultra prioritizes answer quality on complex multi-step work.

Use Fugu when speed matters.

Use Fugu Ultra when getting the answer right matters more than waiting.

That is the practical split.

5. Benchmarks and Source Claims

Sakana makes aggressive benchmark claims.

Their table shows Fugu and Fugu Ultra compared against Opus 4.8, Gemini 3.1 Pro, and GPT 5.5 across tasks like:

SWE Bench Pro
TerminalBench 2.1
LiveCodeBench
LiveCodeBench Pro
Humanity’s Last Exam
GPQA-D
SciCode
Long Context Reasoning
MRCRv2

Source claim: Sakana says Fugu models surpass publicly accessible frontier models and sit close to Fable 5 and Mythos Preview across engineering, scientific, and reasoning benchmarks.

Important:

Sakana says Fable 5 and Mythos Preview are not in Fugu’s agent pool because they are not publicly accessible.

That matters because the claim is not:

❝

“Fugu is secretly using Fable or Mythos.”

The claim is:

❝

“Fugu’s orchestration system can compete with them without using them.”

That is a very different statement.

Example benchmark numbers from Sakana

Benchmark	Fugu	Fugu Ultra
SWE Bench Pro	59.0	73.7
TerminalBench 2.1	80.2	82.1
LiveCodeBench	92.9	93.2
LiveCodeBench Pro	87.8	90.8
Humanity’s Last Exam	47.2	50.0
GPQA-D	95.5	95.5

Source claim: These numbers are from Sakana’s own benchmark table, not independent testing.

Use the numbers carefully.

Strong hook.

Not final proof.

6. Pricing

Sakana offers subscription and pay-as-you-go pricing.

Subscription plans

Plan	Price	Usage Positioning
Standard	$20/month	Lightweight daily usage
Pro	$100/month	10x Standard usage
Max	$200/month	20x Standard usage

Verified: Every subscription tier includes both Fugu and Fugu Ultra.

Fugu Ultra token pricing

For fugu-ultra-20260615:

Token Type	Price per 1M tokens
Input	$5
Output	$30
Cached input	$0.50

For context above 272K tokens:

Token Type	Price per 1M tokens
Input	$10
Output	$45
Cached input	$1.00

Verified: These prices are listed on Sakana’s Fugu pricing page.

Pricing twist

Sakana says Fugu does not stack model fees when multiple agents are active.

Instead, you pay one rate based on the top-tier model involved.

That is important.

Multi-agent systems can get expensive fast when every agent call adds another provider bill.

Fugu’s pitch is that the pricing does not multiply just because more agents are active.

Still, watch your token usage.

More orchestration can mean more tokens.

More tokens still means more cost.

7. Setup Notes

Sakana says Fugu is available through an OpenAI-compatible API.

That means you should be able to point an existing client or coding harness at the Fugu endpoint using an API key, without an SDK migration.

Verified setup concept

Use existing OpenAI-compatible client
Set base URL to Sakana Fugu endpoint
Use Sakana API key
Choose model: Fugu or Fugu Ultra
Send requests through the same client pattern

Not verified

Exact base URL
Exact API key environment variable name
Exact model string for regular Fugu
Full curl command
SDK examples

Sakana’s public page confirms OpenAI-compatible access, but the exact command template was not visible in the accessible page.

Do not invent it.

8. What Beginners Get Wrong

Wrong assumption: this is just another model

No.

This is closer to a model manager.

Fugu coordinates multiple agents. The orchestration is the product.

Wrong assumption: benchmarks mean production-ready

No.

Benchmarks are controlled tests. Production workloads involve messy inputs, latency limits, budget limits, compliance rules, and failure recovery.

Wrong assumption: Fugu Ultra is always the default

No.

Fugu Ultra is for harder work where quality matters more than speed.

For normal interactive work, Fugu is probably the cleaner first test.

Wrong assumption: you can audit every model choice

No.

Sakana says the specific models selected and coordination method are proprietary and not exposed.

That can be a dealbreaker for teams that need routing transparency.

Wrong assumption: EU users can use it now

No.

Sakana says Fugu is not available in the EU/EEA right now while it works toward GDPR and EU-specific compliance.

For a Berlin-based user, this matters immediately.

9. Where It Breaks

1. Transparency

You cannot see the exact underlying model route.

That makes debugging harder.

If the answer fails, you may not know whether the issue came from planning, execution, verification, or a specific model in the pool.

2. Compliance

Fugu lets users opt out of specific models for the regular Fugu model, but Fugu Ultra uses the full fixed agent pool.

If your company has provider restrictions, check this before building around Ultra.

3. Latency

More agents usually means more coordination.

More coordination usually means slower responses.

Sakana directly says Fugu Ultra prioritizes answer quality at the cost of response time.

4. Cost surprises

The pricing looks clean, but long-running agentic jobs can burn tokens quickly.

Especially with:

Code review
Research
Patent analysis
Literature review
Security assessment
Large context tasks

5. Availability

EU/EEA access is blocked right now.

That alone makes it unusable for a chunk of builders until compliance changes.

10. Recommended Workflow

Use Fugu like an evaluation layer, not blind infrastructure.

Step 1: Start with low-risk tasks

Good first tests:

Code review on non-sensitive repos
Drafting technical analysis
Comparing papers
Finding edge cases
Reviewing architecture proposals
Turning messy notes into structured plans

Avoid first tests like:

Production security decisions
Legal analysis
Financial trading decisions
Sensitive customer data
Unreviewed code execution

Step 2: Compare against your current model stack

Run the same task through:

Your current best model
Fugu
Fugu Ultra

Score them on:

Accuracy
Completeness
Latency
Cost
Failure mode
Formatting
Usefulness of reasoning
Human edits required

Step 3: Use Fugu Ultra only where it earns the wait

Do not waste Ultra on simple tasks.

Use it for:

Multi-file code reasoning
Research synthesis
Paper reproduction
Complex debugging
Security reports
Patent landscape analysis
Long-horizon planning

Step 4: Keep human review in the loop

Fugu is still AI output.

Treat it like a powerful analyst, not a final authority.

By The AI Leverage - Learn and master AI daily