📋 Table of Contents
What Sakana Fugu is
Why it matters
What it replaces
Fugu vs Fugu Ultra
Benchmarks and source claims
Pricing
Setup notes
What beginners get wrong
Where it breaks
Quick reference
What Sakana Fugu Is
Sakana Fugu is a multi-agent orchestration system wrapped as one model API.
That is the important part.
You do not manually call five different models.
You do not build your own routing layer from scratch.
You do not hard-code “use Model A for coding, Model B for reasoning, Model C for long context.”
You send a request to Fugu.
Fugu coordinates the model pool behind the scenes.
Verified: Sakana describes Fugu as a system that dynamically orchestrates “the world’s best models” through a single API.
The clean way to think about it:
Fugu is not the worker.
Fugu is the operator deciding which workers should touch the job.
That is why this launch matters.
The model market is crowded. The routing layer is still early.
2. Why It Matters
The beginner question is:
“Which model is best?”
The better builder question is:
“Which system knows which model to use for each part of the task?”
That is the real shift.
Modern AI work is no longer one prompt, one answer.
Real workflows look like this:
Plan the task
Search or inspect context
Write code
Check edge cases
Verify output
Rewrite
Run again
Decide when the result is good enough
One model can do all of that badly.
A multi-agent setup can split the job.
Sakana’s pitch is that Fugu learns how to assemble and coordinate those agents instead of relying on hand-designed workflows.
That is the unlock.
Not more prompts.
Better delegation.
3. What It Replaces
Fugu can replace part of your manual model-selection workflow.
Before Fugu
You decide:
Which model to use
When to switch models
When to ask for verification
Which model handles code
Which model handles reasoning
Which provider fits the task
How to keep cost under control
With Fugu
You call one API.
Fugu handles model selection and switching.
Verified: Sakana says Fugu provides access to a coordinated pool of specialized models through one API and handles model selection and switching for each task.
What it does not replace:
Clear task design
Good prompts
Evaluation
Human review
Security boundaries
Cost monitoring
Compliance checks
Wrong assumption: orchestration means you can stop thinking.
No.
It means the model stack gets more automated, but your workflow still needs guardrails.
4. Fugu vs Fugu Ultra
Sakana offers two models:
Model | Best For | Tradeoff |
|---|---|---|
Fugu | Everyday coding, code review, interactive work, chatbots | Balanced quality and latency |
Fugu Ultra | Harder multi-step reasoning, research, paper reproduction, Kaggle, cybersecurity, patent analysis | Higher quality, slower response |
Verified: Sakana says Fugu balances latency and quality, while Fugu Ultra prioritizes answer quality on complex multi-step work.
Use Fugu when speed matters.
Use Fugu Ultra when getting the answer right matters more than waiting.
That is the practical split.
5. Benchmarks and Source Claims
Sakana makes aggressive benchmark claims.
Their table shows Fugu and Fugu Ultra compared against Opus 4.8, Gemini 3.1 Pro, and GPT 5.5 across tasks like:
SWE Bench Pro
TerminalBench 2.1
LiveCodeBench
LiveCodeBench Pro
Humanity’s Last Exam
GPQA-D
SciCode
Long Context Reasoning
MRCRv2
Source claim: Sakana says Fugu models surpass publicly accessible frontier models and sit close to Fable 5 and Mythos Preview across engineering, scientific, and reasoning benchmarks.
Important:
Sakana says Fable 5 and Mythos Preview are not in Fugu’s agent pool because they are not publicly accessible.
That matters because the claim is not:
“Fugu is secretly using Fable or Mythos.”
The claim is:
“Fugu’s orchestration system can compete with them without using them.”
That is a very different statement.
Example benchmark numbers from Sakana
Benchmark | Fugu | Fugu Ultra |
|---|---|---|
SWE Bench Pro | 59.0 | 73.7 |
TerminalBench 2.1 | 80.2 | 82.1 |
LiveCodeBench | 92.9 | 93.2 |
LiveCodeBench Pro | 87.8 | 90.8 |
Humanity’s Last Exam | 47.2 | 50.0 |
GPQA-D | 95.5 | 95.5 |
Source claim: These numbers are from Sakana’s own benchmark table, not independent testing.
Use the numbers carefully.
Strong hook.
Not final proof.
6. Pricing
Sakana offers subscription and pay-as-you-go pricing.
Subscription plans
Plan | Price | Usage Positioning |
|---|---|---|
Standard | $20/month | Lightweight daily usage |
Pro | $100/month | 10x Standard usage |
Max | $200/month | 20x Standard usage |
Verified: Every subscription tier includes both Fugu and Fugu Ultra.
Fugu Ultra token pricing
For fugu-ultra-20260615:
Token Type | Price per 1M tokens |
|---|---|
Input | $5 |
Output | $30 |
Cached input | $0.50 |
For context above 272K tokens:
Token Type | Price per 1M tokens |
|---|---|
Input | $10 |
Output | $45 |
Cached input | $1.00 |
Verified: These prices are listed on Sakana’s Fugu pricing page.
Pricing twist
Sakana says Fugu does not stack model fees when multiple agents are active.
Instead, you pay one rate based on the top-tier model involved.
That is important.
Multi-agent systems can get expensive fast when every agent call adds another provider bill.
Fugu’s pitch is that the pricing does not multiply just because more agents are active.
Still, watch your token usage.
More orchestration can mean more tokens.
More tokens still means more cost.
7. Setup Notes
Sakana says Fugu is available through an OpenAI-compatible API.
That means you should be able to point an existing client or coding harness at the Fugu endpoint using an API key, without an SDK migration.
Verified setup concept
Use existing OpenAI-compatible client
Set base URL to Sakana Fugu endpoint
Use Sakana API key
Choose model: Fugu or Fugu Ultra
Send requests through the same client patternNot verified
Exact base URL
Exact API key environment variable name
Exact model string for regular Fugu
Full curl command
SDK examples
Sakana’s public page confirms OpenAI-compatible access, but the exact command template was not visible in the accessible page.
Do not invent it.
8. What Beginners Get Wrong
Wrong assumption: this is just another model
No.
This is closer to a model manager.
Fugu coordinates multiple agents. The orchestration is the product.
Wrong assumption: benchmarks mean production-ready
No.
Benchmarks are controlled tests. Production workloads involve messy inputs, latency limits, budget limits, compliance rules, and failure recovery.
Wrong assumption: Fugu Ultra is always the default
No.
Fugu Ultra is for harder work where quality matters more than speed.
For normal interactive work, Fugu is probably the cleaner first test.
Wrong assumption: you can audit every model choice
No.
Sakana says the specific models selected and coordination method are proprietary and not exposed.
That can be a dealbreaker for teams that need routing transparency.
Wrong assumption: EU users can use it now
No.
Sakana says Fugu is not available in the EU/EEA right now while it works toward GDPR and EU-specific compliance.
For a Berlin-based user, this matters immediately.
9. Where It Breaks
1. Transparency
You cannot see the exact underlying model route.
That makes debugging harder.
If the answer fails, you may not know whether the issue came from planning, execution, verification, or a specific model in the pool.
2. Compliance
Fugu lets users opt out of specific models for the regular Fugu model, but Fugu Ultra uses the full fixed agent pool.
If your company has provider restrictions, check this before building around Ultra.
3. Latency
More agents usually means more coordination.
More coordination usually means slower responses.
Sakana directly says Fugu Ultra prioritizes answer quality at the cost of response time.
4. Cost surprises
The pricing looks clean, but long-running agentic jobs can burn tokens quickly.
Especially with:
Code review
Research
Patent analysis
Literature review
Security assessment
Large context tasks
5. Availability
EU/EEA access is blocked right now.
That alone makes it unusable for a chunk of builders until compliance changes.
10. Recommended Workflow
Use Fugu like an evaluation layer, not blind infrastructure.
Step 1: Start with low-risk tasks
Good first tests:
Code review on non-sensitive repos
Drafting technical analysis
Comparing papers
Finding edge cases
Reviewing architecture proposals
Turning messy notes into structured plans
Avoid first tests like:
Production security decisions
Legal analysis
Financial trading decisions
Sensitive customer data
Unreviewed code execution
Step 2: Compare against your current model stack
Run the same task through:
Your current best model
Fugu
Fugu Ultra
Score them on:
Accuracy
Completeness
Latency
Cost
Failure mode
Formatting
Usefulness of reasoning
Human edits required
Step 3: Use Fugu Ultra only where it earns the wait
Do not waste Ultra on simple tasks.
Use it for:
Multi-file code reasoning
Research synthesis
Paper reproduction
Complex debugging
Security reports
Patent landscape analysis
Long-horizon planning
Step 4: Keep human review in the loop
Fugu is still AI output.
Treat it like a powerful analyst, not a final authority.
By The AI Leverage - Learn and master AI daily

