📋 Table of Contents

  1. What Sakana Fugu is

  2. Why it matters

  3. What it replaces

  4. Fugu vs Fugu Ultra

  5. Benchmarks and source claims

  6. Pricing

  7. Setup notes

  8. What beginners get wrong

  9. Where it breaks

  10. Quick reference

What Sakana Fugu Is

Sakana Fugu is a multi-agent orchestration system wrapped as one model API.

That is the important part.

You do not manually call five different models.
You do not build your own routing layer from scratch.
You do not hard-code “use Model A for coding, Model B for reasoning, Model C for long context.”

You send a request to Fugu.

Fugu coordinates the model pool behind the scenes.

Verified: Sakana describes Fugu as a system that dynamically orchestrates “the world’s best models” through a single API.

The clean way to think about it:

Fugu is not the worker.
Fugu is the operator deciding which workers should touch the job.

That is why this launch matters.

The model market is crowded. The routing layer is still early.

2. Why It Matters

The beginner question is:

“Which model is best?”

The better builder question is:

“Which system knows which model to use for each part of the task?”

That is the real shift.

Modern AI work is no longer one prompt, one answer.

Real workflows look like this:

  • Plan the task

  • Search or inspect context

  • Write code

  • Check edge cases

  • Verify output

  • Rewrite

  • Run again

  • Decide when the result is good enough

One model can do all of that badly.

A multi-agent setup can split the job.

Sakana’s pitch is that Fugu learns how to assemble and coordinate those agents instead of relying on hand-designed workflows.

That is the unlock.

Not more prompts.

Better delegation.

3. What It Replaces

Fugu can replace part of your manual model-selection workflow.

Before Fugu

You decide:

  • Which model to use

  • When to switch models

  • When to ask for verification

  • Which model handles code

  • Which model handles reasoning

  • Which provider fits the task

  • How to keep cost under control

With Fugu

You call one API.

Fugu handles model selection and switching.

Verified: Sakana says Fugu provides access to a coordinated pool of specialized models through one API and handles model selection and switching for each task.

What it does not replace:

  • Clear task design

  • Good prompts

  • Evaluation

  • Human review

  • Security boundaries

  • Cost monitoring

  • Compliance checks

Wrong assumption: orchestration means you can stop thinking.

No.

It means the model stack gets more automated, but your workflow still needs guardrails.

4. Fugu vs Fugu Ultra

Sakana offers two models:

Model

Best For

Tradeoff

Fugu

Everyday coding, code review, interactive work, chatbots

Balanced quality and latency

Fugu Ultra

Harder multi-step reasoning, research, paper reproduction, Kaggle, cybersecurity, patent analysis

Higher quality, slower response

Verified: Sakana says Fugu balances latency and quality, while Fugu Ultra prioritizes answer quality on complex multi-step work.

Use Fugu when speed matters.

Use Fugu Ultra when getting the answer right matters more than waiting.

That is the practical split.

5. Benchmarks and Source Claims

Sakana makes aggressive benchmark claims.

Their table shows Fugu and Fugu Ultra compared against Opus 4.8, Gemini 3.1 Pro, and GPT 5.5 across tasks like:

  • SWE Bench Pro

  • TerminalBench 2.1

  • LiveCodeBench

  • LiveCodeBench Pro

  • Humanity’s Last Exam

  • GPQA-D

  • SciCode

  • Long Context Reasoning

  • MRCRv2

Source claim: Sakana says Fugu models surpass publicly accessible frontier models and sit close to Fable 5 and Mythos Preview across engineering, scientific, and reasoning benchmarks.

Important:

Sakana says Fable 5 and Mythos Preview are not in Fugu’s agent pool because they are not publicly accessible.

That matters because the claim is not:

“Fugu is secretly using Fable or Mythos.”

The claim is:

“Fugu’s orchestration system can compete with them without using them.”

That is a very different statement.

Example benchmark numbers from Sakana

Benchmark

Fugu

Fugu Ultra

SWE Bench Pro

59.0

73.7

TerminalBench 2.1

80.2

82.1

LiveCodeBench

92.9

93.2

LiveCodeBench Pro

87.8

90.8

Humanity’s Last Exam

47.2

50.0

GPQA-D

95.5

95.5

Source claim: These numbers are from Sakana’s own benchmark table, not independent testing.

Use the numbers carefully.

Strong hook.

Not final proof.

6. Pricing

Sakana offers subscription and pay-as-you-go pricing.

Subscription plans

Plan

Price

Usage Positioning

Standard

$20/month

Lightweight daily usage

Pro

$100/month

10x Standard usage

Max

$200/month

20x Standard usage

Verified: Every subscription tier includes both Fugu and Fugu Ultra.

Fugu Ultra token pricing

For fugu-ultra-20260615:

Token Type

Price per 1M tokens

Input

$5

Output

$30

Cached input

$0.50

For context above 272K tokens:

Token Type

Price per 1M tokens

Input

$10

Output

$45

Cached input

$1.00

Verified: These prices are listed on Sakana’s Fugu pricing page.

Pricing twist

Sakana says Fugu does not stack model fees when multiple agents are active.

Instead, you pay one rate based on the top-tier model involved.

That is important.

Multi-agent systems can get expensive fast when every agent call adds another provider bill.

Fugu’s pitch is that the pricing does not multiply just because more agents are active.

Still, watch your token usage.

More orchestration can mean more tokens.

More tokens still means more cost.

7. Setup Notes

Sakana says Fugu is available through an OpenAI-compatible API.

That means you should be able to point an existing client or coding harness at the Fugu endpoint using an API key, without an SDK migration.

Verified setup concept

Use existing OpenAI-compatible client
Set base URL to Sakana Fugu endpoint
Use Sakana API key
Choose model: Fugu or Fugu Ultra
Send requests through the same client pattern

Not verified

  • Exact base URL

  • Exact API key environment variable name

  • Exact model string for regular Fugu

  • Full curl command

  • SDK examples

Sakana’s public page confirms OpenAI-compatible access, but the exact command template was not visible in the accessible page.

Do not invent it.

8. What Beginners Get Wrong

Wrong assumption: this is just another model

No.

This is closer to a model manager.

Fugu coordinates multiple agents. The orchestration is the product.

Wrong assumption: benchmarks mean production-ready

No.

Benchmarks are controlled tests. Production workloads involve messy inputs, latency limits, budget limits, compliance rules, and failure recovery.

Wrong assumption: Fugu Ultra is always the default

No.

Fugu Ultra is for harder work where quality matters more than speed.

For normal interactive work, Fugu is probably the cleaner first test.

Wrong assumption: you can audit every model choice

No.

Sakana says the specific models selected and coordination method are proprietary and not exposed.

That can be a dealbreaker for teams that need routing transparency.

Wrong assumption: EU users can use it now

No.

Sakana says Fugu is not available in the EU/EEA right now while it works toward GDPR and EU-specific compliance.

For a Berlin-based user, this matters immediately.

9. Where It Breaks

1. Transparency

You cannot see the exact underlying model route.

That makes debugging harder.

If the answer fails, you may not know whether the issue came from planning, execution, verification, or a specific model in the pool.

2. Compliance

Fugu lets users opt out of specific models for the regular Fugu model, but Fugu Ultra uses the full fixed agent pool.

If your company has provider restrictions, check this before building around Ultra.

3. Latency

More agents usually means more coordination.

More coordination usually means slower responses.

Sakana directly says Fugu Ultra prioritizes answer quality at the cost of response time.

4. Cost surprises

The pricing looks clean, but long-running agentic jobs can burn tokens quickly.

Especially with:

  • Code review

  • Research

  • Patent analysis

  • Literature review

  • Security assessment

  • Large context tasks

5. Availability

EU/EEA access is blocked right now.

That alone makes it unusable for a chunk of builders until compliance changes.

Use Fugu like an evaluation layer, not blind infrastructure.

Step 1: Start with low-risk tasks

Good first tests:

  • Code review on non-sensitive repos

  • Drafting technical analysis

  • Comparing papers

  • Finding edge cases

  • Reviewing architecture proposals

  • Turning messy notes into structured plans

Avoid first tests like:

  • Production security decisions

  • Legal analysis

  • Financial trading decisions

  • Sensitive customer data

  • Unreviewed code execution

Step 2: Compare against your current model stack

Run the same task through:

  • Your current best model

  • Fugu

  • Fugu Ultra

Score them on:

  • Accuracy

  • Completeness

  • Latency

  • Cost

  • Failure mode

  • Formatting

  • Usefulness of reasoning

  • Human edits required

Step 3: Use Fugu Ultra only where it earns the wait

Do not waste Ultra on simple tasks.

Use it for:

  • Multi-file code reasoning

  • Research synthesis

  • Paper reproduction

  • Complex debugging

  • Security reports

  • Patent landscape analysis

  • Long-horizon planning

Step 4: Keep human review in the loop

Fugu is still AI output.

Treat it like a powerful analyst, not a final authority.

By The AI Leverage - Learn and master AI daily

Keep Reading