DeepClaude – Claude Code agent loop with DeepSeek V4 Pro

Daily Digest email

Get the top HN stories in your inbox every day.

aftbit

    #!/bin/sh
    export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
    export ANTHROPIC_AUTH_TOKEN=sk-secret
    export ANTHROPIC_MODEL=deepseek-v4-flash
    export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
    exec claude $@

rapind

ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_SUBAGENT_MODEL=deepseek-v4-flash

This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.

Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)

lhl

For those that don't want their data trained on, OpenRouter allows you to have account-wide or per-request routing with either provider.data_collection: "deny" or zdr: true (zero data retention).

Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.

jorvi

That doesn't work, if you do that it will mark DeepSeek's models with a warning symbol along with the error "paid model training violation".

miroljub

I wonder why the question about data security and training comes often with DeepSeek, Kimi, Glm and never with Anthropic, OpenAI, and Google models.

Why is that?

IIRC, USA data protection protects data of US citizens only, foreigners data is not protected, and the companies are not even allowed to disclose when they collect those data.

ricardobeat

ANTHROPIC_SUBAGENT_MODEL is not a valid setting, should be CLAUDE_CODE_SUBAGENT_MODEL.

rapind

This is correct. Sorry I was using my phone to post. Here's what my bash alias verbatim looks like (.bashrc / .zshrc). The DEEPSEEK_API_KEY var is setup separately (so claude doesn't see it):

----

alias clauded='ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic ANTHROPIC_AUTH_TOKEN=$DEEPSEEK_API_KEY ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_OPUS_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_SONNET_MODEL=deepseek-v4-pro[1m] ANTHROPIC_DEFAULT_HAIKU_MODEL=deepseek-v4-flash CLAUDE_CODE_SUBAGENT_MODEL=deepseek-v4-flash CLAUDE_CODE_EFFORT_LEVEL=max claude'

----

I doubt that the opus, sonnet, and haiku model args actually matter if you want to omit them.

I run this on a VPS that has no other credentials or project access so I can give it the skip permissions arg.

maxgashkov

As of now, OpenRouter offers multiple providers for DeepSeek with ZDR (not sure if they respect it but still).

vidarh

At several times the price of DeepSeek, though, so it's a tradeoff... Even then Pro is still cheaper than Haiku.

tariky

I wanted to try this. To bring back opus and sonnet do I just reset those env's?

snqb

yes, this is pretty much just rerouting Claude to call Deepseek's Anthropic-style-compatible endpoints instead of its own defaults Once removed, it'll work just like before

ianmurrays

Correct.

varenc

The more interesting part of deepclaude is the local proxy it runs to switch models mid-session and do combined cost tracking. Though these features seem quite buried in the LLM-generated readme. Looking at the history, it appears they were added later, and the readme wasn't restructured to highlight this.

Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)

yard2010

How come such slop is allowed here, what value do these vibe coded zero shot "projects" add? Why not just post the prompt?

throwatdem12311

Seriously. When I first looked this project had been pushed the first commit two hours prior. Projects should be at least 3 months old or automatically removed.

woctordho

For the same reason that GitHub has a releases page for uploading binaries.

jpadkins

is the value the working outputs or the inputs? A prompt alone would not let you recreate this project.

fragmede

Convenience? Am I supposed to take the prompt and use my own tokens on it? Why should I have to do that?

otabdeveloper4

Recruiters used to use the candidate's Github "sources" page for evaluating candidates as a kind of proof-of-work.

jimmypk

[flagged]

aaurelions

It seems like any project that makes fun of Claude is bound to reach the top spot on Hacker News. Even if it’s just a project consisting of four lines of code.

undefined

[deleted]

oblio

You're just mean. I count 6 lines of code!

ihsw

[dead]

spirit23

So I created https://getaivo.dev, one can use model in the coding agent directly. Just `aivo claude -m deepseek-v4-pro`

Tanxsinxlnx

does it support aws bedrock provider support,does i can use any model in this

spirit23

Ah, for aws bedrock, just use `aivo keys add` add baseurl and apikey, everything is ready, `aivo models` to see models

spirit23

Currently no, but it can be added

KronisLV

Wonder if there's a way to launch the desktop Claude app like that, especially on Windows, not just the Claude Code TUI/CLI. Might not be possible and you'd just have to use --remote as a workaround.

btbuildem

This in essence is what allows one to use any model with CC -- including local.

neutrinobro

I know. I'm struggling to understand how this is a github repo/HN article. I've been using claude-code with a llama.cpp server and a dummy API key, and all that is required is to define 2 environmental variables to point claude at the local endpoint. Am I missing something?

port11

DeepClaude doesn't support MCP tool use; does your solution work with MCP tools such as Serena?

nadermx

The AI wars have begun

heisenbit

And they are enticing human agents to further their agendas using techniques learned from the white mice.

stingraycharles

This has been possible since the beginning.

undefined

[deleted]

vitaflo

I'm not exactly sure what the point of this is. Deepseek already has instructions to use its API with many CLI's including Claude Code directly:

https://api-docs.deepseek.com/quick_start/agent_integrations...

varenc

The readme absolutely buries the features that are actually non-trivial: It runs a proxy to switch models mid-session, and does combined cost tracking between Anthropic and other models you might be using. The LLM that wrote the readme never updated the general project description to highlight these features.

Also the author checked in their advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc...

eloisant

Also Claude Code is one of the worse CLI, the only good thing is that it's the default for Claude.

I don't see why anyone would want to use it for any other model than Claude instead of OpenCode or Pi.

2ndorderthought

There probably isn't a point. Someone didn't understand something, didn't research it, so they 1 shotted their first thought and sent it to the front page of HN and all of their socials. It's the future bruh

georgeburdell

I embrace it at this point. It ends all the shilling of vibe coded tools at work that I have endured over the past year. Everyone can now make their own tools with zero obligation to coordinate beyond shared hardware resources

altmanaltman

To be fair, HN sent it to the front page, not the user. The rest I agree.

sumeno

A project that obviously bought stars on GitHub probably bought upvotes on HN too

dev_hugepages

And now, because we all upvoted and commented on it, the vibe coded slop of the new user is on the front page now.

2ndorderthought

Same place same time tomorrow?

croes

From vibe coders for vibe coders

2ndorderthought

I don't always copy paste vibe coded project readme mds into Claude code and ask them to rewrite it but when I do... actually that's all I do now because my goal in life is to make wealthy overvalued companies wealthier.

incrudible

Anthropic is the opposite of wealthy, the more you use their service, the more money they lose. Unless you think your precious MDs being used for training data is gonna make them rich eventually.

kordlessagain

Problem?

undefined

[deleted]

crooked-v

I'm curious how well it actually works. I tried Deepseek with Hermes and Opencode and it seemed extremely bad about using some of the basic tools given, like the Hermes holographic memory tools, even with system prompt instructions strongly pointing them out.

_345

I've been experimenting with Hermes, I'm convinced hermes is also just bad. Like as a harness it has got to be doing something to lobotomize these models- Even GPT-5.4 performs badly in Hermes vs just using it in Codex.

ttoinou

I thought the tool format wasnt exactly the same ? So plugging any IA into claude code requires a conversion of format

selcuka

DeepSeek has a dedicated Anthropic-compatible endpoint [1].

[1] https://api-docs.deepseek.com/guides/anthropic_api

miroljub

This one still lacks some features. They still recommend using their OpenAI compatible endpoint.

But I guess Anthropic is just not capable of implementing the OpenAI API compatible client in Claude Code.

ricardobeat

Many of them expose “anthropic-compatible” APIs for this very purpose.

faangguyindia

qwen also offers openai compatible endpoint.

TacticalCoder

It's really getting a lot of upvotes so it's nearly as if people were feeling locked-in and wanted a way out but...

Why would you keep using CC CLI if you want to use the much cheaper DeepSeek v4 models (Flash and Pro): isn't it the opportunity to kiss CC CLI goodbye and use something not controlled by Anthropic?

Anyone here successfully moved from CC CLI to a fully open-source project? I'm asking this as a Claude Code CLI (Sonnet/Opus) user. My "stack" is all open-source: from Linux to Emacs to what-have-you. I'd rather also have open-weight models and a fully open-source (not controlled by a single company) AI CLI.

Any suggestion for something that works well? (by "well" I mean "as well as Claude Code CLI", which is not a panacea so my bar ain't the end of the world either).

justech

If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness. And then for models, you can choose from OpenCode Go (IMO most cost effect at this moment), OpenRouter, or direct from DeepSeek. Better if you go the Kimi route IMO and just buy a subscription from kimi.com

wolttam

I’m going to throw my harness in the ring: https://codeberg.org/mlow/lmcli

taocoyote

Looks interesting. Does it offer anything special that pi.dev or opencode does not?

wolttam

Probably not, `lmcli` is very lean. I would consider it a slightly lower-level tool than either pi.dev or opencode. E.g. there is no built-in coding agent, but it's easy to build one up in the config with your own prompt (or use the example).

It's proven useful for me, and I figure others might appreciate how light of a shim it is between you and the models.

Aeroi

agreed. OpenCode is a strong base, and with a couple modifications it can become a very effective harness. my sideproject mouse.dev I’ve been combining parts from OpenCode, Claude Code, and Hermes to build a cloud agent architecture that works well from mobile.

ryanlitalien

Kudos! Cool idea, I'm on the same path you are, yet you're just one step ahead. For mouse.dev, what are you using for the cloud agent sandbox piece? I haven't moved my agents to the cloud yet (for on the go mobile enablement). Would Islo be a competitor to mouse?

https://islo.dev/

https://www.incredibuild.com/blog/why-we-built-islo-ai-codin...

Aeroi

cool! I've been mostly building for what coding from an iphone can look like. the cloud agent sandbox portion is definitely not polished yet but working well so far. i looked at daytona, e2b, modal ect. but decided to roll my own with fly.io. ttl on agent create. mouse uses per-thread sandboxes (not shared-container multi-workspace) and then post-gres for agent history ect.

i'll have to look more at islo, I definitely think its a growing space with alot of opportunity for those that participate and solve problems.

CharlesW

> OpenCode is a strong base, and with a couple modifications it can become a very effective harness.

I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?

eloisant

What issues do you have with OpenCode?

Personally I use it for the TUI, it's way better than Claude Code's one.

Aeroi

I haven’t run formal evals but i improved the experience for my own needs and it feels noticeably better with these modifications.

-Claude-style subagents -an MCP layer for higher-level tools -Cursor-style control plane modes like Ask, Plan, Debug, and Build.

The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.

So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.

Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.

I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.

adobrawy

I'm a Claude Code Web fan and a rather heavy user. So I was interested in your product. However, I couldn't find an answer on the website. What parts did you find so good that you ported them?

Aeroi

Nothing groundbreaking but i'll do a blog writeup on the architecture if it would be helpful for people. My focus has been on mobile.

The main pieces I've integrated for mouse.dev inspired by claude/cursor was plan mode, agent questions, subagents, pre/post hooks, context compaction, repo-local skills, and permission modes. So mostly tools like enter_plan_mode, ask_user_question, and spawn_subagent, plus .mouse/skills and .mouse/plans.

One nice feature is continuity. If you’re working on desktop and save a plan to .mouse/plans, you can pick it up later on mobile with cloud agents, or do the reverse. You can plan something from your phone, then when you’re back at your desk, review it/build it. That was my initial goal with this project because I've found the plan act loop so helpful.

Mouse Cloud Agents is mostly an OpenCode-based harness, but everything routes through our MCP/event system so it’s mobile-first and provider-agnostic.

I intentionally skipped a lot of IDE and Claude Code style desktop features. The bet is that this new style of coding is becoming less “edit files in an IDE” and more steer a capable coding chatbot.

Would love to hear from anyone reading that's iterating on harness architecture, it's been really fun to work on.

mgoetzke

I liked pi.dev but why is registering endpoints and models not as simple as possible ? Or am i missing something ? I always have to fiddle with the config file.

miroljub

Editing config files is not necessary. Just do /login from your session, choose your provider, and there you go.

aaurelions

Another very cost-effective option is Ollama Cloud. In a month of use, I only hit the 5-hour limit once, when I ran 8 agents simultaneously for 2 hours.

tomw1808

for me its unbearably slow - especially with deepseek v4 pro. Is that just myself? I literally signed up and canceled again, because for one prompt I needed around 5 minutes to get 600 tokens back (via ollama launch claude --mode ...)

undefined

[deleted]

kopirgan

On which tier?

postatic

definitely worth it - have both ollama cloud, opencode and hermes running to test them all out, working great so far.

cpursley

How does the kimi subscription compare to Codex and Claude Code in terms of how much mileage you get for the pricing? I mean, I see the prices but not sure how usage that buys.

taytus

Kimi feels almost limitless. I have the $40/month plan and I've never ever remotely close to hitting the limit. Using opus as the orchestrator.

cpursley

I've had some good results with Kimi in Opencode. Can you tell me more about using Opus as orchestrator - what type of harness setup?

bakugo

> I would first suggest looking into pi.dev

Looked into this one. Thought it was suspicious that it only had 7 open issues on github. Turns out they have a bot that auto-closes every single issue just because.

I honestly have no words.

mikeocool

Their process is outlined here: https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...

> Maintainers review auto-closed issues daily and reopen worthwhile ones. Issues that do not meet the quality bar below will not be reopened or receive a reply.

Seems like not an unreasonable way to deal with the problem of large numbers of low quality issues being submitted.

oefrha

If that process actually happens then there’s absolutely no reason not to have the reviewing maintainer close it after review instead. The only reasonable conclusion is that documented process is aspirational at best and vibed itself at worst.

cromka

Sounds like a perfect way to agitate the community going against the established culture like that.

altmanaltman

But how is it any different from keeping them open?

Like if they are going to sort through all the issues eventually (like they claim), why not just close the ones that are not worthy when they get to them instead of closing all by default?

Is it just so that the project doesnt have open issues on its github page? But they are open issues in reality because the maintainer will eventually go through them?

Nothing is "unreasonable" in the sense that an open source project should have the right to do what it wants with its rules but its definitely a weird stance.

__cayenne__

The maintainer, Mario, sometimes declares the repo is on an “issue holiday” where issues are auto closed. This particular holiday is because there is a big refactor coming up. In non holiday periods issues can be reported as normal.

skeledrew

They have a pretty decent explanation.

https://github.com/badlogic/pi-mono/blob/main/CONTRIBUTING.m...

DetroitThrow

"Decent" is doing some work. This is going beyond any norms I've encountered in OSS to close issues by default via a LLM or an "issue holiday".

LPisGood

The idea is for it to he extremely minimal which strikes me as a very opinionated stance, and not opinions I agree with.

justinhj

It's a very interesting project. Many popular open source projects are inundated with poor quality issues and prs, hence the defences they are starting to erect.

DeathArrow

>If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness.

While those are nice, Claude Code has the largest amount of plugins and skills I want to use.

wizhi

Aren't skills just literal plaintext files? Why not just copy them?

DeathArrow

Yes, they are .md files but they can rely on builtin behaviors in the harness or on plugins.

rsanek

>DeepSeek V4 Pro scores 96.4% on LiveCodeBench and costs $0.87/M output tokens

This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]

The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]

I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.

[0] https://api-docs.deepseek.com/quick_start/pricing [1] https://openrouter.ai/deepseek/deepseek-v4-pro/providers

FooBarWidget

They expect inference prices to structurally drop once they receive their big batch of Huawei Ascend chips by the second half of the year.

syntex

Not sure you can replace Claude with DeepSeek V4 that easily and have same results.

From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.

In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.

vidarh

There are certainly quirks, but identifying and conforming to those quirks is not that complex. E.g. I had Kimi "fix" my harness to work better with Kimi by pointing it at the (open source) kimi-cli + web search and telling it to figure out which differences might matter (it made compaction more aggressive, and worked around some known looping issues (by triggering compaction if it spotted looping tool calls). Largely addressing the quirks tend to harden the harness for other models too. But, yeah, it is more work to make the smaller models work with instead of against the harness.

dandaka

I hope they collaborate with open source harness providers (Pi, Opencode) and train models with those. So next generations will have better integration and better overall quality.

cpursley

I love to learn more about the system you’re building out in Elixir and your learnings if any of it is public.

syntex

Its semi public, but I probably publish it soon once its less embarrassing.

Its an Elixir agent runtime with a thin Go TUI (bubble-tea). Im building it mostly to explore agent orchestration: planner/workers/finalizer flows, local file/code-edit tools, MCP tools, permission gates, run context, compaction, and eventually larger swarms. Erlang/Elixir is interesting for this because the actor/supervision model maps pretty naturally to lots of isolated agents and long-running supervised tasks.

As i said, The main lesson so far is that everything around contracts is much more fragile than I expected unless you use a very strong model. Planners return Markdown instead of JSON, tools get called with subtly wrong args, subagents repeat broken tool calls, finalizers lie about success after workers failed. And various permissions may be interpreted by agents in unexpexted way

I also started with too many modes too early instead of making agentic path extremely solid. That made me understand better why these codebases become huge: there are endless corner cases if you want a harness to work across models, providers, tools...

Stronger models hide a lot of harness weakness and weaker models expose. Making weaker models good enough requires a surprising amount of contract hardening. But that hardening tends to make the system better for stronger models too.

Also elixir http stack was causing a lot of problems (needed to use gun eventually)

cpursley

Thank you for the writeup, integration with a TUI sounds great. Have you played with Jido (it's built on ReqLLM)? OpenAI also has an interesting Elixir orchestration project (surprisingly).

mihailupu

[dead]

o10449366

Idk, my recent experience with Claude is that 4.7 barely knows how to use basic bash tools - how to properly check when programs have finished running, even basic stuff like how to run pytest suites and read the failed tests from the output without re-running the suite to specifically look for them. It's shockingly dumb for all of the tooling they've built into Claude Code (the useless Monitoring tool that blocks bash polling/sleeping that actually works, etc.).

I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.

Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".

Edit: Like just now - I can't tell you how many times I day I see this sequence:

"Sorry, I'll run in parallel"

"Error editing file"

"File must be read first"

Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"

rirze

I’m finding great success having Claude design and review code but having codex actually implement it.

dalekkskaro

[flagged]

connorwhitlock

[dead]

iosjunkie

Get comfortable with Deepseek's privacy policy for using this for anything serious.

"To improve and develop the Services and to train and improve our technology, such as our machine learning models and algorithms. Including by monitoring interactions and usage across your devices, analyzing how people are using it, and training and improving our technology."

https://cdn.deepseek.com/policies/en-US/deepseek-privacy-pol...

isege

> Claude Code is the best autonomous coding agent.

If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.

So it's quite the opposite. Claude Code is arguably the worst harness to run models with.

DaanDL

Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:

https://debugml.github.io/cheating-agents/#sneaking-the-answ...

andxor

Then the benchmarks are wrong.

cpursley

Those benches are completely and totally meaningless when it comes down to real world work tasks, and everyone knows it.

l5870uoo9y

> DeepSeek V4 Pro scores 96.4% on LiveCodeBench and costs $0.87/M output tokens.

Yes and this is a temporary discount which increases to 3.48 USD on 2026/05/31 15:59 UTC.

Source: https://api-docs.deepseek.com/quick_start/pricing

TheServitor

It's surprisingly easy to hit $200 worth of tokens even at ~$1/M token though. No matter how many times I do the math the coding plans are the better value.

ojr

I don't think it surprisingly easy at all unless you are running multiple background tasks overnight, I think Gemini will eventually be the default agentic coding agent in the future when it comes to price and efficiency because its backed by sound money.

I don't use the Claude Code harness, grep instead of using a combination of vector search is super expensive and not sure how their read file implementation is. I built my own harness for example that restrict reads and writes in a token efficient manner. Building your own harness will always be the cheapest option in the long run.

My own harness, minimalistic GUI gets the job done nothing too fancy https://slidebits.com/isogen

_345

If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though

2ndorderthought

A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.

I could see a serious cost reduction story by using opus for design and deepseek for implementation.

Personally I would avoid anthropic entirely. But I get why people don't.

girvo

Like me: that’s what I do. Either Opus 4.7 or GLM 5.1 for planning, write it out to a markdown file, then farm it out to Qwen 3.6 27B on my DGX Spark-alike using Pi. Works amusingly well all things considered.

brianjking

How are you interacting with GLM 5.1? Via the Claude Code harness? I really wish they'd release a fully multimodal model already.

2ndorderthought

How is glm 5.1? I have t tried it yet but have been meaning too

aftbit

What hardware are you using to power this?

chrsw

I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.

energy123

It's not even that much cheaper, GPT 5.5 is about 2x more expensive per task than Deepseek v4 Pro when you adjust for less token usage, according to Artificial Analysis. Doesn't seem worth it to me.

cpursley

Are we talking pay as you go API or vs plans?

brianwawok

And I keep using Opus to like, make git commits. Really just need a smart router that is actually smart, vs having to micromanage model

sterlind

the problem is managing the contexts. your session might fit in Opus, but will that smaller model you dispatch the git commit to fit? even so, will it eat too much on prefill? do you keep compactions around for this, or RAG before dispatch or something? how do you button back up the response?

all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.

maxdo

This is the problem: you need the best model, not just a good one, for: - Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out - Bug fixing — same, plus logs, e.g. datadog

Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.

testing gets more and more complicated. Take a look at opencode go, and you see this:

>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash

and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?

JSR_FDED

I'd argue that you need the model that's good enough, not the best.

Culonavirus

We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.

nyssos

Agreed that we're not at saturation, but we don't have a canonical "best" either. For example ChatGPT 5.5 + Codex is, in my experience, vastly superior to Opus 4.7 + Claude Code at sufficiently well-specified Haskell, but equally vastly inferior at correctly inferring my intent. Deepseek may well have its own niche, though I haven't used it enough to guess what it might be.

willio58

I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus

2ndorderthought

You might be surprised then at how good cheaper models solve your problems

mohsen1

This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.

undefined

[deleted]

sbinnee

After some time replacing gemini 3 flash preview with deepseek v4 flash for a chat model, the biggest difference is the auto reasoning effort. Gemini flash is super fast and perfect for a chat model. But when I need some thought experiments with a handful of constraints, it struggles a bit and I switch to sonnet. But with deepseek v4 flash, it can do long complex reasoning and it gets things often right. Generating a lot of reasoning tokens means that it takes a lot of time of course. But I am happy to find a cheaper model and excited to try something other than gemini flash. Gemini flash has been so good that I was locked on it for a while.

izietto

Just want to say that I faced this very problem the last week, I discovered OpenCode agent and it works great, with DeepSeek and other models. Try it out guys.

jedisct1

Try https://swival.dev - Works perfectly with DeepSeek and Qwen.

column

Pi will blow your mind :)

aucisson_masque

No MCP.

No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.

No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.

No plan mode. Write plans to files, or build it with extensions, or install a package.

No built-in to-dos. Use a TODO.md file, or build your own with extensions.

No background bash. Use tmux. Full observability, direct interaction.

eloisant

I've tried both, I prefer OpenCode. I get that Pi is more customizable but I prefer a nice, complete out-of-the-box experience and that's what OpenCode provides.

Daily Digest email

Get the top HN stories in your inbox every day.