Rendered at 00:03:44 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
kristjansson 3 hours ago [-]
> The phrase "frontier model" is starting to mean two things. One is a checkpoint. The other is a system boundary.
LLM-isms aside, I don't think we want this to be the case? An LLM, for all its complexity, is something that can be reasoned about. It's picking the next token, until it hits an EOS. The semantics imposed on those tokens (reasoning ,tool call, etc.) are up to the user('s harness) to decide and act on. The more that's pushed behind the facade, the harder it is achieve sufficient understanding of the model's behavior s.t. one can compose it into larger abstractions. Perhaps the performance (and the adherence to an interface/contract) compensate? But swapping from Opus or 5.5 to this or Fugu seems like a much bigger change than swapping between different 'base' models.
plaguuuuuu 11 minutes ago [-]
They're applying misdirection so that we use their secret-sauce agentic framework, but like a black box and without seeing any of the internal reasoning patterns, cause that would give it away.
That's a deal-breaker for me. I need as much observability and control over my development workflow as possible; that's part of my secret sauce.
mohsen1 14 minutes ago [-]
This seems to be a new trend. Noticed it with GPT "ultra" in their announcement[1]. I'm with you, a large language model and a system of many language models working together are not the same thing
I might be wrong, but strongly suspect that Fable 5 is already something in this shape, considering long time to first token while having normal troughput.
dantodor 7 minutes ago [-]
sakana fugu landed sooo loudly ... I canceled my test subscription in two days.
meander_water 2 hours ago [-]
I thought all model providers are doing this under the hood anyway in their UI?
They certainly seem to when A/B testing different models, and Fable routes to Opus 4.8 when guardrails fail.
Every one has been saying it’s all about the harness. This is an obvious result of that.
I think an optimal solution would be to have more seamless integration between harness and router roles. As each are only half the picture
jerpint 3 hours ago [-]
Solutions like these are really cementing the view that LLMs are becoming a commodity
chatmasta 14 minutes ago [-]
Looks nice (slop article aside), but why is VSR Hybrid only benchmarked on Humanity’s Last Exam and not the other two benchmarks (LiveCodeBench and GPQA-Diamond)? Is this an oversight or are the results too terrible to show?
droidjj 4 hours ago [-]
Can we please stop submitting fully AI-generated text to HN?
3 hours ago [-]
tensegrist 3 hours ago [-]
at least 50% of the front page would disappear if this were enforced
jghn 3 hours ago [-]
Don’t threaten me with a good time
folkrav 3 hours ago [-]
I'd be perfectly okay with that.
Escapade5160 2 hours ago [-]
So be it.
alchemist1e9 4 hours ago [-]
This should help with better utilizing a heterogenous collection of inference hardware.
LLM-isms aside, I don't think we want this to be the case? An LLM, for all its complexity, is something that can be reasoned about. It's picking the next token, until it hits an EOS. The semantics imposed on those tokens (reasoning ,tool call, etc.) are up to the user('s harness) to decide and act on. The more that's pushed behind the facade, the harder it is achieve sufficient understanding of the model's behavior s.t. one can compose it into larger abstractions. Perhaps the performance (and the adherence to an interface/contract) compensate? But swapping from Opus or 5.5 to this or Fugu seems like a much bigger change than swapping between different 'base' models.
That's a deal-breaker for me. I need as much observability and control over my development workflow as possible; that's part of my secret sauce.
[1] https://news.ycombinator.com/item?id=48689338
They certainly seem to when A/B testing different models, and Fable routes to Opus 4.8 when guardrails fail.
Also, openrouter recently released a fusion router - https://openrouter.ai/blog/announcements/fusion-beats-fronti...
I think an optimal solution would be to have more seamless integration between harness and router roles. As each are only half the picture