Every week our pipeline scrapes the model catalogs and vendor blogs, then judges each item against one question: would this actually improve something we run in production? Verdicts below. Watch = interesting but unproven claims · Adopt = earned a place · Ignore = noise · Product Input = a feature idea, not a factory change.
W27 review: 30 items processed across three pipelines (inbox: 15, model-watch: 15). Zero items promoted to Immediate Factory Upgrade or Factory Candidate this week. Two items Ignored; one item classified Product Input (routed to lengua); all others held at Watch.
The batch is dominated by two themes: (1) new frontier model releases (Claude Fable 5, Gemini 3/3.5/3 Flash, Opus 4.8, Claude Managed Agents) whose specific pricing/API/maturity claims cannot be personally verified and are therefore held in the verify-before-promoting lane; (2) a wave of OpenAI model pricing additions (web_search tool cost added to gpt-5.1, gpt-5.1-codex, gpt-5-nano, o1, o1-pro) plus several non-incumbent model deprecations (Baidu, Alibaba, DeepSeek, Qwen variants) that require only a quick grep to close.
Claude Fable 5 introduces a new model ID (claude-fable-5), revised pricing, a 30-day data-retention mandate, and silent RSI suppression — each independently changes routing logic, compliance posture, and output trust in any app using Anthropic SDK.
Why this verdict: High-signal item covering a live model with pricing and policy changes that touch routing decisions. Held this week because specific claims (pricing, 30-day retention mandate, RSI suppression details) are unverified specs from an aggregator.
Opus 4.8 adds mid-conversation system message injection and lowers the prompt-cache minimum to 1024 tokens — both reduce cost and improve context-management flexibility for long agentic sessions.
Why this verdict: Directly actionable if confirmed. Held because the 1024-token cache minimum and mid-session system message injection are specific spec claims from Simon Willison, not official docs, and require verification.
Claude Managed Agents (multi-agent orchestration primitives: Dreaming, Outcomes, Routines) represent Anthropic's native agent-scheduling layer — if GA, they could replace custom orchestration code in shepherd and reduce maintenance surface.
Why this verdict: High potential if GA; held because specific feature availability and API names are unverified from a live-blog source. Watch pending docs confirmation.
Gemini 3.5 Flash is a faster/cheaper replacement for the current Gemini reviewer in ai-practice-watch's Three Wise Men panel; upgrading the model ID would directly change review quality and cost.
Why this verdict: Deferred from last week. Directly actionable for ai-practice-watch Gemini reviewer. Held because specific model ID, pricing, and context window are unverified claims from vendor blog posts.
Fable 5's autonomous agentic behavior can incur large per-session cost burns without user-visible checkpoints — the mechanism is unbounded agent execution without budget guardrails, a correctness and cost risk for any orchestration layer.
Why this verdict: Actionable concern about agent cost runaway. Held at Watch because the specific cost-per-session figure is from an aggregator and requires confirmation; also, the rule-of-two gate requires confirming shepherd actually lacks a budget cap before promoting.
Gemini 3 is the new top-tier Gemini model; as a potential replacement for the Gemini reviewer in ai-practice-watch's Three Wise Men panel, it changes the quality ceiling for the panel's Gemini seat.
Why this verdict: Deferred from last week. Overlaps. Lower priority because Gemini 3.5 Flash is a more cost-efficient upgrade path and should be evaluated first.
Simon Willison's independent impressions of Claude Fable 5 corroborate the new refusal/fallback API option and pricing — secondary corroboration source for routing and API-design decisions already flagged in.
Why this verdict: Useful corroboration but largely redundant. Held pending the same verification. Deduplicated; Watch only.
Gemini 2.5 Flash/Pro updates plus native MCP support in the Gemini API expand both the reviewer model options and the tool-calling surface area for ai-practice-watch's Gemini reviewer.
Why this verdict: Deferred from last week. MCP support is the novel capability but is superseded in priority by newer Gemini 3/3.5 items. Watch.
Gemini 3 Flash is a speed/cost-optimized variant of Gemini 3; potentially a drop-in for the Gemini reviewer at lower cost than Gemini 3 Pro.
Why this verdict: Deferred from last week. Useful cost data point for Gemini reviewer selection but subordinate to the 3.5 Flash evaluation. Held pending price verification.
Gemini Live API's real-time speech-to-speech translation (70+ languages, public preview) enables voice-native language learning interactions directly applicable to lengua's core use case without new self-hosted infrastructure.
Why this verdict: Deferred from last week. Clear product-input for lengua. Route to lengua backlog for evaluation. Held pending API maturity verification but classified correctly as Product Input.
Gemini 2.5 Computer Use enables browser/UI automation from the Gemini API — a hosted vision-action loop that drives web UIs without a local browser automation stack.
Why this verdict: Deferred from last week. Potentially useful but likely preview-maturity (maturity floor concern). No named product blocker justifying immediate action. Watch.