
As a Claude-Max 20X user, I've been pushing hard for about 4 months with over 1,500 commits (developing a multi-agent orchestration environment based on Opus).
As usual, during my early morning session, I saw a notification on Claude Code Desktop: "OPUS 4.7 RELEASE! Try it~".
Wait, 4.6 was released barely 2 months ago, and the initial instability of the 1M context window is only just settling down. Suddenly, 4.7??
Still, given Anthropic's pattern of quickly addressing model issues, I figured there must be a reason... so I applied it immediately.
In just about 15 minutes?? It felt completely different from previous versions of Opus... It became much more verbose. My first feedback was, "You smell like Gemini?"
My feedback is not intended to disparage Google's Gemini. However, Claude agents have consistently adopted a concise, summary-oriented response style compared to Google's Gemini (a relative concept). Is it just a change in response pattern? Is it an expression of context confidence? As time went on, my concerns and minor doubts led me to realize a serious structural change.
In fact, while working with Opus 4.7 for a day or two, I tried to combine it with the existing 4.6 1M model, tentatively withholding complaints and attributing it to initial release instability and filtering processes.
After checking the status for over 10 days... I had no choice but to conclude that this is a problem not only with the model's performance but also with its attitude.
What I considered Claude's greatest advantage over other models wasn't just benchmark scores. My judgment was that it was the only model capable of inter-agent collaboration due to its superior adherence to project context compared to other models of the same class. Compared to Gemini or GPT, which have larger resource quotas such as context windows, Claude has a smaller single-session capacity. However, how well the agent itself adheres to user-based context systems such as claude.md / memory.md / rules / skills and executes the project is the most important factor and the core of teamwork.
As someone who doesn't use GPT much, I often compare it to Google's Gemini. Gemini boasts a session context window capacity of 2M as a selling point, but it has historically tended to easily ignore user instructions in favor of complying with Google's safety guards or system prompts. On the other hand, Claude models, including Opus, strongly enforce user settings such as claude.md, which is automatically and forcibly loaded at the start of a session. I have built and operate an operating system where multiple Opus CLI, orchestrator, Claude Code Desktop, and other agents are assigned according to their roles to collaborate on coding or tasks, using a system of constitution/laws/detailed implementations/skills. Cracks began to appear in this system. Version 4.7 began to downgrade the Claude file (internally referred to as the constitution) from a mandatory rule to a mere reference. It forgets the rules essential for collaboration, such as session log rules, agent instructions, and reporting systems, every turn. When I point out that, it creates a document called feedback.md on its own, which doesn't exist in the existing rules, and promises to abide by it in the next turn and the next session, but repeats the same mistake again in the next turn. This is natural. It can't even keep claude.md, which is loaded as the main file, so there's no way it will keep its own self-reflection and rules created as feedback, and it will make the same error 100% of the time.
I'm very suspicious. Is 4.7 really OPUS? The main issue with the 4.5~4.6 releases was the evolution of the context window from 200K to 1M.
I don't react much to detailed figures. The key is whether it completes the given task to the end, accurately records its processing details, accurately reports its tasks based on facts, and whether communication and cross-validation between agents are properly performed. In this respect, I can now say for sure about Opus 4.7:
Failure!!