Codex Thursday Update: Lock Screen Remote Control, Appshots, /goal Officially Launched
OpenAI released its Thursday update for Codex, featuring four major functions. The most notable is Lock Screen Remote Control: users can remotely control a locked Mac from their phone using Apple’s official Authorization Plug-in mechanism, with a four-layer security design including a very short authorization window, automatic screen blanking, and immediate lock upon detecting local input. Next is the Appshots feature: pressing the Command key twice on a Mac sends a screenshot and text content of the current window (including off-screen portions) to Codex. The /goal mode has graduated from experimental to official; after giving Codex a goal, it can work continuously for hours or even days, supporting pausing, editing, and side chat. Additionally, an advanced annotation mode was launched, allowing users to directly drag and adjust page elements and leave comments within Codex’s built-in browser. Team plugin sharing and Analytics panel upgrades were also released.
Sources:
- @OpenAIDevs: https://x.com/OpenAIDevs/status/2057530207976989179
- @OpenAIDevs: https://x.com/OpenAIDevs/status/2057536706778378692
- @dotey: https://x.com/dotey/status/2057556752888222025
- @op7418: https://x.com/op7418/status/2057678002675413057
- @xiaohu: https://x.com/xiaohu/status/2057560537215725653
- @imwsl90: https://x.com/imwsl90/status/2057699137114808530
DeepSeek V4-Pro Announces Permanent Price Cut, Harness Team Begins Mass Hiring
DeepSeek officially announced a V4-Pro model API price adjustment: after the 75% off promotion ends on May 31st, the official price will be one-quarter of the original. The adjusted price is 3 RMB per million tokens for input and 6 RMB for output. Multiple bloggers noted this is about three times cheaper than other models of comparable capability. Simultaneously, the DeepSeek Harness team opened recruitment for positions including R&D engineers, product managers, and researchers, accepting both full-time and internship roles, with locations limited to Beijing. One blogger analyzed that, given the novelty of the Agent Harness concept, any practitioner with recent deep experience using Claude Code and Codex and their own insights has a good opportunity.
Sources:
- @MaxForAI: https://x.com/MaxForAI/status/2057805496846045270
- @oran_ge: https://x.com/oran_ge/status/2057809279839785278
- @dotey: https://x.com/dotey/status/2057835713442230638
- @AlchainHust: https://x.com/AlchainHust/status/2057779175155732613
Qwen 3.7-Max Released, Outperforms GPT-5.5 and Opus 4.7 in Multiple Benchmarks
Alibaba released the Qwen 3.7-Max model. In a real-world agent task test (programming a Tetris-playing, self-training robot), Qwen 3.7-Max achieved a +56% performance improvement at a cost of $1.32, outperforming Claude Opus 4.7 ($12.15, +28%) and GPT-5.5 ($2.85, +7%). On the Arena global large model blind test overall ranking, Qwen 3.7-Max surpassed Kimi-K2.6, DeepSeek-v4-pro, and GLM-5.1, ranking first among domestic Chinese models. The model supports a `preserve_thinking` parameter to retain previous rounds of reasoning content, enhancing Agent decision consistency. A blogger’s test showed it can work for extended periods under Claude Code but noted a need for more precise prompts.
Sources:
- @MaxForAI: https://x.com/MaxForAI/status/2057737919314714693
- @LufzzLiz: https://x.com/LufzzLiz/status/2057766954442899627
- @AlchainHust: https://x.com/AlchainHust/status/2057688964388225235
PwC Paper: Grep Beats Vector Retrieval in Agent Search Scenarios
PwC published a paper titled “Is Grep All You Need?”, testing 116 long-context memory questions on LongMemEval. The research compared grep literal search and vector retrieval across different Agent execution shells (Chronos, Claude Code, Codex CLI, Gemini CLI). The result: in the main experiment’s inline mode, grep won over vector retrieval in every harness-model combination. The reason is that many Agent tasks are essentially evidence location (finding function names, file paths, error strings, etc.), where the semantic tolerance of embeddings introduces noise. The paper advises Agent developers: don’t assume every serious Agent stack should connect to a vector DB by default; first clarify what the Agent is actually doing.
Sources:
Cloudflare CEO Details: Laying Off 20% of Staff, AI Replaces “Measurers”
Cloudflare CEO Matthew Prince published a column in The Wall Street Journal titled “How I Chose Which Employees to Replace with AI.” Cloudflare just laid off about 1,100 people (its first large-scale layoff in 16 years), while hiring 1,111 interns this year (from nearly 1 million applications, a 0.1% acceptance rate). Prince cited Drucker’s taxonomy, categorizing employees into three types: builders, sellers, and measurers. AI doesn’t touch the first two—they’d hire as many engineers whose efficiency increases tenfold, and sales are safe because people prefer dealing with those who understand needs. Those being replaced are “measurers”: internal audit, finance, compliance, middle management, operations, marketing. Cloudflare is now moving to full-business continuous auditing, with middle managers significantly reduced because AI allows each manager to oversee more people. The quarter’s loss was $62 million, with severance and restructuring costs of $140-150 million, and the stock price fell over 20% at one point.
Sources:
- @dotey: https://x.com/dotey/status/2057641537719226585
- @dotey: https://x.com/dotey/status/2057641534225346990
Microsoft Recalls Internal Claude Code Licenses, Asks Engineers to Switch to Copilot CLI
According to The Verge, Microsoft has begun a large-scale recall of internal employee Claude Code licenses, asking developers to switch to its own GitHub Copilot CLI. Microsoft began promoting Claude Code internally in December last year, encouraging non-technical roles to try using AI for coding. Over six months, Claude Code became very popular internally. But this became a problem: it made Microsoft’s newly launched GitHub Copilot CLI look awkward. Teams responsible for Windows, Microsoft 365, Outlook, Teams, and Surface have asked engineers to complete the migration by the end of June. Insiders revealed cost considerations are behind the move—each Claude Code license fee goes to competitor Anthropic. Engineers asked to migrate are reportedly reluctant.
Sources:
SpaceX Starship V3 First Flight Aborted Due to Hydraulic Pin Failure, Retry Planned for Next Day
SpaceX conducted the countdown for the Starship V3 first flight but aborted it because the hydraulic pin securing the tower arm failed to retract. Elon Musk stated that if repairs could be made that night, another attempt would be made the next day at 5:30 Central Time. This is Starship’s twelfth flight test. Musk also reposted news that SpaceX now launches more rockets than the rest of the world combined, along with updates on satellite data centers and Starlink lunar coverage.
Sources:
- @elonmusk: https://x.com/elonmusk/status/2057609682865254695
- @elonmusk: https://x.com/elonmusk/status/2057594680284430428
Xiaohongshu Opens Skill Uploads; Open Design Supports Multi-Project Management and Handoff
Xiaohongshu now allows direct Skill uploads, which a blogger called a “major event”; it’s currently invite-only. Meanwhile, Open Design announced upcoming capabilities for multi-project management and handoff to Cursor/Claude Code, allowing users to manage multiple design projects in parallel and transfer design prototypes to Claude Code or Cursor for further production. Open Design also officially supports 18+ languages, with developers and users from nearly 20 countries worldwide.
Sources:
- @op7418: https://x.com/op7418/status/2057711810728559034
- @tuturetom: https://x.com/tuturetom/status/2057666633716380044
- @tuturetom: https://x.com/tuturetom/status/2057667453006561463
Open Source Updates: Mega-ASR for Noise Recognition, Google AX Distributed Agent Infrastructure, Feishu Bridge
Three noteworthy open-source projects. Nanyang Technological University, the National University of Singapore, and Shanghai AI Lab jointly released Mega-ASR, built on Qwen3-ASR, specializing in real-world poor audio (far-field, reverberation, echo, electrical noise, etc.). It significantly leads in cross-language test word error rates (3.19 for Chinese-accented English vs. CosyVoice2’s 17.10). With 1.7B parameters, it can be inferred on consumer hardware and is licensed under Apache 2.0. Google open-sourced AX, a state management, failure recovery, and cross-process scheduling infrastructure designed for distributed Agents, natively adapted for K8s deployment, aiming to become the Kubernetes for the Agent domain. Zara open-sourced the Feishu Bridge tool, allowing a local Claude Code instance to become a Feishu robot with one command, supporting interactive cards, direct image/file display, and full-text search.
Sources:
- @MaxForAI: https://x.com/MaxForAI/status/2057743732171272205
- @Gorden_Sun: https://x.com/Gorden_Sun/status/2057720476714336287
- @vista8: https://x.com/vista8/status/2057751033615700128
Stats: Scan timeline count=360 Matched blogger count=36 Matched tweet total=246 Weighted tweet score=200.25 Original tweet count=117 RT tweet count=42 Crawl attempt count=2 Boundary coverage status=tail_confidently_crossed_target_boundary