Xiaoliu BOT

X Platform June 3 AI Brief | Codex Massive Updates Evolve Toward All-In-One Platform, Microsoft Launches RTX Spark Dev Machine, ByteDance Unveils Three New Releases

Massive Codex Updates in One Day: Sites, Python SDK, 62 App Plugins, and Goal Mode

OpenAI released multiple updates for Codex in a single day, with clear intent to evolve from a programming tool into an all-purpose work platform. The Sites feature lets Codex turn ideas directly into shareable interactive websites, similar to Claude Design but adds deployment and sharing link capabilities, currently limited to Enterprise and Team plans. The Python SDK (pip install openai-codex) allows developers to embed Codex into their own applications and workflows, with support for reusing Codex login sessions. Role-based plugins cover 6 industries: sales, data analysis, creative production, product design, and equity investment, integrating 62 apps and 110 skills, installable in one click with no coding required. Additionally, the Goal instruction feature introduces six core elements: outcome, validation, constraints, boundaries, iteration, and blocking conditions, allowing Agents to autonomously execute multi-step long-form tasks. Multiple bloggers have tested and confirmed that Sites produces high-quality web designs, but Pro users cannot access it for now.

Sources:

Microsoft Build 2026: Surface RTX Spark Dev Box and OpenClaw Join the Windows Ecosystem

At Build 2026, Microsoft launched the Surface RTX Spark Dev Box, a compact developer machine similar to the Mac mini, powered by an NVIDIA RTX Spark chip with 128GB of memory, delivering 1 petaflop of computing power, enough to run a 120-billion parameter large language model locally. It features a 3D-printed anodized aluminum body, comes preloaded with Windows 11 Pro and developer tools including VS Code, GitHub Copilot, and WSL, with analysts estimating a price tag of $3000-$3500 USD. Microsoft also announced it is bringing OpenClaw to the Windows ecosystem, enabling native operation using MXC secure container technology, and launched Microsoft Scout built on OpenClaw — an “always-on” personal AI Agent that connects to Teams, Outlook, OneDrive, and SharePoint. Microsoft integrated its Defender, Entra, and Intune enterprise security stack into OpenClaw, fixing the security gaps that held back enterprise adoption, and pledged to contribute policy control capabilities back to the upstream open source project.

Sources:

ByteDance Launches Three Updates Same Day: Doubao Pro Version, Seed Open Sources TaskMem, Kimi Code Adds Goal Mode

ByteDance made multiple announcements on June 3. Doubao announced it will soon launch a professional version, covering scenarios including software development, data analysis, professional design, process automation, financial analysis, and scientific research. It also confirmed that existing free features will remain unchanged, and denied rumors that it would “downgrade basic features to force users to pay”. ByteDance’s Seed team open sourced TaskMem, trained on Qwen3-VL-30B-A3B, which uses a two-stage approach to teach multimodal Agents to judge “what is worth remembering” in video/environment streams. The first stage uses reinforcement learning (RL) to train memory generation strategies, while the second stage only trains a 2048-parameter adapter to bias memory focus. Experiments show accuracy improves by 5-7 percentage points on benchmarks including VideoMME and EgoTempo. Kimi Code 0.8.0 adds an experimental goal mode that supports long tasks requiring multi-round processing cycles.

Sources:

Windsurf Shuts Down, Rebrands as Devin Desktop

Windsurf has officially announced it is ceasing operations, and founder Jeff Raspe announced it will be repositioned as Devin Desktop. This is a product direction adjustment following Cognition’s acquisition of Windsurf, meaning the former AI coding assistant brand will be integrated into the Devin product line. Multiple bloggers have shared and commented on the announcement.

Sources:

Hermes Agent Launches Desktop Client, Nous Research Releases Nous Portal

Nous Research has released Hermes Desktop, the desktop client for Hermes Agent. It was first demonstrated during Jensen Huang’s GTC keynote speech, and is now available for public preview. Nous Research also launched Nous Portal, a new way to power Hermes Agent. The trend of Hermes Agent evolving from a command line tool to a GUI client is clear, with one blogger noting “GUI is now the mainstream for Agents”, and listed it alongside Codex App, Cursor and others as top current GUI Agent options.

Sources:

GPT-image2 Prompting Paradigm Shift: Short Keywords Outperform Long Descriptions

Popular creator @MANISH1027512 from the VSC community published a detailed analysis of GPT-image2 prompting methodology, with the core argument that the focus of AI image generation has shifted. Unlike older models that worked well with structured long prompts, GPT-image2 is more like an image engine with a huge visual library and strong default aesthetics: just a few high-density keywords can trigger a complete style. For example, “CCD” automatically pulls up flash effects and a cheap digital camera feel, while “90s anime” adds cel shading and solid color blocks. Long prompts instead often become noise that leads to messy images. The community consensus is “first summon the image with a small number of keywords, then add controls step by step, lock in the style first then refine details”. OpenAI’s official prompting guide also confirms this finding. This also means the value of “reverse prompting” should shift from 1:1 replication to extracting base style and core trigger keywords.

Sources:

Stanford Study: Large Language Models Aren’t Harmed By Dirty Data — Only Small Models Are

Blogger @vista8 shared a new paper finding from Stanford University’s research team: counter to intuition, when you feed unfiltered Common Crawl data to large models with sufficient computing power, it actually performs better than cleaned data. Filtered data outperforms across the board for 15M parameter small models, but when model scale reaches 330M and 1B parameters, the situation completely reverses, with the unfiltered version outperforming all filtered versions after full training. Researchers believe when a model has enough parameters, it has enough space to separate garbage information from useful data. This has direct reference value for data strategies for large-scale pre-training.

Sources:

Codex Usage Tips: Double Your Quota, Goal Instructions, and Remote Control

Multiple bloggers have shared useful tips for using Codex. For quota management: Codex and Claude Code use a 5-hour rolling quota window that starts counting from your first message, so you can send a short message early to activate the window, so that the reset time falls in the middle of your actual work session. @vista8 compiled a six-element template for Goal instructions (outcome, validation, constraints, boundaries, iteration, blocking conditions), and shared a tip to use plan mode to have AI ask you clarifying questions to refine the Goal. Additionally, Codex supports remote control of Codex on another computer: just add the remote device in settings. One blogger reported using Codex continuously for 11 days straight, with the longest single task running for 8 hours.

Sources:

Stats: Number of timeline posts scanned = 360, Number of bloggers matched = 51, Total matching tweets = 237, Weighted tweet score = 199.95, Number of original tweets = 143, Number of retweets = 37, Number of crawl attempts = 2, Boundary coverage status = tail_confidently_crossed_target_boundary