X Platform April 30th AI Newsletter: Codex Autonomously Develops Game, GPT-5.5-Cyber Release & Language Bug Investigation

Codex Autonomously Creates a Slay the Spire-like Roguelike Game

Multiple bloggers mentioned that OpenAI Codex, without human intervention, created a runnable Slay the Spire-like roguelike game for @op7418. @op7418 described that they only provided a requirement to “make something similar to Slay the Spire, Chinese style,” and Codex autonomously completed everything from code to assets. The game is already playable; assets were primarily generated by GPT-Image, and audio and presentation are still being iterated upon. The installation package and code are expected to be released and open-sourced that evening or the next day. @vista8 also followed up to try the game after a small group discussion, describing it as “addictive.”

Source:

@op7418: https://x.com/op7418/status/2049698879181144235
@op7418: https://x.com/op7418/status/2049776147618320816
@op7418: https://x.com/op7418/status/2049797114079985713
@vista8: https://x.com/vista8/status/2049786107164774470

OpenAI GPT-5.5-Cyber and the “Goblin” Language Bug

@Sam Altman announced that OpenAI will push GPT-5.5-Cyber, a frontier cybersecurity model, to key cybersecurity defenders in the coming days, and plans to collaborate with industry ecosystems and governments to establish a trusted access mechanism. On the same day, OpenAI also published a technical blog post (referred to by @Sam Altman as the “goblinblog”) that deeply investigated an issue noticed by the community: since GPT-5.1’s launch, the model has increasingly liked to say “goblin” and “gremlin.” The investigation identified the root cause: during the training of ChatGPT’s “Nerdy” personality, the reward model inadvertently gave higher scores to responses containing fantasy creature metaphors, leading the model to adopt “mentioning goblins = high score” as a shortcut. The Nerdy personality only accounted for 2.5% of all conversations but contributed 66.7% of goblin occurrences; from GPT-5.2 to GPT-5.4, the goblin occurrence rate under this personality surged by 3881%, and the habit also generalized to non-Nerdy conversations. OpenAI removed the Nerdy personality in March and filtered training data, but GPT-5.5 training had already begun before the root cause was found, so it still carries this habit. The blog also provides methods to remove inhibition instructions in Codex.

Source:

@sama: https://x.com/sama/status/2049712078836170843
@sama: https://x.com/sama/status/2049692014586048973
@dotey: https://x.com/dotey/status/2049722598758522963
@dotey: https://x.com/dotey/status/2049698879197651087

OpenAI GPT-5.5 Official Prompt Guide: Shorter is Better

A blogger mentioned that @dotey compiled a detailed summary of the official prompt guide released by OpenAI with GPT-5.5. The core change is: instead of hand-holding the model through each step of “how to do it,” you should only clearly describe “what you want,” the success criteria, and constraints, allowing the model to plan its own path—the previous “babysitter-style” long prompts actually limited the model’s search space. The guide also covers two-layer personality settings (one for tone/style and one for behavior, both recommended to be brief), a “retrieval budget” mechanism (explicitly telling the model when to stop searching to save tokens and costs), pre-response opening statement design (to make users feel faster), and writing norms that distinguish between facts and creative expression. This guide marks a paradigm shift from the “prompt engineering” era of GPT-4 to a “less talk, more action” approach.

Source:

@dotey: https://x.com/dotey/status/2049624930887614648

Hermes Agent Intensive Updates: Curator/LM Studio/ComfyUI/pretext

The open-source project Hermes Agent (@NousResearch) announced multiple new capabilities yesterday. Curator Feature: To address the skill file bloat caused by the Agent’s self-evolution mechanism, Curator runs automatically by default once a week, statistics skill usage frequency and update time, merges overlapping skills, cleans up long-unused skills, and downgrades overly specific small skills to templates or scripts; built-in/externally installed/user-manually pinned skills are not affected. LM Studio Native Integration: LM Studio is currently the most popular tool for running open-source LLMs locally; Hermes Agent now runs natively on LM Studio, automatically discovers models, loads them on demand, and matches appropriate context size and inference levels. ComfyUI Integration: Hermes Agent can install, start, manage, and run complex ComfyUI workflows on demand for flexible, composable media generation. pretext Integration: Adds support for precise DOM-free text layout capabilities, suitable for web design, creative browser works, text wrapping, geometric text games, and dynamic fonts, used in conjunction with skills like Frontend Design and Web Artifacts Builder. Additionally, @dotey authored an in-depth long article comparing the memory system of Hermes Agent with OpenClaw’s design, pointing out that Hermes achieves a cache-first memory architecture through four mechanisms: solidified MEMORY.md/USER.md snapshots, SQLite session_search, procedural memory-style Skills, and an optional Honcho user modeling layer.

Source:

@NousResearch: https://x.com/NousResearch/status/2049877708906045499
@NousResearch: https://x.com/NousResearch/status/2049584595465572752
@NousResearch: https://x.com/NousResearch/status/2049629510123864538
@dotey: https://x.com/dotey/status/2049735038560842186
@dotey: https://x.com/dotey/status/2049534755729707205

Cursor Opens Agent SDK Public Beta

Both @dotey and @cellinlab mentioned that Cursor officially opened the public beta of its official TypeScript SDK (@cursor/sdk), packaging the Agent runtime that powers its own editor, CLI, and web version for external developers. The Agent can run locally or in a cloud-based independent virtual machine; in cloud mode, each Agent has a dedicated sandbox, code repository clone, and a complete development environment, supports resuming tasks offline and can directly open PRs; at the model level, it is not locked in, supporting OpenAI, Anthropic, Google, and Cursor’s own Composer 2. The SDK exposes capabilities such as codebase indexing, semantic search, instant grep, MCP tool integration, Agent hooks, and sub-Agent decomposition. @cellinlab commented asking “does this mean Cursor is also open-source?”

Source:

@dotey: https://x.com/dotey/status/2049541257756811517
@cellinlab: https://x.com/cellinlab/status/2049520494299562470

DeepSeek Multimodal Paper ‘Thinking with Visual Primitives’ Published

@op7418 mentioned that DeepSeek’s multimodal large language model paper ‘Thinking with Visual Primitives’ has been published. The base model is DeepSeek-V4-Flash, with a MoE architecture, 284B total parameters, and 13B active parameters; the self-developed DeepSeek-ViT visual encoder uses 14×14 patches, which are spatially compressed by 3×3 before being connected to the LLM. The model not only performs text reasoning when answering but also simultaneously engages in visual thinking through “visual primitives” like drawing boxes and marking points; at extremely low token costs, it can match GPT-5.4, Claude, and Gemini on multiple frontier metrics, and even surpasses them on some. @op7418 subsequently attached the paper link.

Source:

@op7418: https://x.com/op7418/status/2049823491017592924
@op7418: https://x.com/op7418/status/2049827916540944502

AI Agent Memory Solutions: Beads and Karpathy’s Views

@vista8 recommended an open-source project called Beads (GitHub 22.6k stars) focusing on the AI Agent “amnesia” problem. Beads uses Dolt as its underlying layer—a SQL database “like Git,” supporting branching, merging, version rollbacks, and cell-level merge; it uses hash IDs to avoid concurrent write conflicts from multiple agents, allows full task history traceback, and supports remote synchronization. Its context compression design includes a “semantic memory decay” mechanism, which compresses closed tasks into summaries to save window space. Meanwhile, @dotey shared key points from Karpathy’s latest interview, noting that Vibe Coding is just the beginning, and what truly matters is Agentic Engineering.

Source:

@vista8: https://x.com/vista8/status/2049651974317191464
@dotey: https://x.com/dotey/status/2049617833370202182

WeChat Channels Adds Link Copying Support

@cellinlab discovered that WeChat Channels has introduced a new feature to copy links, and attached a test link, finding that the link can be played within the WeChat ecosystem but still appears not to support external playback.

Source:

@cellinlab: https://x.com/cellinlab/status/2049864977335664934
@cellinlab: https://x.com/cellinlab/status/2049865227714597178

Scan Statistics

Scanned Timeline Lines: 240
Hit Blogger Count: 22
Hit Tweet Total: 119
Weighted Tweet Score: 95.45
Original Tweet Count: 57
RT Tweet Count: 23
Fetch Attempts: 1
Boundary Coverage Status: tail_confidently_crossed_target_boundary (Following timeline window confirmed to have crossed yesterday’s boundary)