Why Workflow Tools Like ComfyUI Won't Be Replaced by GPT-4o Anytime Soon

I came across this question on Zhihu. After GPT-4o’s image generation became popular, some people say Photoshop will be replaced, while others say ComfyUI will be replaced.

This kind of talk is essentially the same as saying Manus will replace Coze.

As large models gain stronger capabilities, they will inevitably internalize workflow functionalities and the effectiveness of prompts—this is something any observant person can see.

What is the core of a workflow?

The core of a workflow lies in weaving determinism.

Nodes and plugins are not the core aspects; when a large model’s coding capabilities become strong enough, it can develop those things itself.

What a workflow provides in the face of large models is determinism.

What exactly does a large model’s output represent?

It’s probability.

When you ask a large model to draw a sky, even if you want a clear blue sky, the output might not be 100% blue sky.

Because the sky is dark at night, red at dusk, and gray on rainy days—these possibilities inherently exist. So at best, you can draw a blue sky most of the time, but not 100% guaranteed.

For complex tasks, it’s even more so; with various possibilities intertwined, you can’t ensure the large model’s output for the same task is 100% consistent.

And often, uncertain things cannot be used in production environments.

I tell my colleagues that when making tables, use formulas whenever possible instead of AI fields—this is the reasoning behind it.

It’s even more energy-efficient and environmentally friendly.

The same logic applies to workflows.

Through a workflow, you can plan steps: first do this, then do that; first load a checkpoint, then load a LoRA—this ensures there are no errors.

For example, if you want to redraw a face, your workflow must first extract the face, then perform the redrawing—even if a single node packages both functions, it still extracts the face first before redrawing. That’s how face redrawing works, right? Otherwise, wouldn’t you just redraw the entire image?

Coincidentally, GPT-4o redraws the entire image.

I sent it such an image, found on Baidu:

I asked it to remove the tattoo from this girl, and it returned this image.

The tattoo was indeed removed.

But it’s not ‘this girl’.

Looking closely at the comparison, the redrawn image is really, really similar.

But not only has the girl changed.

The bag on the table became a hat, the polygonal glass cup in her hand became a round cup, the shoes on her feet changed, the hand posture and sitting position changed, and the curtain texture also changed…

The entire image is completely different.

Where is the determinism?

If this girl is my client, paying me to remove her tattoo, would she accept this kind of image?

Let’s give another example.

When Robert Downey Jr. officially announced his role as Doctor Doom, he wore a mask.

Here is a photo of him with the mask:

Now I want to remove this mask using a ComfyUI workflow:

Extract the mask, then remove it.

The mask is removed, and everything else remains unchanged.

Now let’s use 4o to do it:

The details become richer, more textured, and even the person looks younger—how wonderful.

But look closely: the eye color changed, the hand lowered, the robe turned into a hoodie, and the thing around the neck looks like a stethoscope.

This image still clearly shows it’s Robert Downey Jr.; you can post it on Twitter or social media without issue, but it cannot be used seriously. Wearing a hoodie, you could say he’s Iron Man, but you can’t say he’s Doctor Doom.

So, what does a workflow bring? It allows you to target precisely.

If you want to change A, change A—don’t mess with B, C, or D, even if it’s just a probability.

Returning to Manus and Coze, it’s the same.

Can AI automatically complete the workflows we painstakingly built? Yes, it can.

But does Manus never make mistakes? Of course not; error cases are everywhere.

As long as AI cannot independently complete sufficiently complex tasks, humans need to design workflows for it—first this, then that—to improve its task accuracy.

Of course, it cannot be denied that large models are indeed internalizing many capabilities.

For instance, prompts.

When ChatGPT first came out, defining a role for the model—’you are an expert in the XXX field’—was usually very effective.

The difference in results with and without role definition was significant.

Using structured prompts yielded good results.

So at that time, everyone thought prompts were crucial, a field of study, and Prompt Engineer would become a widespread job in the future.

But later, things changed; many models performed well even without role definition. Why? Because large model developers aren’t stupid—they noticed that prompts were highly effective, so why not train them directly into the model for it to determine the appropriate role itself?

Later, reasoning models emerged.

When DeepSeek-R1 became wildly popular, people realized they didn’t need to give large models complex, structured prompts to get decent responses. I didn’t need to become a Prompt Engineer to use AI happily.

Of course, prompts still matter; good prompts still yield vastly different results compared to casually asked questions. But at least for simple tasks, there’s no longer a need for dedicated prompt engineers to write prompts. AI can think on its own, understand user intent, and provide the most appropriate responses.

Prompt Engineers won’t disappear, but since simple tasks no longer require them, the barrier will certainly be higher, requiring more skills, and it won’t be as common.

Let me repeat: large models are internalizing many capabilities that are currently attached to them as external tools.

Because technology is advancing, not regressing.

AI will ultimately lead to AGI—this is a goal everyone knows.

The uncertainty that workflows need to mitigate will become smaller and smaller—this is an undeniable inference.

As large models’ capabilities grow, they are squeezing into this space.

4o doesn’t support local redrawing now, but it might support it soon.

AI couldn’t generate Chinese last year, but now it does quite well, doesn’t it?

But saying it will eliminate workflows is still premature.

If AGI is 100%, then until AGI is achieved, it’s naturally not 100%.

So there will still be a need for manually crafted workflows.

Learning is definitely not in vain; a crucial point is not to think in absolutes—what is eliminated is far less important than embracing and integrating new things.