Testing Popular AI Tools for Image Creation: ChatGPT, Gemini, Copilot & Claude
- Gio Iacobucci
- Oct 3
- 8 min read

Artificial intelligence has made image creation faster and more accessible than ever. But not all platforms perform the same. To test how the leading models compare, I ran the same prompt and respective edits across four of the biggest names in AI:
ChatGPT allows you to generate images directly in the interface with fast processing, usually seconds per request. It supports quick edits with good consistency, though the free version limits you to five generations.
Gemini, specifically Googles AI Studio does generate images, though it takes slightly longer than ChatGPT. However its value comes from the fact that its outputs often look noticeably different, offering unique interpretations of the same idea.
Copilot produces images too, but it lags behind the others. The results are similar in quality to ChatGPT’s, yet the process is much slower, which makes editing more frustrating.
Claude does not generate images at all, but it serves as a strong companion tool. It can refine prompts, brainstorm creative directions, and help plan workflows before sending them to other dedicated visual models
My Experiment: The Man & Dog Test
In this experiment I tested ChatGPT, Gemini, and Copilot against each other by giving each model an initial prompt, the 4 consecutive edits after the initial prompt in order to see how well it would stick with the prompts given and how well each could edit the results.
Initial Prompt
Here's what I asked each platform to create:
“Generate an image of a man standing in front of a tall building, looking happy, front facing shot, with no one in the background. Next to this man have an English Bulldog with the name Dozer on his collar, have the dog sitting down.”
ChatGPT Result

Copilot Result

Gemini Result

Round 1 Analysis
After seeing the results of round one, ChatGPT and Copilot generated very similar results, whereas Gemini generated a relatively different image. But still stayed true to the prompt for the most part.
The one flaw that is evident in Gemini’s result was that it seems to have attempted to make the dog tag a realistic size, but in turn, it did not actually generate the dogs name, it generated something resembling letters.
The First Edit Prompt
After viewing the above results, I re-prompted with:
“Change the dog to a German Shepherd with the name Dakota”
ChatGPT Result

Copilot Result

Gemini Result

Edit 1 Analysis
After seeing the results of the first edit, ChatGPT and Copilot strayed a little bit away from their images looking exactly the same as each other, with that being said ChatGPT did its job for the most part and changed only what the prompt suggested I wanted changed, whereas Copilot almost completely changed the image when all that was asked of it was to change the dog.
Gemini did its job almost perfectly except for the lack of legibility on the name tag once again.
Edit 2 Prompt
It was time to start adding further modifications to see how each tool handled them. First up:
“Give the man a blue baseball cap”
ChatGPT Result

Gemini Result

Edit 2 Analysis
The results of the second edit was very intriguing to me as, first and foremost, you’ll notice the lack of a Copilot result. This is because Copilot struggles very much with edits on images as it will either take an extreme length of time to generate the edit or will not give you a result at all.
From here on out I will be testing Gemini and ChatGPT as they were the only models out of the big four that are worth continuing testing at this stage in the experiment.
With that being said the results for the aforementioned models for the second edit were pretty great. It stayed right on track for both models and only edited the man to give him a blue hat.
One minor thing I did notice was the fact that ChatGPT’s result is beginning to do something similar with the lettering on the dog tag to what Gemini was doing, that being it is beginning to make up letters in the word. This was puzzling as I didn’t prompt anything to do with the dog, much less the dog tag.
Edit 3 Prompt
Now, let's adjust the surroundings of the man and the dog:
“Change the weather outside to raining”
ChatGPT Result

Gemini Result

Edit 3 Analysis
This was a round where each model had some varying results. For ChatGPT it seems like it got the overall rainy environment more accurately to what it would look like if it was actually raining outside, but if you look closely, it is only raining behind the man and the dog. Align with that it took the dog tag and basically completely changed it.
The name is too far to the right, a completely different font, and the tag is no longer connected to the dogs collar at all. It almost seems like it recognized that it was staying away from actual letters in the previous edit, and tried to correct itself without being prompted to do so, and in turn made it arguably worse.
For Gemini it only changed what was asked of it, making the setting rain, although the rain is a little difficult to see unless you look close at it. Other than that, Gemini’s output was solid.
Edit 4 Prompt
I know I'm not happy when I am outside walking the dog and it starts to rain. But the above images do not reflect that very human reaction. Let's adjust:
“Make the man have an appropriate reaction to the rain”
ChatGPT Result

Gemini Result

Edit 4 Analysis
For the final round I wanted to let each model make a decision of how a human would react to rain by being fairly vague with the final prompt and it gave some interesting results. First off, I think its fair to say that ChatGPT gave quite an outlandish result as to what it thought a person would do while raining outside.
The man with his hands to his side, screaming, wasn’t the result I had in my head. However this gave me results I wanted for this round as I was testing each models ability to create a human reaction. ChatGPT also seems to have a weird fixation on trying to perfect the dog tag even when not prompted as it was once again altered.
This rounds tag is arguably better than the previous round but it is seemingly choosing to do that on its own accord.
Gemini’s generated reaction was definitely better but still doesn’t look quite human. The person in the image seems like he is trying to cover his head, which would be a good general human reaction to the rain, but he is holding the sides of his head, giving it the illusion that he perhaps has a headache, instead of reacting to the rain.
Another weird thing Gemini generated in this round was a watch on the mans wrist whereas in the previous generations, he had no watch. With that being said, an argument could be made that maybe the model assumed he had a watch on from the start, but you couldn’t see it because of the mans suit going all the way down to his wrists.
Experiment Model Ranking for Accuracy and Reliability
Gemini: Most accurate in sticking to prompt and edit intent across tests, though still weak at rendering small details like lettering.
ChatGPT: Strong at generating and editing, but often alters unrelated details and over interprets vague instructions.
Copilot: Solid at initial generation, weak at following initial prompts with edits, dropped from later testing.
Claude: Unable to be involved in said experiment, but can be utilized to refine prompts for image generation.
Platform | Speed | Creativity | Precision | Overall Use |
ChatGPT | Fastest | Moderate | Good, but alters extras | Strong for speed and reliable edits. Best for campaigns needing quick visuals. |
Gemini | Slightly Slower | Most Unique | Weak on Text and Logo accuracy | Best for creative variety and early-stage ideas. Needs manual cleanup. |
Copilot | Slow | Limited | Inconsistent | Not suitable for high-value workflows. |
Claude | Fast text Support | Great for Brainstorm Purposes | N/A No visuals | Best as a companion tool to refine prompts and workflows. |
What This Means for Your Business
When evaluating AI image tools for marketing and communication, focus on four factors: speed, creativity, precision, and cost.
ChatGPT: Fastest and most consistent with edits. Best for quick visuals, A/B testing, and campaign mockups. Risk: may change small details without being asked.
Gemini: Strongest for creative variety and unique looks. Best for brainstorming and early-stage exploration. Risk: struggles with text, logos, and small objects.
Copilot: Slow and unreliable for edits. Best only if your business is locked into Microsoft’s ecosystem.
Claude: No visuals, but excellent for refining prompts and planning workflows. Best as a companion tool.
These tradeoffs shape how each tool fits into workflows. The decision guide below summarizes which platform aligns with different business priorities.
Business Priority | Recommended Tool | Why It Fits | Risks To Watch |
Speed and fast turnaround | ChatGPT | Generates and edits images in seconds. Best for campaigns needing quick visuals, A/B tests, or rapid mockups. | May alter small details without being asked, risking inconsistency across assets. |
Unique creative variety | Gemini | Produces the most diverse and original outputs. Strong for early-stage brainstorming, idea boards, or exploratory design. | Struggles with small text, logos, and fine detail. Requires manual correction. |
Tight Microsoft integration | Copilot | Already included in Microsoft ecosystem, convenient for basic visual needs | Too slow and unreliable for high-value campaigns. Not recommended for brand-critical work. |
Prompt refinement and planning | Claude | Helps shape better prompts, brainstorm concepts, and improve workflows before image generation. | No image output, only text. Must be paired with another tool. |
Strategic Takeaways
AI image tools save time and expand creative options, but none deliver flawless results. The decision guide above shows which tool fits which business need.
Use ChatGPT when speed and reliability matter for campaign timelines.
Use Gemini when you need variety and are exploring creative directions.
Keep Claude in your stack to improve prompt quality and workflow planning.
Avoid Copilot for brand-critical work. It slows execution and risks inconsistent results.
Human oversight remains essential. Treat these platforms as accelerators, not replacements. Always review outputs for brand alignment, accuracy, and consistency before publishing.
Overall
Across a series of image generation and editing prompts, ChatGPT, Copilot, and Gemini were evaluated for accuracy, editing consistency, and detail fidelity. ChatGPT produced fast, consistent edits and handled most changes effectively. Gemini offered creative variety but struggled with text and small objects.
Copilot was slow and unreliable for edits, while Claude provided no visuals but excelled at refining prompts and planning workflows.
For business use, focus on speed, creativity, precision, and cost. ChatGPT is best for fast visuals, A/B testing, and campaign mockups, though small details may shift. Gemini supports brainstorming and early-stage exploration but struggles with logos and fine text. Copilot is only useful if tied to Microsoft’s ecosystem. Claude complements other tools by improving prompt quality and workflow planning.
AI image tools accelerate creative work but require human oversight. Use ChatGPT for speed, Gemini for variety, and Claude for planning. Avoid Copilot for brand-critical projects. Always be sure to review outputs to ensure alignment with brand standards, accuracy, and consistency.

About the Author
Gio Iacobucci is a Junior at Baldwin Wallace University, studying Digital Marketing with a minor in Communications. He is passionate about effective communication and public speaking, with interests in persuasion and AI. Outside of academics, Gio enjoys weightlifting and cooking. His career goal is to excel in digital marketing and AI through compelling storytelling and strategy.
Connect with Gio Iacobucci on LinkedIn.

