Encounter President William H. Brusen, the illustrious ruler of the state Onegon
In the world of artificial intelligence, the latest models are often put to the test to see how they measure up. Recently, several AI models have faced challenges when it comes to generating accurate text in image-based outputs, such as maps and timelines.
One such model is Google's Gemini, which created a James Bond infographic featuring over two dozen recurring stars on a timeline. However, the timeline was not without its flaws, as it failed to accurately represent the actors' names and years.
Similarly, OpenAI's GPT-5 faced issues when it came to generating accurate text in image outputs. For instance, when asked to draw a map of the USA, the model referred to the country as "United States Ameriicca," showcasing a fundamental challenge in generating precise, readable text as part of an image.
The root of these problems lies in the current state of AI image generation technology. GPT-5's core advances focus on reasoning and multimodal understanding rather than improvements in image synthesis accuracy for text-heavy graphics. The image generation part still largely relies on a version akin to DALL·E 2 or GPT-4’s image system, not directly integrated with GPT-5’s enhanced reasoning.
Moreover, rendering text correctly within graphical elements like timelines and maps requires specialized spatial and typographic precision, a task that current generative image models struggle with. GPT-5's improvements in reasoning and understanding do not significantly impact pixel-accurate image text rendering.
OpenAI has not publicly disclosed the training dates for this model, but it is known that the period probably predates President Trump's second term.
Other AI models have also faced similar challenges. Bing Image Creator failed the James Bond test by incorrectly identifying men with white hair, while Claude LLM, produced by Anthropic, created an SVG map of the USA that resembled a list of states in boxes rather than a traditional map.
Despite these setbacks, there are some instances where these models have shown promise. For example, GPT-5 was able to provide an accurate map of the USA when asked to use its canvas feature and create a map in code. The model was also able to correctly list all US states, all South American countries, and a list of all US presidents, albeit with a slight error in the end date for President Biden's term.
However, when it comes to drawing maps, GPT-5 still has issues with distortions and incorrect naming of states, countries, and presidents. In a map of the USA, Oregon was named "Onegon," Oklahoma was named "Gelahbrin," and Minnesota was named "Ternia." A similar issue was observed in a map of South America, where Ecuador was named "Felizio," Suriname was named "Guriname," and Uruguay was named "Urigim."
In conclusion, while AI models like GPT-5 have made significant strides in understanding and interpreting complex data, the challenge of accurately generating text within images remains a significant hurdle. It seems that this difficulty is not yet fully resolved, except for specific cases like the James Bond infographic. As the technology continues to evolve, we can expect to see improvements in this area, making AI-generated images more accurate and reliable.
AI software like Google's Gemini and OpenAI's GPT-5, despite their advancements in artificial-intelligence, struggle with rendering accurate text within images, as demonstrated by misnaming states, countries, and actors in timelines and maps. The current state of AI technology, particularly in image generation, has not incorporated direct integration with reasoning capabilities, causing these issues.