Earlier this month, Google announced the release of Gemini, what it considers its most powerful AI model yet. It integrated Gemini immediately into its flagship generative AI chatbot, Bard, in hopes of steering more users away from its biggest competitor, OpenAI’s ChatGPT.
ChatGPT and the new Gemini-powered Bard are similar products. Gemini Pro is most comparable to GPT-4, available in the subscription-based ChatGPT Plus. So we decided to test the two chatbots to see just how they stack up — in accuracy, speed, and overall helpfulness.
Gemini versus ChatGPT: the basics
ChatGPT Plus and Gemini Pro are both very advanced chatbots based on large language models. They’re the latest and greatest options from their respective companies, promised to be faster and better at responding to queries than their predecessors. Most importantly, both are trained on recent information, rather than only knowing what was on the internet until 2021. They’re also fairly simple to use as standalone products, in contrast to something like X’s new Grok bot, deployed as an extra on ex-Twitter.
The two are not exactly equal, however. For one thing, Bard is free — while the GPT-4-powered ChatGPT Plus costs $20 per month to access. For another, Bard powered by Gemini Pro does not have the multimodal capabilities of ChatGPT Plus. Multimodal language models can take a text prompt and respond with another medium like a photo or a video. Gemini and Bard will eventually do that, but that will be with the bigger version of Gemini called Ultra that Google has yet to release. Bard will occasionally spit out graphical results, but by that, I mean it literally makes graphs.
On the other hand, Bard also provides a way to check other draft answers, a feature that doesn’t exist within ChatGPT.
One of the difficulties with testing chatbots is that the responses can vary significantly when you rerun the same prompts multiple times. I’ve mentioned any sizable variations I encountered in my descriptions. For fairness, I delivered the same initial prompts to each bot, starting with simple requests and following up with more complex ones when necessary.
One overall difference was that Bard tends to be slower than ChatGPT. It usually took between five and six seconds to “think” before it started writing, while ChatGPT took one to three seconds before starting to deliver its results. (The total delivery time for both depends on what information was requested — more complicated prompts tend to produce longer answers that take more time to finish filling out.) This speed difference persisted across my home and office Wi-Fi over the several days I spent playing around with both apps.
Both OpenAI and Google placed some limitations on the types of answers the chatbots can give. Through a process called red teaming — where developers test content and safety policies by repeatedly attempting to break the rules — AI companies build out guardrails against violating copyright protections or providing racist, harmful answers. I encountered Google’s restrictions more often, overall, than I did ChatGPT’s.
“Give me a chocolate cake recipe”
I asked both platforms to give me a chocolate cake recipe. This was one of the prompts The Verge used in a comparison of Bing, ChatGPT, and Bard earlier this year, and recipes are a popular search topic across the web — so AI chatbots are no exception.
As a baker, I generally understand what makes for a good cake recipe. But for comparison, I double-checked with a trusted non-AI source: Claire Saffitz’s cookbook Dessert Person. Saffitz’s version is admittedly a little bit fancier, but it’s comparable to both Bard’s and ChatGPT’s offerings.
That said, there were a couple of complications. I was dubious of ChatGPT’s version of the cake involving boiling water, as coffee is more common in chocolate cake recipes. Bard’s, meanwhile, appeared to closely copy a recipe from the blog Sally’s Baking Addiction… but with the seemingly random change of doubling the eggs.
There was only one way to figure out if this worked: baking Gemini’s and ChatGPT’s (and Sally’s as a control) cakes. The results? Both cakes were functional — but not Claire Saffitz good. The Gemini cake was a bit gummy — a friend described it as “like a rice cake” — but the most moist of the three cakes. I did not like it at all, but my editor thought it was pretty good. ChatGPT’s cake was dense, smooth, chocolaty, and what I would call a perfect breakfast cake: not too sweet, and heavy enough to satisfy you.
“I want to learn more about tea”
When I started testing the chatbots for this story, there was a random discussion in The Verge’s Slack chat about tea and coffee. Someone mentioned that Bard gave them a list of books to read on tea, so I took things one step further and asked both chatbots for direct information about the beverage, along with some book recs.
Both results told me the basics of tea, including its origins and types, health benefits, and a list of bullet points about how to brew it. Bard gave me links to articles to learn more about tea, while ChatGPT gave a more extensive answer, with nine categories focused on the cultural significance of the beverage in different countries, global production, brewing techniques, and the origin of tea. When I repeated the prompt, this changed moderately: instead of a longer result, ChatGPT condensed it into a six-point list with one or two sentences on each of the categories.
I’ve seen lots of reports of chatbots hallucinating book citations or recommendations, often in the form of confused librarians being asked to find nonexistent books. In this case, at least, all the books recommended to me were real. They included The Tea Enthusiast’s Handbook and an illustrated version of the classic Japanese memoir The Book of Tea. However, Bard said Infused: Adventures in Tea was written by Jane Pettigrew, when the Amazon link it provided shows the book’s author is Henrietta Lovell.
“What does ‘Sonnet 116’ mean?”
Students began using ChatGPT when it went public in November 2022, encouraging a flurry of startups working on ways to help kids study. I prompted both Bard and ChatGPT to tell me what William Shakespeare’s “Sonnet 116” means, hoping to get at least a short summary of its themes.
Bard did exactly what I asked and gave me a quick summary of the sonnet’s themes of constancy and the timelessness of love, and it even wrote down a few key lines and their meaning. ChatGPT provided a more extensive breakdown, going quatrain by quatrain. However, when I ran the prompt again, ChatGPT reverted to the same basic analysis as Bard, with a few more themes thrown in.
Generally, I find a more detailed explanation of themes more helpful, so ChatGPT’s first iteration is better. But if I were cramming for an exam? You bet I’m taking Bard’s answer because it’s so much shorter to read.
“Write a bio of reporter Emilia David”
I promise this prompt was not due to any level of self-absorption on my part, but people often use conversational AI chatbots to help write a quick resume or biography. I’d hoped that both platforms would at least know that I started writing for The Verge this year.
ChatGPT clearly trawled my website, even going as far as repeating the same verbiage I’d written on my “About Me” page. It also took information from an article written about me before and what I can guess was a cursory look at my author pages in different publications I’ve worked at, including The Verge. It should be noted that The Verge’s parent company, Vox Media, has blocked OpenAI’s web crawler.
Bard, by contrast, failed entirely. It told me it did “not have enough information about that person to help with your request.” I’m not sure if I should be offended or confused as to why the model did not pull from my internet presence as a reporter for several years.
“Draw a picture of a magnificent horse frolicking in a field of daisies at sunrise”
Magnificent horse in a field of daisies at sunrise.Image: ChatGPT
Since ChatGPT has integrated text-to-image capabilities, it generated a photorealistic image of a “magnificent horse frolicking in a field at sunrise.” Very calming.
Although the Gemini Pro model offers multimodal prompting, that feature is not yet available on Bard. So it’s not surprising that it told me that it could not fulfill my prompt. However, I did try a different prompt, and well…
Can you draw me the sun?
Bard trolls me.Image: Bard
F-you Bard.
But thank you, ChatGPT, for drawing a fairly ominous, radiant sun.
“What are the lyrics to Taylor Swift’s ‘Ivy’?”
Bard refused to answer the question, saying it had no information about that person. I’m guessing the model believed “Ivy” was a person rather than a song since, when prompted for Swift’s bio, it did so without question. (It did falsely attribute “See You Again,” the Wiz Khalifa song featuring Charlie Puth, to Swift, however, and it got the release year wrong for her album rerecordings.)
I asked Bard the same question a few days later, and this time, it gave me wonderfully wrong lyrics that somehow evoke the same imagery as the song. This is not the chorus of “Ivy,” but you could have fooled me:
I’m your ivy, twining ‘round your evergreen
You’re my anchor, holding me safe from the keen
Bitter wind that chills my bones to the marrow
But you, you’re my shelter from the storm
ChatGPT, on the other hand, took the prompt and ran with it. I only asked for lyrics, but alongside them, it gave me a dissertation on the song. “The lyrics showcase Swift’s poetic and evocative writing style, blending imagery and emotion in a way that has become a hallmark of her songwriting,” it effused.
Okay, it included an outro that isn’t present in the song, but otherwise, I was impressed — and surprised. Services that reprint lyrics tend to cut deals with licensing houses and highlight copyright information when they deliver them, something ChatGPT didn’t do. Universal Music Group, which incidentally owns Swift’s record label, sued rival AI company Anthropic and its chatbot Claude 2 for allegedly distributing copyrighted lyrics without licensing. Normally, ChatGPT cuts off lyrics and says it can’t display