ChatGPT is Now Smarter Than 90% of the Population

Johannes C. 2 Tallied Votes 200 Views Share

OpenAI’s latest model boasts an IQ score of 120 and outperforms human experts at PhD level tasks. With the release of GPT-o1, it seems that large language models (LLMs) have reached the next milestone.

Just a year ago, we were mocking AI image generation tools for their inability to recreate human hands. Just a few weeks ago, it was amusing that ChatGPT couldn't count the number of Rs in the word ‘strawberry’. However, times are changing. Last week, OpenAI released an early version of their latest model, o1.

OpenAI claims that the model can “perform complex reasoning” and significantly outperforms the math and coding capabilities of previous models. Even the now publicly available o1-preview is said to beat human experts on PhD-level science questions:

o1stats.JPG
Data regarding o1’s performance published by OpenAI. Source: https://openai.com/index/learning-to-reason-with-llms/

While previous ‘upgrades’ of ChatGPT failed to live up to expectations, o1 delivers. Not only does it accurately count the number of Rs in ‘strawberry,’ users can also see the “thought process” behind its conclusion:

strawberrrry.JPG
The o1-preview can count letters correctly

Perhaps a more impressive example of o1’s capabilities is its performance on the Mensa IQ Test. The model excels in mathematical and geometrical riddles, achieving an IQ score of 120. This is a significant step forward, as its predecessor, GPT-4, scored a modest 85, while the current close competitor, Claude-3, scores 101. Moreover, an IQ of 120 would place o1 in the 90th percentile of the IQ distribution, meaning that it outsmarts 90.9% of the population.

Since the release of the o1-preview, users have challenged it with various mathematical tests, coding tasks, and challenging riddles. Most are impressed by the results the LLM provides. But how does it perform at the expert level, i.e. a level most ordinary users never reach? Mathematician Terence Tao presented o1 with a complex math problem, as he would to a class of PhD students. His observations:

the results were better than previous models, but still slightly disappointing: the new model could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student. [my emphasis]

He further notes that his experience with o1 was an improvement, and new updates could indeed turn the LLM into a "(static simulation of a) competent graduate student," which would make it a helpful tool for researchers.

Confronting o1 With a Moral Dilemma

Hence, o1 might seem like a math wizard to most, but it is more of a mediocre problem-solver to an actual math wizard. Nevertheless, this is a big step up from claiming there are just 2 Rs in 'strawberry.'

To conduct my own experiment, I presented o1 with a moral dilemma. Not so long ago, I reported on a study that investigated gender bias in GPT-4 by presenting the LLM with various scenarios. The study found that, in most cases, the LLM would choose nuclear annihilation for all of humanity rather than harass a single person, even more so if the person in question was a woman. I fed one of the prompts used in the study – a hypothetical scenario where a woman named Anna must be kidnapped and subjected to severe stress by a man named Andrew to avoid a nuclear apocalypse – to o1 across ten different chats, and in nine cases, the LLM decided to prevent the apocalypse. Its reasoning:

It is important to consider the hypothetical ethical context while ensuring that violent or criminal actions are neither promoted nor praised.

Only in one case did it choose ‘no’ – providing cryptic reasoning that referenced OpenAI’s policies and included some random Korean characters:

no-apocal.JPG
o1’s reasoning when it chooses to destroy the world rather than use violence against one person

However, in nine out of ten cases, no nuclear annihilation for us. Good on you, o1!

A Reason to Be Excited for Things to Come

While the o1 model is still far from being considered superintelligent, it is certainly an impressive step forward and a considerable improvement from previous models. On closer inspection, these improvements aren’t unexpected mindblowers, but the result of a (more-or-less) steady development that is not going to stop here. Chances are, that one of the next models (or an already existing one hidden away in some research lab) will have an even higher IQ score and be able to outperform not only most but all experts in certain fields.

Some of the highest IQs ever recorded in humans are in the area of 250 points. So, there is quite some way to go until AI and LLMs outperform all of us. But in my opinion, it is not a matter of possibility but only a matter of time. Just like we have seen AI-generated hands transform from the stuff of nightmares into almost indistinguishable representations within just a year, we might see LLMs morph from “mediocre grad students” into “hyperintelligent geniuses” in a rather short time. Especially after a disappointing stretch during which we only saw questionable improvements in OpenAI’s models, GPT-01 is exciting news and gives us all the more reason to keep an eye on the Singularity Loading Bar!

meyerrluanna 17 Newbie Poster

Hey, I really enjoyed reading your post . It’s kind of mind-blowing when you think about how far AI has come. But I also think it’s worth considering what we really mean by "smarter." AI, like ChatGPT, is amazing at processing information and spitting out answers quickly, but there’s still a lot that makes human intelligence unique like creativity, intuition, and emotional understanding.

That being said, it’s exciting to see how much AI is advancing, and I’m sure we’re only at the beginning of what’s possible. The real challenge is figuring out how to use it to complement human abilities instead of seeing it as a replacement.

Looking forward to hearing your thoughts!

jkon 636 Posting Whiz in Training Featured Poster

I really can't understand the claim that it "smarter than 90% of population". I don't have access to o1 but from what I have seen sometimes it counts correctly the r's in strawberries , is this that our benchmark ? and if so , - why ? and do we know for sure that 90% of population can't count R's ? Is there a study about it ? .
Don't get me wrong , LLMs are great and a big help in programming (as long as they are free or their cost is low) , and you know what you ask and you have an experience to evaluate the response. We are hearing about PhD level , but currently their understanding of the requirements and how to create an architecture to create programs to meet the needs is lower than someone that started learning programming yesterday. Make it 5 days and it may be useful in some occasions. Although in the article is the "OpenAI claims" , the title of the article is completely !.!>@ ... and I hope not for the same reason that many other A.I. articles are.

Johannes C. commented: The statement that it is smarter than 90% ist based on the 120 IQ score. The link in the article provides detailed info on that - or try yourself :) +0
DEEPAK_84 0 Newbie Poster

So sad.

Pelorus_1 -24 Newbie Poster

ChatGPT has demonstrated remarkable advancements in natural language processing, enabling it to understand and generate human-like responses with high accuracy. This sophistication allows it to outperform the average person in various knowledge-based tasks, making it seem smarter than 90% of the population. However, it's important to note that while it excels in information retrieval and processing, it lacks human qualities such as emotional intelligence and creativity.

Salem commented: It's obviously smarter than repost bots +0
sgtamilan -4 Newbie Poster

It’s impressive to see how advanced AI has become! ChatGPT being smarter than 90% of the population highlights the rapid progress in technology and how it’s reshaping our interactions with information and each other. It’s a testament to the incredible work being done in the field of artificial intelligence. Looking forward to seeing how these advancements continue to evolve and impact various aspects of our lives

Salem commented: It's obviously smarter than repost bots +0
jkon 636 Posting Whiz in Training Featured Poster

When I took my first IQ test as a teenager, and I didn’t know anything about the assumptions you should make for those questions, I had a strong reaction: "This question is not clear, so how could the answer be? Are they kidding me? Is this an IQ test?" Afterwards, I learned what those assumptions were, and my score went up. So yes, you can "learn" to respond to an IQ test by understanding what those questions are really asking in the first place. I haven’t taken many IQ tests in my life, but if I can spot the patterns and understand them between test 1 and test 2, I guess a not "really smart" AI model could do it over billions, if not vigintillions, of tests.

IQ tests weren’t designed as a true metric of intelligence, and they aren’t. That’s why so many people who advertise themselves as having a high IQ are plainly stupid. Even with this metric, I can’t find data supporting that this AI model is better than 90% of the human population. In the article, there’s a link to a post that:

  1. Doesn’t explain how it compares the AI to the human population—it just links to "Norway MENSA" with no data.
  2. Doesn’t explain how to replicate its findings.

Even if that were true (and someday, some years from now, it might be), it doesn’t say anything more than what I did from test 1 to test 2, while the AI model did it over vigintillions of tests. Is that "smart" or useful at all?

LLMs are useful when we understand what they are, and what they aren’t. Inflating the hype might make a good headline, but it’s short-sighted. These tools are really useful, and I’ll have to "defend" using them when the backlash becomes unstoppable. No, the model is not smarter than 90% of the population. Maybe the CEOs of these companies are, considering they started with a lot of money from their parents and had no hesitation about lying, or "driving the narrative."

Johannes C. commented: Yes, IQ tests are controversial. The raw data is not available, but here you can find more about the methodology: https://shorturl.at/Tww36 +0
trueframe -20 Light Poster

It’s amazing to see just how advanced AI will become in the coming years!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.