Is OpenAI Outdated? Here are 9 Things Claude 3 Can Do That GPT-4 Can’t

Since its launch in late 2022, ChatGPT has often been imitated, but never equaled. At least, that was the case until now.

With the launch of Claude 3 Opus, Anthropic AI announced that it had outperformed OpenAI GPT-4 on several benchmarks. And now, the superiority of this new AI seems to be confirmed in practice.

Early testers reveal incredible use cases for Claude 3, with tasks that were impossible to achieve with ChatGPT. Check out a dozen examples, and don’t hesitate to scroll through the Twitter threads for more details!

AI That Can Assist Experienced Coders

For example, Israeli user Yam Peleg thinks that Claude 3 has passed the threshold of “Power Users”. He believes this is the first time that an AI has been able to help power users by completing complex tasks faster than they can.

According to his explanations, he only used GPT-4 for brainstorming ideas, learning new topics, summarizing long texts, or other easy tasks.

On the other hand, he had never been able to use it for coding tasks. With each attempt, it took him longer than doing it himself.

In his eyes, ChatGPT is therefore very good for beginners in a field, but not for experienced users on frameworks or programming languages.

These people are consistently faster, more accurate, and better able to avoid bugs or write short, simple code.

However, since the launch of Claude 3, he has found that many experienced coders are using this new AI for real-world tasks!

Claude 3 Creates a Fuzzer to Test Software

For his part, Brendan Dolan-Gavitt, a professor at New York University, gave Claude 3 the source code for a library for decoding GIFs in C that he found on GitHub.

He then asked it to write a Python function to generate random GIFs. Its generator got 92% line coverage in the decoder, and found 4 memory security bugs!

See also  Claude 3: an AI more efficient than ChatGPT and Gemini?

For comparison, the expert explains that he wrote his own Python GIF generator manually a few months ago. This program needed an hour to read the code and find the same bugs as Claude’s…

You can discover this fuzzer written by Claude on GitHub, as well as the analyzed program, its explanation and a makefile.

When AI Solves Engineering Problems

HyperWriteAI CEO Matt Shumer says he has written a prompt for Claude 3 to make engineering decisions.

This prompt involves placing the chatbot in the role of an engineer who is experienced in solving complex problems across various disciplines and able to give valuable advice.

It even indicates the format that these tips should take, starting with an overview of the problem and the challenges to be addressed and then proposing the different solutions and their benefits.

You can copy the prompt directly into the tweet to use it as you please. Anyway, Shumer explains that he had already written it for GPT-4 a few months ago and that it works much better with Claude.

Creating Animations for Math Theorems

Professor Alvaro Cintas, who specializes in AI and cybersecurity, asked Claude 3 to generate an animation for the Pythagorean theorem.

As a prompt, he asked the AI, “Write manim code to animate an explanation of the Pythagorean theorem. Think step by step with code it and provide me with the full code.”

The code wasn’t perfect on the first try, but you only had to change several parts of the Python code in a few minutes to get a very satisfying result!

Translating Ancient Artifacts

AI expert Min Choi used Claude to try to decipher the Phaistos Disk: a well-known ancient artifact discovered in 1908 by archaeologist Luigi Pernier in the Minoan palace of Phaistos.

See also  How GPT-4 Challenged Financial Analysts at Their Own Game!

It is a disk 15 centimeters in diameter, covered with a spiral of symbols. In total, there are 45 different signs.

More than a century after its discovery, many people have tried to decipher it without ever succeeding in proving their theories. Is it a religious document? A calendar? A game? A musical score? The mystery remains.

First of all, Min Choi provided Claude with all the information available about the record. In particular, he gave it the Wikipedia page, and a scientific article to teach it how to use software engineering principles to try to translate symbols.

Initially reluctant for fear of making a mistake, Claude finally agreed to provide speculative translations. According to it, the symbols could evoke a goddess offering protection to the city or the palace.

They would explain how the people bring offerings and perform sacred rituals in her honor, in order to receive victory and prosperity.

The other side of the disk would depict how the commander leads the warriors in a great battle, and how the army returns home after victory to receive the blessing of the gods.

Beyond this simple translation, Claude also interprets to whom the terms divinity, commander, people, kingdom and battle might refer.

Very modest, the AI claims to have only 5% confidence in its translation. Yet, the result is similar to that obtained by human archaeologists over the years.

In order to translate the symbols, it explains that it used the techniques of pattern recognition, contextual analysis, comparative analysis, linguistic knowledge, iterative refinement, as well as literary and mythological archetypes.

Vastly better than GPT on arithmetic

A former Amazon and Microsoft employee, AI expert Vaibhav Kumar wanted to test the arithmetic capabilities of Claude 3 Opus and compare them with that of GPT.

So he designed an experiment, and admits to being very surprised by the results. For good reason, Opus turns out to be much better than GPT with numbers.

In order to conduct his test, he used the prompt “Chain of thought combined with the personality of a calculator that avoids scientific notation.”

He also used a dataset consisting of 10 different samples for each combination of numbers and digits used.

See also  How to Access Claude 3 API: Overview of New Models in Claude 3

In the addition test, Opus got 100% correct answers while GPT-4 started making mistakes as the exercises became more complex. GPT-3.5, on the other hand, was completely lost.

According to Kumar, the explanation probably comes from the chain-of-thought format, which allows Opus to make additions in the same way as humans.

When it comes to multiplications, all models are struggling, but Opus scores far ahead of GPT-4. This is the only AI that doesn’t have 0% correct answers on five-digit multiplications. Again, it tries to multiply like humans do and uses tricks.

The same goes for subtractions. Even if Opus makes mistakes, it is still much better than GPT, especially on the most difficult operations.

Deciphering Ikea user manuals

Understanding Ikea manuals isn’t always easy, but Claude 3 does a great job with its visual reasoning skills. That’s what Data Scientist @gabchuayz finds.

He gave the AI an instruction manual, asking it to play as an assistant. Its task was to list the steps in writing as clearly as possible.

As you can see, Claude understands very well the diagrams drawn by Ikea and manages to transcribe them in writing in a simple and detailed way.

For each step, it indicates the parts to be assembled and the tools to be used, describing the action to be performed. A feature that could prove to be very useful if you don’t understand the instructions for use!

Turning a simple idea into a real business

HyperWriteAI CEO Matt Shumer has created a prompt that allows Claude 3 to turn the idea you think into a functional business and generate revenue.

Just state your idea, and the AI will give you step-by-step instructions to turn it into a business.

The first step is to ask it to play the role of an experienced entrepreneur, able to identify untapped opportunities and turn innovative ideas into a business.

The next step is to give it the task of analyzing a service or product idea, and providing advice on how to turn it into a successful startup.

To achieve this, Claude needs to conduct market research, identify risks and challenges, and provide advice on how to go from idea to revenue.

It then provides its response in a clear format, including strategies for acquiring early customers and driving growth. AI also indicates possible monetization models.

This prompt can be a great way to bring your business projects to life, reducing the effort required to bring them to life!

Detect security vulnerabilities

Anthropic‘s Director of Cybersecurity, Jason D. Clinton, is himself impressed by Claude 3 Opus’ ability to detect vulnerabilities autonomously.

As he explains, AI is able to read source code and identify the most complex vulnerabilities that could be exploited by cybercriminals.

In the demo he shares, all he had to do was ask Claude to play the role of a cyber defense assistant and look for a vulnerability.

From this simple prompt, Opus was able to identify a flaw in the Android mobile OS.

However, it had been discovered a month after the end of its training and was therefore not in its dataset.

Its analysis is much more comprehensive and nuanced than existing code flaw scanning tools.

Now, cybersecurity professionals will be able to ask AI to analyze the code to find any issues before it’s too late.

However, Clinton also admits that he doesn’t yet know how hackers will, in turn, be able to exploit artificial intelligence to find weaknesses to exploit or create devastating cyberattacks.

As you can see from these examples, Claude 3 Opus is the new champion of chatbots. For more information, check out Magloire’s article on how this AI outperformed GPT-4 on the Chatbot Arena.

Unfortunately, at the moment, Claude 3 is not available in Europe. In any case, OpenAI plans to launch GPT-5 in 2024 and the response of the shepherd to the shepherdess is likely to hurt a lot… keep following us to be kept up to date with the rapid evolution of AI!

Mohamed SAKHRI
Mohamed SAKHRI

I'm the creator and editor-in-chief of Tech To Geek. Through this little blog, I share with you my passion for technology. I specialize in various operating systems such as Windows, Linux, macOS, and Android, focusing on providing practical and valuable guides.

Articles: 1721

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *