Home » Blog » OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

October 21, 2024

2 min read

SolidityBench Leaderboard Debuts
SolidityBench, created by IQ, is the first leaderboard designed to evaluate large language models (LLMs) for Solidity code generation. It is now available on Hugging Face. Furthermore, it introduces two benchmarks, NaïveJudge and HumanEval for Solidity, to assess and rank AI models in generating smart contract code.

Table of Contents

Refining AI Models for Solidity

As part of IQ’s BrainDAO, SolidityBench supports their upcoming IQ Code suite. This tool refines their EVMind LLMs while comparing them to community-created models. Moreover, IQ Code focuses on developing AI models to efficiently generate and audit smart contracts, addressing the growing demand for secure blockchain applications.

NaïveJudge: A New Benchmark

According to IQ, NaïveJudge tasks LLMs with creating smart contracts based on detailed specifications. These specifications are often derived from audited OpenZeppelin contracts, which serve as the standard for efficiency and accuracy. The generated code is then compared against a reference. Evaluators assess it on functional completeness, adherence to best Solidity practices, security, and optimization.

Advanced LLMs for Code Evaluation

The evaluation process uses advanced LLMs like OpenAI’s GPT-4 and Claude 3.5 Sonnet. These models assess the code based on strict criteria, including functionality, error management, syntax, and overall structure. In addition, optimization aspects such as gas efficiency and storage management are evaluated. Scores range from 0 to 100, offering a comprehensive assessment of the code’s quality.

AI Models for Smart Contracts

Top Performers

OpenAI’s GPT-4o model performed the best, scoring 80.05 overall with a NaïveJudge score of 72.18. It also achieved an 80% pass rate at pass@1 for HumanEval for Solidity. On the other hand, newer reasoning models like OpenAI’s o1-preview and o1-mini scored 77.61 and 75.08, respectively.

Competitive Models from Anthropic and XAI

Furthermore, Anthropic’s Claude 3.5 Sonnet and XAI’s grok-2 delivered competitive results, with overall scores close to 74. Meanwhile, Nvidia’s Llama-3.1-Nemotron-70B ranked the lowest among the top 10, scoring 52.54.

HumanEval for Solidity

HumanEval for Solidity adapts OpenAI’s original Python-based benchmark to Solidity. It includes 25 tasks, each with tests compatible with Hardhat, a popular Ethereum development environment. These tasks ensure accurate testing and compilation of generated code. Moreover, metrics such as pass@1 and pass@3 offer valuable insights into precision and problem-solving abilities.

AI Models Driving Smart Contract Development

Advancing AI-Assisted Contract Creation

By introducing these benchmarks, SolidityBench advances AI-assisted smart contract generation. Consequently, it promotes the creation of more reliable AI models and provides developers with key insights into the current state of AI in Solidity development.

Setting New Standards for AI Models

SolidityBench not only refines the IQ Code suite by improving EVMind LLMs, but also sets new standards for AI-assisted smart contract development across the blockchain ecosystem. As the demand for secure smart contracts grows, SolidityBench plays a crucial role in meeting this need.

Exploring SolidityBench

Developers, researchers, and AI enthusiasts are invited to contribute to SolidityBench. It seeks to refine AI models, promote best practices, and drive the advancement of decentralized applications.

Visit Hugging Face to explore the SolidityBench leaderboard and benchmark Solidity generation models.

The post OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ appeared first on CryptoSlate.