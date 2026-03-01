OpenAI has introduced GPT-5.4, a new foundation model designed for professional and complex knowledge work. According to the company, GPT-5.4 is its “most capable and efficient frontier model” so far. The model is available in three variants: a standard version, a reasoning-focused version called GPT-5.4 Thinking, and a high-performance version called GPT-5.4 Pro.

The release reflects a wider trend in the AI industry where companies are building models aimed at professional workflows such as coding, financial analysis and research.

Three versions designed for different AI tasks

OpenAI said the new model family includes different versions optimised for different use cases.

GPT-5.4 (Standard): general-purpose model for most applications

GPT-5.4 Thinking: designed for multi-step reasoning and complex analysis

GPT-5.4 Pro: built for higher performance and professional workloads

According to the company, the new model can generate complex outputs such as financial models, presentation slides and legal analysis.

Brendan Foody, chief executive of Mercor, said in a statement that GPT-5.4 performed strongly on professional benchmarks.

“GPT-5.4 excels at creating long-horizon deliverables such as slide decks, financial models and legal analysis,” he said. Up to 1 million tokens context window

One of the major upgrades is the model’s context window. The API version of GPT-5.4 supports up to 1 million tokens, allowing the model to analyse extremely large documents, datasets or conversations in a single request. Large context windows are important for enterprise users who need AI tools to process research papers, code repositories or long reports. OpenAI also said the model uses tokens more efficiently than earlier versions. This means the system can complete similar tasks while consuming fewer tokens, which may reduce costs for developers.

Improved benchmark results and reasoning ability

OpenAI reported strong benchmark results for GPT-5.4 across several tests used to evaluate AI systems.

According to the company: GPT-5.4 achieved 83 per cent on OpenAI’s GDPval benchmark, which tests knowledge-work tasks. It recorded top scores in OSWorld-Verified and WebArena Verified, benchmarks designed to evaluate computer-use abilities. The model also performed well on Mercor’s APEX-Agents benchmark, which tests professional skills in fields such as law and finance. These benchmarks aim to measure how well AI systems perform complex multi-step tasks rather than simple question-answering. Fewer factual errors and hallucinations. Reducing hallucinations when AI produces incorrect or fabricated information remains a major focus for AI developers.

OpenAI said GPT-5.4 shows measurable improvements compared with earlier versions.

According to the company’s internal evaluation, The model is 33 per cent less likely to make errors in individual claims compared with GPT-5.2. Overall responses are 18 per cent less likely to contain factual mistakes. These improvements are important for professional environments where accuracy is critical. New ‘Tool Search’ system for developers, OpenAI also introduced a new system called Tool Search for developers using the API version of GPT-5.4. Previously, models needed detailed system prompts describing every available tool. As the number of tools increased, this process consumed large numbers of tokens. The new Tool Search system allows the model to look up tool definitions when needed. According to OpenAI, this reduces token usage and makes requests faster and cheaper.

Safety testing and chain-of-thought monitoring

The company also carried out new safety tests focused on chain-of-thought reasoning, which is the explanation models generate when solving complex problems. AI researchers have raised concerns that models might misrepresent their reasoning. OpenAI said its testing suggests deception is less likely in the Thinking version of GPT-5.4. The results indicate that monitoring chain-of-thought reasoning remains an effective method for evaluating model behaviour.

What GPT-5.4 means for the AI industry