Reasoning models now ace all three CFA exam levels

Dec 14, 2025

Nano Banana Pro prompted by THE DECODER

A new study shows that today's reasoning models can pass the grueling financial analyst test. Gemini 3.0 Pro set a record with a score of 97.6 percent at Level I.

The Chartered Financial Analyst (CFA) certification is widely considered one of finance's toughest qualifications. The three-stage exam tests progressively complex skills, ranging from fundamental knowledge to application, analysis, and complex portfolio construction.

In 2023, the leading language models of the time could already answer some questions on the CFA exam. However, performance was mixed. ChatGPT (3.5) failed Levels I and II. GPT-4 managed to pass Level I but failed Level II. Eventually, GPT-4o—operating as a pure language model—succeeded in passing all three levels.

A new study from researchers at Columbia University, Rensselaer Polytechnic Institute, and the University of North Carolina shows that the current generation of reasoning models passes all three levels, sometimes with near-perfect scores.

Researchers put six reasoning models through 980 exam questions: three Level I exams with 540 multiple-choice questions, two Level II exams with 176 case-based questions, and three Level III exams with 264 questions, including open-answer formats. The result: Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1 passed every level based on established criteria.

Gemini and GPT-5 lead the pack

Gemini 3.0 Pro hit a record 97.6 percent on Level I, the foundational test consisting of independent multiple-choice questions. GPT-5 followed at 96.1 percent, with Gemini 2.5 Pro at 95.7 percent. Even the weakest model tested, DeepSeek-V3.1, scored 90.9 percent.

GPT-5 took the lead on Level II, which tests application and analysis through case studies, scoring 94.3 percent. Gemini 3.0 Pro reached 93.2 percent and Gemini 2.5 Pro 92.6 percent. The researchers noted that models achieved "nearly perfect results" here. Ethics proved to be a stumbling block. Researchers reported relative error rates of 17 to 21 percent at Level II, even for the top-performing models.

On Level III—the most complex stage combining multiple-choice with open responses—Gemini 2.5 Pro performed best on multiple-choice questions at 86.4 percent. However, Gemini 3.0 Pro dominated the constructed responses with 92.0 percent, a significant jump from its predecessor's 82.8 percent.

Level	Best model	Result
Level I (multiple choice)	Gemini 3.0 Pro	97.6%
Level II (multiple choice)	GPT-5	94.3%
Level III (multiple choice)	Gemini 2.5 Pro	86.4%
Level III (constructed responses)	Gemini 3.0 Pro	92.0%
Overall ranking	Gemini 3.0 Pro	1st place

The study uses mock CFA exams compiled from the official CFA Institute Practice Pack (Levels I and II) and AnalystPrep mock exams (Level III). Levels I and II used official material, while Level III used third-party mock exams to maintain comparability with previous research.

An o4-mini model automated the grading of open answers. The study notes this introduces measurement errors and a possible "verbosity bias" where detailed answers get higher scores. Consequently, the results serve as model-based approximations.

Pass thresholds were drawn from previous work: Level I requires at least 60 percent per topic and 70 percent overall. Level II needs at least 50 percent per topic and 60 percent overall. Level III requires an average of at least 63 percent across multiple-choice and constructed-response sections.

Passing a test doesn't mean doing the job

The researchers say the results suggest "reasoning models surpass the expertise required of entry-level to mid-level financial analysts and may achieve senior-level financial analyst proficiency in the future." While LLMs had already mastered the "codified knowledge" of Levels I and II, the latest generation is now developing the complex synthesis skills required for Level III.

The usual caveats apply. Benchmarks—especially multiple-choice formats—only hint at performance and potential economic impact. Passing a test doesn't mean a model can handle the daily grind of a financial analyst, which involves client meetings, assessing market sentiment, and making decisions with incomplete information.

The study also notes that models still struggle most with ethical questions, which often require contextual understanding and judgment. Exams test isolated knowledge, not the ability to apply it in complex, changing real-world situations.

The researchers also can't rule out data contamination. Although they used current, paid materials, questions might have leaked into training data through paraphrased content in public datasets. This means there is a chance the models simply knew the answers rather than reasoning through them.

Still, the leap from "failed" to "almost perfect" in just two years highlights the rapid advance of AI in specialized domains. For the financial sector, the question, it seems, is no longer whether AI can master the material, but how to integrate that knowledge into actual workflows.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Reasoning models now ace all three CFA exam levels

Gemini and GPT-5 lead the pack

Passing a test doesn't mean doing the job

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.