Published: April 16, 2026
Updated: April 18, 2026
Written by: Lusiana Liu

Corali-Lang Benchmark: Detailed Analysis

GPT-5.4 NANO

The GPT-5.4 nano’s answers are somehow funny, because its errors aren’t just contextual misunderstandings, but also patterns and even numerous word choices, making it seem like guesswork. Only two questions were consistently answered correctly, representing only about 10% of the data.

This results shows that while I initially thought the data I created was too easy, it turns out that it’s still quite challenging for a small model.

Categories: AI Benchmark

Tags: AI Platform, Claude Haiku 4.5, Claude Opus 4.6, Claude Sonnet 4.6, DeepSeek V3.2, Gemini 3 Flash Preview, Gemini 3.1 Flash-Lite Preview, Gemini 3.1 Pro Preview, Gemma 4 26B A4B, Gemma 4 31B, GLM-5, GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, Kaggle, Kaggle Benchmark, Kaggle Competitions, Qwen 3 Next 80B Instruct, Qwen 3 Next 80B Thinking

Cookies Notice

Our website use cookies. If you continue to use this site we will assume that you are happy with this.

Corali-Lang Benchmark: Detailed Analysis

ABOUT

DISCLAIMER

Categories

GET IN TOUCH

Cookies Notice

You may also like

Ask 13 LLMs About Reflective Paragraph: Detailed Analysis

What is Inside Runway?

Widgets

ABOUT

DISCLAIMER

Categories

Tags

GET IN TOUCH

Cookies Notice