Corali-Lang Benchmark: Detailed Analysis

GEMINI 3 FLASH PREVIEW

In fact, on the first run, the notebook tested this model and it correctly answered all 20 questions. This made me doubt my data, thinking it was too easy for LLMs. But after testing it with 14 other models, it turned out that not all of them succeeded, and some even didn’t reach a score of 50% (10).

From these results, we can conclude that Gemini 3 Flash Preview
– can understand long contexts, unfazed by misleading Coral’s fashion descriptions,
– can learn a new language: Corali language without being fixated on English,
– can absorb data from narratives, not just explicit rules
– can read Corali language patterns well and apply them to other English words
– can understand context well, knowing who is speaking and what language is being used according to the narrative data

For story questions 4 and 5, which have multiple answer choices, the model’s results were impressive. On question 5, the model consistently answered “Checkad,” which is the most natural and correct answer according to Corali language rules. Meanwhile, for question 4, although not exactly the same, the model still managed to answer correctly within the context, avoiding confusion about who was speaking, who was tired, and who was fighting. This result demonstrates the model’s high capability, as the five-trial results demonstrate consistent results, not luck.

Hello, I’m Lusiana!

Welcome to my learning adventure!

I’m interested in learning new things and am currently interested in Artificial intelligence (AI).

The Coralab is my “imaginary” laboratory. I’ll be posting about the things I learn about Artificial intelligence (AI) here.

PS: Btw, this lab is available in dark and light mode. Enjoy!

PPS: Actually, I still can’t believe I’m back writing blog after several years. Usually when I write blog, I don’t write fiction and vice versa, but now I’m doing both, so good luck for me and my energy.

Cookies Notice

Our website use cookies. If you continue to use this site we will assume that you are happy with this.

ABOUT

DISCLAIMER

Categories

GET IN TOUCH

Cookies Notice

You may also like

What is Inside Runway?

Ask 13 LLMs About Reflective Paragraph: Detailed Analysis

Widgets

ABOUT

DISCLAIMER

Categories

Tags

GET IN TOUCH

Cookies Notice