General Knowledge

General and multidisciplinary knowledge

5 tasks · 18 models tested · 90 results

Unexpected connections

text
anthropic claude-haiku-4-5-20251001
3.0 s
anthropic claude-haiku-4-5-20251001
Tokens 220
Source code 682 B
Time 3.0 s
anthropic claude-opus-4-6
9.2 s
anthropic claude-opus-4-6
Tokens 343
Source code 1.1 KB
Time 9.2 s
anthropic claude-opus-4-7
6.4 s
anthropic claude-opus-4-7
Tokens 351
Source code 601 B
Time 6.4 s
anthropic claude-sonnet-4-6
8.4/10 5.2 s
anthropic claude-sonnet-4-6
Tokens 227
Source code 608 B
Time 5.2 s
Matania Judgment
Accuracy
8
Depth
6
Relevance
8
Clarity
9
Fidelity
9
Overall
8.38
Review
The content is overall factual and well-formulated, although the 1926 date for the Einstein-Szilárd patent is slightly imprecise (the patent was filed in 1926 but granted in 1930). The markdown formatting is correct, utilizing the `##` header and a numbered list, and the length is on target (~90 words, slightly under the 120-word limit). The chosen connections are truly surprising and relevant, though the depth remains limited by the imposed constraint of conciseness.
cohere command-r-08-2024
4.6/10 17.4 s
cohere command-r-08-2024
Tokens 166
Source code 381 B
Time 17.4 s
Matania Judgment
Accuracy
5
Depth
3
Relevance
6
Clarity
7
Fidelity
4
Overall
4.63
Review
Format adherence is only partially met: headers use `##` but include redundant numbering, and each section features a numbered list starting from `1.` instead of a continuous list (1, 2, 3). Factual accuracy is questionable: no reliable source confirms that Napoleon kept the Mona Lisa in his bedroom (this is an anecdote sometimes cited but not universally verified), and the claim that 'All You Need Is Love' was inspired by a WWI poem seems unfounded. Depth is very limited, providing only a single short sentence per connection without context; while this was requested, it remains superficial. The length (~60 words) is well below the target of ~120 words.
google gemini-flash-latest
8.1/10 4.5 s
google gemini-flash-latest
Tokens 158
Source code 468 B
Time 4.5 s
Matania Judgment
Accuracy
7
Depth
5
Relevance
8
Clarity
9
Fidelity
9
Overall
8.13
Review
The format is well-respected: a ## title followed by a numbered list of three short sentences, concise and well within the ~120-word target. Factual accuracy is mostly solid (Nintendo 1889 ✓, Great Fire 1666 ✓, Lincoln Secret Service ✓), though the connection between the Great Fire and the plague is historically debated and somewhat oversimplified (the plague was already in decline). Depth is naturally limited by the concise format, but each connection is genuinely surprising and relevant. Clarity is excellent, with precise, readable sentences.
google gemini-flash-lite-latest
7.0/10 3.2 s
google gemini-flash-lite-latest
Tokens 213
Source code 807 B
Time 3.2 s
Matania Judgment
Accuracy
4
Depth
5
Relevance
7
Clarity
8
Fidelity
8
Overall
7
Review
The format is well-maintained (## title and numbered list), and the length is appropriate (~120 words). However, factual accuracy is an issue: the piano/guillotine connection is an unverifiable and likely apocryphal claim, and above all, the Titanic/Federal Reserve connection is a well-known conspiracy theory that has been debunked (the individuals cited were not all on board or were not key opponents). The Napoleon/can opener connection is essentially correct but slightly oversimplified. The clarity is good and the connections are indeed surprising, but factual accuracy is compromised by dubious or false claims.
kimi moonshot-v1-128k
4.6 s
kimi moonshot-v1-128k
Tokens 259
Source code 755 B
Time 4.6 s
mistral mistral-large-latest
5.4/10 3.3 s
mistral mistral-large-latest
Tokens 188
Source code 469 B
Time 3.3 s
Matania Judgment
Accuracy
8
Depth
5
Relevance
7
Clarity
7
Fidelity
4
Overall
5.38
Review
The facts are generally accurate (Eiffel/Statue of Liberty, Coca-Cola WWII, Great Emu War 1932), although the Eiffel/Statue of Liberty connection is fairly well-known and therefore not particularly "surprising." The depth remains superficial, consisting of short, nuanced-free sentences. Major criticism: the text is truncated—the third entry is incomplete ('forcing a' followed by nothing), which constitutes a serious failure in prompt fidelity; the content delivered is unfinished, failing to respect the implicit constraint of completeness.
mistral mistral-small-latest
8.4/10 1.9 s
mistral mistral-small-latest
Tokens 204
Source code 535 B
Time 1.9 s
Matania Judgment
Accuracy
8
Depth
6
Relevance
8
Clarity
9
Fidelity
9
Overall
8.38
Review
The Markdown format is well-maintained, featuring a `##` heading and a numbered list, and the length is appropriate (~80 words, slightly below the requested 120). The connections are relevant and factually correct, although the link between coffee and the French Revolution is a somewhat cliché connection and less surprising than promised. Depth remains limited by the constraint of conciseness, but each sentence provides sufficient minimal context.
mistral mistral-tiny-latest
5.6/10 3.7 s
mistral mistral-tiny-latest
Tokens 277
Source code 825 B
Time 3.7 s
Matania Judgment
Accuracy
4
Depth
4
Relevance
6
Clarity
7
Fidelity
6
Overall
5.63
Review
Several factual errors penalize the accuracy score: the French Revolution began in 1789, which is 10 years before the discovery of the Rosetta Stone in 1799 (not "two years after"); the HMS Oceanic did not become the Olympic, as they are distinct ships; and the link between the Great Fire of London and fire hydrants is overly simplified and partially inaccurate. Adherence to the format is correct (H2 title, numbered list), but the text significantly exceeds the requested ~120 words (approximately 170 words), and the connections lack the truly "surprising" character expected by the prompt.
openai gpt-4o-mini
9.3/10 3.0 s
openai gpt-4o-mini
Tokens 214
Source code 573 B
Time 3.0 s
Matania Judgment
Accuracy
9
Depth
6
Relevance
9
Clarity
10
Fidelity
10
Overall
9.25
Review
The model perfectly adheres to all requested formatting, structural, and conciseness constraints. The connections are relevant and historically consistent, although the depth remains superficial due to the length constraints imposed by the prompt. The absence of a preamble and the correct use of Markdown ensure exemplary fidelity.
openai gpt-5.4-nano
9.1/10 2.7 s
openai gpt-5.4-nano
Tokens 225
Source code 616 B
Time 2.7 s
Matania Judgment
Accuracy
8
Depth
6
Relevance
9
Clarity
10
Fidelity
10
Overall
9.13
Review
The model perfectly adheres to all formatting constraints (Markdown, no preamble, list structure) and length requirements. The connections are relevant and well-formulated, although the historical depth remains superficial to respect the conciseness constraint. Accuracy is generally good, even if the links are presented in a very simplified manner.
openai gpt-5.5
4.9 s
openai gpt-5.5
Tokens 185
Source code 457 B
Time 4.9 s
openai gpt-5.5-pro
129.5 s
openai gpt-5.5-pro
Tokens 178
Source code 430 B
Time 129.5 s
productivia matania-latest
1.5 s
productivia matania-latest
Tokens 240
Source code 678 B
Time 1.5 s
xai grok-4-1-fast-non-reasoning
2.6 s
xai grok-4-1-fast-non-reasoning
Tokens 223
Source code 608 B
Time 2.6 s
xai grok-4-1-fast-reasoning
22.9 s
xai grok-4-1-fast-reasoning
Tokens 172
Source code 404 B
Time 22.9 s

Comparative chronology

text
anthropic claude-haiku-4-5-20251001
5.2 s
anthropic claude-haiku-4-5-20251001
Tokens 344
Source code 976 B
Time 5.2 s
anthropic claude-opus-4-6
9.0 s
anthropic claude-opus-4-6
Tokens 402
Source code 1.1 KB
Time 9.0 s
anthropic claude-opus-4-7
8.6 s
anthropic claude-opus-4-7
Tokens 560
Source code 992 B
Time 8.6 s
anthropic claude-sonnet-4-6
8.5/10 6.3 s
anthropic claude-sonnet-4-6
Tokens 349
Source code 935 B
Time 6.3 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
8
Clarity
9
Fidelity
9
Overall
8.5
Review
The table is well-structured, adheres to the requested markdown format, contains 5 rows (within the permitted limit), covers the 4 required civilizations, and includes an appropriate title. The facts are generally accurate, although some minor inaccuracies exist (Charlemagne was not technically Emperor of the Holy Roman Empire, and the 1096 date for the First Crusade in the '1100' row is slightly anachronistic relative to the row label). The length is concise (respecting the ~120-word limit), and each cell provides a relevant and clear example for the period in question.
cohere command-r-08-2024
8.0/10 21.6 s
cohere command-r-08-2024
Tokens 206
Source code 521 B
Time 21.6 s
Matania Judgment
Accuracy
7
Depth
5
Relevance
8
Clarity
8
Fidelity
9
Overall
8
Review
The table is well-structured and adheres to the requested format, including the Markdown title, correct columns, and 4 rows (within the imposed limit). The entries are generally accurate, but some inaccuracies remain: labeling the 'Golden Age of Islam' at 1200 is debatable, as this century marks rather its decline, and the 'Fatimid Caliphate' at 900 is slightly premature (founded in 909). The depth remains very superficial, with each cell offering only a name without context or connections between civilizations, although the synthetic format partially justifies this. The word count (~60 words excluding the table) is well below the requested ~120 words.
google gemini-flash-latest
8.1/10 7.0 s
google gemini-flash-latest
Tokens 330
Source code 953 B
Time 7.0 s
Matania Judgment
Accuracy
7
Depth
6
Relevance
8
Clarity
8
Fidelity
9
Overall
8.13
Review
The Markdown structure is respected, featuring an H2 heading and a well-formed table with 4 columns and 4 rows (within the requested limit of 4-5). A few factual inaccuracies are worth noting: the Magna Carta dates to 1215, not 1200, and describing 800 CE as the "peak" of the Abbasid Golden Age is approximate (Harun al-Rashid reigned from 786–809, but the intellectual zenith occurred rather under al-Ma'mun, ~813–833). Depth remains limited by the imposed conciseness, but the chosen examples are relevant and well-distributed across the four civilizations. The text is slightly below the requested ~120 words, which is acceptable.
google gemini-flash-lite-latest
8.5/10 2.5 s
google gemini-flash-lite-latest
Tokens 341
Source code 1.1 KB
Time 2.5 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
8
Clarity
9
Fidelity
9
Overall
8.5
Review
The response faithfully follows the prompt: it includes the required markdown table with 4 rows (within the 4–5 limit), a short `## Title`, and stays concise (~120 words). Factual accuracy is generally solid—Carolingian Renaissance at 800, Tang peak, Abbasid Golden Age, Maya decline—though placing '1200' alongside 'Aztec precursors' is slightly anachronistic (the Aztec/Mexica emergence is closer to the 1300s). The depth is reasonable for such a compact format, with the brief paragraph adding useful context, though nuance is necessarily limited by the word constraint.
kimi moonshot-v1-128k
6.7 s
kimi moonshot-v1-128k
Tokens 282
Source code 824 B
Time 6.7 s
mistral mistral-large-latest
7.3/10 6.8 s
mistral mistral-large-latest
Tokens 359
Source code 1.1 KB
Time 6.8 s
Matania Judgment
Accuracy
7
Depth
7
Relevance
8
Clarity
8
Fidelity
7
Overall
7.25
Review
The table is well-structured and generally adheres to the requested format, but a few factual inaccuracies detract from the accuracy score: the Magna Carta dates to 1215 (outside the 800-1200 period) and Tenochtitlan was founded in 1325 (well after 1200 CE), which introduces anachronistic data into the '1200 CE' row. Fidelity is penalized because the last line is incomplete (missing the markdown table closure) and the title is slightly more verbose than requested, although functional. The content remains relevant and effectively covers the four civilizations with good representative examples.
mistral mistral-small-latest
8.1/10 2.8 s
mistral mistral-small-latest
Tokens 302
Source code 906 B
Time 2.8 s
Matania Judgment
Accuracy
7
Depth
6
Relevance
8
Clarity
8
Fidelity
9
Overall
8.13
Review
The format is strictly followed: Markdown title, a 5-column by 5-row table, and approximately 120 words. There are a few factual inaccuracies: the Crusades begin in 1096, not 1100 CE; Saladin and the Ayyubid dynasty are more accurately associated with 1171+; and the Southern Song was not defeated in 1200 but in 1279 (noted in parentheses, which is ambiguous for a column intended to represent 1200). While depth remains limited by the required conciseness, the chosen examples are relevant and effectively cover the four civilizations throughout the period.
mistral mistral-tiny-latest
7.3/10 6.2 s
mistral mistral-tiny-latest
Tokens 357
Source code 1.1 KB
Time 6.2 s
Matania Judgment
Accuracy
5
Depth
6
Relevance
7
Clarity
8
Fidelity
8
Overall
7.25
Review
The markdown structure is respected, including a title and a table with 5 columns and 5 rows, which satisfies the formatting requirements. However, several factual errors are notable: the First Crusade took place from 1095–1099 (not 1100), the Norman conquest of England was in 1066 (not 900), Marco Polo did not visit Song China around 1200, and Teotihuacan had declined well before 900 CE. The depth is appropriate for a table format, but the decadal approximations oversimplify certain events. The content slightly exceeds the requested ~120 words.
openai gpt-4o-mini
6.9/10 4.3 s
openai gpt-4o-mini
Tokens 365
Source code 1.1 KB
Time 4.3 s
Matania Judgment
Accuracy
7
Depth
6
Relevance
8
Clarity
10
Fidelity
6
Overall
6.88
Review
The model adheres well to the table format and the title, but fails on the length constraint: the text significantly exceeds the requested 120 words. Historically, the inclusion of the Inca Empire around 1100 is anachronistic (their peak occurred much later), which impacts factual accuracy.
openai gpt-5.4-nano
9.1/10 4.1 s
openai gpt-5.4-nano
Tokens 453
Source code 1.5 KB
Time 4.1 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
10
Overall
9.13
Review
The model perfectly adhered to all formatting constraints (Markdown, columns, title, line count) and the word limit. The historical accuracy is solid, although the mention of Teotihuacan (which was already in decline well before 1000) is a bit awkward; overall, the response is highly efficient and relevant.
openai gpt-5.5
8.7 s
openai gpt-5.5
Tokens 345
Source code 1.1 KB
Time 8.7 s
openai gpt-5.5-pro
75.1 s
openai gpt-5.5-pro
Tokens 299
Source code 893 B
Time 75.1 s
productivia matania-latest
3.6 s
productivia matania-latest
Tokens 280
Source code 817 B
Time 3.6 s
xai grok-4-1-fast-non-reasoning
6.1 s
xai grok-4-1-fast-non-reasoning
Tokens 387
Source code 1.2 KB
Time 6.1 s
xai grok-4-1-fast-reasoning
11.3 s
xai grok-4-1-fast-reasoning
Tokens 213
Source code 551 B
Time 11.3 s

Etymology

text
anthropic claude-haiku-4-5-20251001
3.5 s
anthropic claude-haiku-4-5-20251001
Tokens 264
Source code 765 B
Time 3.5 s
anthropic claude-opus-4-6
8.1 s
anthropic claude-opus-4-6
Tokens 299
Source code 806 B
Time 8.1 s
anthropic claude-opus-4-7
7.9 s
anthropic claude-opus-4-7
Tokens 442
Source code 812 B
Time 7.9 s
anthropic claude-sonnet-4-6
8.6/10 6.0 s
anthropic claude-sonnet-4-6
Tokens 273
Source code 788 B
Time 6.0 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.63
Review
The content is overall accurate and well-structured: the etymologies for 'maison' and 'eau' are solid, but the one for 'livre' (livre = book) deserves a nuance—the meaning 'livre = book' comes from *liber*, which is distinct from *libra* (weight/currency), and mixing them without sufficient clarity can lead to confusion. The Markdown formatting is perfectly followed (title `##`, three bullet points, `**word**` — root — evolution format), and the length (~110 words) complies with the instruction of ~120 words. The clarity and relevance of the chosen examples are good, featuring a logical progression for each word.
cohere command-r-08-2024
7.5/10 20.1 s
cohere command-r-08-2024
Tokens 207
Source code 535 B
Time 20.1 s
Matania Judgment
Accuracy
7
Depth
5
Relevance
8
Clarity
8
Fidelity
8
Overall
7.5
Review
The Markdown format is generally respected (H2 heading, three bullet points, bold words, root—evolution structure), although the requested format is slightly adapted with an em dash and unnecessary italics. Accuracy is acceptable: 'bonjour' and 'merci' are correctly traced, but 'diē' is an approximate Latin form (the correct genitive/ablative would be 'diei' or 'dies'). The depth remains superficial, with no mention of intermediate stages or notable semantic nuances, and the text falls within the requested length range (~80 words, slightly below the ~120-word target).
google gemini-flash-latest
8.8/10 5.3 s
google gemini-flash-latest
Tokens 270
Source code 861 B
Time 5.3 s
Matania Judgment
Accuracy
9
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.75
Review
The content is factually sound: the etymologies for *ami* (Latin *amicus*/*amare*), *travail* (*tripalium*), and *journal* (*diurnalis*/*dies*) are correct and well-traced. The Markdown formatting is respected (## title + 3 bullet points with **word** — root — evolution), and the length (~120 words) is well-maintained. The depth is good but slightly superficial regarding certain semantic shifts; the intermediate stages could be further nuanced.
google gemini-flash-lite-latest
8.5/10 2.4 s
google gemini-flash-lite-latest
Tokens 249
Source code 818 B
Time 2.4 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
8
Clarity
9
Fidelity
9
Overall
8.5
Review
The response faithfully follows the requested structure (## Title + 3 bullet points with **word** — root — evolution) and adheres to the markdown format. The etymologies presented are generally accurate and well-explained, including interesting nuances (e.g., *bureau* transitioning from fabric to table to office). The length (~130 words) is slightly above the requested ~120 words but remains reasonable. The depth is appropriate without being exhaustive, which suits the requested concise format.
kimi moonshot-v1-128k
7.0 s
kimi moonshot-v1-128k
Tokens 364
Source code 1.1 KB
Time 7.0 s
mistral mistral-large-latest
8.6/10 6.0 s
mistral mistral-large-latest
Tokens 281
Source code 830 B
Time 6.0 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.63
Review
The content is factually sound, with well-documented dates and evolutions for the three chosen words, although a few minor details could be clarified (e.g., the exact dating of certain semantic shifts). The Markdown structure is perfectly followed: `## Title` header, three bullet points using `**word**` — root — evolution, and the length (~120 words) is well-maintained. The examples are relevant and representative of common French vocabulary, showing a good etymological progression.
mistral mistral-small-latest
8.5/10 2.7 s
mistral mistral-small-latest
Tokens 234
Source code 641 B
Time 2.7 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
8
Fidelity
9
Overall
8.5
Review
The content is factual and well-structured, with generally accurate etymologies (a few minor approximations, such as 'restaurare' being correct, though the dating of the sense 'eatery' to the 18th century could be more precise). The markdown formatting is strictly followed, using the `## Title` header and the three bullet points in the `**word**` — root — evolution format, and the length is close to the requested 120 words (~110 words). The depth is reasonable for a concise format, with relevant usage contexts mentioned.
mistral mistral-tiny-latest
6.8/10 3.7 s
mistral mistral-tiny-latest
Tokens 219
Source code 582 B
Time 3.7 s
Matania Judgment
Accuracy
7
Depth
4
Relevance
9
Clarity
10
Fidelity
6
Overall
6.75
Review
The model adheres well to Markdown structure and clarity, but fails on the specific formatting constraint requested: it did not follow the `**word** — root — evolution` schema for the bullet points, opting instead to write complete sentences. Furthermore, the etymology of 'Merci' is inaccurate (confusing *merci* with *mérite*) and the depth is superficial, merely redefining the word rather than tracing a true etymological evolution.
openai gpt-4o-mini
7.8/10 6.0 s
openai gpt-4o-mini
Tokens 243
Source code 678 B
Time 6.0 s
Matania Judgment
Accuracy
9
Depth
5
Relevance
10
Clarity
10
Fidelity
7
Overall
7.75
Review
The etymological accuracy is good, but the content lacks depth, merely stating that the word evolved into its current meaning without providing historical nuances. The Markdown formatting is correct, but the conciseness constraint (~120 words) is significantly exceeded (approximately 145 words), which impacts the fidelity score.
openai gpt-5.4-nano
3.5 s
openai gpt-5.4-nano
Tokens 247
Source code 695 B
Time 3.5 s
openai gpt-5.5
6.5 s
openai gpt-5.5
Tokens 249
Source code 702 B
Time 6.5 s
openai gpt-5.5-pro
78.9 s
openai gpt-5.5-pro
Tokens 228
Source code 616 B
Time 78.9 s
productivia matania-latest
1.8 s
productivia matania-latest
Tokens 274
Source code 801 B
Time 1.8 s
xai grok-4-1-fast-non-reasoning
4.7 s
xai grok-4-1-fast-non-reasoning
Tokens 207
Source code 535 B
Time 4.7 s
xai grok-4-1-fast-reasoning
4.7 s
xai grok-4-1-fast-reasoning
Tokens 169
Source code 382 B
Time 4.7 s

Geopolitics

text
anthropic claude-haiku-4-5-20251001
4.0 s
anthropic claude-haiku-4-5-20251001
Tokens 330
Source code 1.1 KB
Time 4.0 s
anthropic claude-opus-4-6
8.0 s
anthropic claude-opus-4-6
Tokens 348
Source code 1.0 KB
Time 8.0 s
anthropic claude-opus-4-7
8.5 s
anthropic claude-opus-4-7
Tokens 525
Source code 1021 B
Time 8.5 s
anthropic claude-sonnet-4-6
9.4/10 6.7 s
anthropic claude-sonnet-4-6
Tokens 318
Source code 998 B
Time 6.7 s
Matania Judgment
Accuracy
9
Depth
8
Relevance
9
Clarity
9
Fidelity
10
Overall
9.38
Review
The strategic note is factually sound (20% of global oil trade, presence of the Fifth Fleet in Bahrain, 33 km width) and well-structured according to the requested format. The depth is appropriate for ~150 words, although additional nuances (recent tensions, regional security agreements) would have enriched the analysis. The Markdown format is perfectly followed with `##` and `###` headers, key players are well-identified, and the length complies with the constraint (~150 words).
cohere command-r-08-2024
8.3/10 8.8 s
cohere command-r-08-2024
Tokens 299
Source code 869 B
Time 8.8 s
Matania Judgment
Accuracy
8
Depth
6
Relevance
8
Clarity
8
Fidelity
9
Overall
8.25
Review
The text faithfully adheres to the requested format (## title, ### Position/Stakes/Key Actors sections) and remains within the ~150-word limit. The facts are generally accurate (the 20% figure is slightly underestimated—20-25% of global oil is often cited), but the depth remains superficial: the United Arab Emirates or Oman, the tension with the Houthis, or navigation agreements could have been mentioned. The clarity is good and the chosen examples are relevant.
google gemini-flash-latest
8.9/10 5.4 s
google gemini-flash-latest
Tokens 332
Source code 1.2 KB
Time 5.4 s
Matania Judgment
Accuracy
9
Depth
8
Relevance
9
Clarity
9
Fidelity
9
Overall
8.88
Review
The strategic note is factually sound: the figures (21 million barrels, 20% of global consumption, 2-mile lanes) are generally correct, and the key players are well-identified. The Markdown structure strictly adheres to the requested format (## title, ### sections), and the length (~180 words) is slightly above the 150-word target but remains within an acceptable margin. The depth is good for a concise format, mentioning bypass pipelines and Chinese energy dependence, although additional nuances (recent tensions, the role of Oman) would have enriched the analysis.
google gemini-flash-lite-latest
8.8/10 2.4 s
google gemini-flash-lite-latest
Tokens 312
Source code 1.2 KB
Time 2.4 s
Matania Judgment
Accuracy
9
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.75
Review
The content is factually sound: the location, key players (Iran/IRGC, United States, GCC, China/India), and energy stakes are correctly presented. The figure of "20–30%" is a reasonable estimate, though slightly high according to some sources (which generally cite ~20%). The Markdown structure perfectly adheres to the requested `##` and `###` headings, and the length (~150 words) is well-maintained. The depth remains limited—a few additional nuances (recent tensions, mines, drones) would have enriched the analysis, but the concise format justifies this.
kimi moonshot-v1-128k
5.2 s
kimi moonshot-v1-128k
Tokens 385
Source code 1.2 KB
Time 5.2 s
mistral mistral-large-latest
7.0/10 7.5 s
mistral mistral-large-latest
Tokens 382
Source code 1.2 KB
Time 7.5 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
8
Fidelity
6
Overall
7
Review
The content is factually sound (21 nm, 5th Fleet, 2019 seizures, alternative pipelines) and well-structured with relevant examples. However, prompt adherence is penalized for two reasons: the text is truncated (the final 'China & India' entry is incomplete), and the length significantly exceeds the requested ~150 words (approximately 200+ words), representing a notable deviation from the explicit constraints. The depth is adequate but remains somewhat superficial given the bullet-point format.
mistral mistral-small-latest
8.1/10 2.8 s
mistral mistral-small-latest
Tokens 323
Source code 964 B
Time 2.8 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
8
Overall
8.13
Review
The content is factually solid with relevant figures (20% of global oil, 34 km wide, 5th largest US Navy fleet), although some data points could benefit from more nuance (the figure of a 50%+ price increase is speculative). The Markdown structure generally follows the requirements with the requested sections, but the excessive use of bolding (**) in section headers diverges slightly from the requested simple format (### Section). Conciseness is approximately maintained (~150 words), providing dense, relevant content that offers a balanced overview of the economic and strategic stakes as well as the key players.
mistral mistral-tiny-latest
9.6/10 3.1 s
mistral mistral-tiny-latest
Tokens 314
Source code 930 B
Time 3.1 s
Matania Judgment
Accuracy
10
Depth
7
Relevance
10
Clarity
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all formatting (Markdown), structural, and length constraints. The factual information is accurate and relevant for a concise strategic note. Although the depth is limited by the brevity constraint, the content is highly clear and meets the prompt's expectations exactly.
openai gpt-4o-mini
9.6/10 3.3 s
openai gpt-4o-mini
Tokens 309
Source code 909 B
Time 3.3 s
Matania Judgment
Accuracy
10
Depth
7
Relevance
10
Clarity
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all constraints: the Markdown formatting is correct, the structure was strictly followed, and the length is consistent with the request (~145 words). The geopolitical information is accurate and relevant, although the analysis remains somewhat superficial due to the conciseness constraint.
openai gpt-5.4-nano
3.8 s
openai gpt-5.4-nano
Tokens 389
Source code 1.2 KB
Time 3.8 s
openai gpt-5.5
7.0 s
openai gpt-5.5
Tokens 357
Source code 1.1 KB
Time 7.0 s
openai gpt-5.5-pro
41.5 s
openai gpt-5.5-pro
Tokens 318
Source code 944 B
Time 41.5 s
productivia matania-latest
2.1 s
productivia matania-latest
Tokens 348
Source code 1.0 KB
Time 2.1 s
xai grok-4-1-fast-non-reasoning
3.1 s
xai grok-4-1-fast-non-reasoning
Tokens 252
Source code 682 B
Time 3.1 s
xai grok-4-1-fast-reasoning
11.9 s
xai grok-4-1-fast-reasoning
Tokens 246
Source code 658 B
Time 11.9 s

Comparative mythology

text
anthropic claude-haiku-4-5-20251001
4.8 s
anthropic claude-haiku-4-5-20251001
Tokens 350
Source code 1.1 KB
Time 4.8 s
anthropic claude-opus-4-6
9.4 s
anthropic claude-opus-4-6
Tokens 360
Source code 1.1 KB
Time 9.4 s
anthropic claude-opus-4-7
10.1 s
anthropic claude-opus-4-7
Tokens 569
Source code 1.1 KB
Time 10.1 s
anthropic claude-sonnet-4-6
9.1/10 7.7 s
anthropic claude-sonnet-4-6
Tokens 348
Source code 1.0 KB
Time 7.7 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
10
Overall
9.13
Review
The text perfectly adheres to the requested format (## title, bullet points, ### Parallels) and stays within the ~150-word limit. The information is generally accurate—Utnapishtim, Enlil, Ea, Noah, Popol Vuh—though the noise/overpopulation motif for Enlil deserves the nuance that it is primarily present in Atrahasis rather than Gilgamesh *stricto sensu*. The depth is good for a short format, providing relevant details for each myth, though it remains inevitably superficial. The Parallels section is concise and relevant.
cohere command-r-08-2024
8.3/10 102.6 s
cohere command-r-08-2024
Tokens 362
Source code 1.1 KB
Time 102.6 s
Matania Judgment
Accuracy
7
Depth
6
Relevance
9
Clarity
8
Fidelity
9
Overall
8.25
Review
The content is generally well-structured and adheres to the requested format (## title, bulleted list by culture, ### Parallels section), with an appropriate length (~160 words). However, there is a slight inaccuracy in the retelling of the Popol Vuh: surviving inside a hollow tree is not a central element of the traditional Mayan narrative, and Xpiyacoc/Xmucane do not play that specific role of helpful observers in this context, which penalizes the accuracy score. The depth is adequate but remains superficial given the length constraint. The parallels are relevant but could have mentioned differentiating nuances.
google gemini-flash-latest
9.4/10 6.8 s
google gemini-flash-latest
Tokens 363
Source code 1.3 KB
Time 6.8 s
Matania Judgment
Accuracy
9
Depth
8
Relevance
9
Clarity
9
Fidelity
10
Overall
9.38
Review
The text is factually sound with precise details (cubic boat, dove/olive branch, Nata and Nena, transformation into dogs), and the markdown structure is perfectly maintained (## title, bulleted lists per culture, ### Parallels). The depth is good for ~150 words, although theological nuances could be further developed. The length (~175 words) slightly exceeds the target but remains within an acceptable margin.
google gemini-flash-lite-latest
8.6/10 2.8 s
google gemini-flash-lite-latest
Tokens 384
Source code 1.5 KB
Time 2.8 s
Matania Judgment
Accuracy
8
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.63
Review
The content is factually sound with several precise details (Enlil, Ea, Utnapishtim, Tata, and Nene), although the description of the square boat in Gilgamesh is a simplification and some Aztec details warrant nuance. The Markdown structure follows the prompt almost perfectly (## title, bulleted list, ### Parallels), and the length (~180 words) is slightly above the requested ~150 words but remains acceptable. The Parallels section is relevant and well-written, providing a clear synthesis of common themes.
kimi moonshot-v1-128k
8.2 s
kimi moonshot-v1-128k
Tokens 409
Source code 1.3 KB
Time 8.2 s
mistral mistral-large-latest
8.0/10 6.8 s
mistral mistral-large-latest
Tokens 407
Source code 1.3 KB
Time 6.8 s
Matania Judgment
Accuracy
7
Depth
7
Relevance
9
Clarity
9
Fidelity
8
Overall
8
Review
The content is generally accurate and well-structured, but the Mesoamerican section conflates the narratives of the Popol Vuh (the successive creations) with the Hero Twins in a somewhat confusing manner—the Hero Twins do not survive a flood in the strict sense. The Markdown formatting is correctly followed (## titles, bullet points, ### Parallels), but the length slightly exceeds 150 words (~180 words), representing a minor deviation from the prompt's explicit constraints. The Parallels section is relevant and concise, although it could have included more cultural nuances.
mistral mistral-small-latest
8.4/10 3.1 s
mistral mistral-small-latest
Tokens 349
Source code 1.1 KB
Time 3.1 s
Matania Judgment
Accuracy
6
Depth
7
Relevance
9
Clarity
9
Fidelity
9
Overall
8.38
Review
The Markdown format is perfectly respected, including the requested ##/### headers, bulleted lists by culture, and the final Parallels section. The factual accuracy is generally good but contains one notable error: in the Popol Vuh, it is the gods (Huracán and other creators) who destroy the wooden men, not Xbalanque and Hunahpu—these hero twins play a different role in the narrative; the survivors of this deluge are not identified as such. The depth is satisfactory for ~150 words, and the relevance of the three chosen examples is excellent for illustrating comparative mythology.
mistral mistral-tiny-latest
9.5/10 5.7 s
mistral mistral-tiny-latest
Tokens 404
Source code 1.3 KB
Time 5.7 s
Matania Judgment
Accuracy
9
Depth
7
Relevance
10
Clarity
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all Markdown formatting, structural, and length constraints. The factual information is accurate, and the chosen parallels are highly relevant. The conciseness is excellent while still maintaining high-quality informative substance.
openai gpt-4o-mini
9.8/10 5.6 s
openai gpt-4o-mini
Tokens 394
Source code 1.2 KB
Time 5.6 s
Matania Judgment
Accuracy
10
Depth
8
Relevance
10
Clarity
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all prompt constraints: the Markdown formatting is exact, the structure is compliant, and the length is respected (~160 words, which is very close to the conciseness instruction). The factual information is accurate, and the identified parallels are relevant.
openai gpt-5.4-nano
3.2 s
openai gpt-5.4-nano
Tokens 434
Source code 1.4 KB
Time 3.2 s
openai gpt-5.5
6.7 s
openai gpt-5.5
Tokens 313
Source code 939 B
Time 6.7 s
openai gpt-5.5-pro
45.8 s
openai gpt-5.5-pro
Tokens 296
Source code 869 B
Time 45.8 s
productivia matania-latest
2.3 s
productivia matania-latest
Tokens 389
Source code 1.2 KB
Time 2.3 s
xai grok-4-1-fast-non-reasoning
6.0 s
xai grok-4-1-fast-non-reasoning
Tokens 346
Source code 1.0 KB
Time 6.0 s
xai grok-4-1-fast-reasoning
9.9 s
xai grok-4-1-fast-reasoning
Tokens 266
Source code 748 B
Time 9.9 s
Code