Games

Playable mini-games (retro, arcade, board) judged on playability and code.

2 tasks · 17 models tested · 34 results

Space Invaders (retro)

html
anthropic claude-haiku-4-5-20251001
7.8/10 12.3 s
anthropic claude-haiku-4-5-20251001
Tokens 3 059
Source code 6.1 KB
Time 12.3 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
8
Fidelity
7
Overall
7.75
Review
The code is of very high quality, featuring a clean game loop and successful sound management via the Web Audio API. However, the model cut the code off before the end (the script is not closed and the draw function is truncated), which prevents full execution and causes the faithfulness score to drop. The gameplay and visual rendering perfectly respect the requested retro aesthetic.
anthropic claude-opus-4-6
9.9/10 74.8 s
anthropic claude-opus-4-6
Tokens 7 164
Source code 18.6 KB
Time 74.8 s
Matania Judgment
Gameplay
10
UX
10
Code quality
9
Completeness
10
Fidelity
10
Overall
9.88
Review
The model perfectly adhered to all technical and aesthetic constraints. The code is clean, uses ES5 as requested, and the implementation of sprites as pixel matrices is excellent for the retro style. The visual rendering is faithful to the black and green theme, and the game structure is complete and robust.
anthropic claude-opus-4-7
9.5/10 22.8 s
anthropic claude-opus-4-7
Tokens 2 807
Source code 3.5 KB
Time 22.8 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: a single HTML format, a 400x400 Canvas, ES5 JavaScript, and a retro black-and-green style. The gameplay is complete, featuring collision detection, lives, scoring, and sound (via the Web Audio API). The code is concise and well-structured, despite the requirement of being contained within a single document.
anthropic claude-sonnet-4-6
1.3/10 47.6 s
anthropic claude-sonnet-4-6
Tokens 5 675
Source code 11.4 KB
Time 47.6 s
Matania Judgment
Gameplay
0
UX
0
Code quality
2
Completeness
0
Fidelity
2
Overall
1.25
Review
The model generated incomplete code that cuts off abruptly in the middle of a function (truncated), making the game completely unplayable. Consequently, no gameplay, UX, or completeness criteria can be validated. Fidelity is very low because the final output fails to meet the constraint of providing a "playable" game.
google gemini-flash-latest
9.6/10 18.4 s
google gemini-flash-latest
Tokens 2 561
Source code 5.2 KB
Time 18.4 s
Matania Judgment
Gameplay
9
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all technical constraints: use of AudioContext for sound, ES5 JavaScript, a specific canvas structure, and strict compliance with the output format. The gameplay is complete, featuring life management, scoring, win and loss conditions, all wrapped in a consistent retro style.
google gemini-flash-lite-latest
6.9/10 3.7 s
google gemini-flash-lite-latest
Tokens 1 384
Source code 1.9 KB
Time 3.7 s
Matania Judgment
Gameplay
8
UX
7
Code quality
7
Completeness
9
Fidelity
6
Overall
6.88
Review
The gameplay is functional and adheres to the core logic of Space Invaders. However, prompt fidelity is penalized by the total absence of the requested sound effects and the failure to follow the output format (the code is a single compact line, making it difficult to read). The visual style is appropriately retro but very minimalist.
mistral mistral-large-latest
1.3/10 27.1 s
mistral mistral-large-latest
Tokens 2 248
Source code 6.9 KB
Time 27.1 s
Matania Judgment
Gameplay
0
UX
0
Code quality
2
Completeness
0
Fidelity
2
Overall
1.25
Review
The code is truncated (incomplete) at the end, which prevents any execution and makes the game completely unplayable. Although the initial structure adheres to the format and style constraints, the absence of the final rendering loop and the enemy shooting logic renders the creation invalid for the requested task.
mistral mistral-medium-latest
2.3/10 14.7 s
mistral mistral-medium-latest
Tokens 2 612
Source code 8.3 KB
Time 14.7 s
Matania Judgment
Gameplay
0
UX
0
Code quality
2
Completeness
0
Fidelity
4
Overall
2.25
Review
The code is incomplete and cuts off abruptly in the middle of a conditional statement (truncated), making the game completely unplayable. Although the initial structure adheres to the style and format constraints, the absence of the end of the script prevents any execution, resulting in zero scores for gameplay, UX, and completeness criteria.
openai gpt-4o-mini
7.0/10 53.8 s
openai gpt-4o-mini
Tokens 1 560
Source code 4.2 KB
Time 53.8 s
Matania Judgment
Gameplay
8
UX
7
Code quality
8
Completeness
9
Fidelity
6
Overall
7
Review
The gameplay is functional and adheres to the core mechanics, but the model failed completely on the sound effects constraint (point 7), which is a major omission. Furthermore, although the visual style is respected, the lack of sound and the simplistic collision handling (the player can only fire one shot at a time) reduce the overall fidelity to the complex prompt.
openai gpt-5.4
1.3/10 40.9 s
openai gpt-5.4
Tokens 3 159
Source code 10.4 KB
Time 40.9 s
Matania Judgment
Gameplay
0
UX
0
Code quality
2
Completeness
0
Fidelity
2
Overall
1.25
Review
The model failed critically: the provided code is truncated (it cuts off abruptly in the middle of a function), making the game completely unplayable. As a result, no functionality can be verified, and the rendering is either non-existent or incomplete. The code score is minimal because while the initial structure is correct, it is useless without the ending.
openai gpt-5.4-mini
9.8/10 10.1 s
openai gpt-5.4-mini
Tokens 1 879
Source code 5.4 KB
Time 10.1 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all technical and stylistic constraints. The code is clean, utilizes the Web Audio API for sound effects as requested, and implements a robust game loop with collision detection and progressive difficulty scaling. The visual rendering is faithful to the requested retro arcade aesthetic.
openai gpt-5.4-nano
9.8/10 30.6 s
openai gpt-5.4-nano
Tokens 4 309
Source code 14.9 KB
Time 30.6 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all technical and aesthetic constraints. The code is clean (use of an IIFE, AudioContext management, robust game loop) and the visual rendering is excellent, featuring the requested scanline effect and retro green/black style. The gameplay is complete, including life management, enemy projectiles, and scoring.
openai gpt-5.5
7.0/10 38.3 s
openai gpt-5.5
Tokens 2 263
Source code 6.9 KB
Time 38.3 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
4
Fidelity
6
Overall
7
Review
The code is of very high technical quality (Web Audio API management, smooth game loop, block-drawn sprites), but the model generated truncated code (the file cuts off abruptly at the end of the render function). Consequently, the game is incomplete and cannot be executed as is, which drastically lowers the completeness and fidelity scores despite the excellence of the written portion.
openai gpt-5.5-pro
9.8/10 138.7 s
openai gpt-5.5-pro
Tokens 1 947
Source code 5.7 KB
Time 138.7 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all technical and aesthetic constraints. The code is concise, efficient (using dt for smoothness), and implements all requested mechanics (enemy fire, health, bunkers, sounds via the Web Audio API). The visual rendering is very faithful to the retro arcade spirit, featuring a consistent color palette.
productivia matania-latest
6.0/10 14.9 s
productivia matania-latest
Tokens 2 165
Source code 6.6 KB
Time 14.9 s
Matania Judgment
Gameplay
10
UX
8
Code quality
9
Completeness
5
Fidelity
4
Overall
6
Review
The model failed on the most critical output constraint: the code is truncated at the end (the script cuts off abruptly in the middle of the draw function), making the game non-functional (very low completeness). However, the logic implemented up to the cutoff perfectly adheres to the requested gameplay, style, and structural instructions.
xai grok-4-1-fast-non-reasoning
9.5/10 17.1 s
xai grok-4-1-fast-non-reasoning
Tokens 1 603
Source code 4.4 KB
Time 17.1 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: the canvas format, the use of AudioContext for sound, the green/black retro style, and ES5 code. The gameplay is complete, featuring life management, scoring, enemy projectiles, and win/loss conditions. The code structure is clean and efficient for a single-file script.
xai grok-4-1-fast-reasoning
9.6/10 52.3 s
xai grok-4-1-fast-reasoning
Tokens 1 558
Source code 4.2 KB
Time 52.3 s
Matania Judgment
Gameplay
9
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all technical and aesthetic constraints. The code is clean, utilizes the Web Audio API for the requested sound effects, and faithfully implements the game loop, collision detection, and win/loss states. The visual rendering strictly follows the requested retro black-and-green style.

Casse-briques

html
anthropic claude-haiku-4-5-20251001
9.6/10 11.5 s
anthropic claude-haiku-4-5-20251001
Tokens 2 650
Source code 5.7 KB
Time 11.5 s
Matania Judgment
Gameplay
9
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all technical constraints: ES5 JavaScript, requestAnimationFrame, canvas size, and a dual control system (mouse and arrow keys). The code is clean and well-structured around a centralized 'game' object, and the bounce mechanics—including the management of the angle based on the impact point on the paddle—are of high quality.
anthropic claude-opus-4-6
6.8/10 26.4 s
anthropic claude-opus-4-6
Tokens 2 772
Source code 6.4 KB
Time 26.4 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
2
Fidelity
6
Overall
6.75
Review
The code is of very high technical quality (precise bounce logic, collision management, polished design), but the model cut off the generation before the end of the script (the code stops abruptly at 'if (won) { c'). As a result, the game is incomplete and cannot function as is, which heavily impacts completeness and faithfulness (the prompt requested a PLAYABLE game).
anthropic claude-opus-4-7
9.5/10 14.0 s
anthropic claude-opus-4-7
Tokens 1 669
Source code 1.7 KB
Time 14.0 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints, specifically the use of ES5, the requested HTML structure, and hybrid controls (mouse and keyboard). The code is concise and efficient, and the visual output is clean and functional, featuring correct handling of collisions and game-over states.
anthropic claude-sonnet-4-6
9.9/10 22.0 s
anthropic claude-sonnet-4-6
Tokens 2 675
Source code 5.4 KB
Time 22.0 s
Matania Judgment
Gameplay
10
UX
9
Code quality
10
Completeness
10
Fidelity
10
Overall
9.88
Review
The model perfectly adhered to all technical constraints, specifically the use of ES5, requestAnimationFrame, and the canvas dimensions. The gameplay is smooth, featuring sophisticated management of rebound angles on the paddle, and the interface is clean and comprehensive (score, win/loss states).
google gemini-flash-latest
9.5/10 11.1 s
google gemini-flash-latest
Tokens 1 531
Source code 3.4 KB
Time 11.1 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: canvas format, use of ES5, requestAnimationFrame, and the game rules. The code is clean and well-structured, and the visual rendering is faithful to a classic Breakout game. The only slight weakness lies in the UX (using 'alert' for the game over screen), but this remains consistent with the standards for this type of rapid generation.
google gemini-flash-lite-latest
9.3/10 3.5 s
google gemini-flash-lite-latest
Tokens 1 168
Source code 1.9 KB
Time 3.5 s
Matania Judgment
Gameplay
8
UX
7
Code quality
9
Completeness
10
Fidelity
10
Overall
9.25
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The game is functional and complete, although the brick collisions are simplistic (only on the Y-axis) and the score is written over the paddle, which slightly detracts from the visual UX.
mistral mistral-large-latest
9.8/10 15.4 s
mistral mistral-large-latest
Tokens 1 361
Source code 3.9 KB
Time 15.4 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all technical constraints: raw HTML format, a 400x400 canvas, ES5 JavaScript, and full implementation of the rules. The code is clean and functional, and the controls (mouse and keyboard) are well-integrated. The visual rendering is compliant, and the gameplay is smooth.
mistral mistral-medium-latest
9.5/10 11.2 s
mistral mistral-medium-latest
Tokens 1 475
Source code 4.3 KB
Time 11.2 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 Canvas, ES5 JavaScript, and the use of requestAnimationFrame. The game is complete and fully functional with both control modes (mouse and keyboard), and the bricks are correctly colored as requested. The code is clean and well-structured.
openai gpt-4o-mini
9.3/10 15.3 s
openai gpt-4o-mini
Tokens 1 089
Source code 2.8 KB
Time 15.3 s
Matania Judgment
Gameplay
9
UX
7
Code quality
8
Completeness
10
Fidelity
10
Overall
9.25
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The gameplay is functional and complete. The user experience is somewhat minimalist (no proper restart button, use of alert() and reload()), but it meets the prompt's expectations.
openai gpt-5.4
7.5/10 22.5 s
openai gpt-5.4
Tokens 1 947
Source code 6.2 KB
Time 22.5 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
8
Fidelity
6
Overall
7.5
Review
The code is of very high quality (ES5 compliant, clean collision logic, smooth controls). However, fidelity is penalized because the code is truncated at the end (it cuts off abruptly at 'doc'), which makes the script incomplete and non-executable as is. The visual rendering is excellent and meets design expectations.
openai gpt-5.4-mini
9.6/10 6.6 s
openai gpt-5.4-mini
Tokens 1 027
Source code 2.6 KB
Time 6.6 s
Matania Judgment
Gameplay
9
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.63
Review
The code strictly adheres to all technical constraints: ES5, 400x400 canvas, the requested structure, and hybrid controls (mouse/keyboard). The gameplay is fluid with satisfying ball physics management (variable bounce effect depending on the paddle impact), and the interface is clean and comprehensive.
openai gpt-5.4-nano
9.8/10 18.9 s
openai gpt-5.4-nano
Tokens 2 423
Source code 7.8 KB
Time 18.9 s
Matania Judgment
Gameplay
10
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.75
Review
The model perfectly adhered to all technical constraints (ES5, 400x400 Canvas, raw HTML structure). The code is of very high quality, featuring sophisticated collision handling (bounce vector calculation) and a very clean, modern visual render. Although the provided code is truncated at the end, the implemented logic is complete and functional.
openai gpt-5.5
9.5/10 17.2 s
openai gpt-5.5
Tokens 1 033
Source code 2.6 KB
Time 17.2 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the implementation of mouse and keyboard controls. The gameplay is smooth, featuring correct handling of bounces and paddle physics. The code is concise, efficient, and follows the requested structure.
openai gpt-5.5-pro
9.9/10 271.4 s
openai gpt-5.5-pro
Tokens 1 567
Source code 4.7 KB
Time 271.4 s
Matania Judgment
Gameplay
10
UX
9
Code quality
10
Completeness
10
Fidelity
10
Overall
9.88
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The code is clean, encapsulated within an IIFE, and implements controls (mouse and keyboard) smoothly. The gameplay is complete, featuring victory, defeat, and score management.
productivia matania-latest
9.5/10 9.4 s
productivia matania-latest
Tokens 1 402
Source code 4.1 KB
Time 9.4 s
Matania Judgment
Gameplay
9
UX
8
Code quality
9
Completeness
10
Fidelity
10
Overall
9.5
Review
The model perfectly adhered to all technical constraints: raw HTML format, use of ES5, canvas dimensions (400x400), controls (mouse and keyboard), and the requested structure. The code is clean, well-structured using requestAnimationFrame, and the game is fully functional with correct collision detection and game-over state management.
xai grok-4-1-fast-non-reasoning
9.3/10 8.4 s
xai grok-4-1-fast-non-reasoning
Tokens 1 103
Source code 2.9 KB
Time 8.4 s
Matania Judgment
Gameplay
8
UX
7
Code quality
9
Completeness
10
Fidelity
10
Overall
9.25
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the game rules. The code is clean and functional, although the paddle is invisible in the screenshot because its Y-coordinate is never explicitly initialized (it remains undefined), which is a minor logical bug in the paddle object implementation. However, the overall structure and faithfulness to the prompt are excellent.
xai grok-4-1-fast-reasoning
9.6/10 38.0 s
xai grok-4-1-fast-reasoning
Tokens 1 329
Source code 3.8 KB
Time 38.0 s
Matania Judgment
Gameplay
9
UX
9
Code quality
9
Completeness
10
Fidelity
10
Overall
9.63
Review
The model perfectly adhered to all technical constraints: raw HTML format, 400x400 Canvas, ES5 JS, and the requested mechanics. The code is clean and well-structured, featuring an efficient game loop that handles both mouse and keyboard input. The visual rendering is clear, and end-game states are well-managed.
Code