Games

Playable mini-games (retro, arcade, board) judged on playability and code.

2 tasks · 17 models tested · 34 results

Space Invaders (retro)

html

anthropic claude-haiku-4-5-20251001

7.8/10 12.3 s

anthropic claude-haiku-4-5-20251001

Tokens 3 059

Source code 6.1 KB

Time 12.3 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

7.75

Review

The code is of very high quality, featuring a clean game loop and successful sound management via the Web Audio API. However, the model cut the code off before the end (the script is not closed and the draw function is truncated), which prevents full execution and causes the faithfulness score to drop. The gameplay and visual rendering perfectly respect the requested retro aesthetic.

anthropic claude-opus-4-6

9.9/10 74.8 s

anthropic claude-opus-4-6

Tokens 7 164

Source code 18.6 KB

Time 74.8 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical and aesthetic constraints. The code is clean, uses ES5 as requested, and the implementation of sprites as pixel matrices is excellent for the retro style. The visual rendering is faithful to the black and green theme, and the game structure is complete and robust.

anthropic claude-opus-4-7

9.5/10 22.8 s

anthropic claude-opus-4-7

Tokens 2 807

Source code 3.5 KB

Time 22.8 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: a single HTML format, a 400x400 Canvas, ES5 JavaScript, and a retro black-and-green style. The gameplay is complete, featuring collision detection, lives, scoring, and sound (via the Web Audio API). The code is concise and well-structured, despite the requirement of being contained within a single document.

anthropic claude-sonnet-4-6

1.3/10 47.6 s

anthropic claude-sonnet-4-6

Tokens 5 675

Source code 11.4 KB

Time 47.6 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

1.25

Review

The model generated incomplete code that cuts off abruptly in the middle of a function (truncated), making the game completely unplayable. Consequently, no gameplay, UX, or completeness criteria can be validated. Fidelity is very low because the final output fails to meet the constraint of providing a "playable" game.

google gemini-flash-latest

9.6/10 18.4 s

google gemini-flash-latest

Tokens 2 561

Source code 5.2 KB

Time 18.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints: use of AudioContext for sound, ES5 JavaScript, a specific canvas structure, and strict compliance with the output format. The gameplay is complete, featuring life management, scoring, win and loss conditions, all wrapped in a consistent retro style.

google gemini-flash-lite-latest

6.9/10 3.7 s

google gemini-flash-lite-latest

Tokens 1 384

Source code 1.9 KB

Time 3.7 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

6.88

Review

The gameplay is functional and adheres to the core logic of Space Invaders. However, prompt fidelity is penalized by the total absence of the requested sound effects and the failure to follow the output format (the code is a single compact line, making it difficult to read). The visual style is appropriately retro but very minimalist.

mistral mistral-large-latest

1.3/10 27.1 s

mistral mistral-large-latest

Tokens 2 248

Source code 6.9 KB

Time 27.1 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

1.25

Review

The code is truncated (incomplete) at the end, which prevents any execution and makes the game completely unplayable. Although the initial structure adheres to the format and style constraints, the absence of the final rendering loop and the enemy shooting logic renders the creation invalid for the requested task.

mistral mistral-medium-latest

2.3/10 14.7 s

mistral mistral-medium-latest

Tokens 2 612

Source code 8.3 KB

Time 14.7 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

2.25

Review

The code is incomplete and cuts off abruptly in the middle of a conditional statement (truncated), making the game completely unplayable. Although the initial structure adheres to the style and format constraints, the absence of the end of the script prevents any execution, resulting in zero scores for gameplay, UX, and completeness criteria.

openai gpt-4o-mini

7.0/10 53.8 s

openai gpt-4o-mini

Tokens 1 560

Source code 4.2 KB

Time 53.8 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

Review

The gameplay is functional and adheres to the core mechanics, but the model failed completely on the sound effects constraint (point 7), which is a major omission. Furthermore, although the visual style is respected, the lack of sound and the simplistic collision handling (the player can only fire one shot at a time) reduce the overall fidelity to the complex prompt.

openai gpt-5.4

1.3/10 40.9 s

openai gpt-5.4

Tokens 3 159

Source code 10.4 KB

Time 40.9 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

1.25

Review

The model failed critically: the provided code is truncated (it cuts off abruptly in the middle of a function), making the game completely unplayable. As a result, no functionality can be verified, and the rendering is either non-existent or incomplete. The code score is minimal because while the initial structure is correct, it is useless without the ending.

openai gpt-5.4-mini

9.8/10 10.1 s

openai gpt-5.4-mini

Tokens 1 879

Source code 5.4 KB

Time 10.1 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and stylistic constraints. The code is clean, utilizes the Web Audio API for sound effects as requested, and implements a robust game loop with collision detection and progressive difficulty scaling. The visual rendering is faithful to the requested retro arcade aesthetic.

openai gpt-5.4-nano

9.8/10 30.6 s

openai gpt-5.4-nano

Tokens 4 309

Source code 14.9 KB

Time 30.6 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and aesthetic constraints. The code is clean (use of an IIFE, AudioContext management, robust game loop) and the visual rendering is excellent, featuring the requested scanline effect and retro green/black style. The gameplay is complete, including life management, enemy projectiles, and scoring.

openai gpt-5.5

7.0/10 38.3 s

openai gpt-5.5

Tokens 2 263

Source code 6.9 KB

Time 38.3 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

Review

The code is of very high technical quality (Web Audio API management, smooth game loop, block-drawn sprites), but the model generated truncated code (the file cuts off abruptly at the end of the render function). Consequently, the game is incomplete and cannot be executed as is, which drastically lowers the completeness and fidelity scores despite the excellence of the written portion.

openai gpt-5.5-pro

9.8/10 138.7 s

openai gpt-5.5-pro

Tokens 1 947

Source code 5.7 KB

Time 138.7 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and aesthetic constraints. The code is concise, efficient (using dt for smoothness), and implements all requested mechanics (enemy fire, health, bunkers, sounds via the Web Audio API). The visual rendering is very faithful to the retro arcade spirit, featuring a consistent color palette.

productivia matania-latest

6.0/10 14.9 s

productivia matania-latest

Tokens 2 165

Source code 6.6 KB

Time 14.9 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

Review

The model failed on the most critical output constraint: the code is truncated at the end (the script cuts off abruptly in the middle of the draw function), making the game non-functional (very low completeness). However, the logic implemented up to the cutoff perfectly adheres to the requested gameplay, style, and structural instructions.

xai grok-4-1-fast-non-reasoning

9.5/10 17.1 s

xai grok-4-1-fast-non-reasoning

Tokens 1 603

Source code 4.4 KB

Time 17.1 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: the canvas format, the use of AudioContext for sound, the green/black retro style, and ES5 code. The gameplay is complete, featuring life management, scoring, enemy projectiles, and win/loss conditions. The code structure is clean and efficient for a single-file script.

xai grok-4-1-fast-reasoning

9.6/10 52.3 s

xai grok-4-1-fast-reasoning

Tokens 1 558

Source code 4.2 KB

Time 52.3 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and aesthetic constraints. The code is clean, utilizes the Web Audio API for the requested sound effects, and faithfully implements the game loop, collision detection, and win/loss states. The visual rendering strictly follows the requested retro black-and-green style.

Casse-briques

html

anthropic claude-haiku-4-5-20251001

9.6/10 11.5 s

anthropic claude-haiku-4-5-20251001

Tokens 2 650

Source code 5.7 KB

Time 11.5 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints: ES5 JavaScript, requestAnimationFrame, canvas size, and a dual control system (mouse and arrow keys). The code is clean and well-structured around a centralized 'game' object, and the bounce mechanics—including the management of the angle based on the impact point on the paddle—are of high quality.

anthropic claude-opus-4-6

6.8/10 26.4 s

anthropic claude-opus-4-6

Tokens 2 772

Source code 6.4 KB

Time 26.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

6.75

Review

The code is of very high technical quality (precise bounce logic, collision management, polished design), but the model cut off the generation before the end of the script (the code stops abruptly at 'if (won) { c'). As a result, the game is incomplete and cannot function as is, which heavily impacts completeness and faithfulness (the prompt requested a PLAYABLE game).

anthropic claude-opus-4-7

9.5/10 14.0 s

anthropic claude-opus-4-7

Tokens 1 669

Source code 1.7 KB

Time 14.0 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, specifically the use of ES5, the requested HTML structure, and hybrid controls (mouse and keyboard). The code is concise and efficient, and the visual output is clean and functional, featuring correct handling of collisions and game-over states.

anthropic claude-sonnet-4-6

9.9/10 22.0 s

anthropic claude-sonnet-4-6

Tokens 2 675

Source code 5.4 KB

Time 22.0 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, specifically the use of ES5, requestAnimationFrame, and the canvas dimensions. The gameplay is smooth, featuring sophisticated management of rebound angles on the paddle, and the interface is clean and comprehensive (score, win/loss states).

google gemini-flash-latest

9.5/10 11.1 s

google gemini-flash-latest

Tokens 1 531

Source code 3.4 KB

Time 11.1 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: canvas format, use of ES5, requestAnimationFrame, and the game rules. The code is clean and well-structured, and the visual rendering is faithful to a classic Breakout game. The only slight weakness lies in the UX (using 'alert' for the game over screen), but this remains consistent with the standards for this type of rapid generation.

google gemini-flash-lite-latest

9.3/10 3.5 s

google gemini-flash-lite-latest

Tokens 1 168

Source code 1.9 KB

Time 3.5 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The game is functional and complete, although the brick collisions are simplistic (only on the Y-axis) and the score is written over the paddle, which slightly detracts from the visual UX.

mistral mistral-large-latest

9.8/10 15.4 s

mistral mistral-large-latest

Tokens 1 361

Source code 3.9 KB

Time 15.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints: raw HTML format, a 400x400 canvas, ES5 JavaScript, and full implementation of the rules. The code is clean and functional, and the controls (mouse and keyboard) are well-integrated. The visual rendering is compliant, and the gameplay is smooth.

mistral mistral-medium-latest

9.5/10 11.2 s

mistral mistral-medium-latest

Tokens 1 475

Source code 4.3 KB

Time 11.2 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 Canvas, ES5 JavaScript, and the use of requestAnimationFrame. The game is complete and fully functional with both control modes (mouse and keyboard), and the bricks are correctly colored as requested. The code is clean and well-structured.

openai gpt-4o-mini

9.3/10 15.3 s

openai gpt-4o-mini

Tokens 1 089

Source code 2.8 KB

Time 15.3 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The gameplay is functional and complete. The user experience is somewhat minimalist (no proper restart button, use of alert() and reload()), but it meets the prompt's expectations.

openai gpt-5.4

7.5/10 22.5 s

openai gpt-5.4

Tokens 1 947

Source code 6.2 KB

Time 22.5 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

7.5

Review

The code is of very high quality (ES5 compliant, clean collision logic, smooth controls). However, fidelity is penalized because the code is truncated at the end (it cuts off abruptly at 'doc'), which makes the script incomplete and non-executable as is. The visual rendering is excellent and meets design expectations.

openai gpt-5.4-mini

9.6/10 6.6 s

openai gpt-5.4-mini

Tokens 1 027

Source code 2.6 KB

Time 6.6 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.63

Review

The code strictly adheres to all technical constraints: ES5, 400x400 canvas, the requested structure, and hybrid controls (mouse/keyboard). The gameplay is fluid with satisfying ball physics management (variable bounce effect depending on the paddle impact), and the interface is clean and comprehensive.

openai gpt-5.4-nano

9.8/10 18.9 s

openai gpt-5.4-nano

Tokens 2 423

Source code 7.8 KB

Time 18.9 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints (ES5, 400x400 Canvas, raw HTML structure). The code is of very high quality, featuring sophisticated collision handling (bounce vector calculation) and a very clean, modern visual render. Although the provided code is truncated at the end, the implemented logic is complete and functional.

openai gpt-5.5

9.5/10 17.2 s

openai gpt-5.5

Tokens 1 033

Source code 2.6 KB

Time 17.2 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the implementation of mouse and keyboard controls. The gameplay is smooth, featuring correct handling of bounces and paddle physics. The code is concise, efficient, and follows the requested structure.

openai gpt-5.5-pro

9.9/10 271.4 s

openai gpt-5.5-pro

Tokens 1 567

Source code 4.7 KB

Time 271.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the use of requestAnimationFrame. The code is clean, encapsulated within an IIFE, and implements controls (mouse and keyboard) smoothly. The gameplay is complete, featuring victory, defeat, and score management.

productivia matania-latest

9.5/10 9.4 s

productivia matania-latest

Tokens 1 402

Source code 4.1 KB

Time 9.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints: raw HTML format, use of ES5, canvas dimensions (400x400), controls (mouse and keyboard), and the requested structure. The code is clean, well-structured using requestAnimationFrame, and the game is fully functional with correct collision detection and game-over state management.

xai grok-4-1-fast-non-reasoning

9.3/10 8.4 s

xai grok-4-1-fast-non-reasoning

Tokens 1 103

Source code 2.9 KB

Time 8.4 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 canvas, ES5 JavaScript, and the game rules. The code is clean and functional, although the paddle is invisible in the screenshot because its Y-coordinate is never explicitly initialized (it remains undefined), which is a minor logical bug in the paddle object implementation. However, the overall structure and faithfulness to the prompt are excellent.

xai grok-4-1-fast-reasoning

9.6/10 38.0 s

xai grok-4-1-fast-reasoning

Tokens 1 329

Source code 3.8 KB

Time 38.0 s

Matania Judgment

Gameplay

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints: raw HTML format, 400x400 Canvas, ES5 JS, and the requested mechanics. The code is clean and well-structured, featuring an efficient game loop that handles both mouse and keyboard input. The visual rendering is clear, and end-game states are well-managed.