Utilities

Small interactive tools judged on usability and code quality.

4 tasks · 19 models tested · 76 results

Calculator

html

anthropic claude-haiku-4-5-20251001

9.4/10 11.7 s

anthropic claude-haiku-4-5-20251001

Tokens 2 327

Source code 5.1 KB

Time 11.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.38

Review

The model perfectly adhered to all prompt constraints, specifically regarding size, output format, and functionality. The aesthetics are modern and polished, featuring smooth transition effects. The code is clean and the calculation logic is robust, although using a switch statement instead of eval() (which was permitted) is technically cleaner.

anthropic claude-opus-4-6

6.9/10 32.7 s

anthropic claude-opus-4-6

Tokens 3 255

Source code 7.3 KB

Time 32.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.88

Review

The visual rendering is excellent—modern and perfectly adhering to design and size constraints. However, the code is truncated at the end (the `calculate()` function is incomplete), which makes the calculator totally non-functional for the final calculation. Fidelity is penalized by this abrupt interruption of the source code.

anthropic claude-opus-4-7

9.5/10 14.1 s

anthropic claude-opus-4-7

Tokens 1 880

Source code 2.7 KB

Time 14.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all prompt constraints, including the target size, output format, and requested features. The aesthetics are polished, featuring functional hover/active effects and a clear visual hierarchy for operators. The code is clean—even though the use of eval() was permitted by the prompt—and error handling is implemented.

anthropic claude-sonnet-4-6

6.0/10 23.6 s

anthropic claude-sonnet-4-6

Tokens 2 760

Source code 6.3 KB

Time 23.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is excellent and perfectly adheres to the requested aesthetic. However, the code is incomplete (truncated at the end of the calculate function), which renders the calculator completely non-functional for the user. Fidelity is penalized by this major cutoff, which prevents validation of the completeness of the requested features.

google gemini-flash-latest

9.8/10 26.1 s

google gemini-flash-latest

Tokens 1 537

Source code 4.4 KB

Time 26.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints: the standalone HTML format, the 400x400 size, the requested features, and the iOS-inspired aesthetic. The code is clean, handles font scaling well based on the number's length, and the design implementation is exemplary.

google gemini-flash-lite-latest

9.0/10 3.6 s

google gemini-flash-lite-latest

Tokens 1 156

Source code 2.3 KB

Time 3.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model follows almost all instructions, particularly regarding the requested size, aesthetics, and functionality. The code is clean and functional, although the opening <html> tag is missing and the HTML structure is slightly incomplete (an orphaned </head> tag), which does not affect the rendering. The aesthetics are polished, featuring effective hover effects.

mistral mistral-large-latest

7.5/10 16.8 s

mistral mistral-large-latest

Tokens 1 430

Source code 4.2 KB

Time 16.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

7.5

Review

The model successfully adhered to the requested aesthetics and functionalities, including the +/- button. However, its fidelity is penalized by two major errors: the HTML code is malformed (missing or misplaced closing tags such as </div and </html) and the size constraint is technically exceeded because the container is 400px but the body uses flexbox without limiting overflow, even though the visual rendering is correct.

mistral mistral-medium-latest

4.5/10 18.7 s

mistral mistral-medium-latest

Tokens 1 939

Source code 6.0 KB

Time 18.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.5

Review

The model failed on the most critical constraint: the code is truncated at the end, rendering the calculator completely non-functional (the `toggleSign` function is not closed and the script is incomplete). Furthermore, the HTML code is malformed (a `</head>` tag is present even though no `<head>` tag was opened). Although the aesthetics are successful, the inability to execute the code drastically reduces the fidelity and completeness scores.

mistral mistral-small-latest

8.5/10 5.1 s

mistral mistral-small-latest

Tokens 1 113

Source code 2.9 KB

Time 5.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.5

Review

The model respected almost all constraints, including the target size and aesthetics. However, the code contains a minor structural error (a closing </head> tag without an opening <head> tag), though this does not prevent it from functioning. The +/- functionality is present, which is a good point.

openai gpt-4o-mini

2.5/10 13.6 s

openai gpt-4o-mini

Tokens 1 366

Source code 3.7 KB

Time 13.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

2.5

Review

The code is severely corrupted: many CSS values are empty or malformed (margin, box-shadow, rgba), which prevents correct rendering. Furthermore, the model failed to follow the instruction to output 'ONLY raw HTML code' as it included orphan tags and an incomplete structure. While functionality is partially present, the execution is flawed due to CSS syntax errors.

openai gpt-5.4

1.3/10 20.0 s

openai gpt-5.4

Tokens 1 910

Source code 7.2 KB

Time 20.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.25

Review

The model failed catastrophically on almost every count. The source code is truncated (incomplete), contains numerous CSS syntax errors (empty values such as 'margin: ;' or 'background: linear-gradient(180deg, #111827 %') that prevent proper rendering, and JS errors. The visual result is completely broken, and the code fails to follow the instruction to output a functional, standalone HTML file.

openai gpt-5.4-mini

1.3/10 7.4 s

openai gpt-5.4-mini

Tokens 1 739

Source code 5.8 KB

Time 7.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.25

Review

The code is completely corrupted: numerous hexadecimal values, numbers, and function arguments are either missing or malformed (e.g., '#1b223', 'rgba(,,,.12)', 'match[]'), rendering the CSS and JS invalid. The visual rendering is likely broken, and the calculator is non-functional. The model failed on almost every technical and syntax constraint.

openai gpt-5.4-nano

0.1/10 22.8 s

openai gpt-5.4-nano

Tokens 3 055

Source code 10.3 KB

Time 22.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

0.13

Review

The code is completely corrupted and incomplete: it contains numerous empty values (e.g., margin: ;), CSS syntax errors (rgba with empty commas), and the script cuts off abruptly in the middle of a function. As it stands, the code is unusable and cannot produce any functional or correct visual output.

openai gpt-5.4-pro

4.9/10 164.6 s

openai gpt-5.4-pro

Tokens 1 370

Source code 3.8 KB

Time 164.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed the completeness and fidelity constraints because the source code is truncated (the script cuts off abruptly in the middle of a function), rendering the calculator completely non-functional. Although the visual rendering (CSS) and HTML structure are excellent and adhere to the aesthetic guidelines, the absence of the final JavaScript logic prevents any actual use.

openai gpt-5.5

0.5/10 19.5 s

openai gpt-5.5

Tokens 1 610

Source code 4.7 KB

Time 19.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

0.5

Review

The code is completely corrupted and unusable: it contains numerous critical syntax errors (empty CSS properties, incomplete color values like 'rgba(,,,.45)', and truncated JavaScript expressions like 'i>=&&'). In its current state, the code cannot even execute properly, rendering the calculator non-functional and the visual rendering likely broken or incomplete.

openai gpt-5.5-pro

6.3/10 140.7 s

openai gpt-5.5-pro

Tokens 1 812

Source code 5.5 KB

Time 140.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.25

Review

The visual rendering is excellent, looking very professional and perfectly adhering to the design and size constraints. However, the source code is truncated (it cuts off abruptly in the middle of a JS function), which makes the calculator completely non-functional (failure of completeness). Fidelity is penalized because the model failed to provide a complete and usable file, despite aesthetically following the instructions.

productivia matania-latest

9.5/10 9.6 s

productivia matania-latest

Tokens 1 786

Source code 5.3 KB

Time 9.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all prompt constraints, including the target size of 400x400 and the inclusion of all specific buttons. The rendering is clean and modern, and the logic implementation (including the +/- sign and the order of operations via eval) is robust for a benchmark. The code is well-structured and self-contained.

xai grok-4-1-fast-non-reasoning

1.3/10 15.0 s

xai grok-4-1-fast-non-reasoning

Tokens 1 533

Source code 4.6 KB

Time 15.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.25

Review

The code is completely defective and unusable: it contains numerous critical syntax errors (empty RGBA color values, incomplete CSS properties, invalid regex) that prevent rendering and execution. The calculator is functionally incomplete (the '0' button is empty, calculations fail) and fails to meet basic constraints due to these massive generation errors.

xai grok-4-1-fast-reasoning

1.3/10 82.2 s

xai grok-4-1-fast-reasoning

Tokens 1 709

Source code 5.2 KB

Time 82.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.25

Review

The code is completely non-functional and syntactically broken. It contains numerous critical errors (empty RGBA color values like 'rgba(,,,.4)', undefined variables, and JS syntax errors within loops and calculations) that prevent rendering and execution. The visual output is broken, and most of the requested features are inaccessible due to these massive typos.

To-do list

html

anthropic claude-haiku-4-5-20251001

3.8/10 9.0 s

anthropic claude-haiku-4-5-20251001

Tokens 2 211

Source code 6.4 KB

Time 9.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

3.75

Review

The model failed critically by providing truncated code (the script cuts off abruptly at 'va'). Consequently, the application is completely inoperable: tasks cannot be added, and the update logic is incomplete. Although the CSS styling and initial HTML structure are correct and adhere to the visual constraints, the lack of functional code renders the creation useless.

anthropic claude-opus-4-6

5.8/10 27.0 s

anthropic claude-opus-4-6

Tokens 2 833

Source code 5.9 KB

Time 27.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.75

Review

The model produced visually impressive code with a modern design and smooth animations. However, the code is truncated at the end (missing the closing of the default list and the script/html tags), which renders the application non-functional in its current state. Fidelity is heavily impacted by this major technical error of incomplete generation.

anthropic claude-opus-4-7

9.5/10 12.4 s

anthropic claude-opus-4-7

Tokens 1 902

Source code 2.8 KB

Time 12.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical and functional constraints, including the 400x400 dimensions and the French pre-filling. The output is clean, the interface is intuitive, and the code is well-structured despite the use of 'var' (which is more outdated than 'let/const').

anthropic claude-sonnet-4-6

9.5/10 18.9 s

anthropic claude-sonnet-4-6

Tokens 2 399

Source code 5.2 KB

Time 18.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all constraints, including dimensions, French pre-filling, and the output format. The final result is clean and modern, and the user experience is smooth, featuring correct handling of the counter and task addition. The code is well-structured, although using an IIFE inside a for loop is a somewhat dated approach for managing index scoping.

google gemini-flash-latest

9.6/10 16.0 s

google gemini-flash-latest

Tokens 1 711

Source code 3.6 KB

Time 16.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and functional constraints, including the zone dimensions, the French pre-filling, and the code's autonomy. The output is clean, the interface is intuitive, and the code is well-structured with proper state management (counter). Only one minor HTML structural detail is visible (an orphaned </head> tag at the beginning), but this does not affect the functionality.

google gemini-flash-lite-latest

8.1/10 6.1 s

google gemini-flash-lite-latest

Tokens 934

Source code 1.6 KB

Time 6.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.13

Review

The model respects almost all constraints, including pre-filling and dimensions. However, there is a major logical error in the toggle() function, which uses a global span selector instead of targeting the current row's element, breaking the correct interactivity. The code is functional but lacks robustness (use of var, direct DOM manipulation via innerHTML).

mistral mistral-large-latest

9.5/10 27.2 s

mistral mistral-large-latest

Tokens 1 512

Source code 4.6 KB

Time 27.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, including the 400x400 format, the French pre-filling, and the file's autonomy. The code is clean and well-structured, and the interface is intuitive, providing immediate visual feedback (text striking). The remaining task counter management is correctly implemented for both static and dynamic tasks.

mistral mistral-medium-latest

9.0/10 12.6 s

mistral mistral-medium-latest

Tokens 1 885

Source code 5.9 KB

Time 12.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model adheres almost perfectly to all constraints, including dimensions and pre-filling. The code is clean and functional, although it lacks the <html> opening tag and the <head> opening to be a fully valid HTML document, which is a minor issue for a raw snippet.

mistral mistral-small-latest

9.6/10 5.6 s

mistral mistral-small-latest

Tokens 1 202

Source code 3.3 KB

Time 5.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, specifically the 400x400 format, code autonomy, and the absence of superfluous text. The output is clean and functional, and the code is well-structured with correct state management (counter and barriers). Prompt fidelity is exemplary.

openai gpt-4o-mini

6.4/10 10.0 s

openai gpt-4o-mini

Tokens 912

Source code 2.1 KB

Time 10.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.38

Review

The model respects almost all the requested features, including pre-filling and the counter. However, fidelity is penalized by malformed HTML code (misplaced </head> and <body> tags due to the absence of a <head> section) and CSS containing empty properties (margin:; padding:;), which impairs the visual rendering. The aesthetic aspect is extremely minimalist and lacks proper structure.

openai gpt-5.4

1.4/10 30.9 s

openai gpt-5.4

Tokens 1 658

Source code 5.0 KB

Time 30.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.38

Review

The code is completely corrupted: numerous CSS and JavaScript values are empty (e.g., 'margin: ;', 'remaining = ;'), preventing proper rendering and execution. The model failed to generate numerical values and essential properties, leaving the application non-functional and visually broken.

openai gpt-5.4-mini

4.1/10 7.9 s

openai gpt-5.4-mini

Tokens 1 320

Source code 3.7 KB

Time 7.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.13

Review

The code is completely unusable because it contains severe syntax errors (missing values after '=' signs, such as 'margin: ;' or 'remaining = ;', and incomplete 'for' loops), which prevents execution. Although the design intent is correct, the model generated broken code that fails to meet the constraint of providing functional code. Fidelity is very low as the application cannot start.

openai gpt-5.4-nano

4.4/10 13.1 s

openai gpt-5.4-nano

Tokens 1 770

Source code 5.4 KB

Time 13.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.38

Review

The code is severely corrupted: numerous CSS attributes (margin, padding) and crucial JavaScript values (for loops, array indices) are empty or missing, preventing the script from executing correctly (notably the pre-filling functionality). Although the visual appearance seems correct in the screenshot, the source code structure is syntactically invalid in several critical areas, making the application unstable or non-functional depending on the environment.

openai gpt-5.4-pro

9.6/10 218.1 s

openai gpt-5.4-pro

Tokens 1 691

Source code 5.1 KB

Time 218.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including dimensions, French pre-filling, and code autonomy. The output is clean and modern, providing a smooth user experience with correct handling of the counter and states. The code is well-structured and utilizes efficient event delegation.

openai gpt-5.5

3.8/10 16.0 s

openai gpt-5.5

Tokens 1 446

Source code 4.1 KB

Time 16.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

3.75

Review

The code is severely defective: it contains numerous empty CSS properties (e.g., 'margin: ;', 'border: ;') and uninitialized JS variables (e.g., 'var remaining = ;', 'i = ;'), making the code syntactically invalid and broken. Although the logical structure appears to follow the requested functionality, the absence of values in the code prevents correct and stable rendering, thereby failing in terms of technical fidelity and quality.

openai gpt-5.5-pro

9.6/10 104.6 s

openai gpt-5.5-pro

Tokens 1 290

Source code 3.6 KB

Time 104.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the 400x400 dimension, the French pre-filling, and the output format. The code is clean, well-structured with an IIFE, and the visual rendering is both modern and functional.

productivia matania-latest

9.5/10 9.1 s

productivia matania-latest

Tokens 1 602

Source code 4.9 KB

Time 9.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical and functional constraints, including dimensions, French pre-filling, and the output format. The interface is clean and functional, and the code is well-structured for pure JS. The visual rendering is faithful to the look of a standalone mini-app.

xai grok-4-1-fast-non-reasoning

1.4/10 7.3 s

xai grok-4-1-fast-non-reasoning

Tokens 1 178

Source code 3.3 KB

Time 7.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.38

Review

The code is completely unusable because it contains multiple critical syntax errors (empty values in CSS properties, uninitialized variables in the JS such as 'remaining = ;' or 'i = ;'). These errors prevent correct rendering and logic execution, making the application non-functional. The model failed on almost all technical reliability constraints.

xai grok-4-1-fast-reasoning

8.9/10 15.1 s

xai grok-4-1-fast-reasoning

Tokens 928

Source code 2.3 KB

Time 15.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model perfectly adhered to all technical and functional constraints, including the output format, dimensions, pre-filling, and language. The code is functional and clean, although very minimalist in terms of CSS. The user experience is smooth and meets expectations for an application of this scale.

Unit converter

html

anthropic claude-haiku-4-5-20251001

9.5/10 7.0 s

anthropic claude-haiku-4-5-20251001

Tokens 1 867

Source code 4.4 KB

Time 7.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all constraints, including the output format (raw HTML), the 400x400 area, and the conversion formulas. The aesthetics are polished, featuring a modern gradient and distinct color codes for each unit. The real-time update logic is smooth, and the rounding is correctly implemented.

anthropic claude-opus-4-6

9.8/10 20.8 s

anthropic claude-opus-4-6

Tokens 2 287

Source code 4.6 KB

Time 20.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the area size (400x400) and the absence of superfluous text around the code. The aesthetics are modern and professional, and the real-time conversion logic with rounding to 2 decimal places is implemented without error. The code is clean and well-structured, and the user experience is seamless thanks to the polished design.

anthropic claude-opus-4-7

9.0/10 11.1 s

anthropic claude-opus-4-7

Tokens 1 805

Source code 2.4 KB

Time 11.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model adheres almost perfectly to all constraints, including the 400x400 dimension and real-time updates. The design is modern and very clean. The only slight drawback is regarding fidelity: the source code is malformed (extra or misplaced closing tags like </div> and </body>, and the absence of the <html> and <head> opening tags), which makes the HTML technically invalid even though it remains functional in a browser.

anthropic claude-sonnet-4-6

9.9/10 18.4 s

anthropic claude-sonnet-4-6

Tokens 2 067

Source code 4.0 KB

Time 18.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all constraints, including the zone dimensions and conversion formulas. The output is aesthetically superior, featuring a modern design (glassmorphism), excellent readability, and fluid interactivity that meets all expectations.

google gemini-flash-latest

8.0/10 13.3 s

google gemini-flash-latest

Tokens 1 454

Source code 2.5 KB

Time 13.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The tool is perfectly functional and aesthetically very well-executed, accurately respecting both formulas and visual aspects. However, fidelity is penalized because the model forgot to include the opening <html> and <body> tags (it starts with <style> and ends with </html>), which makes the code technically incomplete for a 'standalone HTML' file, even if browsers correct it automatically.

google gemini-flash-lite-latest

9.5/10 4.1 s

google gemini-flash-lite-latest

Tokens 1 045

Source code 1.5 KB

Time 4.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, including the 400x400 area, rounding to 2 decimal places, and real-time updates. The code is clean; although the absence of opening <html> and <body> tags is technically an omission of full structure, the standalone HTML rendering works perfectly. The aesthetics are sober and efficient, meeting all expectations.

mistral mistral-large-latest

10.1 s

mistral mistral-large-latest

Tokens 928

Source code 2.2 KB

Time 10.1 s

mistral mistral-medium-latest

9.5/10 16.4 s

mistral mistral-medium-latest

Tokens 1 632

Source code 4.9 KB

Time 16.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adheres to all prompt constraints, including dimensions, formulas, and the 'raw HTML' requirement. The design is clean, and the interaction is fluid and instantaneous. Only a minor HTML syntax error was noted (improperly closed </div> tags as </div >), but this does not affect the rendering or functionality.

mistral mistral-small-latest

9.5/10 4.0 s

mistral mistral-small-latest

Tokens 879

Source code 2.0 KB

Time 4.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all constraints, including the mathematical formulas, rounding, and output format (raw HTML only). The interface is clean, centered, and respects the 400x400 space. The code is concise, and the bidirectional update logic is seamless.

openai gpt-4o-mini

7.5/10 11.5 s

openai gpt-4o-mini

Tokens 907

Source code 2.0 KB

Time 11.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

7.5

Review

The code is functional and adheres to all conversion and rounding rules. However, the HTML structure is malformed (</head> and </html> tags are present without opening tags, margin: is empty), which detracts from the code quality. The aesthetics are very basic, and while the 400x400 area constraint is partially met via CSS, the visual rendering lacks polish (irregular spacing).

openai gpt-5.4

5.1/10 14.3 s

openai gpt-5.4

Tokens 1 318

Source code 3.6 KB

Time 14.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.13

Review

The model failed on a major technical constraint: the source code contains numerous CSS syntax errors (empty properties such as 'margin: ;' or 'padding: ;') that break the rendering. Although the visual output appears correct in the screenshot (likely due to browser tolerance), the code is invalid. Furthermore, the <html> and <head> opening tags are missing, which violates the request for standalone, clean HTML.

openai gpt-5.4-mini

8.8/10 6.5 s

openai gpt-5.4-mini

Tokens 1 374

Source code 3.8 KB

Time 6.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.75

Review

The model adheres to almost all constraints, including the formulas and the live update. However, the source code contains CSS syntax errors (empty values for margin and shadow) that could affect rendering depending on the browser, even though the visual result remains clean. The 'ONLY raw HTML code' output constraint is respected.

openai gpt-5.4-nano

5.8/10 11.7 s

openai gpt-5.4-nano

Tokens 1 661

Source code 4.9 KB

Time 11.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.75

Review

The code is severely corrupted: it contains numerous invalid CSS properties (margin:;, padding:;, border:;) and missing values within style functions, which breaks both the visual rendering and the structure. Although the JavaScript conversion logic is functional and follows the formulas, the failure to adhere to basic syntax and the absence of a proper <head> tag undermine the overall quality. Fidelity is penalized by these major syntax errors that prevent a clean render.

openai gpt-5.4-pro

9.6/10 330.4 s

openai gpt-5.4-pro

Tokens 1 506

Source code 4.3 KB

Time 330.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the zone dimensions and mathematical formulas. The design is modern and clean, and the user experience is seamless thanks to the real-time updates. The code is well-structured and self-contained as requested.

openai gpt-5.5

7.6/10 13.5 s

openai gpt-5.5

Tokens 1 222

Source code 3.2 KB

Time 13.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

7.63

Review

The converter is functional, and the interface is clean and intuitive. However, the source code contains critical syntax errors (empty CSS properties such as 'margin: ;' and 'padding: ;') that may affect rendering depending on the browser. Additionally, the HTML structure is malformed (using </div> instead of </div> and the presence of a </head> tag without an opening <head> tag), which undermines the code quality despite meeting the requested functionality.

openai gpt-5.5-pro

9.8/10 142.7 s

openai gpt-5.5-pro

Tokens 1 203

Source code 3.2 KB

Time 142.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and aesthetic constraints. The interface is clean, centered, and respects the 400x400 format. The conversion logic is fluid and precise, and the 'live' mode implementation is robust thanks to the update state management used to prevent infinite loops.

productivia matania-latest

9.0/10 6.7 s

productivia matania-latest

Tokens 1 181

Source code 3.1 KB

Time 6.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model respects almost all constraints, including the 400x400 area and the live update. The aesthetics are clean and functional. However, the source code is malformed as the <html> and <head> opening tags are missing, which is a structural error for a 'standalone HTML', even though the rendering is correct.

xai grok-4-1-fast-non-reasoning

2.4/10 5.4 s

xai grok-4-1-fast-non-reasoning

Tokens 907

Source code 2.1 KB

Time 5.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

2.38

Review

The code is severely corrupted: numerous CSS and JavaScript values are empty or malformed (e.g., margin:;, rgba(,,,.2), || ;), resulting in a broken visual render and a completely inoperative script. The conversion logic is also flawed, as it relies on a poorly managed 'c' variable within the updateAll function, preventing the requested live updates. The model failed on almost all technical and functional constraints.

xai grok-4-1-fast-reasoning

1.9/10 36.0 s

xai grok-4-1-fast-reasoning

Tokens 997

Source code 2.5 KB

Time 36.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.88

Review

The code is completely corrupted: numerous CSS values are empty or malformed (e.g., 'margin: ;', 'background: #ff2f5;', 'rgba(,,,.12)'), which prevents correct rendering. The fidelity is catastrophic because the model failed to respect basic language syntax, rendering the HTML/CSS file invalid and visually non-functional despite a coherent JS logic.

Palette generator

html

anthropic claude-haiku-4-5-20251001

9.6/10 8.9 s

anthropic claude-haiku-4-5-20251001

Tokens 2 202

Source code 4.2 KB

Time 8.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the 400x400 area and the color harmony logic. The code is clean and well-structured, and the user experience is seamless, providing effective visual feedback during copying. The only minor issue to note is the HTML structure (orphaned </head> and <body> tags because the <head> was not explicitly opened), but this does not affect the rendering.

anthropic claude-opus-4-6

6.8/10 34.7 s

anthropic claude-opus-4-6

Tokens 3 488

Source code 6.5 KB

Time 34.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.75

Review

The visual rendering is excellent and the UX is very polished with smooth animations. However, the code is incomplete (truncated at the end of the renderPalette function), which prevents the actual execution of the copy and regeneration features. Fidelity is penalized because the model failed to provide the full requested code, rendering the tool partially inoperative.

anthropic claude-opus-4-7

9.6/10 17.3 s

anthropic claude-opus-4-7

Tokens 2 458

Source code 3.9 KB

Time 17.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the zone dimensions and the coherent palette generation mode. The rendering is aesthetically pleasing with good contrast for the hex codes, and the implementation of the feedback (toast) and copy functionality is smooth and functional.

anthropic claude-sonnet-4-6

5.6/10 31.0 s

anthropic claude-sonnet-4-6

Tokens 3 152

Source code 6.8 KB

Time 31.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.63

Review

The model produced incomplete code (truncated at the end of the script), which prevents the actual execution of all functionalities, such as the rendering loop or the click-to-copy event. Although the visual aesthetics and color generation logic are excellent, the abrupt cutoff in the code is a major failure in technical fidelity.

google gemini-flash-latest

9.5/10 8.5 s

google gemini-flash-latest

Tokens 1 626

Source code 3.3 KB

Time 8.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, including the 400x400 area and the coherent color generation algorithm. The interface is clean, the copy feedback is clearly present, and the code is well-structured, although it uses a somewhat outdated copying method (execCommand) instead of the requested modern API; however, it works perfectly.

google gemini-flash-lite-latest

8.9/10 3.5 s

google gemini-flash-lite-latest

Tokens 1 138

Source code 1.9 KB

Time 3.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model respects almost all constraints, including the color consistency logic via HSL. The rendering is clean and the UX is fluid with the copy feedback. A minor drawback is the 400x400 area constraint, which is partially respected for the palette but not for the global container, and the HTML code is slightly malformed (orphaned </head> tag).

mistral mistral-large-latest

9.6/10 15.5 s

mistral mistral-large-latest

Tokens 1 236

Source code 3.3 KB

Time 15.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and functional constraints of the prompt. The color generation algorithm ensures true visual harmony, and the implementation of copy-paste with feedback is seamless. The code is clean, self-contained, and strictly follows the requested output format.

mistral mistral-medium-latest

5.5/10 14.0 s

mistral mistral-medium-latest

Tokens 2 176

Source code 7.0 KB

Time 14.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model failed on the major technical constraint: the code is truncated at the end (the `copyColor` function is incomplete), which renders the application non-functional (the copy button and generation logic are broken). Although the visual rendering is clean and adheres to the aesthetic guidelines, the absence of complete code for a code generation task is a critical fidelity error.

mistral mistral-small-latest

8.3/10 5.6 s

mistral mistral-small-latest

Tokens 1 036

Source code 2.5 KB

Time 5.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.25

Review

The model follows almost all instructions, including the harmony logic and the copy feedback. However, there is a major technical error in the `rgbToHex` function, which attempts to treat an HSL string as if it were RGB (matching on digits); this risks producing incorrect hex codes or causing the script to crash. Additionally, while the '400x400' zone constraint is visually respected, the code contains an orphan `</head>` closing tag without an opening `<head>` tag.

openai gpt-4o-mini

6.9/10 14.3 s

openai gpt-4o-mini

Tokens 1 151

Source code 2.8 KB

Time 14.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.88

Review

The model respects most of the functionalities (copy, regeneration, hex display), but fails on the 400x400 zone constraint by using a full-screen layout (100vh). The color generation algorithm is very basic and does not always guarantee good readability of white text on light colors, although the code is functional and clean.

openai gpt-5.4

9.6/10 22.8 s

openai gpt-5.4

Tokens 1 815

Source code 5.5 KB

Time 22.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the 400x400 area and the color harmony logic (analogous or complementary). The implementation of the copy-paste feature with visual feedback is robust, and the code is clean, well-structured, and self-contained.

openai gpt-5.4-mini

9.6/10 11.3 s

openai gpt-5.4-mini

Tokens 1 889

Source code 5.7 KB

Time 11.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model strictly adhered to all constraints, including the 400x400 area and the raw output format. The color generation algorithm (analogous or complementary) is well-implemented, providing genuine visual coherence. The interface is clean and modern, and the copy experience with the 'Copied!' feedback is smooth and intuitive.

openai gpt-5.4-nano

4.3/10 21.2 s

openai gpt-5.4-nano

Tokens 2 886

Source code 9.6 KB

Time 21.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.25

Review

The model failed critically on completeness and faithfulness because the code is truncated (it cuts off abruptly in the middle of a loop within the JavaScript function). Although the CSS styling and HTML structure are excellent and promising for the visual rendering, the application is completely non-functional because the script is incomplete.

openai gpt-5.4-pro

4.8/10 229.2 s

openai gpt-5.4-pro

Tokens 2 661

Source code 8.8 KB

Time 229.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.75

Review

The model generated code of very high visual and logical quality, but it was abruptly cut off before the end of the script (the code stops in the middle of a loop). As a result, the generation and copy functionality cannot work, leading to extremely low completeness and fidelity scores despite the clear intention to follow all rules.

openai gpt-5.5

9.6/10 18.4 s

openai gpt-5.5

Tokens 1 675

Source code 4.9 KB

Time 18.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the 400x400 area and the coherent palette generation algorithm. The interface is aesthetically pleasing, the code is clean, and the user experience (copy feedback, hover effects) is excellent.

openai gpt-5.5-pro

6.8/10 240.6 s

openai gpt-5.5-pro

Tokens 2 404

Source code 7.8 KB

Time 240.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.75

Review

The visual rendering and UX are excellent, featuring a modern design and well-thought-out interactions. However, the code is truncated at the end of the script (the file cuts off abruptly in the middle of a function), making the application completely inoperable: the regeneration button and the copy function do not work. Fidelity is penalized by this major technical break, which prevents the task from being completed.

productivia matania-latest

9.5/10 13.1 s

productivia matania-latest

Tokens 1 821

Source code 5.5 KB

Time 13.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all prompt constraints, including the 400x400 area and the coherent palette generation algorithm. The implementation of text contrast (YIQ) for hex code readability is an excellent technical addition. The code is clean, standalone, and functional.

xai grok-4-1-fast-non-reasoning

7.4/10 12.6 s

xai grok-4-1-fast-non-reasoning

Tokens 1 504

Source code 4.4 KB

Time 12.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

7.38

Review

The code is excellent, clean, and functional, featuring very good management of color harmony and user feedback (the 'Copied!' animation). However, the model failed on a major technical constraint in the prompt: the area must be 400x400, yet the rendering uses a flexible layout that occupies the full window height (100vh) without any size limits.

xai grok-4-1-fast-reasoning

9.6/10 48.8 s

xai grok-4-1-fast-reasoning

Tokens 1 666

Source code 5.0 KB

Time 48.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the 400x400 area and the coherent color generation algorithm (analogous and complementary). The interface is clean, the copy visual feedback is well-implemented, and the text readability management (contrast) is an excellent, unrequested addition.