Visuals

Static visuals (SVG/Canvas) judged on rendering quality and code.

5 tasks · 19 models tested · 95 results

Sheep SVG

svg

anthropic claude-haiku-4-5-20251001

5.9/10 7.2 s

anthropic claude-haiku-4-5-20251001

Tokens 1 674

Source code 2.7 KB

Time 7.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.88

Review

The visual rendering is excellent, very cute, and perfectly adheres to the aesthetic guidelines of simple shapes and solid colors. However, the model completely failed on the strict formatting constraints: it included markdown code blocks (```) even though it was explicitly asked to output ONLY raw SVG code, which violates the "ONLY raw SVG code" output instruction.

anthropic claude-opus-4-6

5.9/10 18.7 s

anthropic claude-opus-4-6

Tokens 1 983

Source code 3.3 KB

Time 18.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.88

Review

The visual rendering is excellent and the code is very clean, but the model completely failed on the strict formatting constraints. It included Markdown code blocks (```) despite being explicitly forbidden from doing so, which violates the instruction 'Output ONLY raw SVG code'.

anthropic claude-opus-4-7

5.0/10 14.5 s

anthropic claude-opus-4-7

Tokens 2 186

Source code 2.5 KB

Time 14.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is very successful—cute and faithful to the requested shapes. However, the model completely failed on the strict formatting constraints: it included Markdown code blocks (```) even though the prompt explicitly required 'ONLY raw SVG code' without any explanation or markdown, which is a critical error for an automation task.

anthropic claude-sonnet-4-6

5.9/10 16.0 s

anthropic claude-sonnet-4-6

Tokens 2 001

Source code 3.3 KB

Time 16.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.88

Review

The visual rendering is excellent, cute, and adheres perfectly to the artistic guidelines. However, the model failed on critical formatting constraints: it included Markdown code blocks (```) even though the prompt explicitly demanded raw SVG code without any explanation or markdown, which is a major fidelity error.

google gemini-flash-latest

4.9/10 17.4 s

google gemini-flash-latest

Tokens 1 257

Source code 2.0 KB

Time 17.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed completely on the strict formatting constraints: it included Markdown code blocks (```) and comments, even though the prompt explicitly required the exclusion of all explanations and markdown so that the response would begin directly with <svg. Visually, the sheep is cute and well-constructed, but the violation of the output rules is major.

google gemini-flash-lite-latest

4.9/10 3.0 s

google gemini-flash-lite-latest

Tokens 938

Source code 893 B

Time 3.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed completely on a critical formatting constraint: it included markdown code blocks (```) despite being explicitly forbidden from doing so. Although the drawing is correct and adheres to the design guidelines, the failure to follow the "Output ONLY raw SVG code" rule and the response start rule constitutes a major error for a pure code generation task.

mistral mistral-large-latest

4.5/10 8.4 s

mistral mistral-large-latest

Tokens 703

Source code 1.1 KB

Time 8.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.5

Review

The model failed on the most critical formatting constraint: it included Markdown code block tags (```), which contradicts the rule 'Output ONLY raw SVG code'. Although the drawing is correct and uses the requested shapes, the failure to follow the raw output instruction and the presence of Markdown text heavily penalize the faithfulness score.

mistral mistral-medium-latest

4.3/10 7.2 s

mistral mistral-medium-latest

Tokens 921

Source code 2.0 KB

Time 7.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.25

Review

The model failed on the most critical formatting constraints: it included Markdown code blocks (```) despite instructions for strict exclusion, and it added an orange background that was not requested. Although the SVG code is clean and the drawing is cute, the failure to follow output rules (no markdown) and the addition of an unsolicited element heavily penalize its faithfulness.

mistral mistral-small-latest

1.6/10 4.4 s

mistral mistral-small-latest

Tokens 734

Source code 1.3 KB

Time 4.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

1.63

Review

The model failed on almost all strict constraints: it included Markdown code blocks, failed to close the SVG tag properly (truncated code), and the visual rendering is inconsistent (a strange red circle and a structure that does not resemble a sheep). Fidelity is extremely low because the formatting and output structure instructions were not followed.

openai gpt-4o-mini

4.8/10 5.4 s

openai gpt-4o-mini

Tokens 586

Source code 709 B

Time 5.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.75

Review

The model failed on the most critical formatting constraint: it included markdown tags (```), which violates the rule 'Output ONLY raw SVG code'. Although the drawing is clean and respects the colors and shapes, the inclusion of unsolicited text makes the code unusable as-is for an automated process.

openai gpt-5.4

5.0/10 15.3 s

openai gpt-5.4

Tokens 1 072

Source code 2.5 KB

Time 15.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is very successful and cute, closely adhering to the requested aesthetic. However, the model failed completely on the strict formatting constraints: it included Markdown code blocks (```) and did not follow the 'RAW code ONLY' output rule, which is a major error given the weighting of the faithfulness metric.

openai gpt-5.4-mini

5.0/10 8.2 s

openai gpt-5.4-mini

Tokens 1 419

Source code 2.1 KB

Time 8.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is successful and aesthetic, adhering well to the shape and color instructions. However, the model failed completely on the strict formatting constraints: it included Markdown code blocks (```) and did not start its response directly with <svg, which violates the requested output rules.

openai gpt-5.4-nano

4.9/10 10.2 s

openai gpt-5.4-nano

Tokens 1 239

Source code 2.7 KB

Time 10.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed on a critical formatting constraint: it included Markdown code block tags (```) even though the prompt explicitly required 'ONLY raw SVG code'. While the drawing is aesthetically pleasing and adheres to the design guidelines, the inclusion of unrequested text renders the SVG file invalid as a raw file.

openai gpt-5.4-pro

5.0/10 336.2 s

openai gpt-5.4-pro

Tokens 1 208

Source code 3.1 KB

Time 336.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model failed completely on the strict formatting constraints: it included markdown code blocks (```) even though the prompt demanded the absolute exclusion of all text or markdown, and the response was supposed to start directly with <svg. Visually, the sheep is cute and well-composed, but the violation of the output rules is critical for an automated task.

openai gpt-5.5

5.0/10 15.2 s

openai gpt-5.5

Tokens 959

Source code 2.5 KB

Time 15.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual output is successful and the code is clean, but the model completely failed on the strict formatting constraints. It included Markdown code blocks (```) even though the prompt required 'ONLY raw SVG code' without any explanation or markdown, which is a critical failure in terms of instruction following.

openai gpt-5.5-pro

6.9/10 179.4 s

openai gpt-5.5-pro

Tokens 1 372

Source code 3.8 KB

Time 179.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.88

Review

The visual rendering is excellent, very cute, and perfectly adheres to the artistic guidelines (simple shapes, solid colors, fluffy appearance). However, the model failed on a critical formatting constraint: it included Markdown code blocks (```) even though the prompt explicitly required 'ONLY raw SVG code' and specified that the response must begin with <svg.

productivia matania-latest

5.0/10 6.0 s

productivia matania-latest

Tokens 844

Source code 1.7 KB

Time 6.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is successful and respects the requested aesthetic, but the model failed completely on the strict formatting constraints. The inclusion of Markdown code blocks (```) and comments is a direct violation of rules 6 and 7 of the prompt, justifying a very low faithfulness score.

xai grok-4-1-fast-non-reasoning

4.9/10 7.4 s

xai grok-4-1-fast-non-reasoning

Tokens 904

Source code 1.9 KB

Time 7.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed on a critical formatting constraint: it included markdown code blocks (```) despite being explicitly instructed not to. Although the SVG drawing is of good quality and adheres to the aesthetic guidelines, the violation of the raw output rule (Raw SVG) constitutes a major failure for a prompt of this type.

xai grok-4-1-fast-reasoning

4.8/10 30.0 s

xai grok-4-1-fast-reasoning

Tokens 1 016

Source code 2.4 KB

Time 30.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.75

Review

The model failed on the most critical formatting constraint: it included Markdown code blocks (```), which violates the rule 'Output ONLY raw SVG code'. Although the drawing is cute and correctly uses the requested shapes, the inclusion of superfluous text renders the SVG file invalid if used as is.

Sheep Canvas

html

anthropic claude-haiku-4-5-20251001

4.5/10 10.3 s

anthropic claude-haiku-4-5-20251001

Tokens 1 978

Source code 3.0 KB

Time 10.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.5

Review

The model failed completely on the most critical formatting constraint: it included Markdown code blocks (```), which was explicitly forbidden by rule 6. From a technical standpoint, the code correctly adheres to ES5 and Canvas methods, but the visual rendering is very rudimentary (a simple accumulation of circles) and does not harmoniously fill the 400x400 space.

anthropic claude-opus-4-6

4.0/10 40.7 s

anthropic claude-opus-4-6

Tokens 3 878

Source code 7.0 KB

Time 40.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model failed on the most critical output constraint: the code is truncated at the end (incomplete syntax), which prevents the script from executing. Although the drawing is aesthetically successful and adheres to the style guidelines (ES5, colors), the inability to provide complete code and the inclusion of markdown code blocks (contrary to the 'No ``` markdown' instruction) heavily penalize both fidelity and completeness.

anthropic claude-opus-4-7

5.0/10 19.4 s

anthropic claude-opus-4-7

Tokens 2 608

Source code 2.9 KB

Time 19.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is very successful and cute, adhering well to the requested aesthetic. However, the model completely failed on the strict formatting constraints: it included Markdown code blocks (```) even though the prompt demanded 'ONLY raw HTML code' and 'No markdown'. This major violation of the output instruction (fidelity) causes the overall score to plummet despite the quality of the drawing.

anthropic claude-sonnet-4-6

8.1/10 25.0 s

anthropic claude-sonnet-4-6

Tokens 3 078

Source code 5.8 KB

Time 25.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.13

Review

The visual rendering is excellent, very cute and well-composed. However, the model failed on several strict fidelity constraints: it included markdown (```) despite it being forbidden, and it used the `roundRect` method which was not in the list of authorized methods (point 4), which could cause issues in pure ES5 depending on the environment.

google gemini-flash-latest

5.8/10 22.0 s

google gemini-flash-latest

Tokens 1 209

Source code 1.4 KB

Time 22.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.75

Review

The model failed on a critical formatting constraint: it included Markdown code blocks (```) even though the prompt explicitly required 'ONLY raw HTML code'. Although the code is clean and adheres to the ES5 standard and the requested Canvas methods, the visual rendering is quite rudimentary and does not fill the space harmoniously (the sheep is merely an accumulation of overlapping circles).

google gemini-flash-lite-latest

4.9/10 3.2 s

google gemini-flash-lite-latest

Tokens 1 106

Source code 1.2 KB

Time 3.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed completely on the formatting constraint: it included Markdown code blocks (```) even though the prompt explicitly demanded ONLY raw HTML code, without any explanation or markdown. From a technical standpoint, the code is clean and adheres well to ES5 constraints and Canvas methods, but the structural error is a major violation of the strict instructions.

mistral mistral-large-latest

5.4/10 12.1 s

mistral mistral-large-latest

Tokens 917

Source code 2.3 KB

Time 12.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.38

Review

The model failed on a critical formatting constraint: it included Markdown code blocks (```) when the prompt explicitly required 'RAW HTML code ONLY'. From a technical standpoint, the code perfectly adheres to the requested ES5 constraints and Canvas methods, and the rendering is cute, although the random texture is a bit messy.

mistral mistral-medium-latest

4.4/10 9.1 s

mistral mistral-medium-latest

Tokens 1 066

Source code 2.6 KB

Time 9.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.38

Review

The model failed on a critical formatting constraint: it included markdown code blocks (```) despite being explicitly instructed to output ONLY raw HTML code. Visually, the rendering is very rudimentary and looks like nothing more than a cluster of ellipses, lacking any real "cute sheep" aesthetic. However, the code strictly adheres to the technical constraints of ES5 JavaScript and the requested use of Canvas methods.

mistral mistral-small-latest

4.5/10 5.4 s

mistral mistral-small-latest

Tokens 852

Source code 1.7 KB

Time 5.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.5

Review

The model failed completely on the most critical formatting constraint: it included markdown tags (```) and extra text, even though the prompt demanded 'ONLY raw HTML code' starting with <canvas. Although the code itself is of good quality and adheres to ES5 constraints, the inclusion of code blocks makes the output non-compliant with the strict output instructions.

openai gpt-4o-mini

4.0/10 5.5 s

openai gpt-4o-mini

Tokens 636

Source code 896 B

Time 5.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model failed on the most critical formatting constraint: it included Markdown code block tags (```) despite being strictly forbidden from doing so. Additionally, the drawing is poorly proportioned because the legs are drawn outside the canvas boundaries (y-coordinates of 370 with a height of 80, exceeding 400), and the colors do not perfectly adhere to the grey/dark face instruction. The code does, however, respect the ES5 constraints and the requested Canvas methods.

openai gpt-5.4

5.0/10 26.6 s

openai gpt-5.4

Tokens 1 748

Source code 5.2 KB

Time 26.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model failed on the most critical formatting constraints: it included markdown code blocks (```) and failed to follow the instruction to output ONLY raw code, which makes the code invalid as-is. However, from a technical standpoint, the code perfectly adheres to the requested ES5 constraints and Canvas methods, and the visual rendering is a cute, well-composed sheep.

openai gpt-5.4-mini

4.9/10 8.8 s

openai gpt-5.4-mini

Tokens 1 031

Source code 2.4 KB

Time 8.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.88

Review

The model failed on the most critical formatting constraint: it included Markdown code block tags (```) even though the prompt explicitly required 'ONLY raw HTML code' and 'No ``` markdown'. While the code is technically correct and adheres to ES5 constraints, the violation of the output structure is major.

openai gpt-5.4-nano

6.0/10 10.4 s

openai gpt-5.4-nano

Tokens 1 577

Source code 4.6 KB

Time 10.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is successful and cute, making good use of curves to achieve a fluffy look. However, the model failed on several critical formatting constraints: it included Markdown code blocks (```) even though the prompt required 'ONLY raw HTML code' and 'No markdown', and it added a wrapper function when the exact requested structure did not include a 'drawSheep()' function. Consequently, fidelity is heavily penalized despite the quality of the drawing.

openai gpt-5.4-pro

5.0/10 278.0 s

openai gpt-5.4-pro

Tokens 1 620

Source code 4.7 KB

Time 278.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model failed critically on formatting constraints: it included markdown code blocks (```) and began with introductory text, violating the strict output rules. However, the JavaScript code is high quality, adheres perfectly to ES5 constraints, and produces a cute, well-composed drawing of a sheep on the canvas.

openai gpt-5.5

5.1/10 40.0 s

openai gpt-5.5

Tokens 1 637

Source code 4.8 KB

Time 40.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.13

Review

The visual rendering is excellent and very cute, staying true to the requested aesthetic. However, the model completely failed on the strict formatting constraints: it included Markdown code blocks (```) even though the prompt explicitly demanded 'ONLY raw HTML code' and 'No ``` markdown'. This direct violation of the structural and output rules significantly lowers the fidelity score.

openai gpt-5.5-pro

5.5/10 405.1 s

openai gpt-5.5-pro

Tokens 1 685

Source code 5.0 KB

Time 405.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.5

Review

The code is technically excellent, scrupulously adhering to the ES5 constraints and the requested Canvas methods. However, the fidelity is very poor: the model included Markdown code blocks (```) even though the prompt explicitly required 'ONLY raw HTML code without any markdown,' which is a critical failure for this type of task. Furthermore, the drawing is off-center and appears incomplete or poorly proportioned relative to the 400x400 canvas.

productivia matania-latest

5.9/10 5.9 s

productivia matania-latest

Tokens 810

Source code 1.6 KB

Time 5.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.88

Review

The model failed on several critical formatting constraints: it included Markdown code blocks (```) when the prompt explicitly required raw HTML only, and it added an unsolicited background. However, the code perfectly adheres to the ES5 technical constraints and the use of Canvas methods, producing a visually satisfying sheep.

xai grok-4-1-fast-non-reasoning

4.8/10 11.8 s

xai grok-4-1-fast-non-reasoning

Tokens 877

Source code 1.8 KB

Time 11.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

4.75

Review

The model failed on a critical formatting constraint: it included Markdown code block tags (```) even though the prompt explicitly required 'RAW HTML code ONLY' and 'No ``` markdown'. Although the code itself is of high quality and adheres to ES5 constraints, the inclusion of this text renders the output invalid for direct integration, which heavily impacts the faithfulness score.

xai grok-4-1-fast-reasoning

5.9/10 85.5 s

xai grok-4-1-fast-reasoning

Tokens 1 145

Source code 2.9 KB

Time 85.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.88

Review

The model failed on a critical formatting constraint: it included markdown code blocks (```) when the prompt explicitly required 'ONLY raw HTML code'. On a technical level, the code adheres to the requested ES5 standards and Canvas methods, and the rendering is a correct interpretation of a sheep, although the aesthetics remain very geometric.

Landscape SVG

svg

anthropic claude-haiku-4-5-20251001

9.0/10 3.4 s

anthropic claude-haiku-4-5-20251001

Tokens 850

Source code 1.0 KB

Time 3.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adhered to all technical constraints: exact SVG format, no gradients, and output without superfluous text. All requested elements (hill, sun, two clouds, path, tree) are present and well-positioned. The rendering is clean and faithful, although the UX is neutral as it is a static image.

anthropic claude-opus-4-6

9.8/10 17.5 s

anthropic claude-opus-4-6

Tokens 1 671

Source code 2.8 KB

Time 17.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the raw output format. The composition is balanced, aesthetic, and includes all requested elements (hill, sun, clouds, path, tree) with beautiful attention to detail, such as the flowers and bushes.

anthropic claude-opus-4-7

9.1/10 6.6 s

anthropic claude-opus-4-7

Tokens 1 085

Source code 879 B

Time 6.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model strictly adheres to all technical constraints: first-line format, absence of gradients, raw output without text, and the requested content. The visual rendering is clean and balanced, albeit very minimalist. The UX score is neutral as it is a static image without interaction.

anthropic claude-sonnet-4-6

9.8/10 13.9 s

anthropic claude-sonnet-4-6

Tokens 1 514

Source code 2.4 KB

Time 13.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the raw output format. The composition is balanced, aesthetic, and includes all requested elements (hill, sun, clouds, path, tree), with extra attention paid to details such as the flowers. The code is clean, structured, and easy to read.

google gemini-flash-latest

9.9/10 13.5 s

google gemini-flash-latest

Tokens 783

Source code 847 B

Time 13.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, particularly the first-line format and the absence of superfluous text. The visual output is clean, balanced, and uses simple geometric shapes as requested (solid colors, no gradients). All elements from the prompt (hill, sun, two clouds, path, and tree) are present and well-positioned.

google gemini-flash-lite-latest

9.0/10 3.0 s

google gemini-flash-lite-latest

Tokens 679

Source code 577 B

Time 3.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adhered to all technical constraints, specifically the raw output format, the absence of gradients, and the requested SVG structure. All landscape elements (hill, sun, clouds, path, tree) are present and well-positioned. The code is clean and concise.

mistral mistral-large-latest

9.6/10 9.7 s

mistral mistral-large-latest

Tokens 593

Source code 1017 B

Time 9.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, particularly the first-line format and the absence of explanatory text. The visual output is clean, featuring simple shapes and solid colors as requested, creating a coherent and balanced landscape.

mistral mistral-medium-latest

8.9/10 6.6 s

mistral mistral-medium-latest

Tokens 550

Source code 846 B

Time 6.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model perfectly adhered to all technical constraints, particularly the strict output format and the absence of gradients. All requested elements (hill, sun, two clouds, path, tree) are present. The visual rendering is simple yet effective and well-structured.

mistral mistral-small-latest

8.8/10 5.5 s

mistral mistral-small-latest

Tokens 664

Source code 1.3 KB

Time 5.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.75

Review

The model perfectly adheres to all technical constraints (format, absence of text, solid colors, starting tag). All requested elements (hill, sun, two clouds, path, tree) are present. The visual rendering is simple yet functional, although the tree is very minimalist and the path's geometric structure is a bit strange.

openai gpt-4o-mini

9.0/10 4.4 s

openai gpt-4o-mini

Tokens 464

Source code 616 B

Time 4.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients, the first-line format, and the lack of explanatory text. All requested elements (hill, sun, clouds, path, tree) are present and well-positioned. The code is clean, compact, and functional.

openai gpt-5.4

9.0/10 5.9 s

openai gpt-5.4

Tokens 560

Source code 883 B

Time 5.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adhered to all technical constraints: the SVG format is correct, there is no superfluous text, no gradients, and all requested elements are present. The visual rendering is clean and well-composed, although the lack of interaction (which is natural for a static SVG) limits the UX score.

openai gpt-5.4-mini

9.9/10 5.0 s

openai gpt-5.4-mini

Tokens 708

Source code 1.0 KB

Time 5.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, particularly the first-line format and the absence of explanatory text. The visual output is clean, uses solid colors as requested, and the composition is well-balanced with all required elements (hill, sun, two clouds, path, and tree).

openai gpt-5.4-nano

9.8/10 8.0 s

openai gpt-5.4-nano

Tokens 902

Source code 1.4 KB

Time 8.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the formatting of the first line. The composition is balanced and aesthetic, and all requested elements (hill, sun, clouds, path, tree) are present and well-proportioned.

openai gpt-5.4-pro

9.0/10 240.1 s

openai gpt-5.4-pro

Tokens 597

Source code 1.0 KB

Time 240.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model strictly adheres to all technical constraints: exact SVG format, absence of gradients, and raw output without text. The composition is balanced, and all requested elements (hill, sun, clouds, path, tree) are present and visually well-rendered. The UX score is neutral as it is a static image with no possible interaction.

openai gpt-5.5

9.0/10 11.4 s

openai gpt-5.5

Tokens 677

Source code 1.4 KB

Time 11.4 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adhered to all technical constraints: the SVG format is correct, there are no explanations present, the colors are solid, and all requested elements (hill, sun, two clouds, path, tree) are included. The visual rendering is clean and well-composed, although the UX is neutral since it is a static image.

openai gpt-5.5-pro

9.9/10 112.6 s

openai gpt-5.5-pro

Tokens 605

Source code 1.0 KB

Time 112.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the format of the first line. The composition is balanced and aesthetic, and all requested elements (hill, sun, two clouds, path, tree) are present and well-rendered in pure SVG.

productivia matania-latest

9.6/10 4.1 s

productivia matania-latest

Tokens 605

Source code 1.0 KB

Time 4.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, specifically the first-line format, the absence of gradients, and the absence of explanatory text. The landscape is complete (hills, sun, clouds, path, tree) and the composition is balanced. The code is clean, well-structured, and uses simple SVG shapes as requested.

xai grok-4-1-fast-non-reasoning

8.1/10 4.2 s

xai grok-4-1-fast-non-reasoning

Tokens 560

Source code 884 B

Time 4.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.13

Review

The model adheres almost perfectly to all technical constraints, notably the absence of gradients and the raw output format. The visual rendering is clean, although the composition is a bit cluttered due to the presence of four clouds instead of the requested two, which slightly affects fidelity and completeness.

xai grok-4-1-fast-reasoning

9.6/10 16.7 s

xai grok-4-1-fast-reasoning

Tokens 576

Source code 949 B

Time 16.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model strictly adheres to all technical constraints: the exact SVG format, no gradients, and no textual explanations. All requested elements (hill, sun, clouds, path, tree) are present and well-positioned. The code is clean, and the visual composition is balanced.

Dashboard SVG

svg

anthropic claude-haiku-4-5-20251001

9.3/10 5.0 s

anthropic claude-haiku-4-5-20251001

Tokens 1 181

Source code 1.7 KB

Time 5.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all technical constraints, specifically the strict SVG format, the absence of gradients, and the exclusive use of <text>. The visual hierarchy is respected with a dominant temperature, although the sun's composition is slightly unbalanced in relation to the rest of the dashboard.

anthropic claude-opus-4-6

9.6/10 16.5 s

anthropic claude-opus-4-6

Tokens 1 749

Source code 3.3 KB

Time 16.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, including the strict output format and the absence of gradients. The visual rendering is clean, the hierarchy is respected, and the code is very well-structured, featuring the pertinent use of clipPath for the footer.

anthropic claude-opus-4-7

9.8/10 10.3 s

anthropic claude-opus-4-7

Tokens 1 707

Source code 2.0 KB

Time 10.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the exclusive use of <text>, the absence of gradients, and the mandated SVG format. The visual hierarchy is excellent, featuring a dominant temperature, and the aesthetic is clean and modern despite the simplicity of the shapes.

anthropic claude-sonnet-4-6

9.6/10 13.5 s

anthropic claude-sonnet-4-6

Tokens 1 667

Source code 3.0 KB

Time 13.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and aesthetic constraints of the prompt. The code is clean and structured, and the visual output scrupulously respects the requested hierarchy (dominant temperature, sun icon, forecasts, and footer). The use of solid colors and the <text> element complies with the strict rules imposed.

google gemini-flash-latest

9.6/10 11.0 s

google gemini-flash-latest

Tokens 1 393

Source code 2.1 KB

Time 11.0 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the required SVG format. The visual hierarchy is excellent, featuring a dominant temperature and a balanced composition. The code is clean, well-structured, and addresses the prompt point by point.

google gemini-flash-lite-latest

9.6/10 2.8 s

google gemini-flash-lite-latest

Tokens 895

Source code 1.1 KB

Time 2.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and stylistic constraints. The code is clean, strictly follows the requested format (no superfluous text), and the visual hierarchy is excellent, featuring a dominant temperature and a balanced composition. All requested elements (sun, forecasts, footer) are present and comply with the solid color rules.

mistral mistral-large-latest

9.6/10 10.7 s

mistral mistral-large-latest

Tokens 794

Source code 1.7 KB

Time 10.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and content constraints. The code is clean, structured, and respects the prohibition of gradients. The visual output is clear, maintaining a proper hierarchy and a polished aesthetic for pure SVG.

mistral mistral-medium-latest

9.6/10 10.5 s

mistral mistral-medium-latest

Tokens 739

Source code 1.4 KB

Time 10.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical and content constraints. The code is clean, well-structured, and strictly respects the prohibition of gradients. The visual output is clear, maintaining a proper hierarchy and an effective minimalist aesthetic.

mistral mistral-small-latest

7.8/10 3.5 s

mistral mistral-small-latest

Tokens 676

Source code 1.2 KB

Time 3.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

7.75

Review

The model perfectly adheres to the content and style constraints (solid colors, no gradients, visual hierarchy). However, there is a major syntax error in the source code: the closing tag is </svg (the '>' is missing), which prevents valid rendering in many environments. The visual composition is correct, but the sun icon is very rudimentary.

openai gpt-4o-mini

8.6/10 10.9 s

openai gpt-4o-mini

Tokens 593

Source code 977 B

Time 10.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.63

Review

The model strictly adheres to all technical constraints and the requested content. The code is clean and minimalist. However, the visual output is very rudimentary and lacks true aesthetic composition (the elements are simply stacked in a very vertical manner), but it perfectly fulfills the requested mission of a 'mini dashboard.'

openai gpt-5.4

9.8/10 16.2 s

openai gpt-5.4

Tokens 1 369

Source code 3.9 KB

Time 16.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and aesthetic constraints. The code is clean, well-structured, and strictly follows the instruction to avoid using gradients. The visual output is professional, featuring a clear hierarchy and excellent readability.

openai gpt-5.4-mini

9.8/10 15.8 s

openai gpt-5.4-mini

Tokens 1 260

Source code 3.5 KB

Time 15.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and creative constraints. The code is clean, uses exclusively SVG shapes and text as requested, and the visual hierarchy is excellent, featuring a highly legible dominant color temperature. The aesthetic is modern and minimalist, respecting the prohibition of gradients.

openai gpt-5.4-nano

9.5/10 8.6 s

openai gpt-5.4-nano

Tokens 1 155

Source code 3.4 KB

Time 8.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical and visual constraints. The code is clean and well-structured with transformation groups, and the rendering strictly follows the requested hierarchy (dominant temperature, sun icon, forecasts, and footer). The use of solid colors and the <text> tag complies with the prompt's strict rules.

openai gpt-5.4-pro

9.5/10 358.3 s

openai gpt-5.4-pro

Tokens 1 801

Source code 5.6 KB

Time 358.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, including the strict SVG format, the absence of gradients, and the use of <text> tags. The visual rendering is clean, featuring a clear hierarchy and a modern aesthetic that exceeds the prompt's minimum expectations.

openai gpt-5.5

9.8/10 13.7 s

openai gpt-5.5

Tokens 1 100

Source code 2.9 KB

Time 13.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the exclusive use of <text>. The visual hierarchy is excellent, featuring a dominant color temperature and a very clean, professional layout.

openai gpt-5.5-pro

9.8/10 96.2 s

openai gpt-5.5-pro

Tokens 1 090

Source code 2.8 KB

Time 96.2 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and aesthetic constraints. The visual hierarchy is excellent, the use of solid colors is consistent with the prompt, and the SVG code is clean, structured, and free of any gradients or filters. The final output is professional and complete.

productivia matania-latest

9.6/10 5.6 s

productivia matania-latest

Tokens 718

Source code 1.7 KB

Time 5.6 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients, the use of <text>, and the strict output format. The visual output is clean, featuring a clear hierarchy and a balanced composition that matches the request exactly.

xai grok-4-1-fast-non-reasoning

9.3/10 6.7 s

xai grok-4-1-fast-non-reasoning

Tokens 908

Source code 2.1 KB

Time 6.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model strictly adheres to all technical and content constraints, including the required SVG format, solid colors, and visual hierarchy. The output is clean and legible, although the choice of pink for the background color is a bit visually aggressive. The code is perfectly structured and meets all expectations.

xai grok-4-1-fast-reasoning

9.5/10 25.7 s

xai grok-4-1-fast-reasoning

Tokens 967

Source code 2.4 KB

Time 25.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, specifically the absence of gradients and the use of solid colors. The visual hierarchy is excellent, featuring a dominant temperature and a clean, well-structured SVG output. All requested data (temperature, icon, forecast, and footer) are present and correctly arranged.

Mandala Canvas

html

anthropic claude-haiku-4-5-20251001

9.0/10 5.8 s

anthropic claude-haiku-4-5-20251001

Tokens 1 403

Source code 2.3 KB

Time 5.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model perfectly adheres to all technical constraints: the imposed HTML structure, the use of ES5 JavaScript, the color limitations, and compliance with the raw output format. The visual rendering is aesthetic and well-centered, with a radius exceeding 180px. The UX is neutral, as it is a static creation without any requested interaction.

anthropic claude-opus-4-6

9.9/10 28.3 s

anthropic claude-opus-4-6

Tokens 2 570

Source code 5.4 KB

Time 28.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints: strict HTML structure, use of ES5 JavaScript (var), and compliance with dimensions. The mandala is visually rich and complex, harmoniously utilizing the four authorized colors. The code is clean, modular, and very well-structured.

anthropic claude-opus-4-7

9.9/10 15.5 s

anthropic claude-opus-4-7

Tokens 2 170

Source code 2.5 KB

Time 15.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, specifically the use of ES5, the specific HTML structure, and the output format. The visual rendering is aesthetic, featuring a harmonious color palette and a complex radial symmetry that effectively fills the canvas.

anthropic claude-sonnet-4-6

9.1/10 17.9 s

anthropic claude-sonnet-4-6

Tokens 2 025

Source code 3.6 KB

Time 17.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model perfectly adhered to all technical constraints: use of ES5, the mandated HTML structure, compliance with the output format, and the specified number of colors. The visual rendering is magnificent, complex, and fills the canvas well with a successful radial symmetry. Note: the UX is neutral since this is a static image with no requested interaction, but the score is average by default due to the lack of dynamism.

google gemini-flash-latest

9.0/10 13.1 s

google gemini-flash-latest

Tokens 1 098

Source code 1.7 KB

Time 13.1 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model strictly adheres to all technical constraints: the mandated HTML structure, the use of ES5, the number of segments, and the mandala radius. The visual output is clean and aesthetic, featuring a coherent color palette. The UX is rated 5 because it is a static image without interaction; while this complies with the prompt, it limits the user experience.

google gemini-flash-lite-latest

8.6/10 2.7 s

google gemini-flash-lite-latest

Tokens 913

Source code 1.3 KB

Time 2.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.63

Review

The model perfectly adhered to all strict technical constraints (structure, ES5, output format, dimensions, colors). The visual rendering is clean and symmetrical, although the design is relatively simple for a mandala. The lack of interactivity is expected since nothing was requested, which explains the neutral UX score.

mistral mistral-large-latest

5.5/10 15.5 s

mistral mistral-large-latest

Tokens 890

Source code 2.0 KB

Time 15.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model failed on a critical technical constraint: the color '#d7327' is invalid (it is missing a hexadecimal character), which could break the rendering or lead to unpredictable behavior. Although the code structure adheres to ES5 and the requested format, the visual output is mathematically confusing and fails to produce a coherent, aesthetic mandala. Fidelity is penalized by this syntax error in the requested color data.

mistral mistral-medium-latest

8.9/10 9.3 s

mistral mistral-medium-latest

Tokens 876

Source code 2.0 KB

Time 9.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model strictly adheres to all technical constraints: HTML structure, ES5 syntax, no superfluous text, and the use of 12 symmetry segments. The visual rendering is clean and harmonious, although the geometric complexity remains relatively simple. The lack of interactivity is expected since the prompt did not require it, which explains the neutral UX score.

mistral mistral-small-latest

6.3/10 3.8 s

mistral mistral-small-latest

Tokens 777

Source code 1.6 KB

Time 3.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

6.25

Review

The model strictly adheres to the technical constraints (ES5, HTML structure, canvas ID) and formatting instructions. However, the drawing algorithm is logically flawed: it draws complete circles within a rotation loop, which unnecessarily overlaps entire layers at each segment, resulting in a flat rendering that lacks radial symmetry (resembling simple concentric circles rather than a complex mandala). Fidelity is penalized by this geometric design flaw despite the correct adherence to syntax rules.

openai gpt-4o-mini

8.3/10 8.8 s

openai gpt-4o-mini

Tokens 595

Source code 881 B

Time 8.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

8.25

Review

The model strictly adheres to all technical constraints (ES5, exact structure, output format, dimensions). However, the visual rendering is very poor: the algorithm draws overlapping circular arcs in a disorganized manner instead of creating a harmonious geometric mandala structure, resulting in an appearance of colored blotches rather than complex symmetrical patterns.

openai gpt-5.4

9.1/10 14.9 s

openai gpt-5.4

Tokens 1 187

Source code 3.2 KB

Time 14.9 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model perfectly adhered to all technical constraints: the imposed HTML structure, strict use of ES5 (var), compliance with the output format, and the required levels of symmetry. The visual rendering is aesthetic, featuring a harmonious color palette and a composition that fills the canvas well. The UX is neutral since it is a static image, but the result meets the expectations of a geometric mandala.

openai gpt-5.4-mini

9.1/10 10.7 s

openai gpt-5.4-mini

Tokens 1 220

Source code 3.3 KB

Time 10.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model perfectly adhered to all technical constraints: use of ES5, the imposed code structure, multiple symmetries (8, 12, and 16 segments), and compliance with the minimum radius. The visual output is aesthetic and harmonious. The UX is rated 5 because it is a static image without interaction, which complies with the request but limits the user experience.

openai gpt-5.4-nano

9.4/10 14.7 s

openai gpt-5.4-nano

Tokens 1 625

Source code 4.9 KB

Time 14.7 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.38

Review

The model perfectly adhered to all technical constraints: exact HTML structure, use of ES5 JavaScript, compliance with the minimum radius, and color limitations. The visual output is complex, aesthetic, and makes judicious use of the requested radial symmetry. The code is clean, well-structured, and functional.

openai gpt-5.4-pro

9.8/10 252.5 s

openai gpt-5.4-pro

Tokens 1 331

Source code 3.7 KB

Time 252.5 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints: the use of ES5 (var), the exact canvas structure, and the raw output format. The rendering is aesthetically superb, featuring complex radial symmetry and a harmonious color palette that perfectly fills the space.

openai gpt-5.5

9.8/10 22.3 s

openai gpt-5.5

Tokens 1 165

Source code 3.1 KB

Time 22.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, particularly the use of ES5 JavaScript and the requested strict structure. The visual output is magnificent and complex, utilizing a highly harmonious color palette that respects the 4-hue limit. The mandala fills the canvas perfectly with sophisticated radial symmetry.

openai gpt-5.5-pro

9.9/10 305.3 s

openai gpt-5.5-pro

Tokens 1 183

Source code 3.2 KB

Time 305.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, specifically the use of ES5, the required HTML structure, and the absence of superfluous text. The visual output is aesthetic, featuring a harmonious color palette and a complex radial symmetry that effectively fills the canvas. The code is clean, modular, and very well-structured.

productivia matania-latest

9.9/10 5.8 s

productivia matania-latest

Tokens 903

Source code 2.2 KB

Time 5.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

9.88

Review

The model perfectly adhered to all technical constraints, specifically the use of ES5 JavaScript and the required HTML structure. The mandala is aesthetically successful, featuring three distinct levels of radial symmetry (12, 16, and 8 segments) and a harmonious color palette. The visual rendering is clean, centered, and fills the designated space well.

xai grok-4-1-fast-non-reasoning

9.0/10 4.8 s

xai grok-4-1-fast-non-reasoning

Tokens 954

Source code 2.3 KB

Time 4.8 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model scrupulously adheres to all technical constraints: the imposed HTML structure, the use of ES5 JavaScript, and the raw output format. The visual rendering is aesthetic, featuring complex radial symmetry and harmonious colors. The UX is neutral since it is a static image, which is consistent with the request to generate a mandala.

xai grok-4-1-fast-reasoning

9.0/10 23.3 s

xai grok-4-1-fast-reasoning

Tokens 785

Source code 1.6 KB

Time 23.3 s

Matania Judgment

Rendering

Code quality

Completeness

Fidelity

Overall

Review

The model strictly adheres to all technical constraints: exact HTML structure, use of ES5, raw output format, and mandala dimensions (the 195px radius was respected). The visual rendering is aesthetic and well-structured, incorporating the requested symmetry levels. The UX is neutral since it is a static image, but the execution meets all expectations.