Presentations

Navigable multi-slide HTML presentations, judged on layout and navigation.

5 tasks · 19 models tested · 95 results

Startup pitch

html

anthropic claude-haiku-4-5-20251001

9.8/10 12.4 s

anthropic claude-haiku-4-5-20251001

Tokens 2 669

Source code 5.6 KB

Time 12.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the container size (400x400), keyboard and button navigation, and the position indicator. The visual rendering is professional, featuring excellent typographic hierarchy and a modern, clean design.

anthropic claude-opus-4-6

9.8/10 55.3 s

anthropic claude-opus-4-6

Tokens 5 528

Source code 13.6 KB

Time 55.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the 400x400 container and the complete navigation system (keyboard + buttons + indicator). The visual rendering is exceptional for pure HTML/CSS, featuring polished typographic hierarchy and modern gradients that capture the aesthetic of a startup pitch.

anthropic claude-opus-4-7

9.8/10 22.3 s

anthropic claude-opus-4-7

Tokens 2 609

Source code 4.2 KB

Time 22.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the specific container dimensions (400x400) and the complex navigation mechanisms. The visual rendering is professional, featuring excellent typographic hierarchy and a modern design. The code is concise, efficient, and fully functional.

anthropic claude-sonnet-4-6

9.8/10 46.6 s

anthropic claude-sonnet-4-6

Tokens 5 219

Source code 12.4 KB

Time 46.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, specifically the 400x400 container and the complex navigation (keyboard + buttons + indicator). The visual rendering is high quality, featuring polished typography and a clear hierarchy typical of a professional pitch. The code is clean and well-structured, although truncated in the provided snippet (which is normal for a code test).

google gemini-flash-latest

9.6/10 10.2 s

google gemini-flash-latest

Tokens 2 025

Source code 4.2 KB

Time 10.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all prompt constraints, specifically the 400x400 format, keyboard and button navigation, and the exact number of slides. The visual output is clean and professional with a clear hierarchy, although the design remains quite minimalist.

google gemini-flash-lite-latest

9.3/10 3.6 s

google gemini-flash-lite-latest

Tokens 1 042

Source code 2.1 KB

Time 3.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all technical and functional constraints of the prompt, including keyboard navigation, buttons, the position indicator, and the 400x400 container. The code is clean and efficient. The visual output is functional but very minimalist, lacking the true typographic and design "polish" required for a professional pitch, although it does respect the structural requirements.

mistral mistral-large-latest

9.6/10 25.6 s

mistral mistral-large-latest

Tokens 1 775

Source code 5.6 KB

Time 25.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints, including the container size (400x400), keyboard and button navigation, and the indicator format. The code is clean and semantic, and the visual rendering is professional despite the small size of the presentation area.

mistral mistral-medium-latest

9.1/10 36.6 s

mistral mistral-medium-latest

Tokens 4 260

Source code 15.1 KB

Time 36.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model perfectly adhered to all technical constraints: the container is indeed 400x400, arrow and button navigation is implemented with a position indicator, and the design is clean. The code is slightly truncated at the end of the provided block, but the logical structure is excellent and aligns with the requested pitch.

mistral mistral-small-latest

9.6/10 41.5 s

mistral mistral-small-latest

Tokens 1 223

Source code 3.3 KB

Time 41.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all technical constraints: the container is indeed 400x400, both keyboard and button navigation are functional, and the position indicator is present. The visual rendering is clean, minimalist, and professional, featuring a clear typographic hierarchy despite the simplicity of the code.

openai gpt-4o-mini

8.8/10 19.2 s

openai gpt-4o-mini

Tokens 1 204

Source code 3.3 KB

Time 19.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.75

Review

The model perfectly adheres to all technical and functional constraints (navigation, number of slides, 400x400 container, indicator). However, the visual output is extremely minimalist and lacks the requested "polished typography" and "clear visual hierarchy," settling instead for a very basic, raw HTML style.

openai gpt-5.4

6.9/10 39.8 s

openai gpt-5.4

Tokens 2 923

Source code 9.9 KB

Time 39.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

6.88

Review

The visual rendering is excellent, featuring polished typography and a clear hierarchy that perfectly respects the 400x400 container. However, the code is truncated (incomplete), which prevents the presentation from reaching the requested 5 slides and makes it impossible to navigate through the entire content. Fidelity is heavily penalized by this abrupt cutoff of the source code.

openai gpt-5.4-mini

2.4/10 13.5 s

openai gpt-5.4-mini

Tokens 2 535

Source code 8.4 KB

Time 13.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

2.38

Review

The model failed critically: the source code is truncated (it cuts off abruptly in the middle of a tag), making the HTML file invalid and non-functional. Consequently, navigation, the completeness of the 5 slides, and the code structure are either entirely missing or broken. Only the partial visual rendering of the first slide is acceptable.

openai gpt-5.4-nano

9.8/10 46.4 s

openai gpt-5.4-nano

Tokens 5 258

Source code 19.1 KB

Time 46.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, including the 400x400 container, arrow and button navigation, and the position indicator. The visual rendering is extremely professional, featuring excellent typographic hierarchy and a very polished dark mode design. The code is clean, well-structured, and utilizes smooth transitions.

openai gpt-5.4-pro

9.8/10 173.0 s

openai gpt-5.4-pro

Tokens 3 421

Source code 11.9 KB

Time 173.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the specific 400x400 container size and the navigation requirements (arrows, buttons, and indicator). The visual hierarchy and typography are professional and polished, looking like a real startup pitch. The code is well-structured and highly efficient for a self-contained file.

openai gpt-5.5

8.4/10 26.6 s

openai gpt-5.5

Tokens 1 976

Source code 6.3 KB

Time 26.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.38

Review

The visual rendering is excellent, featuring polished typography and a clear hierarchy that respects the 400x400 container perfectly. The navigation (buttons, indicators, arrows) complies with the instructions. However, the code_quality score is severely impacted because the script is truncated at the end (incomplete code), which would render the navigation functionality inactive in a real-world environment despite the correct HTML structure.

openai gpt-5.5-pro

2.6/10 309.3 s

openai gpt-5.5-pro

Tokens 2 454

Source code 8.1 KB

Time 309.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

2.63

Review

The model failed critically by providing truncated code that stops midway through the 4th slide, leaving the presentation incomplete and non-functional. Navigation is impossible because the JavaScript code and the closing HTML tags are missing, and the requested number of slides (5) was not met. While the visual rendering of the first few slides is high quality, the structural failure and failure to meet completeness constraints heavily penalize the score.

productivia matania-latest

9.8/10 10.1 s

productivia matania-latest

Tokens 1 427

Source code 4.2 KB

Time 10.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical constraints, including the 400x400 container, keyboard and button navigation, and the position indicator. The design is sleek, professional, and respects a clear visual hierarchy. The code is clean, well-structured, and self-contained.

xai grok-4-1-fast-non-reasoning

9.8/10 8.8 s

xai grok-4-1-fast-non-reasoning

Tokens 1 732

Source code 5.4 KB

Time 8.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and content constraints. The design is modern, featuring excellent visual hierarchy and polished typography that integrates well within the 400x400 container. The navigation is comprehensive (buttons, indicator, directional keys) and the code is clean and efficient.

xai grok-4-1-fast-reasoning

9.8/10 20.5 s

xai grok-4-1-fast-reasoning

Tokens 1 481

Source code 4.4 KB

Time 20.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all technical and structural constraints. Navigation is complete (keyboard, buttons, indicator), the container respects the 400x400 dimensions, and the design is clean with excellent visual hierarchy. The code is clean, efficient, and free of unnecessary duplication.

Recursion course

html

anthropic claude-haiku-4-5-20251001

9.8/10 12.6 s

anthropic claude-haiku-4-5-20251001

Tokens 2 429

Source code 5.7 KB

Time 12.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 format and keyboard navigation. The output is clean and educational, and the source code is well-structured with effective manual syntax highlighting. The position indicator and buttons function exactly as requested.

anthropic claude-opus-4-6

9.6/10 34.4 s

anthropic claude-opus-4-6

Tokens 2 986

Source code 6.5 KB

Time 34.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model followed all instructions perfectly, including the strict 400x400 constraint and the specific slide topics. The visual design is professional, featuring excellent syntax highlighting and a clear hierarchy. Note: Although the code snippet provided in the prompt was truncated at the end, the logic and structure are complete and functional based on the visible implementation.

anthropic claude-opus-4-7

9.8/10 19.7 s

anthropic claude-opus-4-7

Tokens 2 432

Source code 3.9 KB

Time 19.7 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

Le modèle a suivi chaque instruction à la perfection. La présentation est contenue dans un conteneur de 400x400, inclut les 4 diapositives demandées, implémente une navigation au clavier et par boutons, et propose un bloc de code avec coloration syntaxique bien stylisé. Le design visuel est épuré, pédagogique et professionnel.

anthropic claude-sonnet-4-6

9.8/10 28.5 s

anthropic claude-sonnet-4-6

Tokens 2 964

Source code 6.5 KB

Time 28.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly followed all constraints, including the strict 400x400 size limit and the specific slide content requirements. The visual design is professional, featuring excellent typography and a well-implemented syntax-highlighted code block. Navigation is complete, including buttons, keyboard support (implied in the logic), and the requested 'X/4' indicator.

google gemini-flash-latest

9.8/10 10.2 s

google gemini-flash-latest

Tokens 1 606

Source code 3.7 KB

Time 10.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 size, keyboard navigation, and slide format. The code is clean and well-structured, and the visual output is clear, featuring effective syntax highlighting for the code block.

google gemini-flash-lite-latest

9.4/10 3.6 s

google gemini-flash-lite-latest

Tokens 936

Source code 1.7 KB

Time 3.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.38

Review

The model followed all instructions perfectly, including the specific slide topics, the 400x400 constraint, and the navigation requirements (keyboard + buttons + indicator). The code is concise and functional, though the visual styling is extremely minimal/brutalist due to the small container size.

mistral mistral-large-latest

9.6/10 12.1 s

mistral mistral-large-latest

Tokens 1 000

Source code 2.5 KB

Time 12.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model followed all instructions perfectly, including the specific constraints on slide content, navigation methods, and dimensions. The code is clean and functional, and the visual rendering is polished and professional despite the minimal design. The syntax highlighting for the code block was implemented exactly as requested.

mistral mistral-medium-latest

5.3/10 16.2 s

mistral mistral-medium-latest

Tokens 2 466

Source code 8.2 KB

Time 16.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.25

Review

The model failed the completeness and fidelity constraints because the source code is truncated at the end (the file cuts off abruptly in the middle of slide 4), rendering the presentation incomplete. Although the visual rendering of the initial slides is clean and adheres to the size and style constraints, the inability to display the final slide and the absence of a functional JavaScript script (either not visible or not included in the snippet) for keyboard/button navigation heavily penalize the score.

mistral mistral-small-latest

9.1/10 7.4 s

mistral mistral-small-latest

Tokens 1 367

Source code 3.9 KB

Time 7.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.13

Review

The model perfectly adhered to all prompt constraints, including the number of slides, the 400x400 format, keyboard navigation, and the educational content. The visual output is clean and effective. However, the code quality is penalized by unnecessary duplication of control elements (buttons with different IDs for each slide instead of a single controller), which makes maintenance difficult.

openai gpt-4o-mini

9.4/10 10.9 s

openai gpt-4o-mini

Tokens 1 176

Source code 3.0 KB

Time 10.9 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.38

Review

The model followed all instructions perfectly, including the specific constraints on dimensions, slide content, and navigation features. The code is functional, and the syntax highlighting for the factorial example is implemented as requested. The HTML structure is slightly malformed (missing opening tags like `<head>` and `<html>`, and using `</div` instead of `</div>`), but it remains a self-contained and working file.

openai gpt-5.4

9.8/10 20.0 s

openai gpt-5.4

Tokens 1 580

Source code 4.6 KB

Time 20.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly followed all instructions, including the strict 400x400 constraint and the specific slide content. The code syntax highlighting is manually implemented via spans, which is a clever way to keep the file self-contained, and the navigation (keyboard, buttons, and counter) is fully functional and robust.

openai gpt-5.4-mini

9.8/10 10.4 s

openai gpt-5.4-mini

Tokens 1 489

Source code 4.3 KB

Time 10.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model followed every instruction perfectly, including the strict 400x400 constraint and the specific slide content. The syntax highlighting is manually implemented via spans, which is a clever way to create a self-contained file, and the navigation (buttons, keyboard, and indicator) works flawlessly. The visual design is clean, professional, and fits the pedagogical purpose.

openai gpt-5.4-nano

9.5/10 16.2 s

openai gpt-5.4-nano

Tokens 2 007

Source code 6.3 KB

Time 16.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.5

Review

Le modèle a respecté parfaitement toutes les instructions, y compris la contrainte spécifique de 400x400, la structure en 4 diapositives et la coloration syntaxique. Le design visuel est professionnel et épuré, s'adaptant parfaitement au thème pédagogique. À noter que le code source fourni semble tronqué à la fin du script, mais la logique fonctionnelle de la navigation est clairement implémentée et visible dans la structure.

openai gpt-5.4-pro

9.8/10 275.7 s

openai gpt-5.4-pro

Tokens 1 753

Source code 5.3 KB

Time 275.7 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 size and keyboard navigation. The visual rendering is clean and professional, and the syntax highlighting is manually implemented to maintain the 'self-contained' aspect. The pedagogical structure is clear and concise.

openai gpt-5.5

9.8/10 15.5 s

openai gpt-5.5

Tokens 1 364

Source code 3.9 KB

Time 15.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 size, keyboard navigation, and the position indicator. The source code is clean and semantic, and the visual rendering is professional, featuring excellent manual syntax highlighting management.

openai gpt-5.5-pro

9.8/10 203.5 s

openai gpt-5.5-pro

Tokens 1 667

Source code 5.0 KB

Time 203.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 format and the complex navigation. The design is modern, educational, and very clean, featuring well-implemented syntax highlighting. Both keyboard and button navigation work exactly as requested.

productivia matania-latest

9.8/10 7.6 s

productivia matania-latest

Tokens 1 168

Source code 3.4 KB

Time 7.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model followed every single instruction perfectly, including the strict 400x400 size constraint and the specific slide content. The syntax highlighting is well-implemented via manual spans, and the navigation (buttons, keyboard, and counter) works exactly as requested. The visual hierarchy and dark theme are professional and clean.

xai grok-4-1-fast-non-reasoning

9.8/10 21.2 s

xai grok-4-1-fast-non-reasoning

Tokens 1 147

Source code 3.1 KB

Time 21.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the container size (360x360 to respect the 400x400 limit), keyboard/button navigation, and the position indicator. The code is clean, semantic, and the syntax highlighting is efficiently implemented manually. The visual rendering is professional, readable, and educational.

xai grok-4-1-fast-reasoning

9.8/10 63.4 s

xai grok-4-1-fast-reasoning

Tokens 1 364

Source code 3.9 KB

Time 63.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 dimension, the number of slides, and the navigation features (keyboard and buttons). The code is clean, the visual output is neat and educational, and the syntax highlighting is correctly implemented manually as requested.

Quarterly report

html

anthropic claude-haiku-4-5-20251001

9.5/10 18.1 s

anthropic claude-haiku-4-5-20251001

Tokens 4 362

Source code 14.1 KB

Time 18.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all technical constraints, including the 400x400 format and the inclusion of a CSS bar chart. The output is professional, clean, and respects the requested business aesthetic with a consistent color palette. The code structure is semantic and well-organized.

anthropic claude-opus-4-6

9.6/10 48.3 s

anthropic claude-opus-4-6

Tokens 4 959

Source code 12.0 KB

Time 48.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all prompt constraints, including the 400x400 size and the professional business style. The visual output is excellent, featuring clear hierarchy, a consistent color palette, and the successful implementation of the CSS bar chart. The navigation and position indicators are both functional and elegant.

anthropic claude-opus-4-7

8.3/10 32.2 s

anthropic claude-opus-4-7

Tokens 3 634

Source code 6.4 KB

Time 32.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.25

Review

The visual rendering is excellent, perfectly adhering to the professional business aesthetic and the dimension constraints (400x400). However, the source code is truncated at the end (the JavaScript script is incomplete), which prevents the navigation from actually working, thereby impacting technical completeness. Prompt fidelity is very high, respecting all requested content and structural constraints.

anthropic claude-sonnet-4-6

9.6/10 53.2 s

anthropic claude-sonnet-4-6

Tokens 6 297

Source code 15.2 KB

Time 53.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all constraints, including the restricted 400x400 size and the creation of the chart using CSS. The output is extremely professional, featuring a consistent color palette and clear visual hierarchy. The navigation is functional, and the full set of 5 requested slides is present and well-structured.

google gemini-flash-latest

9.8/10 11.1 s

google gemini-flash-latest

Tokens 2 172

Source code 5.7 KB

Time 11.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the area size (400x400), the number of slides, and the inclusion of the CSS chart. The design is professional, sleek, and respects the requested business aesthetic. The code is clean, functional, and self-contained.

google gemini-flash-lite-latest

8.4/10 3.7 s

google gemini-flash-lite-latest

Tokens 992

Source code 1.8 KB

Time 3.7 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.38

Review

The model adheres to almost all constraints, notably the 400x400 format and the inclusion of a CSS bar chart. However, the 'Regions' slide contains only 2 zones instead of the requested 4, which impacts fidelity. The aesthetics are very basic and lack the expected 'professional' touch, but the code is functional and compact.

mistral mistral-large-latest

2.8/10 38.8 s

mistral mistral-large-latest

Tokens 2 505

Source code 8.4 KB

Time 38.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

2.75

Review

The model failed critically by generating truncated code (the file cuts off abruptly in the middle of the HTML), rendering the presentation non-functional and incomplete. Navigation and the final slides are entirely missing from the provided code, and the 'ONLY the raw HTML' output constraint is not met because the code is incomplete. While the partial visual rendering is correct, the inability to use the object causes all functionality scores to plummet.

mistral mistral-medium-latest

5.5/10 31.9 s

mistral mistral-medium-latest

Tokens 3 907

Source code 13.8 KB

Time 31.9 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model failed significantly on completeness and fidelity, as the provided code is truncated and incomplete, preventing the full presentation from being rendered. While the code structure and visual style (CSS) are professional and follow aesthetic principles, the output is a partial snippet rather than a functional 5-slide presentation. The 'fidelity' score is heavily penalized because the core requirement of a self-contained 5-slide HTML document was not met due to the cutoff.

mistral mistral-small-latest

9.4/10 11.4 s

mistral mistral-small-latest

Tokens 1 855

Source code 5.8 KB

Time 11.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.38

Review

The model perfectly adhered to all prompt constraints, including the 400x400 format, the number of slides, the CSS chart, and the arrow navigation. The code is clean and functional. The visual output is professional, though it feels a bit cramped within the restricted 400px space, which slightly undermines the visual hierarchy of the KPI slide.

openai gpt-4o-mini

8.9/10 17.2 s

openai gpt-4o-mini

Tokens 1 377

Source code 3.9 KB

Time 17.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model strictly adheres to all structural and functional constraints of the prompt (5 slides, CSS charts, navigation, 400x400 format). The visual output is very basic and lacks a true "professional" aesthetic (standard typography, lack of spacing), but the technical implementation is perfectly compliant with the requirements.

openai gpt-5.4

3.1/10 44.1 s

openai gpt-5.4

Tokens 2 928

Source code 9.9 KB

Time 44.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

3.13

Review

The model failed critically to provide complete code: the file is truncated halfway through the second slide, rendering the presentation unusable. Consequently, navigation and completeness scores are near zero. Although the partial visual output is high quality and adheres to the requested aesthetic, the inability to complete the code constitutes a major failure in both fidelity and quality.

openai gpt-5.4-mini

3.4/10 16.2 s

openai gpt-5.4-mini

Tokens 2 629

Source code 8.7 KB

Time 16.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

3.38

Review

The model failed critically on completeness and faithfulness: the code is truncated (it cuts off abruptly in the middle of the 4th slide) and the JavaScript required for navigation is entirely missing. Although the visual aesthetics (CSS) are excellent and respect the design constraints, the absence of functional logic and the 5th slide renders the result unusable.

openai gpt-5.4-nano

9.6/10 58.5 s

openai gpt-5.4-nano

Tokens 5 564

Source code 20.2 KB

Time 58.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all the technical and aesthetic constraints of the prompt. The visual output is extremely professional, respecting the neutral palette with a consistent accent color, and the slide structure (KPIs, CSS bar charts, regions) is impeccable. The code is clean, semantic, and perfectly adapted to the 400x400 container.

openai gpt-5.4-pro

4.6/10 308.8 s

openai gpt-5.4-pro

Tokens 3 193

Source code 11.0 KB

Time 308.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

4.63

Review

The model generated extremely high-quality code in terms of design and structure, but the output is totally incomplete because the source code cuts off abruptly halfway through the second slide. Consequently, it is impossible to verify the presence of the 5 slides, the CSS chart, or the conclusion, which drastically reduces both faithfulness and completeness.

openai gpt-5.5

3.4/10 31.3 s

openai gpt-5.5

Tokens 2 602

Source code 8.7 KB

Time 31.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

3.38

Review

The model failed critically on completeness and faithfulness: the code is truncated (it cuts off abruptly in the middle of the HTML), rendering the presentation non-functional and incomplete (missing slides, non-existent navigation). Although the visual aesthetics and CSS are of high quality, the inability to use the final product due to an interrupted generation results in a very low score.

openai gpt-5.5-pro

7.0/10 384.1 s

openai gpt-5.5-pro

Tokens 2 760

Source code 9.3 KB

Time 384.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

Review

The visual rendering is excellent, featuring a professional aesthetic and a clear hierarchy that perfectly adheres to the requested palette. However, the code is truncated (incomplete), which prevents the remaining slides from being generated and causes the navigation/indicator to display incorrectly. The fidelity score is penalized by this abrupt interruption, which renders the final product non-functional.

productivia matania-latest

9.8/10 12.0 s

productivia matania-latest

Tokens 1 655

Source code 5.4 KB

Time 12.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all prompt constraints, including the 400x400 format, the number of slides, and the CSS chart. The output is professional and clean, and the navigation (buttons + indicator + arrow keys) is fully functional. The code is concise and well-structured.

xai grok-4-1-fast-non-reasoning

9.8/10 14.1 s

xai grok-4-1-fast-non-reasoning

Tokens 1 519

Source code 4.6 KB

Time 14.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 format, the number of slides, the CSS graphics, and the navigation. The output is professional and clean, and the code implementation is both concise and functional.

xai grok-4-1-fast-reasoning

6.5/10 57.0 s

xai grok-4-1-fast-reasoning

Tokens 2 307

Source code 7.7 KB

Time 57.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

6.5

Review

The model produced code of very high visual and structural quality, perfectly adhering to the requested aesthetics and size constraints. However, the code is truncated (incomplete) at the end of the fourth slide, which prevents the fifth slide from existing and breaks the navigation functionality. The faithfulness score is heavily impacted by this technical cutoff, which renders the final product unusable.

Travel journal

html

anthropic claude-haiku-4-5-20251001

3.0/10 17.1 s

anthropic claude-haiku-4-5-20251001

Tokens 3 613

Source code 10.8 KB

Time 17.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

Review

The model failed critically on almost all constraints. The code is truncated (incomplete), which prevents the navigation from working and renders the presentation unusable. Although the SVGs are created and the styling is clean, the failure to respect the complete output structure and the absence of JavaScript logic for the slides (due to the code being cut off) cause the fidelity and completeness scores to plummet.

anthropic claude-opus-4-6

7.8/10 53.3 s

anthropic claude-opus-4-6

Tokens 4 015

Source code 8.7 KB

Time 53.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

7.75

Review

The visual rendering is excellent, featuring elegant typography and original SVGs that align well with the theme. However, the code is truncated at the end (the fourth slide is incomplete and the closing tag is missing), which prevents the work from being fully complete and impacts the faithfulness score.

anthropic claude-opus-4-7

9.8/10 28.3 s

anthropic claude-opus-4-7

Tokens 3 040

Source code 4.8 KB

Time 28.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 format and the exclusion of external images by using high-quality embedded SVGs. The visual rendering is aesthetic, featuring a consistent color palette and typography suited to the travel journal theme. Navigation is smooth and comprehensive (arrows and indicators included).

anthropic claude-sonnet-4-6

5.5/10 63.3 s

anthropic claude-sonnet-4-6

Tokens 5 151

Source code 12.2 KB

Time 63.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model produced a magnificent visual output with very polished SVGs and a consistent travel journal aesthetic. However, the code is truncated (incomplete) halfway through the second slide, which prevents the navigation from working and stops slides 3 and 4 from displaying. Fidelity is heavily penalized by this cutoff, which renders the final product unusable.

google gemini-flash-latest

9.8/10 20.2 s

google gemini-flash-latest

Tokens 2 052

Source code 4.3 KB

Time 20.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the container size (400x400), the absence of external images, and the use of inline SVGs. The visual rendering is elegant, successfully achieving a travel journal aesthetic, and the navigation is functional and complete. The code is clean, concise, and well-structured.

google gemini-flash-lite-latest

8.9/10 6.4 s

google gemini-flash-lite-latest

Tokens 1 197

Source code 2.3 KB

Time 6.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.88

Review

The model strictly adheres to all prompt constraints: 4 slides, 400x400 format, exclusive use of inline SVG, and no external images. The code is clean and functional, although the visual output is very minimalist and lacks aesthetic refinement (raw typography, basic layout). Arrow navigation and the position indicator are both present and fully operational.

mistral mistral-large-latest

6.5/10 34.4 s

mistral mistral-large-latest

Tokens 1 849

Source code 5.9 KB

Time 34.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

6.5

Review

The model respected most of the visual and thematic constraints, but the code is truncated at the end (the script is not closed), which prevents flawless execution and negatively impacts the code_quality score. Fidelity is penalized by this abrupt cutoff and the fact that the provided source code is incomplete, although the visual output appears to adhere to the 400x400 format and inline SVGs.

mistral mistral-medium-latest

5.5/10 16.2 s

mistral mistral-medium-latest

Tokens 2 184

Source code 7.2 KB

Time 16.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model failed to provide complete code, cutting off abruptly in the middle of the HTML (truncated code), which prevents the navigation from working (missing JavaScript). Although the visual design and SVGs adhere to the style guidelines, the inability to use the application due to the incomplete code heavily penalizes both faithfulness and completeness.

mistral mistral-small-latest

9.3/10 11.9 s

mistral mistral-small-latest

Tokens 1 620

Source code 5.0 KB

Time 11.9 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model perfectly adhered to all the strict prompt constraints, specifically the number of slides, the exclusive use of inline SVG, and the 400x400 format. The navigation is comprehensive, featuring arrows, indicators, and keyboard support. The visual rendering is simple yet functional, although the SVG illustrations are very basic.

openai gpt-4o-mini

8.3/10 17.1 s

openai gpt-4o-mini

Tokens 1 247

Source code 3.5 KB

Time 17.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.25

Review

The model adheres to almost all constraints, specifically the slide count, the use of inline SVGs, and the absence of external images. However, the visual rendering is extremely rudimentary, and the SVGs consist of very basic geometric shapes that only vaguely represent the destinations. The code structure is correct although incomplete (the <head> tag is improperly opened), but the navigation works perfectly.

openai gpt-5.4

7.3/10 35.6 s

openai gpt-5.4

Tokens 2 195

Source code 7.2 KB

Time 35.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

7.25

Review

The model followed almost all instructions, creating beautiful inline SVG illustrations and a cohesive aesthetic. However, the provided code is truncated (cut off at the end), which prevents the JavaScript from functioning and the HTML from being valid/complete. While the visual design is excellent, the lack of functional navigation due to the incomplete code significantly impacts the completeness and fidelity scores.

openai gpt-5.4-mini

5.5/10 25.1 s

openai gpt-5.4-mini

Tokens 3 212

Source code 11.1 KB

Time 25.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.5

Review

The model failed the completeness and fidelity constraints: the source code is truncated at the end of the second slide, preventing the inclusion of the requested 4 slides and making navigation impossible beyond slide 2. Although the visual rendering and SVGs are of good quality, the inability to execute the entire presentation is a major flaw.

openai gpt-5.4-nano

9.8/10 53.1 s

openai gpt-5.4-nano

Tokens 4 198

Source code 15.0 KB

Time 53.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 dimensions and the exclusion of external images through highly creative inline SVGs. The navigation is complete, featuring arrows and position indicators, and the design is aesthetic and consistent with the travel journal theme.

openai gpt-5.4-pro

2.6/10 284.6 s

openai gpt-5.4-pro

Tokens 2 823

Source code 9.6 KB

Time 284.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

2.63

Review

The model failed critically on code generation: the file is truncated (the code cuts off abruptly in the middle of an SVG tag), rendering the presentation incomplete and non-functional. Consequently, the navigation and all 4 slides cannot be verified or used. Although the visual style suggested by the initial code is aesthetic, the inability to provide complete code is a major flaw.

openai gpt-5.5

9.8/10 26.6 s

openai gpt-5.5

Tokens 1 654

Source code 5.1 KB

Time 26.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 dimensions and the absence of external images by using high-quality embedded SVGs. Navigation is complete (arrows and indicators), and the design respects a travel journal aesthetic. The code is clean, well-structured, and fully functional.

openai gpt-5.5-pro

6.0/10 157.7 s

openai gpt-5.5-pro

Tokens 1 849

Source code 5.8 KB

Time 157.7 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

Review

The model produced a magnificent visual rendering that is highly consistent with the 'travel journal' theme (colors, typography, original SVGs). However, the code is truncated at the end (the 4th slide is incomplete and the HTML is not closed), which prevents navigation and the completeness of the application. Fidelity is heavily penalized by this technical cutoff, which renders the final product non-functional.

productivia matania-latest

9.8/10 10.8 s

productivia matania-latest

Tokens 1 389

Source code 4.1 KB

Time 10.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 format, the absence of external images, and the use of inline SVGs. The design is consistent with the travel journal theme, the navigation is functional and smooth, and the code is clean and well-structured.

xai grok-4-1-fast-non-reasoning

9.5/10 11.4 s

xai grok-4-1-fast-non-reasoning

Tokens 1 368

Source code 4.1 KB

Time 11.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.5

Review

The model perfectly adhered to all prompt constraints, including the absence of external images, the use of inline SVGs, and the 400x400 format. The code is clean and functional, and the navigation is intuitive. The visual rendering is simple yet effective and respects the required structure for each slide.

xai grok-4-1-fast-reasoning

9.3/10 36.2 s

xai grok-4-1-fast-reasoning

Tokens 1 882

Source code 6.1 KB

Time 36.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.25

Review

The model strictly adheres to all prompt constraints: 4 slides, inline SVG, no external images, and a 400x400 format. The visual output is elegant, featuring beautiful serif typography and consistent SVG illustrations. However, the source code is truncated at the end (the script is cut off), which prevents perfect execution if used as is; nevertheless, for the purpose of evaluating the requested rendering, the objective has been met.

Step-by-step tutorial

html

anthropic claude-haiku-4-5-20251001

9.6/10 10.3 s

anthropic claude-haiku-4-5-20251001

Tokens 2 145

Source code 4.7 KB

Time 10.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model followed all instructions perfectly, including the specific constraints like the 400x400 fit, keyboard navigation, and progress bar. The visual aesthetic is warm and professional, and the code is clean, although it contains a minor HTML syntax error (unclosed <div> tags in the body) which doesn't affect the final rendering.

anthropic claude-opus-4-6

8.4/10 30.3 s

anthropic claude-opus-4-6

Tokens 2 668

Source code 6.2 KB

Time 30.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.38

Review

The visual rendering is excellent and perfectly adheres to the requested 'warm' aesthetic with a beautiful typographic hierarchy. However, the source code is truncated at the end (the `updateControls` function is not closed and the script is incomplete), which prevents navigation from actually working in a real-world environment. While fidelity to the prompt is very high regarding design and structure, technical completeness is penalized by this abrupt code cutoff.

anthropic claude-opus-4-7

9.8/10 34.6 s

anthropic claude-opus-4-7

Tokens 2 943

Source code 4.6 KB

Time 34.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 format, inline SVG icons, and hybrid navigation (keyboard/buttons). The design is warm and consistent with the subject matter, and both the progress bar and the step indicator function without error.

anthropic claude-sonnet-4-6

6.1/10 41.5 s

anthropic claude-sonnet-4-6

Tokens 4 458

Source code 9.8 KB

Time 41.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

6.13

Review

The model failed on completeness and faithfulness because the code is truncated (it cuts off abruptly in the middle of slide 3), which prevents it from delivering the 5 requested slides. However, the visual aesthetics are excellent, the adherence to dimensions (400x400) is perfect, and the integrated SVGs are of high quality.

google gemini-flash-latest

9.8/10 17.2 s

google gemini-flash-latest

Tokens 2 484

Source code 5.0 KB

Time 17.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all prompt constraints, including the 400x400 format and the navigation functionalities. The aesthetics are warm and consistent with the subject matter, the SVGs are cleanly integrated, and the progression logic is impeccable. The code is clean, although the absence of a complete <head> tag (even though <style> is present) is a minor technical imperfection that has no impact on the final rendering.

google gemini-flash-lite-latest

9.8/10 4.0 s

google gemini-flash-lite-latest

Tokens 1 238

Source code 2.6 KB

Time 4.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 dimensions and hybrid navigation (keyboard/buttons). The code is clean and efficient, and the requested warm aesthetic is well-rendered visually. The progress bar and step indicator function smoothly and accurately.

mistral mistral-large-latest

9.8/10 17.8 s

mistral mistral-large-latest

Tokens 1 410

Source code 4.2 KB

Time 17.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 dimensions and hybrid navigation (keyboard/buttons). The aesthetics are warm and consistent with the subject matter, and the code is clean, semantic, and functional.

mistral mistral-medium-latest

3.0/10 32.5 s

mistral mistral-medium-latest

Tokens 3 075

Source code 10.6 KB

Time 32.5 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

Review

The model failed on almost all structural constraints. It generated only 4 slides instead of 5, the code is truncated (missing closing tags, no JavaScript for navigation, no progress bar logic), and the 'ONLY raw HTML' output constraint was partially ignored due to an incomplete code structure. While the visual rendering is correct, the functionality is non-existent because the code is unfinished.

mistral mistral-small-latest

9.6/10 6.9 s

mistral mistral-small-latest

Tokens 1 222

Source code 3.4 KB

Time 6.9 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

The model perfectly adhered to all prompt constraints, including the 400x400 size, the progress bar, the text indicator, and keyboard navigation. The rendering is aesthetic, warm, and meets expectations. The code is clean and functional, although an opening <html> tag is missing and the body closing tag is malformed, which is negligible for a self-contained component.

openai gpt-4o-mini

8.6/10 10.6 s

openai gpt-4o-mini

Tokens 1 165

Source code 3.0 KB

Time 10.6 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.63

Review

The model adheres to almost all constraints, including keyboard navigation, the progress bar, and the content format. The visual rendering is very basic and does not strictly follow the sizing constraint (400x400) as it uses a full-screen display (100vh), but the functional structure is excellent.

openai gpt-5.4

9.6/10 34.3 s

openai gpt-5.4

Tokens 2 423

Source code 8.0 KB

Time 34.3 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.63

Review

Le modèle a parfaitement respecté toutes les contraintes, y compris la taille stricte de 400x400, la structure en 5 étapes et l'esthétique chaleureuse. Le design visuel est cohérent, les icônes SVG sont bien intégrées et toutes les exigences de navigation (clavier, boutons, barre de progression, indicateur) sont correctement implémentées. Le code est propre et autonome.

openai gpt-5.4-mini

7.5/10 17.4 s

openai gpt-5.4-mini

Tokens 2 084

Source code 6.6 KB

Time 17.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

7.5

Review

The visual rendering is excellent, perfectly adhering to the 'warm' aesthetic and dimensional constraints. However, the source code is truncated at the end (the script is incomplete), which prevents the navigation functionality from actually working in a real-world environment, even though the HTML structure is complete. The fidelity score is impacted by this major technical cutoff, which renders the 'presentation' object non-functional.

openai gpt-5.4-nano

9.8/10 29.4 s

openai gpt-5.4-nano

Tokens 3 201

Source code 11.0 KB

Time 29.4 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 dimensions and the use of inline SVG. The aesthetics are warm and professional, featuring smooth navigation (buttons, indicator, and progress bar) and a very clean code structure.

openai gpt-5.4-pro

6.1/10 223.0 s

openai gpt-5.4-pro

Tokens 1 647

Source code 4.9 KB

Time 223.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

6.13

Review

The visual rendering is excellent, featuring a warm aesthetic and polished typography that respects the requested atmosphere. However, the code is truncated (incomplete), which prevents functional navigation and the presence of all slides, resulting in a very low fidelity score despite the quality of the initial implementation.

openai gpt-5.5

9.8/10 29.9 s

openai gpt-5.5

Tokens 1 466

Source code 4.3 KB

Time 29.9 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all prompt constraints, including the dimensions (400x400), the number of slides, and the navigation functionalities. The output is aesthetically very successful, featuring a warm color palette and consistent typography. The code is clean, compact, and uses inline SVGs for the background as requested.

openai gpt-5.5-pro

5.1/10 149.0 s

openai gpt-5.5-pro

Tokens 2 468

Source code 8.2 KB

Time 149.0 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

5.13

Review

The model failed critically on completeness and faithfulness because the code is truncated (cutting off in the middle of the SVG on slide 5), which prevents the application from functioning fully. However, the generated portion is aesthetically excellent, perfectly adhering to the requested 'warm' design constraints, dimensions, and navigation features.

productivia matania-latest

9.8/10 11.2 s

productivia matania-latest

Tokens 1 418

Source code 5.7 KB

Time 11.2 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the 400x400 size and the use of inline SVG. The aesthetics are warm and consistent with the subject matter, and the navigation (keyboard, buttons, progress bar) is complete and functional.

xai grok-4-1-fast-non-reasoning

8.8/10 25.8 s

xai grok-4-1-fast-non-reasoning

Tokens 1 234

Source code 3.5 KB

Time 25.8 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

8.75

Review

The visual rendering is excellent, respecting both the warm aesthetic and the dimensional constraints. Navigation is comprehensive (buttons, keyboard, indicator, progress bar). However, fidelity is slightly compromised because the model included the 'Step X/5' indicator in both the H1 title of each slide and in a separate indicator, creating an unrequested redundancy; additionally, the presentation's main title is missing from the rendering, replaced by the individual slide titles.

xai grok-4-1-fast-reasoning

9.8/10 97.1 s

xai grok-4-1-fast-reasoning

Tokens 1 250

Source code 3.4 KB

Time 97.1 s

Matania Judgment

Rendering

Navigation

Code quality

Completeness

Fidelity

Overall

9.75

Review

The model perfectly adhered to all constraints, including the strict 400x400 format and the implementation of hybrid navigation (keyboard/buttons). The design is warm and consistent with the subject matter, and both the progress bar and step indicator function smoothly. The code is clean and well-structured.