Biology

Anatomy, physiology, medical and veterinary

10 tasks · 11 models tested · 110 results

Human heart

image

google gemini-2.5-flash-image

4.8/10 6.4 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 6.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is visually impressive and accurately respects the medical aspect of the anatomical section. However, it fails heavily on the textual aspect: the labels consist of illegible characters and meaningless pseudo-words, which contradicts the requirement for English labeling and text. Fidelity is therefore significantly impacted by the model's inability to generate coherent and informative text.

google gemini-3-pro-image-preview

4.8/10 23.7 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 23.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically pleasing and presents correct visual anatomical quality, but it fails on a functional level. The text is nearly illegible and composed of incoherent characters (gibberish), which directly contradicts the request for precise labels. Fidelity is heavily impacted by the model's inability to generate intelligible and accurate medical text.

google imagen-4.0-fast-generate-001

4.8/10 4.9 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image presents a convincing medical aesthetic and good visual quality, but fails heavily on textual accuracy. The labels consist of illegible characters or incoherent pseudo-words, which directly violates the prompt's requirement for explicit English labels. Fidelity is significantly impacted by this inability to generate functional and anatomically correct text.

google imagen-4.0-generate-001

4.8/10 8.1 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image possesses convincing aesthetic and medical quality, but fails heavily on textual accuracy. The labels consist of incoherent and illegible characters (gibberish), which directly contradicts the instruction 'all chambers and valves labeled'. Fidelity is therefore significantly impacted by the model's inability to generate correct anatomical text.

google imagen-4.0-ultra-generate-001

4.9/10 14.1 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 14.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image presents impressive visual and anatomical quality, but fails heavily on textual constraints. Although the rendering is aesthetic, the labels are illegible or composed of incoherent characters, which directly contravenes the explicit request for precise English captions. Fidelity is therefore heavily penalized by the model's inability to generate functional anatomical text.

openai chatgpt-image-latest

4.9/10 45.8 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 45.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is visually high-quality with a convincing medical aesthetic, but it fails on a crucial functional level. Although labels are present, the text is largely illegible or composed of incoherent characters (textual hallucinations), making the image useless for medical purposes. Fidelity is penalized because the prompt explicitly required named chambers and valves, which was not met in an actionable way.

segmind ideogram-3

6.3/10 12.9 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 12.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is aesthetically very successful, with impressive medical rendering quality. However, fidelity is compromised because, although labels are present, they are often illegible, inconsistent, or fail to correctly name all chambers and valves with anatomical precision. While text is present, it fails to meet the scientific accuracy required by the prompt.

segmind seedream-4.5

4.5/10 16.0 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 16.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.5

Review

The image respects the medical aspect and the anatomical cross-section, but fails heavily on textual precision. The labels consist of illegible characters and gibberish, which contradicts the requirement for English labels and text. Fidelity is therefore significantly impacted by the model's inability to generate functional and accurate text.

segmind seedream-v5-lite

3.9/10 42.7 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 42.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.88

Review

The image fails heavily on textual accuracy, displaying incoherent and illegible characters instead of actual anatomical labels. While the visual structure evokes a heart, the prompt explicitly required all chambers and valves to be named in English, which was not respected.

xai grok-imagine-image

4.1/10 8.8 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 8.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.13

Review

The image is aesthetically successful and adheres to the requested medical style, but it fails almost entirely on the textual instructions. Although labels are present, the text consists of incoherent and illegible characters (gibberish), meaning the "labeled" and "text in English" aspects do not comply with the prompt. Consequently, fidelity is very low despite a good overall visual quality.

xai grok-imagine-image-pro

4.8/10 17.1 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 17.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically pleasing and possesses good technical quality, but it fails on a functional level. The generated text is unreadable gibberish, making the labeling of the chambers and valves impossible. Fidelity is heavily penalized because one of the prompt's major constraints (text in English / labeled) was not met.

Respiratory system

image

google gemini-2.5-flash-image

4.9/10 8.0 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 8.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing and provides a clean medical illustration, but it fails on the crucial aspect of textual accuracy. The generated text consists of incoherent characters and gibberish that do not correspond to any real anatomical terms, directly contradicting the "labeled anatomy" instruction. Consequently, fidelity is heavily penalized despite the good overall visual quality.

google gemini-3-pro-image-preview

4.9/10 26.1 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically successful with a clear medical composition, but it fails on the crucial aspect of labeling. The generated text is completely incoherent and resembles visual gibberish, making the illustration useless for educational purposes. Although the subject matter is respected, the promise of "labeled anatomy" is not functionally fulfilled.

google imagen-4.0-fast-generate-001

4.8/10 4.5 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically successful with a high-quality medical rendering, but it fails almost entirely on textual constraints. Although the respiratory system is represented, the labels consist of illegible characters and gibberish, which directly violates the prompts for 'labeled anatomy' and 'Text in English.' Consequently, fidelity is heavily penalized despite the good visual quality.

google imagen-4.0-generate-001

5.8/10 8.5 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.75

Review

The image is aesthetically very successful, featuring professional medical-grade rendering quality and a balanced composition. However, the model fails significantly on textual accuracy: the labels consist of illegible characters and gibberish, which directly violates the explicit "labeled anatomy" instruction. Fidelity is therefore penalized by the inability to provide coherent textual annotations.

google imagen-4.0-ultra-generate-001

5.8/10 12.6 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 12.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.75

Review

The image is aesthetically very successful, featuring impressive medical rendering quality and a clear composition. However, the model fails heavily on textual accuracy: the labels consist of incoherent characters and illegible pseudo-text, which contradicts the requirement for 'labeled anatomy.' Fidelity is therefore penalized, as the informative aspect (labels) is non-existent despite the visual quality.

openai chatgpt-image-latest

5.8/10 41.6 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 41.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.75

Review

The illustration is aesthetically pleasing and technically high-quality, featuring a clear composition. However, the model fails significantly on textual accuracy, generating illegible and incoherent gibberish instead of actual anatomical labels. Fidelity is penalized because, while the instruction to provide anatomical labels is technically respected in terms of structure, the inability to read the medical terms renders the image useless for its intended purpose.

segmind ideogram-3

9.1/10 13.7 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 13.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

9.13

Review

The image is a high-quality medical illustration, very sharp and aesthetically balanced. The prompt is perfectly followed, including the complete anatomy and the requested English text labels. The text is legible and correctly spelled, which is a major achievement for an image generation model.

segmind seedream-4.5

4.9/10 13.7 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 13.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically successful and resembles a genuine medical illustration, but it fails heavily on textual constraints. The generated text is completely incoherent and consists of gibberish, which directly violates the prompt for precise anatomical labels. Consequently, fidelity is low despite acceptable visual quality.

segmind seedream-v5-lite

4.4/10 39.0 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 39.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.38

Review

The image presents a decent medical aesthetic but fails heavily on textual accuracy, displaying incoherent and unreadable gibberish instead of actual anatomical labels. Although the subject is identifiable, the failure to follow the 'labeled anatomy' instruction and the presence of erroneous text significantly penalize both fidelity and precision.

xai grok-imagine-image

4.9/10 7.0 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 7.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically successful and resembles a genuine medical illustration, but it fails heavily on textual constraints. The text is completely illegible and composed of incoherent characters (gibberish), which contradicts the explicit requirement for anatomical labels in English. Consequently, fidelity is heavily penalized despite the good overall visual quality.

xai grok-imagine-image-pro

5.4/10 13.9 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 13.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The image possesses a clean medical aesthetic and good composition, but it fails on the crucial aspect of labeling. The generated text is unreadable gibberish with no medical meaning, which directly contradicts the request for "labeled anatomy." Fidelity is therefore heavily penalized, as the essential informative aspect of the illustration is missing.

Brain and regions

image

google gemini-2.5-flash-image

5.3/10 6.0 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 6.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.25

Review

The image is aesthetically very successful, boasting excellent technical quality and a clear composition. However, it fails heavily on textual constraints: the labels are completely illegible, inconsistent, or composed of abstract characters (textual hallucinations), which directly contradicts the requirement for English labels.

google gemini-3-pro-image-preview

5.6/10 26.8 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.63

Review

The image is aesthetically successful, featuring high rendering quality and a balanced composition. However, fidelity is significantly penalized by the model's inability to generate legible and coherent text (labels), which explicitly contradicts the prompt requesting English labels. The scientific aspect is compromised by the illegible text, which resembles visual gibberish.

google imagen-4.0-fast-generate-001

5.3/10 4.2 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.25

Review

The image is aesthetically very successful, featuring excellent rendering quality and a clear composition. However, the model fails heavily on textual accuracy, generating incoherent and illegible text instead of actual labels. Fidelity is penalized because one of the explicit instructions ("labels") is not functionally respected, even though the visual aspect is present.

google imagen-4.0-generate-001

6.3/10 8.8 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is aesthetically very successful, featuring excellent technical quality and a clear composition. However, the model fails heavily on textual accuracy, displaying inconsistent and unreadable characters (gibberish) instead of actual labels. Fidelity is penalized because while the "label" aspect is visually present, the textual content is entirely incorrect.

google imagen-4.0-ultra-generate-001

5.1/10 12.4 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 12.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.13

Review

The image is aesthetically successful, featuring high rendering quality and a clear composition. However, the model fails significantly on textual accuracy, displaying incoherent and illegible text that does not correspond to actual neuroscientific terms. Fidelity is penalized because the prompt explicitly required labels, which were not functionally respected.

openai chatgpt-image-latest

5.4/10 43.2 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 43.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The overall aesthetics and composition of the illustration are of high quality, reminiscent of a neuroscience textbook. However, the model fails heavily on textual accuracy, generating illegible labels or strings of incoherent characters (gibberish), which violates the explicit request for English labels. Fidelity is penalized by this inability to provide functional and readable text.

segmind ideogram-3

9.1/10 12.8 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 12.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

9.13

Review

The image is of excellent technical quality, featuring remarkable sharpness and a professional aesthetic characteristic of medical illustrations. The model perfectly adheres to all prompt instructions, including color coding, labeling, and the use of English. The text is perfectly legible and free of spelling errors, which is a notable achievement for image generation.

segmind seedream-4.5

5.4/10 13.8 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 13.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The visual aesthetics and color separation are well-executed, offering high-quality illustration. However, the model fails heavily on textual accuracy, generating illegible and incoherent characters instead of actual labels. Fidelity is penalized as the explicit instruction to provide text in English is not functionally respected.

segmind seedream-v5-lite

4.8/10 37.9 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 37.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image presents a clean medical aesthetic and good composition, but fails heavily on textual accuracy. The text is completely illegible and consists of visual gibberish, which contradicts the explicit instruction to provide labels. Fidelity is penalized because the functional and informative aspect promised by the prompt is sacrificed in favor of mere aesthetics.

xai grok-imagine-image

5.4/10 6.0 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 6.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The image is aesthetically successful, featuring high-quality rendering and careful lighting. However, the model fails significantly on textual accuracy, generating incoherent and illegible text that fails to serve its purpose as 'labels.' Fidelity is penalized here, as the informative aspect (readable labels) is essential for a neuroscience illustration.

xai grok-imagine-image-pro

6.1/10 15.0 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 15.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.13

Review

The visual aesthetics and illustration quality are excellent, providing a clean and professional medical rendering. However, fidelity is penalized by the AI's inability to generate legible and coherent text, producing abstract characters instead of actual English labels as requested. While the concept of color-coded regions is well-maintained, the informative aspect (labels) is a technical failure.

Full skeleton

image

google gemini-2.5-flash-image

3.4/10 7.5 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 7.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.38

Review

The image fails almost entirely to meet the prompt's critical constraints. While the visual quality is acceptable in terms of anatomical rendering, the model is incapable of generating legible or coherent text, and it is physically impossible for an image-generating AI to accurately list and label all 206 bones. Fidelity is extremely low, as the labels consist of nonsensical, gibberish text lacking any medical meaning.

google gemini-3-pro-image-preview

4.1/10 26.1 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.13

Review

The image fails critically on the requested task: it is technically impossible for a current image model to generate 206 legible and accurate text labels. The text consists of illegible gibberish, rendering the labeling function non-existent. Although the visual composition of the skeletons is correct, fidelity is very low because the primary labeling instruction was not followed.

google imagen-4.0-fast-generate-001

3.6/10 4.3 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.63

Review

The image fails massively on the primary constraint: the labeling of the 206 bones. Although the anterior and posterior views are present with good visual quality, the text is completely incoherent, consisting of abstract and illegible characters that do not constitute actual anatomical labels. Fidelity is extremely low because the requested labeling task is technically failed by the model.

google imagen-4.0-generate-001

3.6/10 7.8 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 7.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.63

Review

The image fails heavily on the primary task: it is technically impossible for a current image generation model to correctly and legibly label the 206 requested bones. The text consists of illegible pseudo-characters (gibberish), resulting in zero textual accuracy and very low prompt fidelity, despite a correct anatomical aesthetic.

google imagen-4.0-ultra-generate-001

3.6/10 10.9 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 10.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.63

Review

The image fails significantly on the main labeling instruction: although the visual aspect is clean, the text consists of illegible scribbles that do not correspond to any real words. Furthermore, it is technically impossible for a generative AI to accurately and legibly display all 206 bones individually in a single image, resulting in very low prompt fidelity.

openai chatgpt-image-latest

3.6/10 51.1 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 51.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.63

Review

The image fails massively on the main prompt: it is impossible for a current image generation model to correctly label all 206 bones without producing incoherent text or artifacts. Although the visual composition is aesthetic and resembles a medical diagram, the fidelity is very low because the labels are illegible and the requested number of elements is not respected at all.

segmind ideogram-3

4.8/10 12.8 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 12.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically pleasing with good composition, but it fails heavily on the technical constraints of the prompt. Although the anterior and posterior views are present, it is impossible to display all 206 bones with legible and precise labels; the generated text is mostly unreadable gibberish, which directly contradicts the request for English labels.

segmind seedream-4.5

3.9/10 16.0 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 16.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.88

Review

The image fails massively on the primary task: the labels are unreadable, and the model is absolutely unable to list the 206 requested bones, producing illegible and incoherent text instead. Although the basic anatomical structure is present, prompt fidelity is very low as the textual and precision constraints are not met.

segmind seedream-v5-lite

2.6/10 52.4 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 52.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

2.63

Review

The image almost entirely fails to respect the textual and content constraints. Although the overall anatomical structure is suggested, the model is incapable of generating the 206 requested labels, producing instead illegible and incoherent gibberish. Fidelity is extremely low, as the promise of a complete, labeled educational diagram is not met.

xai grok-imagine-image

3.6/10 7.8 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 7.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

3.63

Review

The image is aesthetically pleasing and well-composed, but it fails critically on text instructions. Although the anatomical views are present, the model is unable to generate legible or accurate labels, and it is physically impossible for current AI to correctly list all 206 bones with coherent text on a single image. Consequently, fidelity is very low because the primary constraint (complete labeling) is not met.

xai grok-imagine-image-pro

4.1/10 10.7 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 10.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.13

Review

The image is visually clean and well-composed, but it fails heavily on textual constraints. Although both anterior and posterior views are present, the text is completely illegible and consists of gibberish, making the labeling of the 206 bones impossible. Fidelity is very low because the primary function of the image (a labeled diagram) is not fulfilled.

Eye anatomy

image

google gemini-2.5-flash-image

4.9/10 9.0 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 9.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing and well-composed, but it fails heavily on the scientific and textual aspects. The labels consist of incoherent text and abstract characters (gibberish), making the anatomy unusable for learning purposes. Fidelity is low, as the requirement for "labeled structures" is not met in an intelligible manner.

google gemini-3-pro-image-preview

4.9/10 26.2 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image presents a high-quality medical aesthetic with clear visual rendering, but it fails heavily on the informative aspect. The generated text consists of incoherent characters and pseudo-words that do not correspond to any real anatomical structure, making the diagram unusable for learning purposes. Prompt fidelity is therefore very low, as the 'labeled' constraint is not functionally respected.

google imagen-4.0-fast-generate-001

4.5/10 4.5 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.5

Review

Although the image visually resembles an anatomical cross-section, it fails heavily on the textual aspect: the labels consist of incoherent characters and illegible pseudo-text, failing to follow the "all structures labeled" instruction. Consequently, fidelity is very low because the image's informative purpose is not met.

google imagen-4.0-generate-001

4.6/10 8.9 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.63

Review

The image presents a convincing medical aesthetic and good resolution, but it fails on the crucial point of textual accuracy. The labels are illegible scribbles rather than English text, which directly contradicts the explicit prompt instruction. Consequently, fidelity is very low despite a successful visual composition.

google imagen-4.0-ultra-generate-001

4.8/10 13.6 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 13.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image presents a high-quality medical aesthetic with a clear composition, but it fails heavily on textual constraints. Although the prompt requires structures labeled in English, the generated text consists of incoherent characters and unreadable pseudo-words, making the image useless for educational purposes. Fidelity is therefore very low, as the functional aspect (labels) is not met.

openai chatgpt-image-latest

4.9/10 44.4 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 44.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically successful with high-quality anatomical rendering, but it fails on the functional aspect of the request. The generated text is largely illegible or consists of incoherent characters (gibberish), making the labels unusable for an educational diagram. Fidelity is heavily impacted because the prompt explicitly required labeled structures, which was not intelligibly respected.

segmind ideogram-3

6.4/10 13.1 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 13.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image is aesthetically superb, featuring professional medical-grade technical quality and composition. However, the model fails heavily on textual accuracy: the labels consist of illegible pseudo-text or incoherent words, which directly contradicts the prompt requirement ("all structures labeled"). Fidelity is therefore penalized, as the essential informative aspect of the request has not been met.

segmind seedream-4.5

4.6/10 16.1 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 16.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.63

Review

The image presents a convincing medical aesthetic and good overall composition, but it fails heavily on the requested functional aspect. The text is completely illegible and consists of gibberish, making the labeling of structures impossible. Fidelity is heavily penalized because the prompt explicitly required labeled structures, which is the primary purpose of an anatomical diagram.

segmind seedream-v5-lite

4.4/10 34.2 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 34.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.38

Review

The image generally respects the requested anatomical structure, but fails significantly on textual precision: the labels consist of illegible and incoherent characters, which contradicts the requirement for English text. Fidelity is penalized because the informative and educational aspect (essential for a labeled anatomical cross-section) is compromised by the model's inability to generate readable text.

xai grok-imagine-image

4.8/10 6.3 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 6.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically successful and presents a credible anatomical structure, but it fails heavily on the requested functional aspect. The text is almost illegible or composed of incoherent characters (gibberish), making labeling impossible. Fidelity is low because the primary constraint, "with all structures labeled," is not met in an actionable way.

xai grok-imagine-image-pro

4.9/10 16.0 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 16.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

Although the image is aesthetically successful and resembles a high-quality medical illustration, it fails on a crucial functional aspect: the text is completely incoherent and composed of unreadable gibberish. Fidelity is heavily impacted because the prompt explicitly required labeled structures, which was not respected here.

Digestive system

image

google gemini-2.5-flash-image

5.4/10 6.9 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 6.9 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The image is aesthetically successful with good anatomical composition, but it fails heavily on the textual aspect. Although the prompt requests a labeled illustration, the generated text consists of incoherent and unreadable gibberish, which renders the educational aspect useless and contradicts the requirement for textual precision.

google gemini-3-pro-image-preview

5.4/10 25.7 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 25.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The illustration is visually clean and well-composed, but it fails on the crucial aspect of textual accuracy. The labels consist of illegible characters or gibberish, making the image useless for educational purposes despite adhering to the requested structure.

google imagen-4.0-fast-generate-001

4.9/10 4.5 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image presents a clean medical aesthetic and good overall composition, but fails heavily on textual accuracy. The labels consist of illegible characters and pseudo-words, which directly contradicts the 'labeled' and 'Text in English' prompts. Fidelity is significantly impacted by the model's inability to generate coherent and functional text for an anatomical illustration.

google imagen-4.0-generate-001

4.9/10 9.6 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 9.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing and well-composed, but it fails on a functional level. Although the system is represented, the text is completely incoherent, consisting of abstract and illegible characters, which contradicts the explicit 'labeled' instruction. Fidelity is heavily penalized by the model's inability to generate usable text labels.

google imagen-4.0-ultra-generate-001

4.9/10 10.4 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 10.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically successful with a good medical composition, but it fails on the two critical points of the prompt: labeling and textual accuracy. The generated text consists of incoherent, illegible characters (gibberish), which renders the illustration useless for an educational task despite the visual quality of the diagram.

openai chatgpt-image-latest

5.4/10 42.3 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 42.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The illustration is aesthetically pleasing and well-composed, but it fails on the crucial aspect of textual accuracy. The text is largely illegible or composed of incoherent characters, which contradicts the explicit request for English labels. Fidelity is penalized because the image is a generic anatomical illustration rather than a correctly labeled educational diagram.

segmind ideogram-3

9.1/10 12.2 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 12.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

9.13

Review

The image is an excellent medical illustration, very sharp and aesthetically balanced. The model perfectly adheres to all prompt instructions, including the educational aspect, the complete system, and the accuracy of the English text labels.

segmind seedream-4.5

4.9/10 21.8 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 21.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing with good visual clarity, but it fails heavily on text instructions. The text is a mixture of incoherent characters and gibberish, making labeling impossible and non-functional. Fidelity is severely impacted as the prompt explicitly required labels in English, which was not respected.

segmind seedream-v5-lite

4.4/10 38.6 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 38.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.38

Review

The image fails on its most critical point: textual accuracy. Although the anatomical illustration is generally correct in its structure, the text is completely illegible, consisting of incoherent characters and gibberish, which contradicts the explicit requirements for being 'labeled' and having 'Text in English'. Fidelity is therefore heavily impacted by the model's inability to generate intelligible text.

xai grok-imagine-image

4.9/10 7.3 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 7.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is visually aesthetic and well-composed, but it fails heavily on textual accuracy, a classic issue with image generation models. Although the digestive system is represented, the labels consist of incoherent and illegible gibberish, which directly contradicts the 'labeled' and 'Text in English' instructions. Fidelity is therefore heavily penalized, as the requested educational and textual aspects are not met.

xai grok-imagine-image-pro

4.9/10 15.0 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 15.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing and well-composed, but it fails heavily on textual constraints. Although the system is represented, the text is completely illegible or consists of incoherent characters (gibberish), which violates the explicit request for labeling and English text. Fidelity is therefore heavily penalized by the model's inability to generate functional annotations.

Muscular system

image

google gemini-2.5-flash-image

4.8/10 13.0 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 13.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image presents good aesthetic quality and correct framing for an anatomical plate, but it fails regarding textual aspects. The text is completely illegible and consists of gibberish, which contradicts the explicit requirement for muscle labeling. Fidelity is heavily impacted because the essential informative aspect requested by the prompt is not met.

google gemini-3-pro-image-preview

4.9/10 26.8 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

Although the image is aesthetically pleasing and adheres to the request for anterior and posterior views, it fails significantly on textual accuracy. The labels are illegible scribbles and do not constitute real English, which directly contradicts the explicit prompt constraint. Fidelity is therefore heavily impacted by the model's inability to generate coherent text for an anatomical diagram.

google imagen-4.0-fast-generate-001

4.6/10 4.6 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.63

Review

The image respects the requested visual structure (anterior and posterior views), but fails heavily on the textual aspect. The generated text is unreadable gibberish and does not constitute actual English labels, which directly contravenes an explicit instruction in the prompt. Consequently, fidelity is heavily penalized despite a good overall aesthetic quality.

google imagen-4.0-generate-001

4.6/10 8.3 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.63

Review

The image respects the request for anterior and posterior views but fails critically on the textual aspect: the labels are illegible scribbles and do not constitute real English. Fidelity is heavily penalized because the explicit instruction to name the main muscles is not met, turning the image into a simple anatomical illustration without the requested captions.

google imagen-4.0-ultra-generate-001

4.8/10 12.2 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 12.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image respects the requested anatomical structure (anterior and posterior views), but fails heavily on the text. The labels are illegible, consisting of incoherent characters and pseudo-letters that do not constitute real English. Fidelity is therefore heavily penalized as the 'main muscles labeled in English' constraint is not met.

openai chatgpt-image-latest

4.8/10 45.7 s

openai chatgpt-image-latest

Cost 0.21 $

Resolution 1536 x 1024

Time 45.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically pleasing and demonstrates good technical quality, but it fails on the requested functional aspects. The text is nearly illegible or consists of incoherent characters (textual hallucinations), rendering the "labeled" aspect useless. Furthermore, the image does not distinctly show the anterior and posterior views separately or clearly, thereby failing to meet a crucial part of the instructions.

segmind ideogram-3

6.4/10 14.1 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 14.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image is visually impressive, featuring excellent rendering quality and a balanced composition. However, fidelity is compromised by the model's inability to generate legible and anatomically correct text (very low text_accuracy), turning labels into illegible pseudo-characters. Although both views (anterior and posterior) are present, the educational aspect promised by the prompt is undermined by the typographic failure.

segmind seedream-4.5

4.8/10 15.6 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 15.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image is aesthetically pleasing with good overall anatomical quality, but it fails heavily on the text instructions. The labels are illegible, inconsistent, or composed of nonsensical gibberish characters, which directly contradicts the prompt to 'main muscles labeled in English.' Consequently, fidelity is very low despite a beautiful visual composition.

segmind seedream-v5-lite

2.5/10 45.4 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 45.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

2.5

Review

The image fails almost completely on textual accuracy, displaying incoherent and illegible text that does not correspond to any real anatomical terms. Although the overall anatomical structure is suggested, the model fails to follow the requested labeling instructions, making the image useless for educational purposes.

xai grok-imagine-image

4.6/10 8.7 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 8.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.63

Review

Although the image is aesthetically successful and respects the requested anatomical view, it fails almost completely on textual accuracy. The labels are abstract, illegible shapes rather than actual English text, which contradicts the explicit instruction "main muscles labeled in English." Fidelity is therefore heavily penalized as the crucial informative aspect is missing.

xai grok-imagine-image-pro

4.8/10 19.5 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 19.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.75

Review

The image presents good aesthetic quality and correct framing for an anatomical plate, but fails heavily on textual precision. The labels are illegible, contain inconsistent characters, and do not fulfill the requested labeling function. Although the anterior and posterior views are suggested, the inability to provide correct text renders the image useless for its intended scientific purpose.

Canine anatomy

image

google gemini-2.5-flash-image

6.3/10 7.5 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 7.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is aesthetically very successful, with rendering quality and composition worthy of a veterinary manual. However, the text is total gibberish, which severely undermines the scientific accuracy and the fidelity to the prompt requesting English text. While the structure of the cross-section is well-maintained, the lack of legible labels renders the illustration unusable for its primary purpose.

google gemini-3-pro-image-preview

6.8/10 29.0 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 29.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.75

Review

The image is visually impressive, boasting excellent technical quality and a careful composition typical of a medical illustration. However, its fidelity is undermined by the model's inability to generate legible and coherent English text, producing abstract glyphs instead of actual anatomical labels. While the visual aspect of the cross-section is successful, the informative (textual) aspect is a complete failure.

google imagen-4.0-fast-generate-001

6.4/10 4.2 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image accurately respects the subject of canine anatomy and the requested cross-section view. However, the textual precision is very poor, featuring illegible and incoherent characters that do not constitute actual English. Fidelity is impacted by the failure to meet the explicit constraint 'Text in English'.

google imagen-4.0-generate-001

5.9/10 8.0 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 8.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.88

Review

The image is technically of high quality, featuring a convincing anatomical plate aesthetic. However, the model fails heavily on text accuracy, displaying incoherent and illegible characters instead of the requested English. Fidelity is penalized by this major flaw in text rendering, even though the cross-section view and the subject matter are respected.

google imagen-4.0-ultra-generate-001

6.3/10 11.2 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 11.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is technically excellent, with a very successful medical textbook aesthetic. However, it fails on the textual aspect: the labels consist of illegible characters and pseudo-text, which violates the explicit request for English text. Fidelity is penalized by this inability to generate coherent annotations despite a convincing anatomical composition.

openai chatgpt-image-latest

6.4/10 44.2 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 44.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image accurately captures the subject of canine anatomy in cross-section with a veterinary manual aesthetic. However, textual precision is very poor, featuring illegible or nonexistent characters (text hallucinations), which undermines the requested pedagogical function. While fidelity to the subject is correct, it is penalized by the failure to meet the explicit textual constraint.

segmind ideogram-3

8.3/10 13.7 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 13.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

8.25

Review

The image adheres very well to the subject of veterinary anatomy, featuring a clear cross-section view and a high-quality scientific plate aesthetic. Prompt fidelity is excellent, although text accuracy is slightly diminished by characters or labels that are sometimes difficult to read or slightly inconsistent, which is common with this type of generation.

segmind seedream-4.5

6.4/10 21.6 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 21.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image adheres well to the composition of a veterinary anatomical illustration with a coherent cross-section view. However, the textual precision is very poor, featuring illegible characters and nonsensical text that fails to follow the request for English labeling. Fidelity is impacted by the model's inability to generate intelligible text despite explicit instructions.

segmind seedream-v5-lite

4.1/10 33.0 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 33.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.13

Review

The image fails in terms of scientific and textual accuracy: the organs are poorly defined and the text is completely illegible or composed of incoherent characters, which contradicts the request for a veterinary illustration. Although the general aesthetic evokes an anatomical plate, prompt fidelity is low because the text is not in understandable English and the anatomical structure is incorrect.

xai grok-imagine-image

6.4/10 6.1 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 6.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image adheres well to the subject of canine anatomy and the requested cross-section view. However, the text precision is very poor, featuring illegible or incoherent characters that fail to follow the "Text in English" instruction. While the visual quality is good, the scientific aspect is undermined by the model's inability to generate correct text labels.

xai grok-imagine-image-pro

6.6/10 14.4 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 14.4 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.63

Review

The image accurately respects the subject of canine anatomy and the requested cross-section view with a clean medical aesthetic. However, fidelity is penalized by the AI's inability to generate legible and coherent English text, producing unreadable glyphs instead of labels. The technical quality is overall good, but the text is a total failure relative to the explicit instruction.

Equine locomotion

image

google gemini-2.5-flash-image

4.9/10 6.2 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 6.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect the textual and conceptual constraints of the prompt. Although the aesthetics are correct, the model fails to distinctly represent the four gaits in a sequential manner, and the generated text is incoherent or nonsensical, which heavily impacts fidelity and textual precision.

google gemini-3-pro-image-preview

4.9/10 26.7 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 26.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect both textual and conceptual constraints. Although the aesthetics are acceptable, the model fails to distinctly illustrate the four requested gaits in a sequential manner and produces incoherent or missing text where it was explicitly required. Prompt fidelity is low because the pedagogical demonstration structure is not maintained.

google imagen-4.0-fast-generate-001

4.9/10 5.1 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 5.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect the textual and conceptual constraints of the prompt. Although the aesthetics are correct, the model fails to display legible English text and does not clearly present the four distinct looks in a sequential manner as requested.

google imagen-4.0-generate-001

4.9/10 9.0 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 9.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect both textual and conceptual constraints: the text is incoherent and does not correctly name the gaits, and the image does not clearly present four distinct sequential phases of the four gaits (walk, trot, canter, slow gait). Although the aesthetic quality is good, prompt fidelity is mediocre due to the lack of an explicit sequential structure and the failure of the text.

google imagen-4.0-ultra-generate-001

4.9/10 14.1 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 14.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect both textual and structural constraints. Although the aesthetics are of good quality, the model fails to generate legible English text and does not clearly present the four gaits in a sequential manner as requested. Consequently, prompt fidelity is very low despite the high overall visual quality.

openai chatgpt-image-latest

4.9/10 50.3 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 50.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails on the crucial point of fidelity: it does not present the four requested gaits, but instead seems to show a repetitive or incomplete movement sequence. Furthermore, the generated text is incoherent and fails to meet the requirement for textual clarity in English, although the aesthetic quality of the illustration is correct.

segmind ideogram-3

8.8/10 14.1 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 14.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

8.75

Review

The image is aesthetically superb and perfectly adheres to the request for sequences illustrating gaits. The text is seamlessly integrated and correctly spelled, which is a strength of this model. The composition is balanced, although the "sequential" aspect is handled in an artistic rather than strictly technical manner.

segmind seedream-4.5

4.9/10 15.1 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 15.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is aesthetically pleasing and demonstrates high artistic quality, but it fails to meet the structural and textual constraints. The model fails to distinctly illustrate the four gaits in a sequential manner, and the generated text is illegible or incoherent, which seriously undermines prompt fidelity.

segmind seedream-v5-lite

4.4/10 33.3 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 33.3 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.38

Review

The image fails significantly on fidelity and textual accuracy: the text is completely illegible and does not correspond to a clear English text prompt. Furthermore, the image does not present the four requested gaits in a sequential and distinct manner, but rather resembles a confused composition of horses in motion.

xai grok-imagine-image

4.9/10 6.0 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 6.0 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image fails to respect the textual and structural constraints of the prompt. Although the aesthetics are correct, the model fails to display legible English text and does not clearly present the four appearances in a sequential and distinct manner as requested. Fidelity is heavily penalized by the lack of pedagogical clarity and the presence of incoherent text.

xai grok-imagine-image-pro

4.9/10 14.8 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 14.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

4.88

Review

The image is technically successful with a beautiful aesthetic, but it fails on textual and structural instructions. The model failed to integrate legible English text and does not clearly present the four gaits in a sequential and distinct manner as requested. Fidelity is heavily penalized by the absence of the textual element and the lack of precision in the sequence of gaits.

Animal vs plant cell

image

google gemini-2.5-flash-image

5.9/10 7.1 s

google gemini-2.5-flash-image

Cost < 0.01 $

Resolution 1344 x 768

Time 7.1 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.88

Review

The image respects the requested side-by-side comparison structure but fails heavily on textual accuracy, featuring illegible or inconsistent labels (text hallucinations). While the visual quality and composition are satisfactory for an educational diagram, the inability to generate correct English text significantly degrades its fidelity to the intended use.

google gemini-3-pro-image-preview

6.4/10 27.2 s

google gemini-3-pro-image-preview

Cost < 0.01 $

Resolution 2752 x 1536

Time 27.2 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image is visually stunning, featuring excellent composition and great technical sharpness. However, fidelity is penalized by the presence of incoherent text and the "gibberish" typical of image generation models, which renders the labels unreadable despite the explicit request for English text. While the comparative structure is maintained, the educational value is compromised by the textual inaccuracies.

google imagen-4.0-fast-generate-001

5.4/10 4.7 s

google imagen-4.0-fast-generate-001

Cost 0.02 $

Resolution 1408 x 768

Time 4.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.38

Review

The image respects the side-by-side comparison structure and the visual quality is satisfactory. However, fidelity is heavily penalized because the text is completely illegible and incoherent (character hallucinations), meaning the "with labels" and "Text in English" instructions are not functionally met.

google imagen-4.0-generate-001

6.3/10 10.7 s

google imagen-4.0-generate-001

Cost 0.04 $

Resolution 1408 x 768

Time 10.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is visually impressive, featuring excellent rendering quality and a balanced composition. However, fidelity is compromised because the text is nearly illegible or consists of incoherent characters (textual hallucinations), failing the explicit request for readable English labels.

google imagen-4.0-ultra-generate-001

6.3/10 16.8 s

google imagen-4.0-ultra-generate-001

Cost 0.08 $

Resolution 1408 x 768

Time 16.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is visually stunning, featuring excellent rendering quality and a balanced composition. However, it fails heavily on text accuracy (text_accuracy), which consists of illegible or incoherent characters, undermining the requested educational function. Fidelity is impacted because, although a side-by-side comparison is present, the "with labels" instruction is technically failed due to the AI's inability to generate readable text.

openai chatgpt-image-latest

6.4/10 43.8 s

openai chatgpt-image-latest

Cost 0.22 $

Resolution 1536 x 1024

Time 43.8 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.38

Review

The image is aesthetically very successful, featuring a clear composition and high visual quality. However, fidelity is penalized by the model's inability to generate legible and correct text (textual hallucinations), which is crucial for a task requiring labels. Although the side-by-side comparison concept is respected, the informative aspect is compromised by the inaccurate text.

segmind ideogram-3

8.0/10 13.5 s

segmind ideogram-3

Cost 0.04 $

Resolution 1344 x 768

Time 13.5 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

Review

The image perfectly adheres to the requested structure (side-by-side comparison, cross-sections) and the aesthetic quality is high. However, the textual accuracy is flawed as the labels are either illegible or composed of incoherent characters, which is a major defect for an educational diagram.

segmind seedream-4.5

5.5/10 17.7 s

segmind seedream-4.5

Cost 0.04 $

Resolution 2560 x 1472

Time 17.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.5

Review

The image respects the requested side-by-side comparison structure, but fails heavily on textual accuracy, producing illegible and incoherent text instead of actual labels. The visual quality is acceptable but lacks the scientific clarity expected for an educational diagram.

segmind seedream-v5-lite

5.5/10 37.7 s

segmind seedream-v5-lite

Cost 0.04 $

Resolution 2848 x 1600

Time 37.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.5

Review

The image adheres to the requested side-by-side comparison structure, but fails critically on text accuracy, which consists of incoherent and illegible characters. Although the visual composition clearly distinguishes between the two cell types, the inability to generate usable text labels significantly degrades the educational value of the creation.

xai grok-imagine-image

5.9/10 6.7 s

xai grok-imagine-image

Cost 0.02 $

Resolution 2752 x 1504

Time 6.7 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

5.88

Review

The image adheres to the side-by-side comparison structure and the visual quality is good, but the model fails heavily on text accuracy. The labels consist of inconsistent and illegible characters (gibberish), which renders the educational aspect useless despite the aesthetic composition.

xai grok-imagine-image-pro

6.3/10 15.6 s

xai grok-imagine-image-pro

Cost 0.07 $

Resolution 2816 x 1536

Time 15.6 s

Matania Judgment

Quality

Composition

Creativity

Text accuracy

Fidelity

Overall

6.25

Review

The image is visually impressive, featuring excellent technical quality and a balanced composition. However, it fails in terms of textual accuracy; the text is either completely illegible or composed of gibberish, which is a major flaw for a task requiring labels. Fidelity is penalized because, although the comparative structure is present, the essential informative aspect provided by the text is not fulfilled.