Use image inputs

Test vision-enabled models by uploading images and combining them with text prompts to create multimodal scenarios.

Control image preprocessing (resize, crop) and annotate expected outputs for evaluators to compare against.

Use saved views to reproduce tests and share results across the team.

Monitor latency and cost separately for multimodal runs to optimize experience and budget.