Gemini Vision Skill

This skill has been merged into AI Multimodal which provides comprehensive multimedia capabilities.

Redirecting to AI Multimodal

Image analysis - Captioning, OCR, object detection, visual Q&A, segmentation
Audio processing - Transcription, summarization, up to 9.5 hours
Video understanding - Scene detection, temporal analysis, up to 6 hours
Document extraction - PDF tables, forms, charts, diagrams
Image generation - Text-to-image with Imagen 4
Video generation - Text-to-video with Veo 3

Quick Examples

"Analyze this product image and extract name, color, condition"
"Extract text from invoice.jpg and return as JSON"
"Compare these before/after photos and list differences"
"Detect all objects in image with bounding boxes"

→ Go to AI Multimodal for full documentation.

Key Takeaway

Use AI Multimodal for all Gemini-powered image, audio, video, and document processing.