Cohere Launches Command A Vision for Diverse Visual AI
Cohere has introduced Command A Vision, a new model engineered to process a wide range of visual data, including images, diagrams, and PDF documents. This development marks an expansion of Cohere’s capabilities in handling diverse data formats for AI applications.
The company states that Command A Vision surpasses several leading models, such as GPT-4.1, Llama 4 Maverick, Pixtral Large, and Mistral Medium 3, across standard vision benchmarks, indicating a strong performance claim in the competitive AI landscape.
A key feature of the model is its advanced Optical Character Recognition (OCR) capability, which not only recognizes text but also understands the structural layout of documents like invoices and forms. This allows it to accurately extract data and present it in a structured JSON format, streamlining document processing for businesses.
Beyond document processing, Command A Vision is also capable of analyzing real-world images. For instance, it can identify potential hazards or critical elements within industrial settings, according to Cohere, showcasing its utility in safety and operational efficiency.
Command A Vision is currently accessible via the Cohere platform. Additionally, the model is available on Hugging Face for research purposes, fostering broader exploration and development. For local deployment, it can operate efficiently using either two A100 GPUs or a single H100 GPU, leveraging 4-bit quantization for optimized performance, making it adaptable for various operational environments.