Ad
Short

Cohere's new Command A Vision model is designed to handle images, diagrams, PDFs, and other types of visual data. Cohere says the model outperforms GPT-4.1, Llama 4 Maverick, Pixtral Large, and Mistral Medium 3 on standard vision benchmarks.

The model's OCR can recognize both the text and the structure of documents such as invoices and forms, outputting the extracted data in structured JSON. Command A Vision can also process real-world images, like identifying potential risks in industrial environments, the company says.

Image: Cohere

Command A Vision is available through the Cohere platform and for research on Hugging Face. The model can run locally with either two A100 GPUs or a single H100 using 4-bit quantization.

Short

Black Forest Labs and Krea AI have released FLUX.1 Krea [dev], an open text-to-image model designed to generate more realistic images with fewer of the exaggerated, AI-typical textures.

The model is based on FLUX.1 [dev] and remains fully compatible with its architecture. It was built for flexible customization and easy integration into downstream applications. Model weights are available on Hugging Face, with commercial licenses offered through the BFL Licensing Portal. Partners like FAL, Replicate, Runware, DataCrunch, and TogetherAI provide API access.

Ad
Ad
Short

Google is rolling out Opal, a new experimental tool that lets users build AI-powered mini-apps with simple natural language prompts, no coding required.

Opal takes descriptions written in everyday language and automatically connects prompts, AI models, and other tools to turn them into working apps, displaying everything as a visual workflow.

Once an app is built, users can share it with others, who only need a Google account to use it. Opal is launching as a public beta in the US, with plans to develop the tool further based on feedback from the community.

Ad
Ad
Ad
Ad
Google News