Ad
Short

Google's AI infrastructure is under strain as demand for its latest models increases. Product manager Logan Kilpatrick responded to complaints about the limited availability of Gemini 2.5 Pro Deep Think, explaining, "the release is constrained because this is a big model and takes a boat load of compute to run, when our TPU's are already burning to keep up with massive growth on Veo, Gemini 2.5 pro, AI mode rollout to hundreds of millions, etc."

Kilpatrick addressed criticism after users pointed out that, despite strong benchmark scores, Gemini 2.5 Pro Deep Think is difficult to use to use due to access restrictions. Even Ultra subscribers can only make a handful of requests per day as the system struggles to keep up with demand.

Image: Kilpatrick via X
Short

Wan2.2 A14B now tops the rankings for open source video models, according to Artificial Analysis. It ranks seventh for text-to-video and fourteenth for image-to-video, with the lower placement in the latter likely due to its 16 frames per second output compared to 24 fps in some competitors. Among open models, Wan2.2 A14B leads the field, but it still trails behind closed models like Veo 3 and Seedance 1.0 in overall performance. Pricing, however, is often much lower depending on the provider.

Image: Artificial Analysis
Ad
Ad
Ad
Ad
Short

Uber Eats now manipulates food images using generative AI.

Uber Eats is now using generative AI to identify and enhance low-quality food photos on its menus. The technology does more than just adjust lighting, resolution, or cropping. It can move food onto different plates or backgrounds, and even modify the food itself - making portions look bigger or digitally filling in gaps for a more polished look.

This approach goes further than traditional retouching or generic stock photos. The AI is capable of generating convincing images of dishes that, in some cases, never actually existed in this form.

Image: Uber
Short

Cohere's new Command A Vision model is designed to handle images, diagrams, PDFs, and other types of visual data. Cohere says the model outperforms GPT-4.1, Llama 4 Maverick, Pixtral Large, and Mistral Medium 3 on standard vision benchmarks.

The model's OCR can recognize both the text and the structure of documents such as invoices and forms, outputting the extracted data in structured JSON. Command A Vision can also process real-world images, like identifying potential risks in industrial environments, the company says.

Image: Cohere

Command A Vision is available through the Cohere platform and for research on Hugging Face. The model can run locally with either two A100 GPUs or a single H100 using 4-bit quantization.

Ad
Ad
Short

Black Forest Labs and Krea AI have released FLUX.1 Krea [dev], an open text-to-image model designed to generate more realistic images with fewer of the exaggerated, AI-typical textures.

The model is based on FLUX.1 [dev] and remains fully compatible with its architecture. It was built for flexible customization and easy integration into downstream applications. Model weights are available on Hugging Face, with commercial licenses offered through the BFL Licensing Portal. Partners like FAL, Replicate, Runware, DataCrunch, and TogetherAI provide API access.

Google News