Ad
Short

Google Deepmind has launched Multimodal Canvas, an experimental testing console for developers. With a valid API key, they can use Gemini 1.5 Flash to quickly test multimodal prompts with text, drawings, camera shots, and other images. Gemini 1.5 Flash is faster and less expensive than the larger Gemini 1.5 Pro, and supports a 1 million token context window.

Short

According to Android Authority, Google is planning to introduce a number of new AI features under the "Google AI" brand for the Pixel 9 series. In addition to existing features like Circle to Search and Gemini, there are three new ones: "Add Me" is designed to make sure everyone is in a group photo, and builds on the Best Take feature. "Studio" could become an AI image generator similar to Apple's Image Playground. The most interesting new feature is "Pixel Screenshots", a more privacy-friendly alternative to Microsoft's controversial Recall feature. Instead of automatically recording everything, it only works with self-created screenshots that are enriched with metadata and analyzed by a local AI. The screenshots can then be searched for content and questions can be asked.

Ad
Ad
Short

Perplexity AI has released an enhanced version of Pro Search. Pro Search can now answer questions with multiple steps, perform advanced math and programming tasks through the integration of the Wolfram|Alpha engine, and perform intelligent actions based on search results, such as follow-up searches. Pro Search is available free of charge to all users five times every four hours. The startup and its "answer engine" are currently being criticized for possible copyright infringement and questionable data collection practices.

Short

Meta is changing the "Made with AI" label to "AI info" to indicate the use of AI in photos. The company is responding to complaints from photographers that images were being labeled even when only simple AI-assisted editing tools were used. Meta hopes the change will make it clear that the labeled images were not necessarily created entirely with AI. Meta also continues to use technical metadata standards like C2PA and IPTC.

Image: Meta
Ad
Ad
Short

Agility Robotics, maker of the Digit humanoid robot, and logistics service provider GXO Logistics have signed a multi-year agreement to commercially integrate Digit robots into GXO's logistics centers. The agreement, which will follow a pilot in late 2023, represents both the industry's first formal commercial deployment and the first robotics-as-a-service (RaaS) deployment of humanoid robots, according to the companies. Under the RaaS agreement, GXO will deploy Digit robots alongside the Agility Arc cloud automation platform. At a SPANX omnichannel distribution center in Atlanta, the Digit robots are assisting with repetitive tasks such as moving totes and placing them on conveyor belts. The companies plan to explore additional use cases and expand the use of Digit as needed.

Short

According to Bloomberg reporter Mark Gurman, Apple is working to bring Apple Intelligence to the Vision Pro headset. One challenge is to optimize the features for mixed reality. The AI features will not be released for the Vision Pro until next year - Apple Intelligence will launch on all other supported devices in the fall. By then, Gurman expects a deal with Google or Anthropic to support additional AI models. Longer term, he speculates, the company may be planning a monthly subscription service like "Apple Intelligence+" that offers additional features to monetize the technology. Apple already takes a cut of subscription revenue from any AI partner it brings on board. "The company will be less reliant on hardware tweaks to drive its business and will actually be making money from AI — something everyone in Silicon Valley is hoping to pull off," Gurman says.

Ad
Ad
Short

LMSYS Org has added image recognition to the Chatbot Arena to compare vision language models (VLMs) from OpenAI, Anthropic, Google, and other AI vendors. In two weeks, more than 17,000 user preferences were collected in more than 60 languages. GPT-4o and Claude 3.5 Sonnet performed significantly better at image recognition than Gemini 1.5 Pro and GPT-4 Turbo. While Claude 3 Opus is better than Gemini 1.5 Flash for language models, both are similarly good for VLMs. The open-source model Llava-v1.6-34b is slightly better than Claude-3-Haiku. The data collected shows common applications such as image description, math problems, document comprehension, meme explanation, and story writing. Next, the team plans to add support for multiple images, as well as PDFs, video, and audio. The Large Model Systems Organization (LMSYS Org) is an open research organization founded by UC Berkeley students and faculty in collaboration with UCSD and CMU.

Image: LMSYS
Google News