Ad
Short

Microsoft has released a set of vision models called Florence 2. Florence 2 is a prompt-based vision model designed for computer vision and image processing tasks such as image description, object recognition, localization, and segmentation. According to Microsoft, Florence 2 can outperform other specialized and larger vision models in some tasks. To train Florence, Microsoft created the FLD-5B dataset, which contains 5.4 billion annotations for 126 million images. The models come in two sizes, with 0.23B and 0.77B parameters, and are available on Hugging Face for commercial use under the MIT license.

Image: Microsoft
Short

TikTok has launched Symphony Digital Avatars, a generative AI tool that allows creators and brands to create AI avatars of real people for branded content. The tool offers stock avatars, pre-built avatars created with paid actors, and custom avatars representing a creative or brand spokesperson. With the Symphony AI Dubbing AI tool, content can be translated into more than 10 languages. TikTok is also launching the Symphony Collective, an advisory board to provide feedback on TikTok's AI marketing solutions.

@tiktoknewsroom

Introducing Symphony Digital Avatars, to help creators and brands captivate global audiences and deliver impactful messages in an immersive and authentic way. Check out our Newsroom to learn more.

♬ original sound - TikTok Newsroom

Ad
Ad
Short

Apple has released 20 new Core ML models and 4 datasets on the Hugging Face platform to help developers build AI applications that run directly on devices. The models cover areas such as image classification, deep segmentation, and semantic segmentation, and are optimized to run on Apple devices without a network connection. Apple is working closely with Hugging Face to advance initiatives such as the MLX community and the integration of open-source AI into Apple Intelligence capabilities.

Short

Camb AI has released Mars5, an open-source voice cloning AI model that claims to offer higher realism compared to competitors like ElevenLabs. According to the company, Mars5 captures nuances in speech, including emotion, rhythm, and intonation. Camb is also planning to release Boli, a translation model that captures context and colloquialisms better than tools like Google Translate. The company is working with clients such as Major League Soccer, Tennis Australia, and movie studios. Mars5 is available in English on GitHub, while the multi-language version with support for 140 languages is accessible through Camb's paid Studio platform.

Ad
Ad
Short

McDonald's is ending its AI experiment for drive-through orders after a two-year trial in more than 100 restaurants. The fast-food giant, which launched the automated order taking (AOT) in partnership with IBM, will shut it down on July 26, 2024. The goal was to speed up drive-through service and streamline operations. Despite the setback, McDonald's still believes in the potential of voice recognition for food ordering. The company plans to find a new partner for more extensive research by the end of the year. IBM says it is also in talks with other fast-food chains. It appears that the two companies are not parting on good terms with this project.

Ad
Ad
Google News