Ad
Skip to content

Google Magika is an accurate and fast AI-based open-source file type recognition tool

Image description
Google

Key Points

  • Google has open-sourced Magika, an AI-based file type recognition system that can quickly and accurately identify binary and text-based file types.
  • Compared to traditional recognition tools that rely on hand-crafted heuristics and user-defined rules, Magika uses a deep learning model and a large training dataset to ensure more reliable recognition.
  • Already in use internally at Google, Magika is designed to help other software applications improve their file recognition accuracy and provide researchers with a reliable tool for large-scale recognition.

Google is open-sourcing Magika, an AI system for file type recognition. It can quickly and accurately identify binary and text-based file types.

Accurately identifying file types is a challenging problem due to the diverse structures of file formats. Traditional recognition tools such as libmagic rely on hand-crafted heuristics and user-defined rules, which can be time-consuming and error-prone.

Magika addresses these issues with its AI-based model and large training dataset. It provides a more reliable way to identify file types at scale, Google said. The tool uses a custom deep learning model that is only 1MB in size and can identify files in milliseconds, Google writes.

In a benchmark of one million files, Magika outperformed existing tools by 20 percent, with even better performance for text files.

Ad
DEC_D_Incontent-1

Magika achieves near-perfect file recognition results across the board. | Image: Google

Google says it uses Magika internally to route Gmail, Drive, and Safe Browsing files to the correct security and content policy scanners.

Magika's open-source approach is intended to help other software improve the accuracy of its file detection, and to provide researchers with a reliable tool for large-scale detection. The upcoming integration with VirusTotal is expected to improve the platform's efficiency and accuracy in detecting malicious code.

Users can try the web demo of Magika or install it as a Python library and standalone command-line tool.

Magika is available on Github under the Apache2 license and can be installed as a standalone tool and as a Python library using the pypi package manager using the "pip install magika" command.

Ad
DEC_D_Incontent-2

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Google