Content
summary Summary

Google is open-sourcing Magika, an AI system for file type recognition. It can quickly and accurately identify binary and text-based file types.

Accurately identifying file types is a challenging problem due to the diverse structures of file formats. Traditional recognition tools such as libmagic rely on hand-crafted heuristics and user-defined rules, which can be time-consuming and error-prone.

Magika addresses these issues with its AI-based model and large training dataset. It provides a more reliable way to identify file types at scale, Google said. The tool uses a custom deep learning model that is only 1MB in size and can identify files in milliseconds, Google writes.

In a benchmark of one million files, Magika outperformed existing tools by 20 percent, with even better performance for text files.

Ad
Ad
Magika achieves near-perfect file recognition results across the board. | Image: Google

Google says it uses Magika internally to route Gmail, Drive, and Safe Browsing files to the correct security and content policy scanners.

Magika's open-source approach is intended to help other software improve the accuracy of its file detection, and to provide researchers with a reliable tool for large-scale detection. The upcoming integration with VirusTotal is expected to improve the platform's efficiency and accuracy in detecting malicious code.

Users can try the web demo of Magika or install it as a Python library and standalone command-line tool.

Magika is available on Github under the Apache2 license and can be installed as a standalone tool and as a Python library using the pypi package manager using the "pip install magika" command.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google has open-sourced Magika, an AI-based file type recognition system that can quickly and accurately identify binary and text-based file types.
  • Compared to traditional recognition tools that rely on hand-crafted heuristics and user-defined rules, Magika uses a deep learning model and a large training dataset to ensure more reliable recognition.
  • Already in use internally at Google, Magika is designed to help other software applications improve their file recognition accuracy and provide researchers with a reliable tool for large-scale recognition.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.