Google’s Magika file identification tool goes open source

New report reveals scale and impact of modern artificial intelligence

Google’s Magika file identification tool goes open source

The AI-based file identification system could be a boon for cybersecurity applications.

Google has announced that its Magika file identification tool is now available as open source and the project is now hosted on GitHub.

There is also a web demo available for immediate testing.

“Today, we’re opening up Magika, Google’s AI-based file type identification system, to help others accurately detect binary and textual file types,” Google said in a blog post.

“Under the hood, Magika employs a highly optimized, custom deep learning model, enabling accurate file identification in milliseconds, even when running on a CPU.”

File identification is a problem for traditional methods, which often rely on “a handcrafted collection of custom heuristics and rules to detect each file format,” according to Google. This process takes time and, due to the human element, is quite prone to errors. It is also comparatively slow.

Magika, however, uses a deep learning algorithm trained on the Keras API. It uses Onnx as an inference engine and is capable of identifying files in seconds. It’s just one megabyte in size and achieved a score of 99.31 percent in Google’s 1 million file benchmark.

The closest competitor, file-magic 5.44, scored only 81.3 percent.

Google uses Magika internally as a security tool.

“Magika is used at scale to help improve security for Google users by routing Gmail, Drive, and Safe Browsing files to the appropriate security and content policy scanners,” Google said.

“Looking at a weekly average of hundreds of billions of files reveals that Magika improves the accuracy of file type identification by 50 percent compared to our previous system that relied on hand-crafted rules.”

Magika will also soon be integrated into VirusTotal, which already uses Google’s artificial intelligence (AI) to detect malicious files.

“Magika will act as a pre-filter before Code Insight analyzes the files, improving the efficiency and accuracy of the platform,” according to Google.

“This integration, due to the collaborative nature of VirusTotal, directly contributes to the global cybersecurity ecosystem, fostering a more secure digital environment.”

Leave a Reply

Your email address will not be published. Required fields are marked *