Lasso Security investigation uncovers major HuggingFace API token exposure

DALL-E 3 prompted by THE DECODER

An investigation by cybersecurity startup Lasso Security reveals that more than 1,500 HuggingFace API tokens are exposed, including those from Meta.

A recent investigation into HuggingFace, a major platform for developers, has revealed that more than 1,500 API tokens are exposed. According to Lasso Security, a start-up specialising in cybersecurity for language models and other generative AI models, this leaves millions of Meta Llama, Bloom and Pythia users vulnerable to potential attacks.

HuggingFace is an important resource for developers working on AI projects such as language models. The platform offers an extensive library of AI models and datasets, including Meta's widely used Llama models.

The HuggingFace API allows developers and organisations to integrate models and read, create, modify, and delete repositories or files within them using API tokens.

Lasso Security gains full access to Meta repositories

The team searched GitHub and HuggingFace repositories for exposed API tokens using their search functions. According to best practices such as OpenAI, API tokens should not be stored directly in code for this very reason.

The Lasso Security team found 1,681 tokens in their search and were able to uncover accounts from major organizations including Meta, Microsoft, Google, and VMware. The data also gave the team full access to the widely used Meta Llama, Bloom, Pythia, and HuggingFace repositories. Exposing such a large number of API tokens poses significant risks to organizations and their users, the team said.

Lasso lists some key dangers associated with exposed API tokens:

1. Supply chain vulnerabilities: If potential attackers gained full access to accounts such as Meta Llama2, BigScience Workshop and EleutherAI, they could manipulate existing models and potentially turn them into malicious entities, the team says. This could affect millions of users who rely on these basic models for their applications.

2. Training data poisoning: With write access to 14 datasets with tens and hundreds of thousands of downloads per month, attackers could manipulate trusted datasets, compromising the integrity of AI models based on them, with far-reaching consequences.

Recommendation

AI and society

'Inscrutable Wizards': How Chinese AI startup Deepseek is making Silicon Valley look slow

3. Model theft: The team claims to have used the method to gain access to more than ten thousand private AI models and more than 2,500 datasets, which could lead to potential economic losses, impaired competitive advantage, and potential access to sensitive information.

Team provides security tips to users and HuggingFace

To address these vulnerabilities, developers are advised not to use hard-coded tokens and to follow best practices. HuggingFace should also continuously scan for publicly exposed API tokens and either revoke them or notify users and organizations of the exposed tokens.

Organizations should also consider token classification and implement security solutions that inspect IDEs and code reviews specifically designed to protect their investment in LLM. By addressing these issues now, organizations could strengthen their defenses and avert the threats posed by these vulnerabilities, Lasso Security said.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Lasso Security investigation uncovers major HuggingFace API token exposure

Lasso Security gains full access to Meta repositories

'Inscrutable Wizards': How Chinese AI startup Deepseek is making Silicon Valley look slow

Team provides security tips to users and HuggingFace

Cybercriminals are upgrading WormGPT with new AI models to power more advanced attacks

ChatGPT scams range from silly money-making ploys to calculated political meddling

AI agents outperform human teams in hacking competitions

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Lasso Security investigation uncovers major HuggingFace API token exposure

Lasso Security gains full access to Meta repositories

Team provides security tips to users and HuggingFace

Share

Bank details