Silo AI's Poro34B open-source LLM aims to master all official EU languages

DALL-E 3 prompted by THE DECODER

Helsinki-based AI startup Silo AI has launched Poro, an open-source large language model (LLM) aimed at advancing multilingual AI capabilities for European languages and code.

Developed by SiloGen, Silo AI's generative AI division, and the TurkuNLP research group at the University of Turku, Poro is the first in a planned series of models covering all official languages of the European Union, "with the aim of ensuring European digital sovereignty and democratizing access to LLMs." Silo AI describes itself as "the largest private AI lab in the Nordics that builds AI as a service."

The 34.2 billion parameter Poro 34B model uses a BLOOM transformer architecture with ALiBi embeddings and is trained on a one trillion token multilingual dataset focused on English, Finnish, and programming languages such as Python and Java. Poro is 30 percent trained, and this training was done on LUMI, Europe's fastest supercomputer, located in Finland.

The model uses a cross-lingual training approach to address the challenge of training high-performance natural language models for under-resourced European languages. Poro's training progress is documented through the Poro Research Checkpoints program, providing transparency into the model's training.

In benchmarks, Poro 34B achieves state-of-the-art results in the low-resource language Finish, without sacrificing its Finish capabilities at the expense of its English capabilities. After the German-specific language model LeoLM, which was recently trained by LAION and Hessian.ai, this is the second European language-specific LLM that can perform well in English and its native language.

Poro is freely available under the Apache 2.0 license, making it suitable for both commercial and research use. You can see its model card here.

LLMs like GPT-4 perform well in many languages, but are typically strongest in English because it dominates the dataset.

With France's Mistral 7B showing competitive overall performance and Germany's Aleph Alpha recently receiving a 500 million investment, it appears that Europe is finally getting its act together. If we don't count the actual EU AI Act, which seems to be taking a little longer.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Silo AI's Poro34B open-source LLM aims to master all official EU languages

EU publishers claim Google's AI Overviews reduce their web traffic and hurt their revenue

Ilya Sutskever says, "We have the compute, we have the team, and we know what to do"

OpenAI to tap 4.5 GW of Oracle data center power for Stargate AI project

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Silo AI's Poro34B open-source LLM aims to master all official EU languages

EU publishers claim Google's AI Overviews reduce their web traffic and hurt their revenue

Ilya Sutskever says, "We have the compute, we have the team, and we know what to do"

OpenAI to tap 4.5 GW of Oracle data center power for Stargate AI project