Content
summary Summary

Helsinki-based AI startup Silo AI has launched Poro, an open-source large language model (LLM) aimed at advancing multilingual AI capabilities for European languages and code.

Developed by SiloGen, Silo AI's generative AI division, and the TurkuNLP research group at the University of Turku, Poro is the first in a planned series of models covering all official languages of the European Union, "with the aim of ensuring European digital sovereignty and democratizing access to LLMs." Silo AI describes itself as "the largest private AI lab in the Nordics that builds AI as a service."

The 34.2 billion parameter Poro 34B model uses a BLOOM transformer architecture with ALiBi embeddings and is trained on a one trillion token multilingual dataset focused on English, Finnish, and programming languages such as Python and Java. Poro is 30 percent trained, and this training was done on LUMI, Europe's fastest supercomputer, located in Finland.

The model uses a cross-lingual training approach to address the challenge of training high-performance natural language models for under-resourced European languages. Poro's training progress is documented through the Poro Research Checkpoints program, providing transparency into the model's training.

Ad
Ad

In benchmarks, Poro 34B achieves state-of-the-art results in the low-resource language Finish, without sacrificing its Finish capabilities at the expense of its English capabilities. After the German-specific language model LeoLM, which was recently trained by LAION and Hessian.ai, this is the second European language-specific LLM that can perform well in English and its native language.

Poro is freely available under the Apache 2.0 license, making it suitable for both commercial and research use. You can see its model card here.

LLMs like GPT-4 perform well in many languages, but are typically strongest in English because it dominates the dataset.

With France's Mistral 7B showing competitive overall performance and Germany's Aleph Alpha recently receiving a 500 million investment, it appears that Europe is finally getting its act together. If we don't count the actual EU AI Act, which seems to be taking a little longer.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Silo AI, a Helsinki-based startup, has launched Poro, an open source large language model focused on improving multilingual AI capabilities for European languages, starting with English and Finnish.
  • The 34.2 billion parameter Poro 34B model uses a BLOOM transformer architecture and is trained on a one-trillion token multilingual dataset, with training progress documented through the Poro Research Checkpoints program for transparency.
  • Poro is part of a growing trend of European language-specific AI models, such as France's Mistral 7B and Germany's LeoLM, aimed at addressing the dominance of English in large language models such as GPT-4.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.