Sweden's National Library trains AI on 500 years of data

Jan 28, 2023

Kb-labb / National Library of Sweden

The KBLab data department at the National Library of Sweden combines thousands of works into one data set. This is used to train AI models.

By law, the National Library of Sweden has collected virtually all Swedish-language writings from the past 500 years. A total of 16 petabytes have already been collected, and the collection is growing by 50 terabytes every month.

On this basis, KBLab, the integrated research department established in 2019, has trained more than two dozen AI models. "Before our lab was created, researchers couldn’t access a dataset at the library — they’d have to look at a single object at a time," Börjeson said. "There was a need for the library to create datasets that enabled researchers to conduct quantity-oriented research."

Highly specialized data sets for research

Thanks to this work, researchers will soon be able to create highly specialized datasets, "for example, pulling up every Swedish postcard that depicts a church, every text written in a particular style or every mention of a historical figure across books, newspaper articles and TV broadcasts," according to the Nvidia blog. Hardware from the graphics processor maker was used for the training.

For the first model, KBLab used 20 GB of data, but today it uses about 70 GB, according to Hugging Face. Soon, it will even tackle a whole terabyte of Swedish texts. In addition to Swedish, the dataset will also include Dutch, Norwegian and German. This should improve the performance of the AI models.

Generative text model in development

In addition to the Transformer models that understand Swedish text, KBLab has an AI tool that converts audio to text, allowing the library to transcribe its extensive collection of radio broadcasts so that researchers can search the audio for specific content.

KBLab is also developing generative text models and an AI model to automatically create descriptions of video content. Together with researchers at the University of Gothenburg and the Swedish Academy, KBLab is supporting the modernization of dictionaries.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Sweden's National Library trains AI on 500 years of data

Highly specialized data sets for research

Generative text model in development

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.