Content
summary Summary

Jais is a large language model focused on Arabic and is currently the best open model of its kind.

Researchers from the United Arab Emirates, in collaboration with Cerebras, introduce two new open language models: Jais and Jais-chat. The models were trained on Arabic and English language and code, and significantly outperform existing open-source models for Arabic.

Jais is a 13 billion parameter model pre-trained with 395 billion tokens, of which 116 billion are Arabic tokens. Jais chat has been instruction tuned with an additional 10 million instruction/response pairs and outperforms all existing open Arabic/multilingual chatbots.

The models are the first Arabic-centric open models of this scale.

Ad
Ad

Jais can match ChatGPT in some tasks

Arabic websites, books, news, and Wikipedia were used as training data, with all data filtered before training. The 232 billion tokens of English data from The Pile by EleutherAI are used to compensate for the limited Arabic data available. The team also uses 46 billion code tokens.

In benchmarks, Jais and Jais-chat outperform existing, freely available Arabic models by 11 to 15 points in accuracy, and are competitive with Meta's LLaMa2 for English, according to the team. Commercial models such as OpenAI's ChatGPT or Anthropic's Claude are still ahead on average in the benchmarks, but are also significantly larger. However, for some tasks, such as writing, Jais and Jais-chat are on par with ChatGPT, the team said.

The team also provides a number of other security mechanisms for Jais-chat, such as filters and classifiers for unwanted requests and output.

Another special feature of the model: it was not trained on Nvidia GPUs, but on Cerebra's CS-2 systems. The company produces a wafer-sized AI chip that is installed in the CS-2 systems.

Jais and Jais-chat are available on Hugging Face and can be tried out on Arabic-GPT.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Jais is an open LLM for Arabic, developed in collaboration between researchers from the United Arab Emirates and Cerebras. It has 13 billion parameters and outperforms existing open source models for Arabic.
  • The model was trained on 395 billion tokens, including 116 billion Arabic tokens, and is competitive with Meta's LLaMa2 for English. For some tasks in Arabic, such as writing, Jais is on par with OpenAI's ChatGPT.
  • Jais and Jais-chat are available on Hugging Face and have been trained on Cerebra's CS-2 systems.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.