Content
summary Summary

GPT-4 and other models rely on transformers. With StripedHyena, researchers present an alternative to the widely used architecture.

Ad

With StripedHyena, the Together AI team presents a family of language models with 7 billion parameters. What makes it special: StripedHyena uses a new set of AI architectures that aim to improve training and inference performance compared to the widely used transformer architecture, used for example in GPT-4.

The release includes StripedHyena-Hessian-7B (SH 7B), a base model, and StripedHyena-Nous-7B (SH-N 7B), a chat model. These models are designed to be faster, more memory efficient, and capable of processing very long contexts of up to 128,000 tokens. Researchers from HazyResearch, hessian.AI, Nous Research, MILA, HuggingFace, and the German Research Centre for Artificial Intelligence (DFKI) were involved.

StripedHyena: an efficient alternative to transformers

According to Together AI, StripedHyena is the first alternative model that can compete with the best open-source transformers. The base model achieves comparable performance to Llama-2, Yi, and Mistral 7B on OpenLLM leaderboard tasks and outperforms them on long context summarization.

Ad
Ad

The core component of the StripedHyena models is a state-space model (SSM) layer. Traditionally, SSMs have been used to model complex sequences and time series data. They are particularly useful for tasks where temporal dependencies need to be modeled. In the last two years, however, researchers have developed better and better ways to use SSMs for sequence models for language and other domains. The reason: they require less computing power.

The result: StripedHyena is more than 30 percent, 50 percent, and 100 percent faster than conventional transformers in the end-to-end training of sequences of 32,000 tokens, 64,000 tokens, and 128,000 tokens.

The main goal of the StripedHyena models is to push the boundaries of architectural design beyond transformers. In the future, the researchers plan to investigate larger models with longer contexts, multimodal support, further performance optimizations, and the integration of StripedHyena into retrieval pipelines to take full advantage of the longer context.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Together AI introduces StripedHyena, a 7 billion parameter language model that uses new AI architectures to improve training and inference performance over the Transformer architecture.
  • StripedHyena consists of two models, SH 7B (base model) and SH-N 7B (chat model), which are faster, more memory efficient, and can handle very long contexts of up to 128,000 tokens.
  • The core component of the StripedHyena models is a state space model (SSM) layer, which requires less computing power and is faster than classical transformers when training long sequences.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.