Ad
Skip to content

StripedHyena: A new architecture for next-generation generative AI?

Image description
DALL-E 3 prompted by THE DECODER

GPT-4 and other models rely on transformers. With StripedHyena, researchers present an alternative to the widely used architecture.

With StripedHyena, the Together AI team presents a family of language models with 7 billion parameters. What makes it special: StripedHyena uses a new set of AI architectures that aim to improve training and inference performance compared to the widely used transformer architecture, used for example in GPT-4.

The release includes StripedHyena-Hessian-7B (SH 7B), a base model, and StripedHyena-Nous-7B (SH-N 7B), a chat model. These models are designed to be faster, more memory efficient, and capable of processing very long contexts of up to 128,000 tokens. Researchers from HazyResearch, hessian.AI, Nous Research, MILA, HuggingFace, and the German Research Centre for Artificial Intelligence (DFKI) were involved.

StripedHyena: an efficient alternative to transformers

According to Together AI, StripedHyena is the first alternative model that can compete with the best open-source transformers. The base model achieves comparable performance to Llama-2, Yi, and Mistral 7B on OpenLLM leaderboard tasks and outperforms them on long context summarization.

Ad
DEC_D_Incontent-1

The core component of the StripedHyena models is a state-space model (SSM) layer. Traditionally, SSMs have been used to model complex sequences and time series data. They are particularly useful for tasks where temporal dependencies need to be modeled. In the last two years, however, researchers have developed better and better ways to use SSMs for sequence models for language and other domains. The reason: they require less computing power.

The result: StripedHyena is more than 30 percent, 50 percent, and 100 percent faster than conventional transformers in the end-to-end training of sequences of 32,000 tokens, 64,000 tokens, and 128,000 tokens.

The main goal of the StripedHyena models is to push the boundaries of architectural design beyond transformers. In the future, the researchers plan to investigate larger models with longer contexts, multimodal support, further performance optimizations, and the integration of StripedHyena into retrieval pipelines to take full advantage of the longer context.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

  • Over 20 percent launch discount.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder