StripedHyena: A new architecture for next-generation generative AI?

DALL-E 3 prompted by THE DECODER

GPT-4 and other models rely on transformers. With StripedHyena, researchers present an alternative to the widely used architecture.

With StripedHyena, the Together AI team presents a family of language models with 7 billion parameters. What makes it special: StripedHyena uses a new set of AI architectures that aim to improve training and inference performance compared to the widely used transformer architecture, used for example in GPT-4.

The release includes StripedHyena-Hessian-7B (SH 7B), a base model, and StripedHyena-Nous-7B (SH-N 7B), a chat model. These models are designed to be faster, more memory efficient, and capable of processing very long contexts of up to 128,000 tokens. Researchers from HazyResearch, hessian.AI, Nous Research, MILA, HuggingFace, and the German Research Centre for Artificial Intelligence (DFKI) were involved.

StripedHyena: an efficient alternative to transformers

According to Together AI, StripedHyena is the first alternative model that can compete with the best open-source transformers. The base model achieves comparable performance to Llama-2, Yi, and Mistral 7B on OpenLLM leaderboard tasks and outperforms them on long context summarization.

The core component of the StripedHyena models is a state-space model (SSM) layer. Traditionally, SSMs have been used to model complex sequences and time series data. They are particularly useful for tasks where temporal dependencies need to be modeled. In the last two years, however, researchers have developed better and better ways to use SSMs for sequence models for language and other domains. The reason: they require less computing power.

The result: StripedHyena is more than 30 percent, 50 percent, and 100 percent faster than conventional transformers in the end-to-end training of sequences of 32,000 tokens, 64,000 tokens, and 128,000 tokens.

The main goal of the StripedHyena models is to push the boundaries of architectural design beyond transformers. In the future, the researchers plan to investigate larger models with longer contexts, multimodal support, further performance optimizations, and the integration of StripedHyena into retrieval pipelines to take full advantage of the longer context.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

StripedHyena: A new architecture for next-generation generative AI?

StripedHyena: an efficient alternative to transformers

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

StripedHyena: A new architecture for next-generation generative AI?

StripedHyena: an efficient alternative to transformers

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team