Nous Research, an AI research company, has released a new family of language models called Hermes 3. According to the technical report, the models are characterized by high controllability and neutral alignment.
Hermes 3 includes Instruct models in sizes of 8, 70, and 405 billion parameters and is based on Meta's open-source model Llama 3.1. The models are designed to precisely follow instructions and adapt to the world view specified in the system prompt.
This sets Hermes 3 apart from proprietary commercial models that may refuse instructions for moral reasons. For Hermes 3, there is no "latent thoughtcrime," as stated in the report.
Hermes 3 outperforms Meta's Llama 3.1
According to Nous Research, Hermes 3 masters skills such as reasoning, reward modeling, "scratchpads" for intermediate results, structured output with XML tags, generation of internal monologues for transparent decision-making, and Mermaid diagrams for visual communication.
The training took place in two phases: a supervised fine-tuning phase (SFT) and a phase with Direct Preference Optimization (DPO). Nearly 400 million tokens were used for the SFT phase. The models were evaluated epoch-wise, and the best checkpoints for the 8B and 405B models were selected.
In several public benchmarks such as ARC, BoolQ, HellaSwag, IFEval, and Winogrande, the Hermes 3 models achieve top scores among models with open weights - also in comparison to the underlying models from Meta.
For this, a mix of synthetically created reasoning tasks and expressive applications such as role-playing and creative writing was trained.
The models can also use external tools and cite information from documents via "Retrieval Augmented Generation" to answer questions.
The Hermes 3 models are available on Hugging Face.