Content
summary Summary

Researchers at Kaiserslautern University have developed a new method using open-source language models to detect phishing emails. The approach combines two AI techniques and shows significantly better results than existing methods.

Ad

According to a new study from Kaiserslautern University, automated systems can now detect phishing emails with up to 96% accuracy.

"Phishing is a significant and increasing threat to cybersecurity. Attacks using constantlyevolving techniques aim to tempt people into revealing sensitive personal information.It is estimated that 90 percent of all successful cyberattacks have phishing as an initialvector of attack," the researchers write in their study. The team combined two AI techniques: few-shot learning and retrieval-augmented generation (RAG).

Few-shot learning means providing the AI model with some phishing email examples as context. This teaches the model what to look for without requiring retraining. The RAG component selects these examples dynamically: for each email being checked, it searches a database for the five most similar known phishing emails. These then serve as context.

Ad
Ad

The researchers tested their method with eleven different open-source language models, including Mixtral 8x7B, Llama 3.1, and Google DeepMind's new Gemma family. The tests showed that combining few-shot learning and RAG significantly improved detection rates, especially with larger models.

Small models with RAG show good performance

The large Llama 3.1 70B model achieved the best results with 96.18% accuracy. However, the much smaller Gemma2 9B model performed surprisingly well, achieving almost the same accuracy at 95%. In general, however, smaller models with fewer than 10 billion parameters struggled to use the RAG method effectively.

For their tests, the researchers used a balanced dataset of 2,900 legitimate and 2,900 phishing emails. The phishing emails came from real attacks between 2022 and 2024. The legitimate emails were sourced from the publicly available CSDMC Spam Corpus.

The research team sees room for improvement: future versions could incorporate additional data sources and consider email metadata and file attachments. They also suggest that using AI agents with API access could be a promising extension of the system.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of Applied Sciences in Kaiserslautern, Germany, have developed a method that recognizes phishing emails with up to 96% accuracy. It combines Few-Shot Learning and Retrieval Augmented Generation (RAG) with open source language models.
  • The system dynamically selects five similar known phishing emails as context for each email to be checked. In tests with eleven different language models, Llama 3.1 70B achieved the highest accuracy at 96.18 percent, closely followed by the smaller Gemma2 9B at 95 percent.
  • The method was tested on a dataset of 2,900 legitimate and phishing emails. According to the researchers, the detection rate could be further improved by integrating email metadata, file attachments and AI agents with API access.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.