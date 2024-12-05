AI research
Maximilian Schreiner

Researchers achieve 96% accuracy in detecting phishing emails with open-source AI

Midjourney prompted by THE DECODER
Researchers achieve 96% accuracy in detecting phishing emails with open-source AI
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Researchers at Kaiserslautern University have developed a new method using open-source language models to detect phishing emails. The approach combines two AI techniques and shows significantly better results than existing methods.

Ad

According to a new study from Kaiserslautern University, automated systems can now detect phishing emails with up to 96% accuracy.

"Phishing is a significant and increasing threat to cybersecurity. Attacks using constantlyevolving techniques aim to tempt people into revealing sensitive personal information.It is estimated that 90 percent of all successful cyberattacks have phishing as an initialvector of attack," the researchers write in their study. The team combined two AI techniques: few-shot learning and retrieval-augmented generation (RAG).

Few-shot learning means providing the AI model with some phishing email examples as context. This teaches the model what to look for without requiring retraining. The RAG component selects these examples dynamically: for each email being checked, it searches a database for the five most similar known phishing emails. These then serve as context.

Ad
Ad

The researchers tested their method with eleven different open-source language models, including Mixtral 8x7B, Llama 3.1, and Google DeepMind's new Gemma family. The tests showed that combining few-shot learning and RAG significantly improved detection rates, especially with larger models.

Small models with RAG show good performance

The large Llama 3.1 70B model achieved the best results with 96.18% accuracy. However, the much smaller Gemma2 9B model performed surprisingly well, achieving almost the same accuracy at 95%. In general, however, smaller models with fewer than 10 billion parameters struggled to use the RAG method effectively.

For their tests, the researchers used a balanced dataset of 2,900 legitimate and 2,900 phishing emails. The phishing emails came from real attacks between 2022 and 2024. The legitimate emails were sourced from the publicly available CSDMC Spam Corpus.

The research team sees room for improvement: future versions could incorporate additional data sources and consider email metadata and file attachments. They also suggest that using AI agents with API access could be a promising extension of the system.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of Applied Sciences in Kaiserslautern, Germany, have developed a method that recognizes phishing emails with up to 96% accuracy. It combines Few-Shot Learning and Retrieval Augmented Generation (RAG) with open source language models.
  • The system dynamically selects five similar known phishing emails as context for each email to be checked. In tests with eleven different language models, Llama 3.1 70B achieved the highest accuracy at 96.18 percent, closely followed by the smaller Gemma2 9B at 95 percent.
  • The method was tested on a dataset of 2,900 legitimate and phishing emails. According to the researchers, the detection rate could be further improved by integrating email metadata, file attachments and AI agents with API access.
Sources
HS Offenburg
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI research

AutoDAN-Turbo autonomously develops jailbreak strategies to bypass language model safeguards

News, tests and reports about VR, AR and MIXED Reality.
Barbie goes Virtual Reality — as DLC for a popular VR hit Skydance's Behemoth: Here's how the Quest 3 and PSVR 2 version compare New funding for Meta Quest's VR chair is set to grow its market reach MIXED-NEWS.com
AI and society

"AGI system could be built in as little as three years": Ex-OpenAI employee warns US Senate

AI and society

California signs deepfake bill, but hesitates on further AI regulation

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Researchers achieve 96% accuracy in detecting phishing emails with open-source AI

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI in practice

OpenAI launches o1 and ChatGPT Pro for $200 per month

AI research

DeepMind's Genie 2 generates playable 3D worlds from single images

AI research

LLMs can outperform neuroscientists at predicting research outcomes

Google News