0% Complete
صفحه اصلی
/
International Conference on Artificial Intelligence; City, Industry and Health
Enhancing Multilingual Spam Detection Using Machine Learning and Synthetic Data Augmentation
نویسندگان :
Mohamad Hosein Ghojavand
1
Hamid Rastegari
2
1- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
2- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
کلمات کلیدی :
Spam Detection،Large Language Models،Machine Learning،Data Augmentation،Text Classification
چکیده :
This study investigates the effectiveness of classical machine learning algorithms for multilingual SMS spam detection, focusing on English and the low-resource language Farsi. We conduct a comprehensive evaluation of 18 classical models on original datasets for both languages. To address the challenge of limited data, particularly for Farsi, we augment the datasets with synthetic SMS messages generated using large language models (LLMs). The performance of the models is compared across three configurations: main (real-world data), synthetic (LLM-generated data), and a combination of both. Results show that synthetic data augmentation significantly improves the performance of several models in Farsi SMS spam detection, leading to enhanced accuracy, precision, recall, and F1-scores. Notably, classifiers trained on the combined Farsi dataset consistently outperform those trained solely on the original data. These findings highlight the potential of LLM-generated synthetic data to enhance the effectiveness of classical models in low-resource language contexts. Moreover, the study underscores the value of data augmentation in multilingual text classification tasks where resources are limited. By comparing performance across languages and dataset types, this research provides insights into cross-linguistic spam detection and demonstrates that classical machine learning remains a viable approach when complemented with modern data augmentation techniques.
لیست مقالات
لیست مقالات بایگانی شده
Application of ANN artificial network in slope behavior evaluation using machine learning technique
Marziyeh Tourani - Hadi Bahadori
A Review of Machine Learning Algorithms for Urban Air Pollution Prediction
Neda Chatraei Azizabadi - Nasim Noorafza
Diagnosis of Autism in Brain MRI Images Using Fuzzy Clustering and Gaussian Mixture, Feature Extraction with VGG-16, and Classification with ResNet
Mansoor Zeinali - Ghadeer ketab yousif Alkhafaji
Reconfigurable Pulse Charge BMS with Pre-charge Capability for Li-Ion Energy Storage Systems (ESSs)
Amirhossein Rahimian Zarif - Amin Kazemi - Yasser Mafinejad
Bridging the Gap: Understanding Adult Learners' Fear of AI in Language Classrooms and How to Overcome It
Bahareh Assarzadegan - Oimd Tabatabaei - Ali Yousefi
The Evolution of Smart Grids: Decentralization, Communication, and Economic Impact
Saiedeh Mehrabani-Najafabadi - Hossein Shahinzadeh - Hamed Nafisi - Shohreh Azani - Ehsan Etemadnia - Ali Karimi
Predictive Modeling of Pollutant Emissions from Biodiesel-diesel Fuel Blends in a Diesel Engine Using Artificial Neural Networks
Alireza Shirneshan
نقش مقابله با تلفات غیرفنی در افزایش بهینه سازی انرژی
احسان آقاباباگلی
Applications of Artificial Intelligence in Food Quality and Safety Control
Nasrin Hesami - Hadiseh Dehghan
تحلیل فنی و اقتصادی پنل های خورشیدی موجود در بازار و انتخاب پنل مناسب برای شرایط اقلیمی در سطح کشور
ابراهیم گوگونانی - داود طغرایی - مجید ریاحی سامانی - مجتبی مجتبی رحیمی - بابک مهماندوست
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 41.0.1