0% Complete
صفحه اصلی
/
International Conference on Artificial Intelligence; City, Industry and Health
Enhancing Multilingual Spam Detection Using Machine Learning and Synthetic Data Augmentation
نویسندگان :
Mohamad Hosein Ghojavand
1
Hamid Rastegari
2
1- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
2- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
کلمات کلیدی :
Spam Detection،Large Language Models،Machine Learning،Data Augmentation،Text Classification
چکیده :
This study investigates the effectiveness of classical machine learning algorithms for multilingual SMS spam detection, focusing on English and the low-resource language Farsi. We conduct a comprehensive evaluation of 18 classical models on original datasets for both languages. To address the challenge of limited data, particularly for Farsi, we augment the datasets with synthetic SMS messages generated using large language models (LLMs). The performance of the models is compared across three configurations: main (real-world data), synthetic (LLM-generated data), and a combination of both. Results show that synthetic data augmentation significantly improves the performance of several models in Farsi SMS spam detection, leading to enhanced accuracy, precision, recall, and F1-scores. Notably, classifiers trained on the combined Farsi dataset consistently outperform those trained solely on the original data. These findings highlight the potential of LLM-generated synthetic data to enhance the effectiveness of classical models in low-resource language contexts. Moreover, the study underscores the value of data augmentation in multilingual text classification tasks where resources are limited. By comparing performance across languages and dataset types, this research provides insights into cross-linguistic spam detection and demonstrates that classical machine learning remains a viable approach when complemented with modern data augmentation techniques.
لیست مقالات
لیست مقالات بایگانی شده
Evaluating AI Diagnostic Tools for Use in Remote Medical Settings
Zahra Abiri
مدیریت مصرف و بهینه سازی انرژی در خانه هوشمند با وجود خودرو برقی وسیستم ذخیره ساز انرژی به همراه پنل فتوولتائیک
سید امید رادی - مهرداد فارسی مدان
Forecasting the spread of epidemic diseases based on a modified SEIR model using the fuzzy-fractal system
Mahdie Sanjaran - Hamid Mahmoodian
Smart Access: Artificial Intelligence-Enhanced Vein Assessment for Improved Peripheral Intravenous Catheter Placement in Pediatric Oncology- A Conceptual Design
Faridokht Yazdani
تبیین جامعه شناختی مصرف انرژی الکتریکی (بازنگری سامانمند پژوهش های پیرامون مصرف انرژی الکتریکی)
کمال سعیدی - مژگان سعیدی
Intelligent food packaging with modern food technology and artificial intelligence field
Aazam Aarabi
Diagnosis of Autism in Brain MRI Images Using Fuzzy Clustering and Gaussian Mixture, Feature Extraction with VGG-16, and Classification with ResNet
Mansoor Zeinali - Ghadeer ketab yousif Alkhafaji
The Role of Modern Urbanization in Optimizing Energy Distribution Networks in Urban Areas with the Approach of Internet of Things and Smart City
Meysam Rezaei - Fereshte Ahmadi - Elham Nazemi - Amirhossein Shabani Shahreza
بهینه سازی مصرف انرژی در خودروهای هیبریدی
مهدی غفاریان
Effects of DG Sources on Distribution Network's Harmonics, Simulation Analysis and Comparison
Hamidreza Amiri
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 41.6.0