0% Complete
صفحه اصلی
/
International Conference on Artificial Intelligence; City, Industry and Health
Enhancing Multilingual Spam Detection Using Machine Learning and Synthetic Data Augmentation
نویسندگان :
Mohamad Hosein Ghojavand
1
Hamid Rastegari
2
1- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
2- Department of Computer Engineering, Na.C., Islamic Azad University, Najafabad, Iran
کلمات کلیدی :
Spam Detection،Large Language Models،Machine Learning،Data Augmentation،Text Classification
چکیده :
This study investigates the effectiveness of classical machine learning algorithms for multilingual SMS spam detection, focusing on English and the low-resource language Farsi. We conduct a comprehensive evaluation of 18 classical models on original datasets for both languages. To address the challenge of limited data, particularly for Farsi, we augment the datasets with synthetic SMS messages generated using large language models (LLMs). The performance of the models is compared across three configurations: main (real-world data), synthetic (LLM-generated data), and a combination of both. Results show that synthetic data augmentation significantly improves the performance of several models in Farsi SMS spam detection, leading to enhanced accuracy, precision, recall, and F1-scores. Notably, classifiers trained on the combined Farsi dataset consistently outperform those trained solely on the original data. These findings highlight the potential of LLM-generated synthetic data to enhance the effectiveness of classical models in low-resource language contexts. Moreover, the study underscores the value of data augmentation in multilingual text classification tasks where resources are limited. By comparing performance across languages and dataset types, this research provides insights into cross-linguistic spam detection and demonstrates that classical machine learning remains a viable approach when complemented with modern data augmentation techniques.
لیست مقالات
لیست مقالات بایگانی شده
Data-Driven Finger Selection for Nailfold Capillaroscopy in SLE Using Unsupervised Learning and Diagnostic Scoring
Habibollah Jafari - Abdolamir Karbalaie
Comparative Analysis of U-Net and U-Net (Xception) for CT-Based Segmentation of Target Volume and Organs At-Risk in Left Breast Cancer
Hajar Ahmadi - Azimeh NV Dehkordi - Farhad Azimifar - Seied Rabi Mahdavi - Mahnaz Roayaei
A Novel Adaptive Fuzzy-Based AI Approach for High-Density Salt-and-Pepper Noise Removal in MRI Images: Applications in Digital Health and Clinical Diagnostics
Alireza Naghsh - Mohammad Ebadi
Analysis of the Role of Smart Cities in Reducing Air Pollution in Iran's Transitional Metropolises: A Bibliometric Study
Sahar Manavi nia - Fereshte Ahmadi
روش ترکیبی شبیه سازی شده جدید مبتنی بر منطق فازی برای ردیابی نقطه توان ماکزیمم سیستم های فتوولتاییک در شرایط جوی مختلف
محسن دلفانی - حمیدرضا بهمنی
بهینه سازی مصرف انرژی در خودروهای هیبریدی
مهدی غفاریان
بهینه سازی عملکرد سنسورهای مگنتومیتر سه محوره حساس برای تشخیص جریانهای الکتریکی بسیار ضعیف در پنل های خورشیدی
سعید جعفری - نجمه چراغی شیرازی
Reconfiguration of Electrical Distribution Systems in the Presence of Distributed Generation Resources Using MINLP
Azadeh Barani - Majid Moazzami - Ghazanfar Shahgholian - Fariborz Haghighatdar
تحلیل مقایسهای اینورترهای خورشیدی: بررسی تأثیر شرایط محیطی، هزینه بر عملکرد و راندمان سامانههای فتوولتائیک
علی فرج زاده - مجید محمدیان - امین علیان - محمد جبیری
Economic evaluation of energy reduction in smart buildings using wireless sensor networks
Hussein Asaad Shakir Al-Khalaf - Rahmat Aazami - Mohammadamin Shirkhani
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.4.4