Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media

Meysam Salehi, Shahrbanoo Ghahari, Mehdi Hosseinzadeh, Leila Ghalichi

Research output: Contribution to journalArticlepeer-review

56 Downloads (Pure)

Abstract

Domestic violence (DV) against women in Iran is a hidden societal issue. In addition to its chronic physical, mental, industrial, and economic effects on women, children, and families, DV prevents victims from receiving mental health care. On the other hand, DV campaigns on social media have encouraged victims and society to share their stories of abuse. As a result, massive amount of data has been generated about this violence, which can be used for analysis and early detection. Therefore, this study aimed to analyze and classify Persian textual content pertinent to DV against women in social media. It also aimed to use machine learning to predict the risk of this content. After collecting 53,105 tweets and captions in the Persian language from Twitter and Instagram, between April 2020 and April 2021, 1611 tweets and captions were chosen at random and categorized using criteria compiled and approved by an expert in the field of DV. Then, using machine learning algorithms, modeling and evaluation processes were performed on the tagged data. The Naïve Base model, with an accuracy of 86.77% was the most accurate model among all machine learning models for predicting critical Persian content pertinent to domestic violence on social media. The obtained findings indicate that using a machine learning approach, the risk of Persian content related to DV in social media against women can be predicted.

Original languageEnglish
Article numbere15667
Number of pages11
JournalHeliyon
Volume9
Issue number5
Early online date23 Apr 2023
DOIs
Publication statusPublished - May 2023

Bibliographical note

Funding Information:
This study used Python version 3.4 for analysis and modeling. This program was chosen due of its ease of use, popularity and application in modeling using machine learning algorithms, the capability to call libraries developed for Persian language processing, use in the context of studies in this field, and the availability of various resources on the Internet for resolving coding errors in this program. Furthermore, the Python developer community's history, and the ability to seek guidance and support from them, were factors in selection of this program to perform processing and modeling in this study. This study is divided into six sections, each of which is described in detail below.Modeling: The current study used with supervised machine learning algorithms for modeling. The algorithms were applied to a classification problem. After completing the previous steps and preparing the data for the training of the model, classification algorithms such as Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) were applied to identify the most fitting algorithm. All parts of the existing dataset were subjected to the k-fold cross-validation method. This method partitions the data into K subsets. Each time, one of these subsets is used for validation, and the next k-1 is used for training. This procedure is repeated K times, with each dataset being used exactly once for training and once for validation. Finally, the average result of these K validations is chosen as a final estimate [27]. The stratified k-fold was applied to each of the machine learning algorithms with a 10-fold setting, and the accuracy of all ten times was averaged, resulting in identifying the algorithm with the highest mean accuracies.

Publisher Copyright:
© 2023 The Authors

Keywords

  • Domestic violence
  • Machine learning
  • Mental health
  • Social media

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media'. Together they form a unique fingerprint.

Cite this