Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media

Meysam Salehi; Shahrbanoo Ghahari; Mehdi Hosseinzadeh; Leila Ghalichi

doi:10.1016/j.heliyon.2023.e15667

Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media

Meysam Salehi, Shahrbanoo Ghahari, Mehdi Hosseinzadeh, Leila Ghalichi

Applied Health Research

Research output: Contribution to journal › Article › peer-review

44 Downloads (Pure)

Abstract

Domestic violence (DV) against women in Iran is a hidden societal issue. In addition to its chronic physical, mental, industrial, and economic effects on women, children, and families, DV prevents victims from receiving mental health care. On the other hand, DV campaigns on social media have encouraged victims and society to share their stories of abuse. As a result, massive amount of data has been generated about this violence, which can be used for analysis and early detection. Therefore, this study aimed to analyze and classify Persian textual content pertinent to DV against women in social media. It also aimed to use machine learning to predict the risk of this content. After collecting 53,105 tweets and captions in the Persian language from Twitter and Instagram, between April 2020 and April 2021, 1611 tweets and captions were chosen at random and categorized using criteria compiled and approved by an expert in the field of DV. Then, using machine learning algorithms, modeling and evaluation processes were performed on the tagged data. The Naïve Base model, with an accuracy of 86.77% was the most accurate model among all machine learning models for predicting critical Persian content pertinent to domestic violence on social media. The obtained findings indicate that using a machine learning approach, the risk of Persian content related to DV in social media against women can be predicted.

Original language	English
Article number	e15667
Number of pages	11
Journal	Heliyon
Volume	9
Issue number	5
Early online date	23 Apr 2023
DOIs	https://doi.org/10.1016/j.heliyon.2023.e15667
Publication status	Published - May 2023

Bibliographical note

Funding Information:
This study used Python version 3.4 for analysis and modeling. This program was chosen due of its ease of use, popularity and application in modeling using machine learning algorithms, the capability to call libraries developed for Persian language processing, use in the context of studies in this field, and the availability of various resources on the Internet for resolving coding errors in this program. Furthermore, the Python developer community's history, and the ability to seek guidance and support from them, were factors in selection of this program to perform processing and modeling in this study. This study is divided into six sections, each of which is described in detail below.Modeling: The current study used with supervised machine learning algorithms for modeling. The algorithms were applied to a classification problem. After completing the previous steps and preparing the data for the training of the model, classification algorithms such as Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) were applied to identify the most fitting algorithm. All parts of the existing dataset were subjected to the k-fold cross-validation method. This method partitions the data into K subsets. Each time, one of these subsets is used for validation, and the next k-1 is used for training. This procedure is repeated K times, with each dataset being used exactly once for training and once for validation. Finally, the average result of these K validations is chosen as a final estimate [27]. The stratified k-fold was applied to each of the machine learning algorithms with a 10-fold setting, and the accuracy of all ten times was averaged, resulting in identifying the algorithm with the highest mean accuracies.

Publisher Copyright:
© 2023 The Authors

Keywords

Domestic violence
Machine learning
Mental health
Social media

ASJC Scopus subject areas

General

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.heliyon.2023.e15667Licence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

SalehiM2023DomesticFinal published version, 2.19 MBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{d302a34ad3ef497aa6790f92e2305609,

title = "Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media",

abstract = "Domestic violence (DV) against women in Iran is a hidden societal issue. In addition to its chronic physical, mental, industrial, and economic effects on women, children, and families, DV prevents victims from receiving mental health care. On the other hand, DV campaigns on social media have encouraged victims and society to share their stories of abuse. As a result, massive amount of data has been generated about this violence, which can be used for analysis and early detection. Therefore, this study aimed to analyze and classify Persian textual content pertinent to DV against women in social media. It also aimed to use machine learning to predict the risk of this content. After collecting 53,105 tweets and captions in the Persian language from Twitter and Instagram, between April 2020 and April 2021, 1611 tweets and captions were chosen at random and categorized using criteria compiled and approved by an expert in the field of DV. Then, using machine learning algorithms, modeling and evaluation processes were performed on the tagged data. The Na{\"i}ve Base model, with an accuracy of 86.77% was the most accurate model among all machine learning models for predicting critical Persian content pertinent to domestic violence on social media. The obtained findings indicate that using a machine learning approach, the risk of Persian content related to DV in social media against women can be predicted.",

keywords = "Domestic violence, Machine learning, Mental health, Social media",

author = "Meysam Salehi and Shahrbanoo Ghahari and Mehdi Hosseinzadeh and Leila Ghalichi",

note = "Funding Information: This study used Python version 3.4 for analysis and modeling. This program was chosen due of its ease of use, popularity and application in modeling using machine learning algorithms, the capability to call libraries developed for Persian language processing, use in the context of studies in this field, and the availability of various resources on the Internet for resolving coding errors in this program. Furthermore, the Python developer community's history, and the ability to seek guidance and support from them, were factors in selection of this program to perform processing and modeling in this study. This study is divided into six sections, each of which is described in detail below.Modeling: The current study used with supervised machine learning algorithms for modeling. The algorithms were applied to a classification problem. After completing the previous steps and preparing the data for the training of the model, classification algorithms such as Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) were applied to identify the most fitting algorithm. All parts of the existing dataset were subjected to the k-fold cross-validation method. This method partitions the data into K subsets. Each time, one of these subsets is used for validation, and the next k-1 is used for training. This procedure is repeated K times, with each dataset being used exactly once for training and once for validation. Finally, the average result of these K validations is chosen as a final estimate [27]. The stratified k-fold was applied to each of the machine learning algorithms with a 10-fold setting, and the accuracy of all ten times was averaged, resulting in identifying the algorithm with the highest mean accuracies. Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2023",

month = may,

doi = "10.1016/j.heliyon.2023.e15667",

language = "English",

volume = "9",

journal = "Heliyon",

issn = "2405-8440",

publisher = "Elsevier",

number = "5",

}

TY - JOUR

T1 - Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media

AU - Salehi, Meysam

AU - Ghahari, Shahrbanoo

AU - Hosseinzadeh, Mehdi

AU - Ghalichi, Leila

N1 - Funding Information: This study used Python version 3.4 for analysis and modeling. This program was chosen due of its ease of use, popularity and application in modeling using machine learning algorithms, the capability to call libraries developed for Persian language processing, use in the context of studies in this field, and the availability of various resources on the Internet for resolving coding errors in this program. Furthermore, the Python developer community's history, and the ability to seek guidance and support from them, were factors in selection of this program to perform processing and modeling in this study. This study is divided into six sections, each of which is described in detail below.Modeling: The current study used with supervised machine learning algorithms for modeling. The algorithms were applied to a classification problem. After completing the previous steps and preparing the data for the training of the model, classification algorithms such as Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) were applied to identify the most fitting algorithm. All parts of the existing dataset were subjected to the k-fold cross-validation method. This method partitions the data into K subsets. Each time, one of these subsets is used for validation, and the next k-1 is used for training. This procedure is repeated K times, with each dataset being used exactly once for training and once for validation. Finally, the average result of these K validations is chosen as a final estimate [27]. The stratified k-fold was applied to each of the machine learning algorithms with a 10-fold setting, and the accuracy of all ten times was averaged, resulting in identifying the algorithm with the highest mean accuracies. Publisher Copyright: © 2023 The Authors

PY - 2023/5

Y1 - 2023/5

N2 - Domestic violence (DV) against women in Iran is a hidden societal issue. In addition to its chronic physical, mental, industrial, and economic effects on women, children, and families, DV prevents victims from receiving mental health care. On the other hand, DV campaigns on social media have encouraged victims and society to share their stories of abuse. As a result, massive amount of data has been generated about this violence, which can be used for analysis and early detection. Therefore, this study aimed to analyze and classify Persian textual content pertinent to DV against women in social media. It also aimed to use machine learning to predict the risk of this content. After collecting 53,105 tweets and captions in the Persian language from Twitter and Instagram, between April 2020 and April 2021, 1611 tweets and captions were chosen at random and categorized using criteria compiled and approved by an expert in the field of DV. Then, using machine learning algorithms, modeling and evaluation processes were performed on the tagged data. The Naïve Base model, with an accuracy of 86.77% was the most accurate model among all machine learning models for predicting critical Persian content pertinent to domestic violence on social media. The obtained findings indicate that using a machine learning approach, the risk of Persian content related to DV in social media against women can be predicted.

AB - Domestic violence (DV) against women in Iran is a hidden societal issue. In addition to its chronic physical, mental, industrial, and economic effects on women, children, and families, DV prevents victims from receiving mental health care. On the other hand, DV campaigns on social media have encouraged victims and society to share their stories of abuse. As a result, massive amount of data has been generated about this violence, which can be used for analysis and early detection. Therefore, this study aimed to analyze and classify Persian textual content pertinent to DV against women in social media. It also aimed to use machine learning to predict the risk of this content. After collecting 53,105 tweets and captions in the Persian language from Twitter and Instagram, between April 2020 and April 2021, 1611 tweets and captions were chosen at random and categorized using criteria compiled and approved by an expert in the field of DV. Then, using machine learning algorithms, modeling and evaluation processes were performed on the tagged data. The Naïve Base model, with an accuracy of 86.77% was the most accurate model among all machine learning models for predicting critical Persian content pertinent to domestic violence on social media. The obtained findings indicate that using a machine learning approach, the risk of Persian content related to DV in social media against women can be predicted.

KW - Domestic violence

KW - Machine learning

KW - Mental health

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85153601601&partnerID=8YFLogxK

U2 - 10.1016/j.heliyon.2023.e15667

DO - 10.1016/j.heliyon.2023.e15667

M3 - Article

AN - SCOPUS:85153601601

SN - 2405-8440

VL - 9

JO - Heliyon

JF - Heliyon

IS - 5

M1 - e15667

ER -

Domestic violence risk prediction in Iran using a machine learning approach by analyzing Persian textual content in social media

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Fingerprint

Cite this