Revealing Temporal Label Noise in Multimodal Hateful Video Classification

  • Shuonan Yang
  • , Tailin Chen
  • , Rahul Singh
  • , Jiangbei Yue
  • , Jianbo Jiao
  • , Zeyu Fu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The rapid proliferation of online multimedia content has intensified the spread of hate speech, presenting critical societal and regulatory challenges. While recent work has advanced multimodal hateful video detection, most approaches rely on coarse, video-level annotations that overlook the temporal granularity of hateful content. This introduces substantial label noise, as videos annotated as hateful often contain long non-hateful segments. In this paper, we investigate the impact of such label ambiguity through a fine-grained approach. Specifically, we trim hateful videos from the HateMM and MultiHateClip English datasets using annotated timestamps to isolate explicitly hateful segments. We then conduct an exploratory analysis of these trimmed segments to examine the distribution and characteristics of both hateful and non-hateful content. This analysis highlights the degree of semantic overlap and the confusion introduced by coarse, video-level annotations. Finally, controlled experiments demonstrated that time-stamp noise fundamentally alters model decision boundaries and weakens classification confidence, highlighting the inherent context dependency and temporal continuity of hate speech expression. Our findings provide new insights into the temporal dynamics of multimodal hateful videos and highlight the need for temporally aware models and benchmarks for improved robustness and interpretability. Code and data are available at https://github.com/Multimodal-Intelligence-Lab-MIL/HatefulVideoLabelNoise.
Original languageEnglish
Title of host publicationMUWS '25
Subtitle of host publicationProceedings of the 4th International Workshop on Multimodal Human Understanding for the Web and Social Media
PublisherAssociation for Computing Machinery (ACM)
Pages26-34
Number of pages9
ISBN (Print)9798400718380
DOIs
Publication statusPublished - 26 Oct 2025
EventThe 4th International Workshop on Multimodal Human Understanding for the Web and Social Media - Dublin, Ireland
Duration: 28 Oct 202528 Oct 2025

Conference

ConferenceThe 4th International Workshop on Multimodal Human Understanding for the Web and Social Media
Abbreviated titleMUWS 2025
Country/TerritoryIreland
CityDublin
Period28/10/2528/10/25

Keywords

  • hateful video detection
  • label noise
  • multimodal computing

Fingerprint

Dive into the research topics of 'Revealing Temporal Label Noise in Multimodal Hateful Video Classification'. Together they form a unique fingerprint.

Cite this