EVILSEED: A Guided Approach to Finding Malicious Web Pages

Research output: Chapter in Book/Report/Conference proceedingChapter

Authors

  • Luca Invernizzi
  • Stefano Benvenuti
  • Paolo Milani Comparetti
  • Christopher Kruegel
  • Giovanni Vigna

Colleges, School and Institutes

External organisations

  • University of California
  • University of Genova
  • Vienna University of Technology

Abstract

Malicious web pages that use drive-by download attacks or social engineering techniques to install unwanted software on a user’s computer have become the main avenue for the propagation of malicious code. To search for malicious web pages, the first step is typically to use a crawler to collect URLs that are live on the Internet. Then, fast prefiltering techniques are employed to reduce the amount of pages that need to be examined by more precise, but slower, analysis tools (such as honeyclients). While effective, these techniques require a substantial amount of resources. A key reason is that the crawler encounters many pages on the web that are benign, that is, the “toxicity” of the stream of URLs being analyzed is low.

In this paper, we present EVILSEED, an approach to search the web more efficiently for pages that are likely malicious. EVILSEED starts from an initial seed of known, malicious web pages. Using this seed, our system automatically generates search engines queries to identify other malicious pages that are similar or related to the ones in the initial seed. By doing so, EVILSEED leverages the crawling infrastructure of search engines to retrieve URLs that are much more likely to be malicious than a random page on the web. In other words EVILSEED increases the “toxicity” of the input URL stream. Also, we envision that the features that EVILSEED presents could be directly applied by search engines in their prefilters. We have implemented our approach, and we evaluated it on a large-scale dataset. The results show that EVILSEED is able to identify malicious web pages more efficiently when compared to crawler-based approaches.

Details

Original languageEnglish
Title of host publicationProceedings of the IEEE Symposium on Security and Privacy
Publication statusPublished - May 2012