Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages

D Canali, Marco Cova, G Vigna, C Kruegel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

227 Citations (Scopus)


Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive.

One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets.

To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.
Original languageEnglish
Title of host publicationProceedings of the 20th international conference on World wide web
PublisherAssociation for Computing Machinery
Number of pages10
ISBN (Print)978-1-4503-0632-4
Publication statusPublished - 28 Mar 2011
EventInternational Conference on World Wide Web (WWW '11), 20th - New York, United States
Duration: 1 Apr 2011 → …


ConferenceInternational Conference on World Wide Web (WWW '11), 20th
Country/TerritoryUnited States
CityNew York
Period1/04/11 → …


Dive into the research topics of 'Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages'. Together they form a unique fingerprint.

Cite this