Peering into the Muddy Waters of Pastebin
by Srdjan Matic, Aristide Fattori, Danilo Bruschi and Lorenzo Cavallaro
Advances in technology and a steady orientation of services toward the cloud are becoming increasingly popular with legitimate users and cybercriminals. How frequently is sensitive information leaked to the public? And how easy it is to identify it amongst the tangled maze of legitimate posts that are published daily? The underground economy and the trade of users' stolen information are once again rising to the surface and mutating into a bazaar under the eyes of everyone. Do we have to worry about it and can we do anything to stop it?
Pastebin applications, also known simply as “pastebin”, are the most well-known information-sharing web applications on the Internet. Pastebin applications enable users to share information with others by creating a paste. Users only need to submit the information to be shared and the service provides an URL to retrieve it. In addition to being useful for sharing long messages in accordance with policies (eg Twitter) and netiquette (IRC chats), one of the main features that make pastebin appealing is the possibility of anonymously sharing information with a potentially large crowd.
Unfortunately, as along with the legitimate use of such services comes their inevitable exploitation for illegal activities. The first outbreak occurred in late 2009, when roughly 20,000 compromised Hotmail accounts were disclosed in a public post. Many other sensitive leaks followed shortly thereafter, but it is with the illegal activities of the hacker groups Anonymous and LulzSec that such security concerns reached a much wider audience [1].
To shed interesting insights on the underground economy, we, Royal Holloway, University of London and University of Milan, jointly developed a framework to automatically monitor text-based content-sharing pastebin-like applications to harvest and categorize (using pattern matching and machine learning) leaked sensitive
information.
We monitored pastebin.com from late 2011 to early 2012, periodically downloading public pastes and following links to user-defined posts. We recorded a diverse range of categories of sensitive or malicious information leaked daily: lists of compromised accounts, database dumps, list of compromised hosts (with backdoor accesses), stealer malware dumps, and lists of premium accounts.
The list of compromised accounts (ie username and password pairs) is the most commonly recorded stolen sensitive information (685 posts with 197,022 unique accounts). Such lists are often packed with references to where these accounts were stolen and the websites where they would be valid, giving miscreants (or just random curious readers) an easy shot. Such information enables us to shed some light on previous security trends and weaknesses [2] (eg password strengths and credential reuse). For instance, more than 75% of such passwords were cracked in a negligible amount of time, pointing out that users still rely on poorly chosen or weak passwords.
Similarly, posts of leaked database dumps often include references to the attacked servers, precise information on the exploited vulnerability and clear indications of the tools used to perform the attack, providing interesting insights into the attackers’ methods.
Posts containing leaked information about compromised servers (104 posts with 5,011 unique accounts) include lists of URLs with recurring patterns (eg webdav, shell, dos). Our analysis shows that such PHP-written shells are generally aimed at performing UDP-based DoS attacks.
Information leaked by malware was responsible for 121 posts with 12,036 unique accounts. Such posts report very precise information associated with the leaked credentials, ie the URL of the website for which the account is valid, the program from which they were stolen, an IP addresses, a computer name and a date.
Finally, posts of leaked premium website accounts contain lists of username and password used to access web applications that provide enhanced features for paying customers (892 posts with 239,976 unique accounts). Unsurprisingly, the two commonest categories of premium accounts refer to pornography and file sharing websites.
As previous researchers have done [2], we evaluated the potential value of this sensitive information on the black market [3]; prices and values are reported in Table 1.