FIGURE 4.9 SocialMention displaying results and associated statistics. Social Searcher (http://www.social-searcher.com/) Social Searcher is yet another social media search engine. It uses Facebook, Twitter and Google+ as its sources. The interface provided by this search engine is simple. Under the search tab the search results are distributed into three tabs based on the source, Introduction 65 under these tabs the posts are listed with a preview, which is very helpful in identifying the ones relevant for us. Similar to SocialMention we can setup e-mail alerts also. Under the analytics tab we can get the sentiment analysis, users, keywords, domains, and much more. One of the interesting of these is the popular tab which lists the results with more interaction such as likes, retweets, etc. TWITTER Twitter is one of the most popular social networking sites with huge impact. Apart from its usual functionality to microblog, it also allows to understand the reach and user base of any entity which makes it a powerful tool for reconnaissance. Today it is widely used for market promotion as well as analyze the social landscape. Topsy (http://topsy.com/) Topsy is a tool which allows us to search and monitor Twitter. Using it we can check out the trend of any keyword over Twitter and analyze its reach. The interface is pretty simple and looks like a conventional search engine, just the results are only based on Twitter. The results presented by it can be narrowed down to various timeframes such as 1 day, 30 days, etc. We can also fiter out the results to only see the images, tweets, links, videos, or inflencers. There is another fiter which allows us to see only results containing results from specifi languages. All in all Topsy is a great tool for market monitoring for specifi keywords. FIGURE 4.10 Topsy search. 66 CHAPTER 4 Search the Web—Beyond Convention Trendsmap (http://trendsmap.com/) Trendsmap is a great visual platform which shows trending topics in the form of keywords, hashtags, and Twitter handles from the Twitter platform over the world map. It is great platform which utilizes visual representation of the trends to understand what’s hot in a specifi region of the world. Apart for showing this visual form of information it also allows us to search through this information in the form of a topic or a location which makes it easier for us to see only what we want. Tweetbeep (http://tweetbeep.com/) In its own words, Tweetbeep is like Google alerts for Twitter. It is a great service which allows us to monitor topics of interest on Twitter such as a brand name, product, or updates related to companies and even links. From market monitoring purpose it’s a great tool which can help us to quickly respond to topics of interest. Twiangulate (http://twiangulate.com/search) Twiangulate is a great tool which allows us to perform Twitter triangulations. Using it we can fid who are the common people who are followers of and are followed by two different twitter users. Similarly it also provides the feature to compare the reach of two users. It is great tool to understand and compare the inflence of different Twitter users. SOURCE CODE SEARCH Most of the search engines we have used only look for the text visible on the web page, but there are some search engines which index the source code present on the internet. These kind of search engines can be very helpful when we are looking for specifi technology used over the internet, such as a content management system like WordPress. Utilities of such search engines are for search engine optimization, competitive analysis, keyword research for marketing and are only limited by the creativity of the user. Due to the storage and scalability issues earlier there were no service providers in this domain, but with technological advancements some options are opening up now, let checkout some of these. NerdyData (http://nerdydata.com) NerdyData is one of the fist of its kind and unique search engine which allows us to search the code of the web page. Using the platform is pretty simple, go to the URL https://search.nerdydata.com/, enter the keyword like WordPress 3.7 and NerdyData will list down the websites which contain that keyword in their source code. The results not only provide the URL of the website but also shows the section of the code with the keyword highlighted under the section Source Code Snippet. Apart from this there are various features such as contact author, fetch backlink, and others which can be very helpful but most of these are paid, yet the limited free usage of NerdyData is very useful and is worth a try. Introduction 67 FIGURE 4.11 NerdyData code search results. Ohloh code (https://code.ohloh.net) Ohloh code is another great search engine for source code searching, but it’s a bit different in terms that it searches for open source code. What this means is that its source of information is the code residing in open space, such as Git repositories. It provides great options to fiter out the results based on defiitions, languages (programming), extensions, etc. through a bar on the left-hand side titled “Filter Code Results.” Searchcode (https://searchcode.com) Similar to Ohloh, Searchcode also uses open source code repositories as its information source. The search fiters provided by Searchcode are very helpful, some of them are repository, source, and language. TECHNOLOGY INFORMATION In this special section of search engines we will be working on some unique search engines which will help us to gather information related to various different technologies and much more. In this segment we will be heavily dealing with IP addresses and related terms, so it is advised to go through the section “Defiing the basic terms” in the fist chapter. 68 CHAPTER 4 Search the Web—Beyond Convention Whois (http://whois.net/) Whois is basically a service which allows us to get information about the registrant of an internet resource such as a domain name. Whois.net provides a platform using which we can perform a Whois search for a domain or IP address. A whois record usually consists of registrar info; date of registration and expiry; registrant info such as name, e-mail address, etc. Robtex (http://www.robtex.com) Robtex is great tool to fid out information about internet resources such as IP address, Domain name, Autonomous System (AS) number, etc. The interface is pretty simple and straightforward. At the top left-hand corner is a search bar using which we can lookup information. Searching for a domain gives us related information like IP address, route, AS number, location, etc. Similarly other information is provided for IP addresses, route, etc. W3dt (https://w3dt.net/) W3dt is great online resource to fid out networking related information. There are various section which we can explore using this single platform. The fist section is domain name system (DNS) tools which allows us to perform various DNS-related queries such as DNS lookup, reverse DNS lookup, DNS server figerprinting, etc. Second section provides tools related to network/internet such as port scan, traceroute, MX record retriever, etc. The next section is web/HTTP which consists of tools such as SSL certifiate info, URL encode/decode, HTTP header retrieval, etc., then comes the database lookups section under which comes MAC address lookup, Whois lookup, etc., in the end there are some general and ping-related tools. All in all it is great set of tools which allows to perform a huge list of different useful functions under single interface. Shodan (http://www.shodanhq.com/) So far we have used various types of search engines which help us to explore the web in all different ways. What we haven’t encountered till now is an internet search engine (remember the difference between web and internet explained in chapter 1) or simply said a computer search engine. Shodan is a computer search engine which scans the internet and grabs the service banner based on IP address and port. It allows us to search this information using IP addresses, country fiters, and much more. Using it we can fid out simple information such as websites using a specifi type of web server such as Internet Information Services (IIS) or Apache and also information which can be quite sensitive such as IP cameras without authentication or SCADA systems over internet. Though the free version without registration provides very limited information, which can be mitigated a bit using a registered account, yet it is suffiient enough to understand the power of this unique search engine. We can utilize the power of this tool through browser add-on or through its application programming interface also. Shodan has a very active development history and comes up with new features all the time, so we can expect much more from it in the future. Introduction 69 FIGURE 4.12 Shodan results for port 21. WayBack Machine (http://archive.org/web/web.php) Internet Archive WayBack Machine is great resource to lookup how a website looked in past. Simply type the website address into the search bar and it will return back a timeline with the available snapshot highlighted on the calendar. Simply hovering over these highlighted dates over calendar will present a link to the snapshot. This is great tool to analyze how a website has evolved and thus monitor its past growth. It can also be helpful to retrieve information from a website which was available in the past but is not now. REVERSE IMAGE SEARCH We all are familiar with the phrase “A picture is worth a thousand words” and its veracity and are also aware of platforms like Google Images (http://images.google.com), Flickr (https://www.flckr.com/), Deviantart (http://www.deviantart.com/), which provides us images for keywords provided. Usually when we need to lookup some information, we have a keyword or a set of them in the form of text, following the same lead the search engines we have dealt with till now take text as an input and get us the results, but in case we have an image and we want to see where it appears on the web, where do we go? This is where reverse image search engines come in, which take image as an input and looks up to fid its web appearance. Let’s get familiar with some of these. 70 CHAPTER 4 Search the Web—Beyond Convention Google Images (http://images.google.com/) We all are aware that Google allows us to search the web for images, but what many of us are unaware of is that it also allows to perform a reverse image search. We simply need to go to the URL http://images.google.com and click on the camera icon and provide the URL of the image on the web or upload a locally stored image fie, we can also drag and drop an image fie into the search bar and voila Google comes up with links to the pages containing that or similar images on the web. FIGURE 4.13 Google reverse image search. TinEye (https://www.tineye.com/) TinEye is another reverse image search engine and has a huge database of images. Similar to Google images, searching on TinEye is very simple, we can provide the URL to the image, upload it, or perform a drag and drop. TinEye also provides browser plugin for major browsers, which makes the task much easier. Though the results of TinEye are not as comprehensive as Google images, yet it provides a great platform for the task and must be tried. ImageRaider (http://www.ImageRaider.com/) Last but not the least in this list is ImageRaider. ImageRaider simply lists the results domain wise. If a domain contains more than one occurrence of the Introduction 71 image then it also tells that and the links to those images are listed under the domain name. Reverse image search can be very helpful to fid out more about someone when we are hitting dead-ends using conventional methods. As many people use same profie picture for various different platforms, making a reverse image search can lead us to other platforms where the use has created a profie and also has previously undiscovered information. MISCELLANEOUS We dealt with a huge list of search engines which are specialize in their domain and are popular among a community. In this section we will be dealing with some different types of search platforms which are lesser known but serve unique purposes and are very helpful in special cases. DataMarket (http://datamarket.com/) DataMarket is an open portal which consists of large data sets and provides the data in a great manner through visualizations. The simple search feature provides results for global topics with list of different visualizations related to the topic, for example, searching for the keyword gold would provide results such as gold statistics, import/ export of gold, and much more. The results page consists of a bar on the left which provides a list of fiters using which the listed results can be narrowed down. It also allows us to upload our own data and create visualization from it. Refer to the link http://datamarket.com/topic/list/ for a huge list of topics on which DataMarket provides information. WolframAlpha (http://www.wolframalpha.com/) In this chapter we learned about various search engines which take some value as input and provide us with the links which might contain the answer to the questions we are actually looking for, but what we are going to learn about now is not a search engine but a computational knowledge engine. What this means is that it takes our queries as input but does not provides with the URLs to the websites containing the information, instead it tries to understand our natural language queries and based upon an organized data set, provides a factual answer to them in form of text and sometimes apposite visualization also. Say, for example, we want to know the purpose of .mil domain, so we can simply type in the query “what is the purpose of the .mil internet domain?” and get the results, to get the words starting with a and ending with e, a query like “words starting with a and ending with e” would give us the results, we can even check the net worth of Warren Buffett by a query like “Warren Buffett net worth.” For more examples of the queries of various domains that WolframAlpha is able to answer, checkout the page http://www.wolframalpha.com/examples/. 72 CHAPTER 4 Search the Web—Beyond Convention FIGURE 4.14 WolframAlpha result. Addictomatic (http://addictomatic.com) Usually we visit various different platforms to search information related to a topic, but addictomatic aggregate various news and media sources to create a single dashboard for any topic of our interest. The content aggregated is displayed in various sections depending upon the source. It also allows us to move these sections depending upon our preference for better readability. Carrot2 (http://search.carrot2.org/stable/search) Carrot2 is a search results clustering engine, what this means is that it takes search results from other search engines and organizes these results into topics using its search results clustering algorithms. Its unique capability to cluster the results into topics allows to get a better understanding of it and associated terms. These clusters are also represented in different interesting forms such as folders, circles, and FoamTree. Carrot2 can be used through its web interface which can be accessed using the URL http://search.carrot2.org/ and also through a software application which can be downloaded from http://project.carrot2.org/download.html. Introduction 73 FIGURE 4.15 Carrot2 search result cluster. Boardreader (http://boardreader.com/) Boards and forums are rich source of information as a lot of interaction and Q&A goes on in places like this. Members of such platforms range from newbies to experts in the domain to which the forum is related to. In places like this we can get answers to questions which are diffiult to fid elsewhere as they purely comprise of user-generated content, but how do we search them? Here is the answer Boardreader. It allows us to search forums to get results which contains content with human interaction. It also displays a trend graph of the search query keyword to show the amount of activity related to it. The advance search features provided by it such as sort by relevance, occurrence between specifi dates, domain-specifi search, etc. adds to its already incredible features. Omgili (http://omgili.com/) Similar to Boardreader, Omgili is also a forum and boards search engine. It displays the results in the form of broad bars and these bars contain information such as date, number of posts, author, etc. which can be helpful in estimating the relevance of the result. One such information is Thread Info, which provides further information about a thread such as forum name, number of authors, and replies to the thread, without actually visiting the original thread forum page. It also allows us to fiter the results based upon the timeline of their occurrence such as past month, week, day, etc. 74 CHAPTER 4 Search the Web—Beyond Convention Truecaller (http://www.truecaller.com) Almost everyone who uses or has ever used a smartphone is familiar with the concept of mobile applications, better known as apps and many if not most of them have used the famous app called Truecaller which helps to identify the person behind the phone number, what many of us are unaware of is that it can also be used through a web browser. Truecaller simply allows us to search using a phone number and provides the user’s details using it’s crowdsourced database. So we discussed a huge list of various search engines under various categories which are not conventionally used but as we have already seen these are very useful in different scenarios. We all are addicted to Google for all our searching needs and it being one of the best in its domain has also served our purpose most of the time, but sometimes we need different and specifi answers to our queries, then we need these kind of search engines. This list tries to cover most of the aspects of daily searching needs, yet surely there must be other platforms which need to be fid out and used commonly to solve specifi problems. In this chapter we learned about various unconventional search engines, their features, and functionalities, but what about the conventional search engines like Google, Bing, Yahoo, etc. that we use on daily basis. Oh! we already know how to Other search engines worth trying: • Meta search engine • Search (http://www.search.com/) •People search • ZabaSearch (http://www.zabasearch.com/) •Company search • Hoovers (http://www.hoovers.com/) • Kompass (http://kompass.com/) •Semantic • Sensebot (http://www.sensebot.net/) • Social media search • Whostalkin (http://www.whostalkin.com/) •Twitter search • Mentionmapp (http://mentionmapp.com/) • SocialCollider (http://socialcollider.net/) • GeoChirp (http://www.geochirp.com/) • Twitterfall (http://beta.twitterfall.com/) • Source code search • Meanpath (https://meanpath.com) •Technology search • Netcraft (http://www.netcraft.com/) • Serversniff (http://serversniff.net) • Reverse image search • NerdyData image search (https://search.nerdydata.com/images) •Miscellaneous • Freebase (http://www.freebase.com/) Introduction 75 use them or do we? The search engines we use on daily basis have various advanced features which many of the users are unaware of. These features allows users to fiter out the results so that we can get more information and less noise. In the next chapter we will be dealing with conventional search engines and will learn how to use them effectively to perform better search and get specifi results. Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00005-7 77 Copyright © 2015 Elsevier Inc. All rights reserved. CHAPTER Advanced Web Searching 5 INFORMATION IN THIS CHAPTER • Search Engines • Conventional Search Engines • Advanced Search Operators of various Search Engines • Examples and Usage INTRODUCTION In the last chapter we dealt with some special platforms which allowed us to perform domain-specifi searches; now let’s go into the depths of conventional search engines which we use on daily basis and check out how we can utilize them more effiiently. In this chapter, basically, we will understand the working and advanced search features of some of the well-known search engines and see what all functionalities and fiters they provide to serve us better. So we already have a basic idea about what search engine is, how it crawls over the web to collect information, which are further indexed to provide us with search results. Let’s revise it once and understand it in more depth. Web pages as we see them are not actually what they look like. Web pages basically contain HyperText Markup Language (HTML) code and most of the times some JavaScript and other scripting languages. So HTML is basically a markup language and uses tags to structure the information, for example the tag

is used to create a heading. When we receive this HTML code from the server, our browsers interpret this code and display us the web page in its rendered form. To check the client-side source code of a web page, simply press Ctrl+U in the browser with a web page open. Once the web crawler of a search engine reaches a web page, it goes through its HTML code. Now most of the times these pages also contain links to other pages, which are used by the crawlers to move further in their quest to collect data. The content crawled by the web crawler is then stored and indexed by search engine based on variety of factors. The pages are ranked based upon their structure (as defied in HTML), the keywords used, interlinking of the pages, media present on the page, and many other details. Once a page has been crawled and indexed it is ready to be presented to the user of the search engine depending upon the query. 78 CHAPTER 5 Advanced Web Searching Once a page has been crawled, the job of the crawler does not fiish for that page. The crawler is scheduled to perform the complete process again after a specifi time as the content of the page might change. So this process keeps on going and as new pages are linked they are also crawled and indexed. Search engine is a huge industry in itself which helps us in our web exploration, but there is another industry which depends directly on search engines and that is search engine optimization (SEO). SEO is basically about increasing the rank of a website/web page or in other words to bring it up to the starting result pages of a search engine. The motivation behind this is that it will increase the visibility of that page/site and hence will get more traffi which can be helpful from a commercial or personal point of view. Now we have a good understanding of the search engines and how they operate, let’s move ahead and see how we can better use some of the conventional search engines. GOOGLE Google is one of the most widely used search engines and is the starting point for web exploration for most of us. Initially Google search was accessible through very simple interface and provided limited information. Apart from the search box there were some special search links, links about the company, and a subscription box where we could enter our email to get updates. There were no ads, no different language options, no login, etc. It’s not only the look and feel of the interface that has changed over the years but also the functionalities. It has evolved from providing simple web links to the pages containing relevant information to a whole bunch of related tools which not only allow us to search different media types and categories but also narrow down these results using various fiters. Today there are various categories of search results such as images, news, maps, videos, and much more. These plethora of functionalities provided by Google today has certainly made our lives much easier and made the act of fiding information on the web a piece of cake. Still sometimes we face diffiulty in fiding the exact information we are looking for and the main reason behind it is not the lack of information but to the contrary the abundance of it. Let’s move on to see how we perform Google search and how to improve it. So whenever we need to search something in Google we simply think about some of the keywords associated with it and type them into the search bar and hit Enter. Based upon the indexing Google simply provides us with the associated resources. Now if we want to get better results or fiter the existing results based upon various factors, we need to use Google advanced search operators. Let’s have a look at these operators and their usage. site: It fetches results only for the site provided. It is very useful when to limit our search to some specifi domain. It can be used with another keyword and Google Google 79 will bring back related pages from the site specifid. For an information security perspective it is very useful to fid out different sub domains related to a particular domain. Examples: site:gov, site:house.gov FIGURE 5.1 Google “site” operator usage. inurl: This operator allows looking for keywords in the uniform resource locator (URL) of the site. It is useful to fid out pages which follow a usual keyword for specifi pages, such as contact us. Generally, as the URL contains some keywords associated with the body contents, it will help us to fid out the equivalent page for the keyword we are searching for. Example: inurl:hack allinurl: Similar to “inurl” this operator allows looking for multiple keywords in the URL. So we can search for multiple keywords in the URL of a page. This also enhances the chances of getting quality content of what we are looking for. Example: allinurl:hack security intext: This operator makes sure that the keyword specifid is present in the text of the page. Sometimes just for the sake of SEO, we can fid some pages only contain keywords to enhance the page rank but not the associated content. In that case we can use this 80 CHAPTER 5 Advanced Web Searching query parameter to get the appropriate content from a page for the keyword we are looking for. Example: intext:hack allintext: Similar to the “intext” this operator allows to lookup for multiple keywords in the text. As we discussed earlier the feature of searching for multiple keywords always enhances the content quality in the result page. Example: allintext:data marketing intitle: It allows us to restrict the results by the keywords present in the title of the pages (title tag: XYZ). It can be helpful to identify pages which follow a convention for the title of the pages such as directory listing by the keywords “index of” and most of the sites provide the keywords in the title for improving the page rank. So this query parameter always helps to search for a particular keyword. Example: intitle:blueocean allintitle: This is the multiple keyword counterpart of “intitle” operator. Example: allintitle:blueocean market fietype: This operator is used to fid out fies of a specifi kind. It supports multiple fie types such as pdf, swf, kml, doc, svg, txt, etc. This operator comes handy when we are only looking for specifi type of fies on a specifi domain. Example: fietype:pdf, site:xyz.com, fietype:doc ext: The operator ext simply stands for extension and it works similar to the fietype operator. Example: ext:pdf defie: This operator is used to fid out the meaning of the keyword supplied. Google returns dictionary meaning and synonyms for the keyword. Example: defie:data AROUND This operator is helpful when we are looking for the results which contain two different keywords, but in close association. It allows us to restrict the number Google 81 of words as the maximum distance between two different keywords in the search results. Example: A AROUND(6) Z AND A simple Boolean operator which makes sure keywords on both the side are present in the search results. Example: data AND market OR Another Boolean operator which provides search results that contain either of the keyword present on both the sides of the operator. Example: data OR intelligence NOT Yet another Boolean operator which excludes the search results that contain the keyword followed by it. Example: lotus NOT flwer “” This operator is useful when we need to search for the results which contain the provided keyword in the exact sequence. For example we can search pages which contain quotes or some lyrics. Example: “time is precious” - This operator excludes the search results which contain the keyword followed by it (no space). Example: lotus -flwer * This wildcard operator is used as a generic placeholder for the unknown term. We can use this to get quotes which we partially remember or to check variants of one. Example: “* is precious” .. This special operator is used to provide a number range. It is quite useful to enforce a price range, time range (date), etc. Example: japan volcano 1990..2000 82 CHAPTER 5 Advanced Web Searching info: The info operator provides information what Google has on a specifi domain. Links to different types of information are present in the results, such as cache, similar websites, etc. Example: info:elsevier.com related: This operator is used to fid out other web pages similar to the provided domain. It is very helpful when we are looking for websites which provide similar services to a website or to fid the competitors of it. Example: related:elsevier.com cache: This operator redirects to the latest cache of the page that Google has crawled. In case we don’t get a result for a website which was accessible earlier, this is a good option to try. Example: cache:elsevier.com Advanced Google search can also be performed using the page http://www.google.com/advanced_search, which allows us to perform restricted search without using the operators mentioned above. FIGURE 5.2 Google advanced search page. Apart from the operators Google also provide some operations which allow us to check information about current events and also perform some other useful things. Some examples are: Google 83 time Simply entering this keyword displays the current time of the location we are residing in. We can also use name of region to get its current time. Example: time france weather This keyword shows the current weather condition of our current location. Similar to “time” keyword we can also use it to get the weather conditions of a different region. Example: weather sweden Calculator Google also solves mathematical equations and also provides a calculator. Example: 39*(9823-312)+44/3 Convertor Google can be used to perform conversions for different types of units like measurement units, currency, time, etc. Example: 6 feet in meters This is not all, sometimes Google also shows relevant information related to global events as and when they happen; for example, FIFA World Cup. Apart from searching the web, in general, Google also allows us to search specifi categories such as images, news, videos, etc. All these categories, including web have some common and some specifi search fiters of their own. These options can simply be accessed by clicking on the “Search tools” tab just below the search bar. We can fid options which allow us to restrict the results based upon the country, time of publish for web; for images there are options like the color of image, its type, usage rights, etc. and similarly other relevant fiters for different categories. These options can be very helpful in fiding the required information of a category as they are designed according to that specifi category. For example if we are looking for an old photograph of something it is a good idea to see only the results which are black and white. The operators we discussed are certainly very useful for anyone who needs to fid out some information on the web, but the InfoSec community has certainly taken it to next level. These simple and innocent operators we just discussed are widely used in the cyber security industry to fid and demonstrate how without even touching the target system, critical and compromising information can be retrieved. This technique of using Google search engine operators to fid such information is termed as “Google Hacking.” When it comes to “Google Hacking” one name that jumps out in mind is Johnny Long. Johnny was an early adopter and pioneer in the fild of creating such Google queries which could provide sensitive information related to the target. These queries are widely popular by the name Google Dorks. Let’s understand how this technique works. We saw a number of operators which can narrow down search results to a specifi domain, fietype, title value, etc. Now 84 CHAPTER 5 Advanced Web Searching in Google Hacking our motive is to fid sensitive information related to the target; for this people have come up with various different signatures for different fies and pages which are known to contain such information. For example, let’s just say we know the name of a sensitive directory which should not be directly accessible to any user publicly, but remains public by default after the installation of the related application. So now if we want to fid out the sites which have not changed the accessibility for this directory, we can simply use the query “inurl:/sensitive_directory_name/” and we will get a bunch of websites which haven’t changed the setting. Now if we want to further narrow it down for a specifi website, we can combine the query with the operator “site,” as “site:targetdomain.com inurl://sensitive_directory_name/.” Similarly we can fid out sensitive fies that are existing on a website by using the operators “site” and “fietype” in collaboration. Let’s take another example of Google Hacking which can help us to discover high severity vulnerability in a website. Many developers use flsh to make websites more interactive and visually appealing. Small web format (SWF) is a flsh fie format used to create such multimedia. Now there are many SWF players known to be vulnerable to cross-site scripting (XSS), which could lead to an account compromise. Now if we want to fid out if the target domain is vulnerable to such attack, then we can simply put in the query “site:targetdomain.com fietype:swf SWFPlayer_signature_keyword” and test the resulting pages using publicly available payloads to verify. There are huge number of signatures to fid out various types of pages such as sensitive directories, web server identifiation, fies containing username/password, admin login pages, and much more. The Google Hacking Database created by Johnny Long can be found at http://www.hackersforcharity.org/ghdb/ though it is not updated, yet it is a great place to understand and learn how we can use Google to fid out sensitive information. A regularly updated version can be found at http://www.exploit-db.com/google-dorks/. FIGURE 5.3 Google hacking database- www.exploit-db.com/google-dorks/. Bing 85 BING Microsoft has been providing search engine solutions from a long time and they have been known with different names. Bing is latest and most feature-rich search engine in this series. Unlike its predecessors Bing provides a more clean and simple interface. As Microsoft covers a major part of operating system market, the general perspective of a user in terms of search engine is that Bing is just another sideproduct from a technology giant and hence most of them do not take it seriously. But unfortunately it is wrong. Like all the search engines Bing also has some unique features that will force you to use Bing when you need those features. Defiitely those features have a unique mark on how we search. We will discuss not only about the special features but also the general operators which can allow us to understand the search engine and its functionalities. + This operator works quite similar in all the search engines. This allows a user to forcefully add single or multiple keywords in a search query. Bing will make sure the keywords come after + operator must present in the result pages. Example: power +search - This operator is also known as NOT operator. This is used to exclude something from a set of things, such as excluding a cuisine. Example: Italian food -pizza Here Bing will display all the Italian foods available but not pizza. We can write this in another form which can also fetch same result such as the below example Example: Italian food NOT pizza “” This is also same in most of the search engines. This is used to search for exact phrase used inside double quotation. Example: “How to do Power Searching?” | This is also known as OR operator, mostly used for getting result from one of the two keywords or one of the many keywords added with this operator. Example: ios | android ios OR android 86 CHAPTER 5 Advanced Web Searching & This operator is also known as AND operator. This is the by-default used search operator. If we do nothing and add multiple keywords then Bing will do a AND search in the backend and give us the result. Example: power AND search power & search As this is the default search, it’s very important to keep in mind that until and unless we use OR and NOT in capital, Bing won’t understand it as operators. () This can be called as group operator. As parenthesis has the top priority order, we can add the lower preferred operators such as OR in that and create a group query to execute the lower priority operators fist. Example: android phone AND (nexus OR xperia) site: This operator will help to search a particular keyword within a specifi website. This operator works quite the same in most of the search engines. Example: site:owasp.org clickjacking fietype: This allows a user to search for data in specifi type of fie. Bing supports all fie types but few, mostly those are supported by Google are also supported by Bing. Example: hack fietype:pdf ip: This unique operator provided by Bing allows us to search web pages based upon IP address. Using it we can perform a reverse IP search, which means it allows us to look for pages hosted on the specifid IP). Example: ip:176.65.66.66 Grouping of Bing operators supported in following order. () “” NOT/- And/& OR/| Bing 87 feed: Yet another unique operator provided by Bing is feed, which allows us to look for web feed pages containing the provided keyword. One other feature that Bing provides is to perform social search using the page https://www.bing.com/explore/social. It allows us to connect our social network accounts with Bing and perform search within them. FIGURE 5.5 Bing social search. FIGURE 5.4 Bing “ip” search. 88 CHAPTER 5 Advanced Web Searching YAHOO Yahoo is one of the oldest players in the search engine arena and has been quite popular. The search page for Yahoo also has a lot of content such as news, trending topics, weather, fiancial information, and much more. Earlier Yahoo has utilized third party services to power its search capabilities, later it shifted to become independent and once again has joined forces with Bing for its searching services. Though there is not too much that Yahoo offers in terms of advanced searching as compared to other search engines, the ones provided are worth trying comparing to others. Let’s see some of the operators that can be useful. + This operator is used to make sure the search results contain the keyword followed by it. Example: +data - Opposite to the “+” operator, this operator is used to exclude any specifi keyword from the search results. Example: -info OR This operator allows us to get results for either of the keywords supplied. Example: data OR info site: This operator allows restricting the result only to the site provided. We will only get to see the links from the specifid website. There are two other operators which work like this operator but do not provide results as accurate or in-depth as they are domain and hostname. Their usage is similar to the “site” operator. Example: site:elsevier.com link: It is another interesting operator which allows us to lookup web pages which link to the specifi web page provided. While using this operator do keep in mind to provide the URL with the protocol (http:// or https://). Yahoo 89 Example: link:http://www.elsevier.com/ defie: We can use this operator to fid out the dictionary meaning of a word. Example: defie:data intitle: The “intitle” operator is used to get the results which contain the specifid keyword in their title tag. Example: intitle:data So these are the operators which Yahoo supports. Apart from these we can access the Yahoo advanced search page at http://search.yahoo.com/search/options?fr= fp-top&p=, which allows us to achieve well-fitered search results. One other thing that Yahoo offers is advanced news search which can be performed using the page http://news.search.yahoo.com/advanced . FIGURE 5.6 Yahoo “link” search. 90 CHAPTER 5 Advanced Web Searching FIGURE 5.7 Yahoo advanced search page. YANDEX: Yandex is Russian search engine and is not too much popular outside the country, but it’s one of the most powerful search engines available. Like Google, Bing, Yahoo it has its own unique keywords and data indexed. Yandex is the most popular and widely used search engine in Russia. It’s the fourth largest search engine in the world. Apart from Russia, it is also used in countries like Ukraine, Kazakhstan, Turkey, and Belarus. It is also most under rated search engine as its use is only limited to specifi country but in security community we see it otherwise. Most of the people are either happy with their conventional search engine or they think all the internet information is available in the search engine they are using. But the fact is that search engines like Yandex also have many unique features that can provide us with way effiient result as compared to other search engines. Here we will discuss how Yandex can be a game changer in searching data on internet and how to use it effiiently. As discussed earlier like other search engines, Yandex has its own operators such as lang, parenthesis, Boolean, and all. Let’s get familiar with these operators and their usage. + This operator works quite same for all the search engines. Here also for Yandex, + operator is used to include a keyword in a search result page. The keyword added after + operator is the primary keyword in the search query. The result fetched by the search engine must contain that keyword. Yandex 91 Example: osint +tools Here the result page might not contain the OSINT keyword but must contain tools keyword. So when we want to focus on a particular keyword or set of keywords in Yandex, we must use + operator. ∼∼ This is used as NOT operator which is used to exclude a keyword from a search result page. It can be used in excluding a particular thing from a set of the things. Let’s say we want to buy mobile phone but not windows phone. Then we can craft a query accordingly to avoid windows phone from search result by using ∼∼ operator. Example: mobile phone ∼∼ windows ∼ Unlike ∼∼ operator ∼ is used to exclude a keyword not from search result page but search result sentence. That means we might have both or all the keywords present in the query in a page but the excluded keyword must not be in any sentence with the other keywords mentioned. I understand it being little complicated so let me explain simply. Let’s start with the above query mobile phone ∼∼ windows Here if a page contains both mobile phone as well as windows, Yandex will exclude that page from search result. Example: mobile phone ∼ windows But for the example shown above, it will show all the pages that contains both mobile phone as well as windows but not if these two keywords are in same sentence. && The && operator is used to show pages that contains both the keywords in search result. Example: power && searching It will provide the results of all the pages that contain both these keywords. & This operator is used to show only pages that contains both the keywords in a sen- tence. It provides more refied result for both the keywords. Example: power & searching /number It’s a special operator which can be used for different purposes according to the number used after slash. It’s used for defiing the closeness of the keywords. It is quite similar to AROUND operator of Google and NEAR operator of Bing. The number used with slash defies the word distance between two keywords. 92 CHAPTER 5 Advanced Web Searching Example: power /4 searching Yandex will make sure that the result page must contain these two keywords with in four words from each other irrespective of keyword position. That means the order in which we created the query with the keywords might change in result page. What if we need to fi the order? Yes, Yandex has a solution for that also: adding a +sign with the number. Example: power /+4 searching By adding the + operator before the number will force Yandex to respond with the results with only pages where these two keywords are in same order and in within 4 word count. What if we need the reverse of it, let’s say we need to get results of keyword “searching” fist and after that “power” within 4 word count and not vice versa. In that case negative number will come pretty handy where we can use - sign to reverse what we just did without getting the vice versa result. Example: power /-4 searching This will only display pages which contain searching keyword and power after that within 4 word count. Let’s say we want to setup a radius or boundary for a keyword with respect to another; in that case we have to specify that keyword in second position. Example: power /(-3 +4) searching Here we are setting up a radius for searching with respect to power. This means that the page is displayed in results shown only if either “searching” will be found within 3 words before or after “power” within 4 word count. This can be helpful when we are searching for two people’s names. In that case we cannot guess that which name will come fist and which name will come next so it’s better to create a radius for those two names, and the query will serve our purpose. As we discussed a lot about word-based keyword search, now let’s put some light on sentence-based keyword search. For sentence based keyword search we can use Yandex && operator with this number operator. Example: power && /4 searching In this case we can get result pages containing these two keywords with in 4 sentence difference irrespective of the position of the keyword. That means either “power” may come fist and “searching” after that or vice versa. ! This operator does something special. And this is one of my favorite keyword. It gives a user freedom to only search a specifi keyword without similar word search or extended search and all. What exactly happens in general search is that if you Yandex 93 search for a keyword, let’s say AND, you will get some results showing only AND and then the results will extend to ANDroid or AMD and so on. If we want to get only result for AND keyword; use this operator. Example: !and This will restrict the search engine to provide results only showing pages which contains this particular keyword AND. !! It can be used to search the dictionary form of the keyword. Example: !!and () When we want to create a complex query with different keywords and operators we can use these brackets to group them. As we already used these brackets above, now we will see some other example to understand the true power of this. FIGURE 5.8 Yandex complex query. Example: power && (+searching | !search) Here the query will search for both sets of keywords fist power searching and power search but not both in same result. “” Now it’s about a keyword let’s say we want to search a particular string or set of keywords then what to do? Here this operator “” comes for rescue. It is quite similar 94 CHAPTER 5 Advanced Web Searching as Google’s “”. This will allow a user to search for exact keywords or string which is put inside the double quotes. Example: “What is OSINT?” It will search for exact string and if available will give us the result accordingly. * This operator can be refereed as wildcard operator. The use of this operator is quite same in most of the search engines. This operator is used to fil the missing keyword or suggest relevant keywords according to the other keywords used in the search query. Example: osint is * of technology It will search for auto fil the space where * is used to complete the query with relevant keywords. In this case that can be ocean or treasure or anything. We can also use this operator with double quote to get more effiient and accurate result. Example: “OSINT is * of technology” | This is also quite similar to OR operator of Google. It allows us to go for different keywords where we want results for any of them. In-real time scenario we can search for options using this operator. Let’s say I want to buy a laptop and I have different options: in that case this operator will come to picture. Example: dell | toshiba | macbook Here we can get result for any of these three options but not all in one result. << This is an unusual operator known as non-ranking “AND.” It is basically used to add additional keywords to the list of keywords without impacting the ranking of the website on result. We might not get to know what exactly it does by just going through its defiitions. So in simple words it can be used to tag additional keywords to the query list without impacting the page rankings. Example: power searching << OSINT It can be used to additionally search for OSINT along with the other two keywords without impacting the page ranking in the result page. title: This is quite equivalent to the “intitle.” It can be used to search the pages with the keyword (s) specifid after title query parameter. Example: title:osint Yandex 95 This will provide pages that contain OSINT in the title of the web page. Similarly we can use this title query parameter to search for more than one keyword. Example: title:(power searching) url: This “url” search query parameter is also an add-on. It searches for the exact URL provided by the user in Yandex database. Example: url:http://attacker.in Here Yandex will provide a result if and only if the URL has been crawled and indexed in its database. inurl: It can be used to search for keywords present in a URL or in other words for URL fragment search. This “inurl” query parameter works quite similar in all the search engines. Example: inurl:osint It will search for all the URLs that contain osint keyword no matter what the position of the keyword is. mime:fietype This query parameter is quite similar to “fietype” query parameter of Google. This helps a user to search for a particular fie type. Example: osint mime:pdf FIGURE 5.9 Yandex fie search. 96 CHAPTER 5 Advanced Web Searching It will provide us all the PDF links that contains osint keyword. The fie types supported by Yandex mime are PDF, RTF, SWF, DOC, XLS, PPT, DOCX, PPTX, XLSX, ODT, ODS, ODP, ODG host: It can be used to search all the available hosts. This can be used by the penetration testers mostly. Example: host:owasp.org rhost: It is quite similar to host but “rhost” searches for reverse hosts. This can also be used by the penetration testers to get all the reverse host details. It can be used in two ways. One is for subdomains by using the wildcard operator * at the end or another without that. Example: rhost:org.owasp.* rhost:org.owasp.www site: This operator is like the best friend of a penetration tester or hacker. This is available in most of the search engines. It provides all the details of subdomains of the provided URL. For penetration testers or hackers fiding the right place to search for vulnerability is most important. As in most cases the main sites are much secured as compared to the subdomains, if any operator helps to simplify the process by providing details of the subdomains to any hacker or penetration tester then half work is done. So the importance of this operator is defiitely felt in security industry. Example: site:http://www.owasp.org It will provide all the available subdomains of the domain owasp.com as well as all the pages. date: This query can be used to either limit the search data to a specifi date or to specifi period by a little enhancement in the query. Example: date:201408* In this case, format of date used is YYYYMMDD, but in case of the DD we used wildcard operator “*” so we will get results limited to August 2014. Yandex 97 We can also limit the same to a particular date of the August 2014 by changing a bit in the query. date:20140808 It will only show results belong to that date. We can also use “=” in place of “:” and it will still work the same. So the above query can be changed to date=201408* date=20140808 As we discussed earlier we can also limit the search results to a particular time period. Let’s say we want to search something from a particular date to till date. In that case we can use date=>20140808 It will provide results from 8th August 2014 to till date, but what if we want to limit both the start date and the end date. In that case also Yandex provide us a provision of providing range. date=20140808..20140810 Here we will get the results form date 8th August 2014 to 10th August 2014. domain: It can be used to specify the search results based of top level domains (TLDs). Mostly this type of the domain search was done to get results from country-specifi domains. Let’s say we wanted to get the list of CERT-empanelled security service providing company names from different countries. In that case we can search for the countryspecifi domain extension let’s say we want to get these details for New Zealand then its TLD is nz. So we can craft a query like Example: “cert empanelled company” domain:nz lang: It can be used to search pages written in specifi languages. Yandex supports some specifi languages such as RU: Russian UK: Ukrainian BE: Belorussian EN: English FR: French DE: German KK: Kazakh TT: Tatar TR: Turkish 98 CHAPTER 5 Advanced Web Searching Though we can always use Google translator to translate the page from any languages to English or any other languages, it’s an added feature provided by Yandex to fulfil minimum requirements of the regions where Yandex is used popularly. So to search a page we need to provide the short form of the languages. Example: power searching lang:en It will search for the pages in English that contains power searching. cat: It is also something unique provided by Yandex. Cat stands for category. Yandex categorizes different things based on region id or topic id. Using cat we can search for a result based on region or topic assigned in Yandex database. The details of Regional codes: http://search.yaca.yandex.ru/geo.c2n. The details of Topic codes: http://search.yaca.yandex.ru/cat.c2n. Though the pages contains data in Russian language, we can always use Google translate to serve this purpose. As we discussed in the beginning that Yandex is an underrated search engine some of its cool features are defiitely going to put a mark on our life once we go through this chapter. One of such feature is its advanced search GUI. There are lazy people like me who want everything in GUI so that they just have to customize everything by providing limited details and selecting some checkbox or radio buttons. Yandex provides that in the below link http://www.yandex.com/search/advanced?&lr=10558 Here we have to just select what we want and most importantly it covers most of the operators we discussed above. So go to the page, select what you want, and search effiiently using GUI. Defiitely after going through all these operators we can easily feel the impact of the advance search or we can also use the term power search for that. The advance search facilitates a user with faster, effiient, and reliable data in the result. It always reduces our manual efforts to get the desired data. And the content quality is also better in advance search as we limit the search to what we are actually looking for. It can be either country-specifi domain search, a particular fie type, or content from a specifi date. These things cannot be done easily with simple keyword search. We are in an age where information is everything. Then the reliability factor comes in to picture and if we want bulk of reliable information from the net in very less time span then we need to focus on the advance search. We can use any conventional search engine of our choice. Most of the search engines have quite similar operators to serve the purpose but there are some special features present; so look for those special features and use different search engines for different customized advance search. Yandex 99 So we learned about various search engines and their operators and how to utilize these operators to search better and get precise results. For some operators we say their individual operations and how they can help to narrow down the results and for some we saw how they can be used with other operators to generate a great query which directly gets us to what we want. Though there are some operators for different search engines which work more or less in the same fashion yet as the crawling and indexing techniques of different platforms are different, it is worthwhile to check which one of them provides better results depending upon our requirements. One thing that we need to keep in mind is that the search providers keep on deprecating the operators or features which are not used frequently enough and also some functionalities are not available in some regions. We saw how easily we can get the results that we actually want with the use of some small but effective techniques. The impact of these techniques is not just limited to fiding out the links to websites, but if used creatively they can be implemented in various filds. Apart from fiding the information on the web, which certainly is useful for everyone, these techniques can be used to fid out details which are profession specifi. For example a marketing professional can scale the size of the website of competitor using the operator “site,” or a sales professional can fid out emails for a company using the wildcard operator “*@randomcompany.com.” We also saw how search engine dorks are used by cyber security professionals to fid out sensitive and compromising information just by using some simple keywords and operators. The takeaway here is not just to learn about the operators but also about how we can use them creatively in our profession. We have covered a lot about how to perform searching using different searching platforms in this and some previous chapters. Till now we have mainly focused on browser-based applications or we can say web applications. In the next chapter we will be moving on and learn about various tools which need to be installed as applications and provide us various features for extracting data related to various filds, using various methods. Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00006-9 101 Copyright © 2015 Elsevier Inc. All rights reserved. CHAPTER OSINT Tools and Techniques 6 INFORMATION IN THIS CHAPTER • OSINT Tools • Geolocation • Information Harvesting • Shodan • Search Diggity • Recon-ng • Yahoo Pipes • Maltego INTRODUCTION In the previous chapters we learned about the basics of the internet and effective ways to search it. We went to great depths of searching social media to unconventional search engines and further learned about effective techniques to use regular search engines. In this chapter we will move a step further and will discuss about some of the automated tools and web-based services which are used frequently to perform reconnaissance by professionals of various intelligence-related domains specially information security. We will start from the installation part to understanding their interface and will further learn about their functionality and usage. Some of these tools provide a rich graphic interface (GUI) and some of them are command line based (CLI), but don’t judge them by their interface but by their functionality and relevance in our fild of work. Before moving any further we must install the dependencies for these tools so that we don’t have to face any issues during their installation and usage. The packages we need are • Java latest version •Python 2.7 • Microsoft .NET Framework v4 We simply need to download the relevant package depending upon our system confiuration and we are good to go. 102 CHAPTER 6 OSINT Tools and Techniques CREEPY Most of us are addicted to social networks, and image sharing is one of the most utilized features of these platforms. But sometimes when we share these pictures it’s not just the image that we are sharing but might also the exact location where that picture was taken. Creepy is a Python application which can extract out this information and display the geolocation on a map. Currently Creepy supports search for Twitter, Flickr, and Instagram. It extracts the geolocation based on EXIF information stored in images, geolocation information available through application programming interface (API), and some other techniques. It can be downloaded from http://ilektrojohn.github.io/creepy/. We simply need to select the version according to our platform and install it. The next phase after installation of Creepy is to confiure the plugins that are available in it, for which we simply need to click on the Plug-in Confiuration button present under the edit tab. Here we can select the plugins and using their individual confiuration wizard confiure them accordingly. Once the confiuration is done we can check whether it is working properly or not using the Test Plugin Confiuration button. FIGURE 6.1 Confiure Creepy. After the confiuration phase is done, we can start a new project by clicking on the person icon on the top bar. Here we can name the project and search for people on different portals. From the search results we can select the person of interest and include him/her in the target list and fiish the wizard. After this our project will be displayed under the project bar at the right-hand side. Creepy 103 FIGURE 6.2 Search users. Now we simply need to select our project and click on the target icon or right click on the project and click Analyze Current Project. After this Creepy will start the analysis, which will take some time. Once the analysis is complete, Creepy will display the results on the map. FIGURE 6.3 Creepy results. Now we can see the results in which the map is populated with the markers according the identifid geolocation. Now Creepy further allows us to narrow down these results based on various fiters. 104 CHAPTER 6 OSINT Tools and Techniques Clicking on the calendar button allows us to fiter the results based on a time period. We can also fiter the results based upon area, which we can defie in the form of radius in kilometers from a point of our choice. We can also see the results in the form of a heat map instead of the markers. The negative sign (−) present at the end can be used to remove all the fiters imposed on the results. FIGURE 6.4 Applying fiter. The results that we get from Creepy can also be downloaded in the form of CSV fie and also as KML, which can be used to display the markers in another map. Creepy can be used for the information-gathering phase during a pentest (penetration test) and also as a proof-of-concept tool to demonstrate to users what information they are revealing about themselves. FIGURE 6.5 Download Creepy results. TheHarvester 105 THEHARVESTER TheHarvester is an open source intelligence tool (OSINT) for obtaining e-mail addresses, employee name, open ports, subdomains, hosts banners, etc. from public sources such as search engines like Google, Bing and other sites such as LinkedIn. It’s a simple Python tool which is easy to use and contains different information-gathering functions. Being a Python tool it’s quite understandable that to use this tool we must have Python installed in our system. This tool is created by Christian Martorella and one of the simple, popular, and widely used tools in terms of information gathering. TheHarvester can be found here: http://www.edge-security.com/theharvester.php Generally we need to input a domain name or company name to collect relevant information such as email addresses, subdomains, or the other details mentioned in the above paragraph. But we can use keywords also to collect related information. We can specify our search, such as from which particular public source we want to use for the information gathering. There are lots of public source that Harvester use for information gathering but before moving to that let’s understand how to use Harvester. EX: theharvester -d example.com -l 500 -b Google -d = Generally, domain name or company name -l = Number of result limits to work with -b = Specifying the data source such as in the above command its Google, but apart from that we can use LinkedIn and all (to use all the available public sources) as a source also to collect information. FIGURE 6.6 TheHarvester in action. 106 CHAPTER 6 OSINT Tools and Techniques Apart from the above mentioned one harvester also has other options to specify, such as: -s = to start with a particular result number (the default value is 0) -v = to get virtual hosts by verifying hostnames via DNS resolution -f= for saving the data. (formats available either html or xml) -n = to perform DNS resolve query for all the discovered ranges -c = to perform DNS bruteforce for all domain names -t= to perform a DNS TLD expansion discovery -e = to use a specifi DNS server -l = To limit the number of result to work with -h = to use Shodan database to query discovered hosts. FIGURE 6.7 TheHarvester HTML results. The sources it uses are Google, Google profies, Bing, pretty good privacy (PGP) servers, LinkedIn, Jigsaw, Shodan, Yandex, name servers, people123, and Shodan 107 Exalead. Google, Yandex, Bing, and Exalead are search engines that are used in backend as a source, while Shodan is also a search engine but not the conventional one and we already discussed a bit about it earlier and we will discuss in detail about the same in this chapter later. PGP servers are like key servers used for data security and those are also a good source to collect e-mail details. The people123 is for searching for a particular person and Jigsaw is the cloud-based solution for lead generation and other sales stuffs. From different sources harvester collects different information such as for e-mail harvesting it uses Google, Bing, PGP servers, and sometimes Exalead and run their specifi queries in the background to get the desired result. Similarly for subdomains or host names it uses again Google, Bing, Yandex, Exalead, PGP servers, and Exalead. And fially for the list for employee names it uses LinkedIn, Google profies, people123, and Jigsaw as a main source. This is how theHarvester harvests all the information and gives us the desired result as per our query. So craft your query wisely to harvest all the required information. SHODAN We have previously discussed about Shodan briefl in Chapter 4, but this unique search engine deserves much more than a paragraph to discuss its usage and impact. As discussed earlier Shodan is a computer search engine. The internet consists of various different types of devices connected online and available publicly. Most of these devices have a banner, which they send as a response to the application request send by a client. Many if not most of these banners contains information which can be called sensitive in nature, such as server version, device type, authentication mode, etc. Shodan allows us to search such devices over internet and also provides fiters to narrow down the results. It is highly recommended to create an account to utilize this great tool, as it removes some of the restrictions imposed on the free usage. So after logging into the application we will simply go to the dashboard at http://www.shodanhq.com/home. Here we can see some the recent searches as well as popular searches made on this platform. This page also shows a quick reference to the fiters that we can use. Moving on let’s see more popular searches listed under the URL http://www.shodanhq.com/browse. Here we can see there are various different search queries which look quite interesting, such as webcam, default password, SCADA, etc. Clicking on one of these directly takes us to the result page and lists details of machines on the internet with that specifi keyword. The page http://www.shodanhq.com/help/fiters shows the list of all the fiters that we can use in Shodan to perform a more focused search, such as country, hostname, port, etc., including the usual fiters “+,”“-,” and “|.” 108 CHAPTER 6 OSINT Tools and Techniques FIGURE 6.8 Shodan popular searches. FIGURE 6.9 Shodan fiters. Let’s perform a simple search on Shodan for the keyword “webcam.” Shodan has simply found more than 15,000 results for this keyword; though we cannot view all the results under the free package, yet what we get is enough to understand its reach and availability of such devices on the internet. Some of these might be protected by some kind of authentication mechanism such as username and password, but some might be publicly accessible without any such mechanism. We can simply fid out by opening Shodan 109 their listed IP address in our browsers (Warning: It might be illegal to do so depending upon the laws of the country, etc.). We can further narrow down these results to a country by using the “country” fiter. So our new query is “webcams country:us” which gives us a list of webcams in the United States of America. FIGURE 6.10 Shodan results for query “webcam” To get a list of machines with fie transfer protocol (FTP) service, residing in India, we can use the query “port:21 country:in”. We can also perform search for specifi IP address or range of it using the fiter “net.” Shodan is providing a great deal of relevant information and its application is only limited by the creativity of its users. FIGURE 6.11 Shodan results for query “port:21 country:in.” 110 CHAPTER 6 OSINT Tools and Techniques Apart from this Shodan also offers an API to integrate its data into our own application. There are also some other services provided by it at a price and are worth a try for anyone working in the information security domain. Recently there has been a lot of development in Shodan and its associated services which makes this product a must try for information security enthusiasts. SEARCH DIGGITY In the last chapter we learned a lot about using advanced search features of various search engines and also briefl discussed about the term “Google Hacking.” To perform such functions we need to have the list of operations that we can use and will have to type each query to see if anything is vulnerable, but what if there was a tool which has a database of such queries and we can simply run it. Here enters the Search Diggity. Search Diggity is tool by Bishop Fox which has a huge set of options and a large database of queries for various search engines which allow us to gather compromising information related to our target. It can be downloaded from http://www.bishopfox.com/resources/tools/google-hacking-diggity/attack-tools/. The basic requirement for its installation is Microsoft .NET framework v4 Once we have downloaded and installed the application, the things we need are the search ids and API keys. These search ids/API keys are required so that we can perform more number of searcher without too many restrictions. We can fid how to get and use these keys in the contents section under the Help tab and also from a some simple Google searches. Once all the keys (Google, Bing, Shodan, etc.) are at their place we can move forward with the usage of the tool.