FIGURE 4.9
SocialMention displaying results and associated statistics.
Social Searcher (http://www.social-searcher.com/)
Social Searcher is yet another social media search engine. It uses Facebook, Twitter and
Google+ as its sources. The interface provided by this search engine is simple. Under
the search tab the search results are distributed into three tabs based on the source,
Introduction 65
under these tabs the posts are listed with a preview, which is very helpful in identifying the ones relevant for us. Similar to SocialMention we can setup e-mail alerts also.
Under the analytics tab we can get the sentiment analysis, users, keywords,
domains, and much more. One of the interesting of these is the popular tab which
lists the results with more interaction such as likes, retweets, etc.
TWITTER
Twitter is one of the most popular social networking sites with huge impact. Apart
from its usual functionality to microblog, it also allows to understand the reach and
user base of any entity which makes it a powerful tool for reconnaissance. Today it is
widely used for market promotion as well as analyze the social landscape.
Topsy (http://topsy.com/)
Topsy is a tool which allows us to search and monitor Twitter. Using it we can check
out the trend of any keyword over Twitter and analyze its reach. The interface is
pretty simple and looks like a conventional search engine, just the results are only
based on Twitter. The results presented by it can be narrowed down to various timeframes such as 1 day, 30 days, etc. We can also fiter out the results to only see the
images, tweets, links, videos, or inflencers. There is another fiter which allows us to
see only results containing results from specifi languages. All in all Topsy is a great
tool for market monitoring for specifi keywords.
FIGURE 4.10
Topsy search.
66 CHAPTER 4 Search the Web—Beyond Convention
Trendsmap (http://trendsmap.com/)
Trendsmap is a great visual platform which shows trending topics in the form of keywords, hashtags, and Twitter handles from the Twitter platform over the world map.
It is great platform which utilizes visual representation of the trends to understand
what’s hot in a specifi region of the world. Apart for showing this visual form of
information it also allows us to search through this information in the form of a topic
or a location which makes it easier for us to see only what we want.
Tweetbeep (http://tweetbeep.com/)
In its own words, Tweetbeep is like Google alerts for Twitter. It is a great service
which allows us to monitor topics of interest on Twitter such as a brand name, product, or updates related to companies and even links. From market monitoring purpose
it’s a great tool which can help us to quickly respond to topics of interest.
Twiangulate (http://twiangulate.com/search)
Twiangulate is a great tool which allows us to perform Twitter triangulations. Using
it we can fid who are the common people who are followers of and are followed
by two different twitter users. Similarly it also provides the feature to compare the
reach of two users. It is great tool to understand and compare the inflence of different Twitter users.
SOURCE CODE SEARCH
Most of the search engines we have used only look for the text visible on the web
page, but there are some search engines which index the source code present on the
internet. These kind of search engines can be very helpful when we are looking for
specifi technology used over the internet, such as a content management system
like WordPress. Utilities of such search engines are for search engine optimization,
competitive analysis, keyword research for marketing and are only limited by the
creativity of the user.
Due to the storage and scalability issues earlier there were no service providers in
this domain, but with technological advancements some options are opening up now,
let checkout some of these.
NerdyData (http://nerdydata.com)
NerdyData is one of the fist of its kind and unique search engine which allows us to
search the code of the web page. Using the platform is pretty simple, go to the URL
https://search.nerdydata.com/, enter the keyword like WordPress 3.7 and NerdyData
will list down the websites which contain that keyword in their source code. The
results not only provide the URL of the website but also shows the section of the
code with the keyword highlighted under the section Source Code Snippet. Apart
from this there are various features such as contact author, fetch backlink, and others
which can be very helpful but most of these are paid, yet the limited free usage of
NerdyData is very useful and is worth a try.
Introduction 67
FIGURE 4.11
NerdyData code search results.
Ohloh code (https://code.ohloh.net)
Ohloh code is another great search engine for source code searching, but it’s
a bit different in terms that it searches for open source code. What this means
is that its source of information is the code residing in open space, such as Git
repositories.
It provides great options to fiter out the results based on defiitions, languages
(programming), extensions, etc. through a bar on the left-hand side titled “Filter
Code Results.”
Searchcode (https://searchcode.com)
Similar to Ohloh, Searchcode also uses open source code repositories as its information source. The search fiters provided by Searchcode are very helpful, some of them
are repository, source, and language.
TECHNOLOGY INFORMATION
In this special section of search engines we will be working on some unique search
engines which will help us to gather information related to various different technologies and much more. In this segment we will be heavily dealing with IP addresses
and related terms, so it is advised to go through the section “Defiing the basic terms”
in the fist chapter.
68 CHAPTER 4 Search the Web—Beyond Convention
Whois (http://whois.net/)
Whois is basically a service which allows us to get information about the registrant
of an internet resource such as a domain name. Whois.net provides a platform using
which we can perform a Whois search for a domain or IP address. A whois record
usually consists of registrar info; date of registration and expiry; registrant info such
as name, e-mail address, etc.
Robtex (http://www.robtex.com)
Robtex is great tool to fid out information about internet resources such as IP
address, Domain name, Autonomous System (AS) number, etc. The interface is
pretty simple and straightforward. At the top left-hand corner is a search bar using
which we can lookup information. Searching for a domain gives us related information like IP address, route, AS number, location, etc. Similarly other information is
provided for IP addresses, route, etc.
W3dt (https://w3dt.net/)
W3dt is great online resource to fid out networking related information. There are
various section which we can explore using this single platform. The fist section is
domain name system (DNS) tools which allows us to perform various DNS-related
queries such as DNS lookup, reverse DNS lookup, DNS server figerprinting, etc.
Second section provides tools related to network/internet such as port scan, traceroute, MX record retriever, etc. The next section is web/HTTP which consists of tools
such as SSL certifiate info, URL encode/decode, HTTP header retrieval, etc., then
comes the database lookups section under which comes MAC address lookup, Whois
lookup, etc., in the end there are some general and ping-related tools. All in all it is
great set of tools which allows to perform a huge list of different useful functions
under single interface.
Shodan (http://www.shodanhq.com/)
So far we have used various types of search engines which help us to explore the web
in all different ways. What we haven’t encountered till now is an internet search engine
(remember the difference between web and internet explained in chapter 1) or simply
said a computer search engine. Shodan is a computer search engine which scans the
internet and grabs the service banner based on IP address and port. It allows us to search
this information using IP addresses, country fiters, and much more. Using it we can
fid out simple information such as websites using a specifi type of web server such as
Internet Information Services (IIS) or Apache and also information which can be quite
sensitive such as IP cameras without authentication or SCADA systems over internet.
Though the free version without registration provides very limited information,
which can be mitigated a bit using a registered account, yet it is suffiient enough to
understand the power of this unique search engine. We can utilize the power of this
tool through browser add-on or through its application programming interface also.
Shodan has a very active development history and comes up with new features all the
time, so we can expect much more from it in the future.
Introduction 69
FIGURE 4.12
Shodan results for port 21.
WayBack Machine (http://archive.org/web/web.php)
Internet Archive WayBack Machine is great resource to lookup how a website looked
in past. Simply type the website address into the search bar and it will return back
a timeline with the available snapshot highlighted on the calendar. Simply hovering
over these highlighted dates over calendar will present a link to the snapshot. This is
great tool to analyze how a website has evolved and thus monitor its past growth. It
can also be helpful to retrieve information from a website which was available in the
past but is not now.
REVERSE IMAGE SEARCH
We all are familiar with the phrase “A picture is worth a thousand words” and its veracity and are also aware of platforms like Google Images (http://images.google.com),
Flickr (https://www.flckr.com/), Deviantart (http://www.deviantart.com/), which
provides us images for keywords provided. Usually when we need to lookup some
information, we have a keyword or a set of them in the form of text, following the
same lead the search engines we have dealt with till now take text as an input and
get us the results, but in case we have an image and we want to see where it appears
on the web, where do we go? This is where reverse image search engines come in,
which take image as an input and looks up to fid its web appearance. Let’s get
familiar with some of these.
70 CHAPTER 4 Search the Web—Beyond Convention
Google Images (http://images.google.com/)
We all are aware that Google allows us to search the web for images, but what many
of us are unaware of is that it also allows to perform a reverse image search. We
simply need to go to the URL http://images.google.com and click on the camera icon
and provide the URL of the image on the web or upload a locally stored image fie,
we can also drag and drop an image fie into the search bar and voila Google comes
up with links to the pages containing that or similar images on the web.
FIGURE 4.13
Google reverse image search.
TinEye (https://www.tineye.com/)
TinEye is another reverse image search engine and has a huge database of images.
Similar to Google images, searching on TinEye is very simple, we can provide the
URL to the image, upload it, or perform a drag and drop. TinEye also provides
browser plugin for major browsers, which makes the task much easier. Though the
results of TinEye are not as comprehensive as Google images, yet it provides a great
platform for the task and must be tried.
ImageRaider (http://www.ImageRaider.com/)
Last but not the least in this list is ImageRaider. ImageRaider simply lists the
results domain wise. If a domain contains more than one occurrence of the
Introduction 71
image then it also tells that and the links to those images are listed under the
domain name.
Reverse image search can be very helpful to fid out more about someone when
we are hitting dead-ends using conventional methods. As many people use same
profie picture for various different platforms, making a reverse image search can
lead us to other platforms where the use has created a profie and also has previously
undiscovered information.
MISCELLANEOUS
We dealt with a huge list of search engines which are specialize in their domain and
are popular among a community. In this section we will be dealing with some different types of search platforms which are lesser known but serve unique purposes and
are very helpful in special cases.
DataMarket (http://datamarket.com/)
DataMarket is an open portal which consists of large data sets and provides the data
in a great manner through visualizations. The simple search feature provides results
for global topics with list of different visualizations related to the topic, for example,
searching for the keyword gold would provide results such as gold statistics, import/
export of gold, and much more. The results page consists of a bar on the left which
provides a list of fiters using which the listed results can be narrowed down. It also
allows us to upload our own data and create visualization from it. Refer to the link
http://datamarket.com/topic/list/ for a huge list of topics on which DataMarket provides information.
WolframAlpha (http://www.wolframalpha.com/)
In this chapter we learned about various search engines which take some value as
input and provide us with the links which might contain the answer to the questions
we are actually looking for, but what we are going to learn about now is not a search
engine but a computational knowledge engine. What this means is that it takes our
queries as input but does not provides with the URLs to the websites containing the
information, instead it tries to understand our natural language queries and based
upon an organized data set, provides a factual answer to them in form of text and
sometimes apposite visualization also.
Say, for example, we want to know the purpose of .mil domain, so we can simply type in the query “what is the purpose of the .mil internet domain?” and get
the results, to get the words starting with a and ending with e, a query like “words
starting with a and ending with e” would give us the results, we can even check the
net worth of Warren Buffett by a query like “Warren Buffett net worth.” For more
examples of the queries of various domains that WolframAlpha is able to answer,
checkout the page http://www.wolframalpha.com/examples/.
72 CHAPTER 4 Search the Web—Beyond Convention
FIGURE 4.14
WolframAlpha result.
Addictomatic (http://addictomatic.com)
Usually we visit various different platforms to search information related to a topic,
but addictomatic aggregate various news and media sources to create a single dashboard for any topic of our interest. The content aggregated is displayed in various
sections depending upon the source. It also allows us to move these sections depending upon our preference for better readability.
Carrot2 (http://search.carrot2.org/stable/search)
Carrot2 is a search results clustering engine, what this means is that it takes
search results from other search engines and organizes these results into topics
using its search results clustering algorithms. Its unique capability to cluster
the results into topics allows to get a better understanding of it and associated terms. These clusters are also represented in different interesting forms
such as folders, circles, and FoamTree. Carrot2 can be used through its web
interface which can be accessed using the URL http://search.carrot2.org/
and also through a software application which can be downloaded from
http://project.carrot2.org/download.html.
Introduction 73
FIGURE 4.15
Carrot2 search result cluster.
Boardreader (http://boardreader.com/)
Boards and forums are rich source of information as a lot of interaction and Q&A goes
on in places like this. Members of such platforms range from newbies to experts in the
domain to which the forum is related to. In places like this we can get answers to questions
which are diffiult to fid elsewhere as they purely comprise of user-generated content,
but how do we search them? Here is the answer Boardreader. It allows us to search
forums to get results which contains content with human interaction. It also displays a
trend graph of the search query keyword to show the amount of activity related to it. The
advance search features provided by it such as sort by relevance, occurrence between
specifi dates, domain-specifi search, etc. adds to its already incredible features.
Omgili (http://omgili.com/)
Similar to Boardreader, Omgili is also a forum and boards search engine. It displays
the results in the form of broad bars and these bars contain information such as date,
number of posts, author, etc. which can be helpful in estimating the relevance of the
result. One such information is Thread Info, which provides further information about
a thread such as forum name, number of authors, and replies to the thread, without
actually visiting the original thread forum page. It also allows us to fiter the results
based upon the timeline of their occurrence such as past month, week, day, etc.
74 CHAPTER 4 Search the Web—Beyond Convention
Truecaller (http://www.truecaller.com)
Almost everyone who uses or has ever used a smartphone is familiar with the concept
of mobile applications, better known as apps and many if not most of them have used
the famous app called Truecaller which helps to identify the person behind the phone
number, what many of us are unaware of is that it can also be used through a web
browser. Truecaller simply allows us to search using a phone number and provides
the user’s details using it’s crowdsourced database.
So we discussed a huge list of various search engines under various categories
which are not conventionally used but as we have already seen these are very useful
in different scenarios. We all are addicted to Google for all our searching needs and it
being one of the best in its domain has also served our purpose most of the time, but
sometimes we need different and specifi answers to our queries, then we need these
kind of search engines. This list tries to cover most of the aspects of daily searching
needs, yet surely there must be other platforms which need to be fid out and used
commonly to solve specifi problems.
In this chapter we learned about various unconventional search engines, their
features, and functionalities, but what about the conventional search engines like
Google, Bing, Yahoo, etc. that we use on daily basis. Oh! we already know how to
Other search engines worth trying:
• Meta search engine
• Search (http://www.search.com/)
•People search
• ZabaSearch (http://www.zabasearch.com/)
•Company search
• Hoovers (http://www.hoovers.com/)
• Kompass (http://kompass.com/)
•Semantic
• Sensebot (http://www.sensebot.net/)
• Social media search
• Whostalkin (http://www.whostalkin.com/)
•Twitter search
• Mentionmapp (http://mentionmapp.com/)
• SocialCollider (http://socialcollider.net/)
• GeoChirp (http://www.geochirp.com/)
• Twitterfall (http://beta.twitterfall.com/)
• Source code search
• Meanpath (https://meanpath.com)
•Technology search
• Netcraft (http://www.netcraft.com/)
• Serversniff (http://serversniff.net)
• Reverse image search
• NerdyData image search (https://search.nerdydata.com/images)
•Miscellaneous
• Freebase (http://www.freebase.com/)
Introduction 75
use them or do we? The search engines we use on daily basis have various advanced
features which many of the users are unaware of. These features allows users to fiter
out the results so that we can get more information and less noise. In the next chapter
we will be dealing with conventional search engines and will learn how to use them
effectively to perform better search and get specifi results.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00005-7 77
Copyright © 2015 Elsevier Inc. All rights reserved.
CHAPTER
Advanced Web Searching 5
INFORMATION IN THIS CHAPTER
• Search Engines
• Conventional Search Engines
• Advanced Search Operators of various Search Engines
• Examples and Usage
INTRODUCTION
In the last chapter we dealt with some special platforms which allowed us to perform domain-specifi searches; now let’s go into the depths of conventional search
engines which we use on daily basis and check out how we can utilize them more
effiiently. In this chapter, basically, we will understand the working and advanced
search features of some of the well-known search engines and see what all functionalities and fiters they provide to serve us better.
So we already have a basic idea about what search engine is, how it crawls over
the web to collect information, which are further indexed to provide us with search
results. Let’s revise it once and understand it in more depth.
Web pages as we see them are not actually what they look like. Web pages basically contain HyperText Markup Language (HTML) code and most of the times
some JavaScript and other scripting languages. So HTML is basically a markup language and uses tags to structure the information, for example the tag <h1></h1>
is used to create a heading. When we receive this HTML code from the server, our
browsers interpret this code and display us the web page in its rendered form. To
check the client-side source code of a web page, simply press Ctrl+U in the browser
with a web page open.
Once the web crawler of a search engine reaches a web page, it goes through
its HTML code. Now most of the times these pages also contain links to other
pages, which are used by the crawlers to move further in their quest to collect
data. The content crawled by the web crawler is then stored and indexed by
search engine based on variety of factors. The pages are ranked based upon their
structure (as defied in HTML), the keywords used, interlinking of the pages,
media present on the page, and many other details. Once a page has been crawled
and indexed it is ready to be presented to the user of the search engine depending
upon the query.
78 CHAPTER 5 Advanced Web Searching
Once a page has been crawled, the job of the crawler does not fiish for that page.
The crawler is scheduled to perform the complete process again after a specifi time
as the content of the page might change. So this process keeps on going and as new
pages are linked they are also crawled and indexed.
Search engine is a huge industry in itself which helps us in our web exploration,
but there is another industry which depends directly on search engines and that is
search engine optimization (SEO). SEO is basically about increasing the rank of
a website/web page or in other words to bring it up to the starting result pages of a
search engine. The motivation behind this is that it will increase the visibility of that
page/site and hence will get more traffi which can be helpful from a commercial or
personal point of view.
Now we have a good understanding of the search engines and how they operate,
let’s move ahead and see how we can better use some of the conventional search
engines.
GOOGLE
Google is one of the most widely used search engines and is the starting point for
web exploration for most of us. Initially Google search was accessible through very
simple interface and provided limited information. Apart from the search box there
were some special search links, links about the company, and a subscription box
where we could enter our email to get updates. There were no ads, no different language options, no login, etc.
It’s not only the look and feel of the interface that has changed over the years but
also the functionalities. It has evolved from providing simple web links to the pages
containing relevant information to a whole bunch of related tools which not only
allow us to search different media types and categories but also narrow down these
results using various fiters. Today there are various categories of search results such
as images, news, maps, videos, and much more. These plethora of functionalities
provided by Google today has certainly made our lives much easier and made the act
of fiding information on the web a piece of cake. Still sometimes we face diffiulty
in fiding the exact information we are looking for and the main reason behind it is
not the lack of information but to the contrary the abundance of it.
Let’s move on to see how we perform Google search and how to improve it. So
whenever we need to search something in Google we simply think about some of the
keywords associated with it and type them into the search bar and hit Enter. Based
upon the indexing Google simply provides us with the associated resources. Now if
we want to get better results or fiter the existing results based upon various factors,
we need to use Google advanced search operators. Let’s have a look at these operators and their usage.
site:
It fetches results only for the site provided. It is very useful when to limit our
search to some specifi domain. It can be used with another keyword and Google
Google 79
will bring back related pages from the site specifid. For an information security
perspective it is very useful to fid out different sub domains related to a particular
domain.
Examples: site:gov, site:house.gov
FIGURE 5.1
Google “site” operator usage.
inurl:
This operator allows looking for keywords in the uniform resource locator (URL) of
the site. It is useful to fid out pages which follow a usual keyword for specifi pages,
such as contact us. Generally, as the URL contains some keywords associated with
the body contents, it will help us to fid out the equivalent page for the keyword we
are searching for.
Example: inurl:hack
allinurl:
Similar to “inurl” this operator allows looking for multiple keywords in the URL. So
we can search for multiple keywords in the URL of a page. This also enhances the
chances of getting quality content of what we are looking for.
Example: allinurl:hack security
intext:
This operator makes sure that the keyword specifid is present in the text of the page.
Sometimes just for the sake of SEO, we can fid some pages only contain keywords
to enhance the page rank but not the associated content. In that case we can use this
80 CHAPTER 5 Advanced Web Searching
query parameter to get the appropriate content from a page for the keyword we are
looking for.
Example: intext:hack
allintext:
Similar to the “intext” this operator allows to lookup for multiple keywords in the
text. As we discussed earlier the feature of searching for multiple keywords always
enhances the content quality in the result page.
Example: allintext:data marketing
intitle:
It allows us to restrict the results by the keywords present in the title of the pages
(title tag: <title>XYZ</title>). It can be helpful to identify pages which follow a convention for the title of the pages such as directory listing by the keywords “index of”
and most of the sites provide the keywords in the title for improving the page rank.
So this query parameter always helps to search for a particular keyword.
Example: intitle:blueocean
allintitle:
This is the multiple keyword counterpart of “intitle” operator.
Example: allintitle:blueocean market
fietype:
This operator is used to fid out fies of a specifi kind. It supports multiple fie types
such as pdf, swf, kml, doc, svg, txt, etc. This operator comes handy when we are only
looking for specifi type of fies on a specifi domain.
Example: fietype:pdf, site:xyz.com, fietype:doc
ext:
The operator ext simply stands for extension and it works similar to the fietype
operator.
Example: ext:pdf
defie:
This operator is used to fid out the meaning of the keyword supplied. Google returns
dictionary meaning and synonyms for the keyword.
Example: defie:data
AROUND
This operator is helpful when we are looking for the results which contain two
different keywords, but in close association. It allows us to restrict the number
Google 81
of words as the maximum distance between two different keywords in the search
results.
Example: A AROUND(6) Z
AND
A simple Boolean operator which makes sure keywords on both the side are present
in the search results.
Example: data AND market
OR
Another Boolean operator which provides search results that contain either of the
keyword present on both the sides of the operator.
Example: data OR intelligence
NOT
Yet another Boolean operator which excludes the search results that contain the keyword followed by it.
Example: lotus NOT flwer
“”
This operator is useful when we need to search for the results which contain the
provided keyword in the exact sequence. For example we can search pages which
contain quotes or some lyrics.
Example: “time is precious”
-
This operator excludes the search results which contain the keyword followed by it
(no space).
Example: lotus -flwer
*
This wildcard operator is used as a generic placeholder for the unknown term.
We can use this to get quotes which we partially remember or to check variants
of one.
Example: “* is precious”
..
This special operator is used to provide a number range. It is quite useful to enforce
a price range, time range (date), etc.
Example: japan volcano 1990..2000
82 CHAPTER 5 Advanced Web Searching
info:
The info operator provides information what Google has on a specifi domain. Links
to different types of information are present in the results, such as cache, similar
websites, etc.
Example: info:elsevier.com
related:
This operator is used to fid out other web pages similar to the provided domain. It
is very helpful when we are looking for websites which provide similar services to a
website or to fid the competitors of it.
Example: related:elsevier.com
cache:
This operator redirects to the latest cache of the page that Google has crawled. In case we
don’t get a result for a website which was accessible earlier, this is a good option to try.
Example: cache:elsevier.com
Advanced Google search can also be performed using the page
http://www.google.com/advanced_search, which allows us to perform restricted
search without using the operators mentioned above.
FIGURE 5.2
Google advanced search page.
Apart from the operators Google also provide some operations which allow us to
check information about current events and also perform some other useful things.
Some examples are:
Google 83
time
Simply entering this keyword displays the current time of the location we are residing in. We can also use name of region to get its current time.
Example: time france
weather
This keyword shows the current weather condition of our current location. Similar to
“time” keyword we can also use it to get the weather conditions of a different region.
Example: weather sweden
Calculator
Google also solves mathematical equations and also provides a calculator.
Example: 39*(9823-312)+44/3
Convertor
Google can be used to perform conversions for different types of units like measurement units, currency, time, etc.
 Example: 6 feet in meters
This is not all, sometimes Google also shows relevant information related to
global events as and when they happen; for example, FIFA World Cup.
Apart from searching the web, in general, Google also allows us to search specifi categories such as images, news, videos, etc. All these categories, including web
have some common and some specifi search fiters of their own. These options can
simply be accessed by clicking on the “Search tools” tab just below the search bar.
We can fid options which allow us to restrict the results based upon the country,
time of publish for web; for images there are options like the color of image, its type,
usage rights, etc. and similarly other relevant fiters for different categories. These
options can be very helpful in fiding the required information of a category as they
are designed according to that specifi category. For example if we are looking for an
old photograph of something it is a good idea to see only the results which are black
and white.
The operators we discussed are certainly very useful for anyone who needs to fid out
some information on the web, but the InfoSec community has certainly taken it to next
level. These simple and innocent operators we just discussed are widely used in the cyber
security industry to fid and demonstrate how without even touching the target system,
critical and compromising information can be retrieved. This technique of using Google
search engine operators to fid such information is termed as “Google Hacking.”
When it comes to “Google Hacking” one name that jumps out in mind is Johnny
Long. Johnny was an early adopter and pioneer in the fild of creating such Google
queries which could provide sensitive information related to the target. These queries
are widely popular by the name Google Dorks.
Let’s understand how this technique works. We saw a number of operators which
can narrow down search results to a specifi domain, fietype, title value, etc. Now
84 CHAPTER 5 Advanced Web Searching
in Google Hacking our motive is to fid sensitive information related to the target;
for this people have come up with various different signatures for different fies and
pages which are known to contain such information. For example, let’s just say we
know the name of a sensitive directory which should not be directly accessible to
any user publicly, but remains public by default after the installation of the related
application. So now if we want to fid out the sites which have not changed the
accessibility for this directory, we can simply use the query “inurl:/sensitive_directory_name/” and we will get a bunch of websites which haven’t changed the setting.
Now if we want to further narrow it down for a specifi website, we can combine
the query with the operator “site,” as “site:targetdomain.com inurl://sensitive_directory_name/.” Similarly we can fid out sensitive fies that are existing on a website
by using the operators “site” and “fietype” in collaboration.
Let’s take another example of Google Hacking which can help us to discover high
severity vulnerability in a website. Many developers use flsh to make websites more
interactive and visually appealing. Small web format (SWF) is a flsh fie format used
to create such multimedia. Now there are many SWF players known to be vulnerable to
cross-site scripting (XSS), which could lead to an account compromise. Now if we want
to fid out if the target domain is vulnerable to such attack, then we can simply put in
the query “site:targetdomain.com fietype:swf SWFPlayer_signature_keyword” and test
the resulting pages using publicly available payloads to verify. There are huge number
of signatures to fid out various types of pages such as sensitive directories, web server
identifiation, fies containing username/password, admin login pages, and much more.
The Google Hacking Database created by Johnny Long can be found at
http://www.hackersforcharity.org/ghdb/ though it is not updated, yet it is a great place
to understand and learn how we can use Google to fid out sensitive information. A
regularly updated version can be found at http://www.exploit-db.com/google-dorks/.
FIGURE 5.3
Google hacking database- www.exploit-db.com/google-dorks/.
Bing 85
BING
Microsoft has been providing search engine solutions from a long time and they
have been known with different names. Bing is latest and most feature-rich search
engine in this series. Unlike its predecessors Bing provides a more clean and simple
interface. As Microsoft covers a major part of operating system market, the general
perspective of a user in terms of search engine is that Bing is just another sideproduct from a technology giant and hence most of them do not take it seriously. But
unfortunately it is wrong. Like all the search engines Bing also has some unique features that will force you to use Bing when you need those features. Defiitely those
features have a unique mark on how we search. We will discuss not only about the
special features but also the general operators which can allow us to understand the
search engine and its functionalities.
+
This operator works quite similar in all the search engines. This allows a user to
forcefully add single or multiple keywords in a search query. Bing will make sure the
keywords come after + operator must present in the result pages.
Example: power +search
-
This operator is also known as NOT operator. This is used to exclude something from
a set of things, such as excluding a cuisine.
 Example: Italian food -pizza
Here Bing will display all the Italian foods available but not pizza. We can write
this in another form which can also fetch same result such as the below example
Example: Italian food NOT pizza
“”
This is also same in most of the search engines. This is used to search for exact phrase
used inside double quotation.
Example: “How to do Power Searching?”
|
This is also known as OR operator, mostly used for getting result from one of the two
keywords or one of the many keywords added with this operator.
Example: ios | android
ios OR android
86 CHAPTER 5 Advanced Web Searching
&
This operator is also known as AND operator. This is the by-default used search
operator. If we do nothing and add multiple keywords then Bing will do a AND
search in the backend and give us the result.
Example: power AND search
power & search
As this is the default search, it’s very important to keep in mind that until and
unless we use OR and NOT in capital, Bing won’t understand it as operators.
()
This can be called as group operator.
As parenthesis has the top priority order, we can add the lower preferred operators
such as OR in that and create a group query to execute the lower priority operators
fist.
Example: android phone AND (nexus OR xperia)
site:
This operator will help to search a particular keyword within a specifi website. This
operator works quite the same in most of the search engines.
Example: site:owasp.org clickjacking
fietype:
This allows a user to search for data in specifi type of fie. Bing supports all fie
types but few, mostly those are supported by Google are also supported by Bing.
Example: hack fietype:pdf
ip:
This unique operator provided by Bing allows us to search web pages based upon
IP address. Using it we can perform a reverse IP search, which means it allows us to
look for pages hosted on the specifid IP).
Example: ip:176.65.66.66
Grouping of Bing operators supported in following order.
()
“”
NOT/-
And/&
OR/|
Bing 87
feed:
Yet another unique operator provided by Bing is feed, which allows us to look for
web feed pages containing the provided keyword.
One other feature that Bing provides is to perform social search using the page
https://www.bing.com/explore/social. It allows us to connect our social network
accounts with Bing and perform search within them.
FIGURE 5.5
Bing social search.
FIGURE 5.4
Bing “ip” search.
88 CHAPTER 5 Advanced Web Searching
YAHOO
Yahoo is one of the oldest players in the search engine arena and has been quite popular. The search page for Yahoo also has a lot of content such as news, trending topics,
weather, fiancial information, and much more. Earlier Yahoo has utilized third party
services to power its search capabilities, later it shifted to become independent and
once again has joined forces with Bing for its searching services. Though there is
not too much that Yahoo offers in terms of advanced searching as compared to other
search engines, the ones provided are worth trying comparing to others. Let’s see
some of the operators that can be useful.
+
This operator is used to make sure the search results contain the keyword followed by it.
Example: +data
-
Opposite to the “+” operator, this operator is used to exclude any specifi keyword
from the search results.
Example: -info
OR
This operator allows us to get results for either of the keywords supplied.
Example: data OR info
site:
This operator allows restricting the result only to the site provided. We will only get
to see the links from the specifid website. There are two other operators which work
like this operator but do not provide results as accurate or in-depth as they are domain
and hostname. Their usage is similar to the “site” operator.
Example: site:elsevier.com
link:
It is another interesting operator which allows us to lookup web pages which link to
the specifi web page provided. While using this operator do keep in mind to provide
the URL with the protocol (http:// or https://).

Yahoo 89
Example: link:http://www.elsevier.com/
defie:
We can use this operator to fid out the dictionary meaning of a word.
Example: defie:data
intitle:
The “intitle” operator is used to get the results which contain the specifid keyword
in their title tag.
Example: intitle:data
So these are the operators which Yahoo supports. Apart from these we can access
the Yahoo advanced search page at http://search.yahoo.com/search/options?fr=
fp-top&p=, which allows us to achieve well-fitered search results. One other thing
that Yahoo offers is advanced news search which can be performed using the page
http://news.search.yahoo.com/advanced .
FIGURE 5.6
Yahoo “link” search.
90 CHAPTER 5 Advanced Web Searching
FIGURE 5.7
Yahoo advanced search page.
YANDEX:
Yandex is Russian search engine and is not too much popular outside the country,
but it’s one of the most powerful search engines available. Like Google, Bing, Yahoo
it has its own unique keywords and data indexed. Yandex is the most popular and
widely used search engine in Russia. It’s the fourth largest search engine in the world.
Apart from Russia, it is also used in countries like Ukraine, Kazakhstan, Turkey, and
Belarus. It is also most under rated search engine as its use is only limited to specifi
country but in security community we see it otherwise. Most of the people are either
happy with their conventional search engine or they think all the internet information
is available in the search engine they are using. But the fact is that search engines like
Yandex also have many unique features that can provide us with way effiient result
as compared to other search engines.
Here we will discuss how Yandex can be a game changer in searching data on
internet and how to use it effiiently.
As discussed earlier like other search engines, Yandex has its own operators such
as lang, parenthesis, Boolean, and all. Let’s get familiar with these operators and
their usage.
+
This operator works quite same for all the search engines. Here also for Yandex, +
operator is used to include a keyword in a search result page. The keyword added
after + operator is the primary keyword in the search query. The result fetched by the
search engine must contain that keyword.
Yandex 91
 Example: osint +tools
Here the result page might not contain the OSINT keyword but must contain tools
keyword. So when we want to focus on a particular keyword or set of keywords in
Yandex, we must use + operator.
∼∼
This is used as NOT operator which is used to exclude a keyword from a search result
page. It can be used in excluding a particular thing from a set of the things. Let’s say
we want to buy mobile phone but not windows phone. Then we can craft a query
accordingly to avoid windows phone from search result by using ∼∼ operator.
Example: mobile phone ∼∼ windows
∼
Unlike ∼∼ operator ∼ is used to exclude a keyword not from search result page but
search result sentence. That means we might have both or all the keywords present
in the query in a page but the excluded keyword must not be in any sentence with the
other keywords mentioned. I understand it being little complicated so let me explain
simply. Let’s start with the above query
 mobile phone ∼∼ windows
Here if a page contains both mobile phone as well as windows, Yandex will
exclude that page from search result.
 Example: mobile phone ∼ windows
But for the example shown above, it will show all the pages that contains both
mobile phone as well as windows but not if these two keywords are in same sentence.
&&
The && operator is used to show pages that contains both the keywords in search
result.
 Example: power && searching
It will provide the results of all the pages that contain both these keywords.
&
This operator is used to show only pages that contains both the keywords in a sen-
tence. It provides more refied result for both the keywords.
Example: power & searching
/number
It’s a special operator which can be used for different purposes according to the number used after slash. It’s used for defiing the closeness of the keywords. It is quite
similar to AROUND operator of Google and NEAR operator of Bing. The number
used with slash defies the word distance between two keywords.
92 CHAPTER 5 Advanced Web Searching
Example: power /4 searching
Yandex will make sure that the result page must contain these two keywords with
in four words from each other irrespective of keyword position. That means the order
in which we created the query with the keywords might change in result page.
What if we need to fi the order? Yes, Yandex has a solution for that also: adding
a +sign with the number.
 Example: power /+4 searching
By adding the + operator before the number will force Yandex to respond with the
results with only pages where these two keywords are in same order and in within 4
word count.
What if we need the reverse of it, let’s say we need to get results of keyword
“searching” fist and after that “power” within 4 word count and not vice versa. In
that case negative number will come pretty handy where we can use - sign to reverse
what we just did without getting the vice versa result.
 Example: power /-4 searching
This will only display pages which contain searching keyword and power after
that within 4 word count.
Let’s say we want to setup a radius or boundary for a keyword with respect to
another; in that case we have to specify that keyword in second position.
 Example: power /(-3 +4) searching
Here we are setting up a radius for searching with respect to power. This means
that the page is displayed in results shown only if either “searching” will be found
within 3 words before or after “power” within 4 word count.
This can be helpful when we are searching for two people’s names. In that case
we cannot guess that which name will come fist and which name will come next
so it’s better to create a radius for those two names, and the query will serve our
purpose.
As we discussed a lot about word-based keyword search, now let’s put some light
on sentence-based keyword search. For sentence based keyword search we can use
Yandex && operator with this number operator.
Example: power && /4 searching
In this case we can get result pages containing these two keywords with in 4
sentence difference irrespective of the position of the keyword. That means either
“power” may come fist and “searching” after that or vice versa.
!
This operator does something special. And this is one of my favorite keyword. It
gives a user freedom to only search a specifi keyword without similar word search
or extended search and all. What exactly happens in general search is that if you
Yandex 93
search for a keyword, let’s say AND, you will get some results showing only AND
and then the results will extend to ANDroid or AMD and so on. If we want to get only
result for AND keyword; use this operator.
 Example: !and
This will restrict the search engine to provide results only showing pages which
contains this particular keyword AND.
!!
It can be used to search the dictionary form of the keyword.
Example: !!and
()
When we want to create a complex query with different keywords and operators we
can use these brackets to group them. As we already used these brackets above, now
we will see some other example to understand the true power of this.
FIGURE 5.8
Yandex complex query.
 Example: power && (+searching | !search)
Here the query will search for both sets of keywords fist power searching and
power search but not both in same result.
“”
Now it’s about a keyword let’s say we want to search a particular string or set of
keywords then what to do? Here this operator “” comes for rescue. It is quite similar
94 CHAPTER 5 Advanced Web Searching
as Google’s “”. This will allow a user to search for exact keywords or string which is
put inside the double quotes.
 Example: “What is OSINT?”
It will search for exact string and if available will give us the result accordingly.
*
This operator can be refereed as wildcard operator. The use of this operator is quite
same in most of the search engines. This operator is used to fil the missing keyword
or suggest relevant keywords according to the other keywords used in the search
query.
Example: osint is * of technology
It will search for auto fil the space where * is used to complete the query with
relevant keywords. In this case that can be ocean or treasure or anything. We can also
use this operator with double quote to get more effiient and accurate result.
Example: “OSINT is * of technology”
|
This is also quite similar to OR operator of Google. It allows us to go for different
keywords where we want results for any of them. In-real time scenario we can search
for options using this operator. Let’s say I want to buy a laptop and I have different
options: in that case this operator will come to picture.
 Example: dell | toshiba | macbook
Here we can get result for any of these three options but not all in one result.
<<
This is an unusual operator known as non-ranking “AND.” It is basically used to
add additional keywords to the list of keywords without impacting the ranking of
the website on result. We might not get to know what exactly it does by just going
through its defiitions. So in simple words it can be used to tag additional keywords
to the query list without impacting the page rankings.
 Example: power searching << OSINT
It can be used to additionally search for OSINT along with the other two keywords without impacting the page ranking in the result page.
title:
This is quite equivalent to the “intitle.” It can be used to search the pages with the
keyword (s) specifid after title query parameter.
 Example: title:osint
Yandex 95
This will provide pages that contain OSINT in the title of the web page. Similarly
we can use this title query parameter to search for more than one keyword.
Example: title:(power searching)
url:
This “url” search query parameter is also an add-on. It searches for the exact URL
provided by the user in Yandex database.
 Example: url:http://attacker.in
Here Yandex will provide a result if and only if the URL has been crawled and
indexed in its database.
inurl:
It can be used to search for keywords present in a URL or in other words for URL
fragment search. This “inurl” query parameter works quite similar in all the search
engines.
 Example: inurl:osint
It will search for all the URLs that contain osint keyword no matter what the position of the keyword is.
mime:fietype
This query parameter is quite similar to “fietype” query parameter of Google. This
helps a user to search for a particular fie type.
 Example: osint mime:pdf
FIGURE 5.9
Yandex fie search.
96 CHAPTER 5 Advanced Web Searching
It will provide us all the PDF links that contains osint keyword. The fie types
supported by Yandex mime are
PDF, RTF, SWF, DOC, XLS, PPT, DOCX, PPTX, XLSX, ODT, ODS, ODP,
ODG
host:
It can be used to search all the available hosts. This can be used by the penetration
testers mostly.
Example: host:owasp.org
rhost:
It is quite similar to host but “rhost” searches for reverse hosts. This can also be used
by the penetration testers to get all the reverse host details.
It can be used in two ways. One is for subdomains by using the wildcard operator
* at the end or another without that.
Example: rhost:org.owasp.*
rhost:org.owasp.www
site:
This operator is like the best friend of a penetration tester or hacker. This is available in most of the search engines. It provides all the details of subdomains of the
provided URL.
For penetration testers or hackers fiding the right place to search for vulnerability is most important. As in most cases the main sites are much secured as
compared to the subdomains, if any operator helps to simplify the process by
providing details of the subdomains to any hacker or penetration tester then half
work is done. So the importance of this operator is defiitely felt in security
industry.
Example: site:http://www.owasp.org
It will provide all the available subdomains of the domain owasp.com as well as
all the pages.
date:
This query can be used to either limit the search data to a specifi date or to specifi
period by a little enhancement in the query.
 Example: date:201408*
In this case, format of date used is YYYYMMDD, but in case of the DD we used
wildcard operator “*” so we will get results limited to August 2014.
Yandex 97
We can also limit the same to a particular date of the August 2014 by changing a
bit in the query.
date:20140808
It will only show results belong to that date.
We can also use “=” in place of “:” and it will still work the same. So the above
query can be changed to
date=201408*
date=20140808
As we discussed earlier we can also limit the search results to a particular time
period. Let’s say we want to search something from a particular date to till date. In
that case we can use
date=>20140808
It will provide results from 8th August 2014 to till date, but what if we want to
limit both the start date and the end date. In that case also Yandex provide us a provision of providing range.
date=20140808..20140810
Here we will get the results form date 8th August 2014 to 10th August 2014.
domain:
It can be used to specify the search results based of top level domains (TLDs). Mostly
this type of the domain search was done to get results from country-specifi domains.
Let’s say we wanted to get the list of CERT-empanelled security service providing
company names from different countries. In that case we can search for the countryspecifi domain extension let’s say we want to get these details for New Zealand then
its TLD is nz. So we can craft a query like
Example: “cert empanelled company” domain:nz
lang:
It can be used to search pages written in specifi languages.
Yandex supports some specifi languages such as
RU: Russian
UK: Ukrainian
BE: Belorussian
EN: English
FR: French
DE: German
KK: Kazakh
TT: Tatar
TR: Turkish
98 CHAPTER 5 Advanced Web Searching
Though we can always use Google translator to translate the page from any
languages to English or any other languages, it’s an added feature provided by
Yandex to fulfil minimum requirements of the regions where Yandex is used
popularly.
 So to search a page we need to provide the short form of the languages.
 Example: power searching lang:en
It will search for the pages in English that contains power searching.
cat:
It is also something unique provided by Yandex. Cat stands for category. Yandex categorizes different things based on region id or topic id. Using cat we can search for a
result based on region or topic assigned in Yandex database.
The details of Regional codes: http://search.yaca.yandex.ru/geo.c2n.
 The details of Topic codes: http://search.yaca.yandex.ru/cat.c2n.
Though the pages contains data in Russian language, we can always use Google
translate to serve this purpose.
As we discussed in the beginning that Yandex is an underrated search engine
some of its cool features are defiitely going to put a mark on our life once we go
through this chapter. One of such feature is its advanced search GUI.
There are lazy people like me who want everything in GUI so that they just have
to customize everything by providing limited details and selecting some checkbox or
radio buttons. Yandex provides that in the below link
 http://www.yandex.com/search/advanced?&lr=10558
Here we have to just select what we want and most importantly it covers most
of the operators we discussed above. So go to the page, select what you want, and
search effiiently using GUI.
Defiitely after going through all these operators we can easily feel the impact
of the advance search or we can also use the term power search for that. The
advance search facilitates a user with faster, effiient, and reliable data in the result.
It always reduces our manual efforts to get the desired data. And the content quality is also better in advance search as we limit the search to what we are actually
looking for. It can be either country-specifi domain search, a particular fie type,
or content from a specifi date. These things cannot be done easily with simple
keyword search.
We are in an age where information is everything. Then the reliability factor
comes in to picture and if we want bulk of reliable information from the net in very
less time span then we need to focus on the advance search. We can use any conventional search engine of our choice. Most of the search engines have quite similar
operators to serve the purpose but there are some special features present; so look
for those special features and use different search engines for different customized
advance search.
Yandex 99
So we learned about various search engines and their operators and how to utilize
these operators to search better and get precise results. For some operators we say
their individual operations and how they can help to narrow down the results and for
some we saw how they can be used with other operators to generate a great query
which directly gets us to what we want. Though there are some operators for different search engines which work more or less in the same fashion yet as the crawling
and indexing techniques of different platforms are different, it is worthwhile to check
which one of them provides better results depending upon our requirements. One
thing that we need to keep in mind is that the search providers keep on deprecating
the operators or features which are not used frequently enough and also some functionalities are not available in some regions.
We saw how easily we can get the results that we actually want with the use
of some small but effective techniques. The impact of these techniques is not just
limited to fiding out the links to websites, but if used creatively they can be implemented in various filds. Apart from fiding the information on the web, which certainly is useful for everyone, these techniques can be used to fid out details which
are profession specifi. For example a marketing professional can scale the size of
the website of competitor using the operator “site,” or a sales professional can fid
out emails for a company using the wildcard operator “*@randomcompany.com.”
We also saw how search engine dorks are used by cyber security professionals to fid
out sensitive and compromising information just by using some simple keywords and
operators. The takeaway here is not just to learn about the operators but also about
how we can use them creatively in our profession.
We have covered a lot about how to perform searching using different searching
platforms in this and some previous chapters. Till now we have mainly focused on
browser-based applications or we can say web applications. In the next chapter we
will be moving on and learn about various tools which need to be installed as applications and provide us various features for extracting data related to various filds,
using various methods.
Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00006-9 101
Copyright © 2015 Elsevier Inc. All rights reserved.
CHAPTER
OSINT Tools and
Techniques 6
INFORMATION IN THIS CHAPTER
• OSINT Tools
• Geolocation
• Information Harvesting
• Shodan
• Search Diggity
• Recon-ng
• Yahoo Pipes
• Maltego
INTRODUCTION
In the previous chapters we learned about the basics of the internet and effective ways
to search it. We went to great depths of searching social media to unconventional
search engines and further learned about effective techniques to use regular search
engines. In this chapter we will move a step further and will discuss about some of
the automated tools and web-based services which are used frequently to perform
reconnaissance by professionals of various intelligence-related domains specially
information security. We will start from the installation part to understanding their
interface and will further learn about their functionality and usage. Some of these
tools provide a rich graphic interface (GUI) and some of them are command line
based (CLI), but don’t judge them by their interface but by their functionality and
relevance in our fild of work.
Before moving any further we must install the dependencies for these tools so that
we don’t have to face any issues during their installation and usage. The packages
we need are
• Java latest version
•Python 2.7
• Microsoft .NET Framework v4
We simply need to download the relevant package depending upon our system
confiuration and we are good to go.
102 CHAPTER 6 OSINT Tools and Techniques
CREEPY
Most of us are addicted to social networks, and image sharing is one of the most utilized
features of these platforms. But sometimes when we share these pictures it’s not just the
image that we are sharing but might also the exact location where that picture was taken.
Creepy is a Python application which can extract out this information and display
the geolocation on a map. Currently Creepy supports search for Twitter, Flickr, and
Instagram. It extracts the geolocation based on EXIF information stored in images,
geolocation information available through application programming interface (API),
and some other techniques.
It can be downloaded from http://ilektrojohn.github.io/creepy/. We simply need
to select the version according to our platform and install it. The next phase after
installation of Creepy is to confiure the plugins that are available in it, for which
we simply need to click on the Plug-in Confiuration button present under the edit
tab. Here we can select the plugins and using their individual confiuration wizard
confiure them accordingly. Once the confiuration is done we can check whether it
is working properly or not using the Test Plugin Confiuration button.
FIGURE 6.1
Confiure Creepy.
After the confiuration phase is done, we can start a new project by clicking on
the person icon on the top bar. Here we can name the project and search for people
on different portals. From the search results we can select the person of interest and
include him/her in the target list and fiish the wizard. After this our project will be
displayed under the project bar at the right-hand side.
Creepy 103
FIGURE 6.2
Search users.
Now we simply need to select our project and click on the target icon or right
click on the project and click Analyze Current Project. After this Creepy will start
the analysis, which will take some time. Once the analysis is complete, Creepy will
display the results on the map.
FIGURE 6.3
Creepy results.
Now we can see the results in which the map is populated with the markers
according the identifid geolocation. Now Creepy further allows us to narrow down
these results based on various fiters.
104 CHAPTER 6 OSINT Tools and Techniques
Clicking on the calendar button allows us to fiter the results based on a time
period. We can also fiter the results based upon area, which we can defie in the form
of radius in kilometers from a point of our choice. We can also see the results in the
form of a heat map instead of the markers. The negative sign (−) present at the end
can be used to remove all the fiters imposed on the results.
FIGURE 6.4
Applying fiter.
The results that we get from Creepy can also be downloaded in the form of CSV
fie and also as KML, which can be used to display the markers in another map.
Creepy can be used for the information-gathering phase during a pentest
(penetration test) and also as a proof-of-concept tool to demonstrate to users what
information they are revealing about themselves.
FIGURE 6.5
Download Creepy results.
TheHarvester 105
THEHARVESTER
TheHarvester is an open source intelligence tool (OSINT) for obtaining e-mail
addresses, employee name, open ports, subdomains, hosts banners, etc. from public
sources such as search engines like Google, Bing and other sites such as LinkedIn. It’s
a simple Python tool which is easy to use and contains different information-gathering
functions. Being a Python tool it’s quite understandable that to use this tool we must
have Python installed in our system. This tool is created by Christian Martorella and
one of the simple, popular, and widely used tools in terms of information gathering.
 TheHarvester can be found here: http://www.edge-security.com/theharvester.php
Generally we need to input a domain name or company name to collect relevant
information such as email addresses, subdomains, or the other details mentioned in
the above paragraph. But we can use keywords also to collect related information.
We can specify our search, such as from which particular public source we want to
use for the information gathering. There are lots of public source that Harvester use for
information gathering but before moving to that let’s understand how to use Harvester.
EX: theharvester -d example.com -l 500 -b Google
-d = Generally, domain name or company name
-l = Number of result limits to work with
-b = Specifying the data source such as in the above command its Google, but
apart from that we can use LinkedIn and all (to use all the available public
sources) as a source also to collect information.
FIGURE 6.6
TheHarvester in action.
106 CHAPTER 6 OSINT Tools and Techniques
Apart from the above mentioned one harvester also has other options to specify,
such as:
-s = to start with a particular result number (the default value is 0)
-v = to get virtual hosts by verifying hostnames via DNS resolution
-f= for saving the data. (formats available either html or xml)
-n = to perform DNS resolve query for all the discovered ranges
-c = to perform DNS bruteforce for all domain names
-t= to perform a DNS TLD expansion discovery
-e = to use a specifi DNS server
-l = To limit the number of result to work with
-h = to use Shodan database to query discovered hosts.
FIGURE 6.7
TheHarvester HTML results.
The sources it uses are Google, Google profies, Bing, pretty good privacy
(PGP) servers, LinkedIn, Jigsaw, Shodan, Yandex, name servers, people123, and
Shodan 107
Exalead. Google, Yandex, Bing, and Exalead are search engines that are used in
backend as a source, while Shodan is also a search engine but not the conventional one and we already discussed a bit about it earlier and we will discuss in
detail about the same in this chapter later. PGP servers are like key servers used
for data security and those are also a good source to collect e-mail details. The
people123 is for searching for a particular person and Jigsaw is the cloud-based
solution for lead generation and other sales stuffs. From different sources harvester collects different information such as for e-mail harvesting it uses Google,
Bing, PGP servers, and sometimes Exalead and run their specifi queries in the
background to get the desired result. Similarly for subdomains or host names it
uses again Google, Bing, Yandex, Exalead, PGP servers, and Exalead. And fially
for the list for employee names it uses LinkedIn, Google profies, people123, and
Jigsaw as a main source.
This is how theHarvester harvests all the information and gives us the desired
result as per our query. So craft your query wisely to harvest all the required
information.
SHODAN
We have previously discussed about Shodan briefl in Chapter 4, but this unique
search engine deserves much more than a paragraph to discuss its usage and impact.
As discussed earlier Shodan is a computer search engine. The internet consists of
various different types of devices connected online and available publicly. Most of
these devices have a banner, which they send as a response to the application request
send by a client. Many if not most of these banners contains information which
can be called sensitive in nature, such as server version, device type, authentication
mode, etc. Shodan allows us to search such devices over internet and also provides
fiters to narrow down the results.
It is highly recommended to create an account to utilize this great
tool, as it removes some of the restrictions imposed on the free usage. So
after logging into the application we will simply go to the dashboard at
http://www.shodanhq.com/home. Here we can see some the recent searches as
well as popular searches made on this platform. This page also shows a quick reference to the fiters that we can use. Moving on let’s see more popular searches
listed under the URL http://www.shodanhq.com/browse. Here we can see there
are various different search queries which look quite interesting, such as webcam, default password, SCADA, etc. Clicking on one of these directly takes us
to the result page and lists details of machines on the internet with that specifi
keyword. The page http://www.shodanhq.com/help/fiters shows the list of all
the fiters that we can use in Shodan to perform a more focused search, such as
country, hostname, port, etc., including the usual fiters “+,”“-,” and “|.”
108 CHAPTER 6 OSINT Tools and Techniques
FIGURE 6.8
Shodan popular searches.
FIGURE 6.9
Shodan fiters.
Let’s perform a simple search on Shodan for the keyword “webcam.” Shodan has
simply found more than 15,000 results for this keyword; though we cannot view all the
results under the free package, yet what we get is enough to understand its reach and
availability of such devices on the internet. Some of these might be protected by some
kind of authentication mechanism such as username and password, but some might be
publicly accessible without any such mechanism. We can simply fid out by opening
Shodan 109
their listed IP address in our browsers (Warning: It might be illegal to do so depending
upon the laws of the country, etc.). We can further narrow down these results to a
country by using the “country” fiter. So our new query is “webcams country:us” which
gives us a list of webcams in the United States of America.
FIGURE 6.10
Shodan results for query “webcam”
To get a list of machines with fie transfer protocol (FTP) service, residing in India,
we can use the query “port:21 country:in”. We can also perform search for specifi IP
address or range of it using the fiter “net.” Shodan is providing a great deal of relevant
information and its application is only limited by the creativity of its users.
FIGURE 6.11
Shodan results for query “port:21 country:in.”
110 CHAPTER 6 OSINT Tools and Techniques
Apart from this Shodan also offers an API to integrate its data into our own
application. There are also some other services provided by it at a price and are
worth a try for anyone working in the information security domain. Recently there
has been a lot of development in Shodan and its associated services which makes
this product a must try for information security enthusiasts.
SEARCH DIGGITY
In the last chapter we learned a lot about using advanced search features of various
search engines and also briefl discussed about the term “Google Hacking.” To perform such functions we need to have the list of operations that we can use and will
have to type each query to see if anything is vulnerable, but what if there was a tool
which has a database of such queries and we can simply run it. Here enters the Search
Diggity. Search Diggity is tool by Bishop Fox which has a huge set of options and a
large database of queries for various search engines which allow us to gather compromising information related to our target. It can be downloaded from http://www.bishopfox.com/resources/tools/google-hacking-diggity/attack-tools/. The basic
requirement for its installation is Microsoft .NET framework v4
Once we have downloaded and installed the application, the things we need are
the search ids and API keys. These search ids/API keys are required so that we can
perform more number of searcher without too many restrictions. We can fid how
to get and use these keys in the contents section under the Help tab and also from a
some simple Google searches. Once all the keys (Google, Bing, Shodan, etc.) are at
their place we can move forward with the usage of the tool.