hacking


SUBMITTED BY: nesde.nokn0wn3ntity

DATE: June 15, 2016, 3:04 p.m.

FORMAT: Text only

SIZE: 73.3 kB

HITS: 358

  1. FIGURE 4.9
  2. SocialMention displaying results and associated statistics.
  3. Social Searcher (http://www.social-searcher.com/)
  4. Social Searcher is yet another social media search engine. It uses Facebook, Twitter and
  5. Google+ as its sources. The interface provided by this search engine is simple. Under
  6. the search tab the search results are distributed into three tabs based on the source,
  7. Introduction 65
  8. under these tabs the posts are listed with a preview, which is very helpful in identifying the ones relevant for us. Similar to SocialMention we can setup e-mail alerts also.
  9. Under the analytics tab we can get the sentiment analysis, users, keywords,
  10. domains, and much more. One of the interesting of these is the popular tab which
  11. lists the results with more interaction such as likes, retweets, etc.
  12. TWITTER
  13. Twitter is one of the most popular social networking sites with huge impact. Apart
  14. from its usual functionality to microblog, it also allows to understand the reach and
  15. user base of any entity which makes it a powerful tool for reconnaissance. Today it is
  16. widely used for market promotion as well as analyze the social landscape.
  17. Topsy (http://topsy.com/)
  18. Topsy is a tool which allows us to search and monitor Twitter. Using it we can check
  19. out the trend of any keyword over Twitter and analyze its reach. The interface is
  20. pretty simple and looks like a conventional search engine, just the results are only
  21. based on Twitter. The results presented by it can be narrowed down to various timeframes such as 1 day, 30 days, etc. We can also fiter out the results to only see the
  22. images, tweets, links, videos, or inflencers. There is another fiter which allows us to
  23. see only results containing results from specifi languages. All in all Topsy is a great
  24. tool for market monitoring for specifi keywords.
  25. FIGURE 4.10
  26. Topsy search.
  27. 66 CHAPTER 4 Search the Web—Beyond Convention
  28. Trendsmap (http://trendsmap.com/)
  29. Trendsmap is a great visual platform which shows trending topics in the form of keywords, hashtags, and Twitter handles from the Twitter platform over the world map.
  30. It is great platform which utilizes visual representation of the trends to understand
  31. what’s hot in a specifi region of the world. Apart for showing this visual form of
  32. information it also allows us to search through this information in the form of a topic
  33. or a location which makes it easier for us to see only what we want.
  34. Tweetbeep (http://tweetbeep.com/)
  35. In its own words, Tweetbeep is like Google alerts for Twitter. It is a great service
  36. which allows us to monitor topics of interest on Twitter such as a brand name, product, or updates related to companies and even links. From market monitoring purpose
  37. it’s a great tool which can help us to quickly respond to topics of interest.
  38. Twiangulate (http://twiangulate.com/search)
  39. Twiangulate is a great tool which allows us to perform Twitter triangulations. Using
  40. it we can fid who are the common people who are followers of and are followed
  41. by two different twitter users. Similarly it also provides the feature to compare the
  42. reach of two users. It is great tool to understand and compare the inflence of different Twitter users.
  43. SOURCE CODE SEARCH
  44. Most of the search engines we have used only look for the text visible on the web
  45. page, but there are some search engines which index the source code present on the
  46. internet. These kind of search engines can be very helpful when we are looking for
  47. specifi technology used over the internet, such as a content management system
  48. like WordPress. Utilities of such search engines are for search engine optimization,
  49. competitive analysis, keyword research for marketing and are only limited by the
  50. creativity of the user.
  51. Due to the storage and scalability issues earlier there were no service providers in
  52. this domain, but with technological advancements some options are opening up now,
  53. let checkout some of these.
  54. NerdyData (http://nerdydata.com)
  55. NerdyData is one of the fist of its kind and unique search engine which allows us to
  56. search the code of the web page. Using the platform is pretty simple, go to the URL
  57. https://search.nerdydata.com/, enter the keyword like WordPress 3.7 and NerdyData
  58. will list down the websites which contain that keyword in their source code. The
  59. results not only provide the URL of the website but also shows the section of the
  60. code with the keyword highlighted under the section Source Code Snippet. Apart
  61. from this there are various features such as contact author, fetch backlink, and others
  62. which can be very helpful but most of these are paid, yet the limited free usage of
  63. NerdyData is very useful and is worth a try.
  64. Introduction 67
  65. FIGURE 4.11
  66. NerdyData code search results.
  67. Ohloh code (https://code.ohloh.net)
  68. Ohloh code is another great search engine for source code searching, but it’s
  69. a bit different in terms that it searches for open source code. What this means
  70. is that its source of information is the code residing in open space, such as Git
  71. repositories.
  72. It provides great options to fiter out the results based on defiitions, languages
  73. (programming), extensions, etc. through a bar on the left-hand side titled “Filter
  74. Code Results.”
  75. Searchcode (https://searchcode.com)
  76. Similar to Ohloh, Searchcode also uses open source code repositories as its information source. The search fiters provided by Searchcode are very helpful, some of them
  77. are repository, source, and language.
  78. TECHNOLOGY INFORMATION
  79. In this special section of search engines we will be working on some unique search
  80. engines which will help us to gather information related to various different technologies and much more. In this segment we will be heavily dealing with IP addresses
  81. and related terms, so it is advised to go through the section “Defiing the basic terms”
  82. in the fist chapter.
  83. 68 CHAPTER 4 Search the Web—Beyond Convention
  84. Whois (http://whois.net/)
  85. Whois is basically a service which allows us to get information about the registrant
  86. of an internet resource such as a domain name. Whois.net provides a platform using
  87. which we can perform a Whois search for a domain or IP address. A whois record
  88. usually consists of registrar info; date of registration and expiry; registrant info such
  89. as name, e-mail address, etc.
  90. Robtex (http://www.robtex.com)
  91. Robtex is great tool to fid out information about internet resources such as IP
  92. address, Domain name, Autonomous System (AS) number, etc. The interface is
  93. pretty simple and straightforward. At the top left-hand corner is a search bar using
  94. which we can lookup information. Searching for a domain gives us related information like IP address, route, AS number, location, etc. Similarly other information is
  95. provided for IP addresses, route, etc.
  96. W3dt (https://w3dt.net/)
  97. W3dt is great online resource to fid out networking related information. There are
  98. various section which we can explore using this single platform. The fist section is
  99. domain name system (DNS) tools which allows us to perform various DNS-related
  100. queries such as DNS lookup, reverse DNS lookup, DNS server figerprinting, etc.
  101. Second section provides tools related to network/internet such as port scan, traceroute, MX record retriever, etc. The next section is web/HTTP which consists of tools
  102. such as SSL certifiate info, URL encode/decode, HTTP header retrieval, etc., then
  103. comes the database lookups section under which comes MAC address lookup, Whois
  104. lookup, etc., in the end there are some general and ping-related tools. All in all it is
  105. great set of tools which allows to perform a huge list of different useful functions
  106. under single interface.
  107. Shodan (http://www.shodanhq.com/)
  108. So far we have used various types of search engines which help us to explore the web
  109. in all different ways. What we haven’t encountered till now is an internet search engine
  110. (remember the difference between web and internet explained in chapter 1) or simply
  111. said a computer search engine. Shodan is a computer search engine which scans the
  112. internet and grabs the service banner based on IP address and port. It allows us to search
  113. this information using IP addresses, country fiters, and much more. Using it we can
  114. fid out simple information such as websites using a specifi type of web server such as
  115. Internet Information Services (IIS) or Apache and also information which can be quite
  116. sensitive such as IP cameras without authentication or SCADA systems over internet.
  117. Though the free version without registration provides very limited information,
  118. which can be mitigated a bit using a registered account, yet it is suffiient enough to
  119. understand the power of this unique search engine. We can utilize the power of this
  120. tool through browser add-on or through its application programming interface also.
  121. Shodan has a very active development history and comes up with new features all the
  122. time, so we can expect much more from it in the future.
  123. Introduction 69
  124. FIGURE 4.12
  125. Shodan results for port 21.
  126. WayBack Machine (http://archive.org/web/web.php)
  127. Internet Archive WayBack Machine is great resource to lookup how a website looked
  128. in past. Simply type the website address into the search bar and it will return back
  129. a timeline with the available snapshot highlighted on the calendar. Simply hovering
  130. over these highlighted dates over calendar will present a link to the snapshot. This is
  131. great tool to analyze how a website has evolved and thus monitor its past growth. It
  132. can also be helpful to retrieve information from a website which was available in the
  133. past but is not now.
  134. REVERSE IMAGE SEARCH
  135. We all are familiar with the phrase “A picture is worth a thousand words” and its veracity and are also aware of platforms like Google Images (http://images.google.com),
  136. Flickr (https://www.flckr.com/), Deviantart (http://www.deviantart.com/), which
  137. provides us images for keywords provided. Usually when we need to lookup some
  138. information, we have a keyword or a set of them in the form of text, following the
  139. same lead the search engines we have dealt with till now take text as an input and
  140. get us the results, but in case we have an image and we want to see where it appears
  141. on the web, where do we go? This is where reverse image search engines come in,
  142. which take image as an input and looks up to fid its web appearance. Let’s get
  143. familiar with some of these.
  144. 70 CHAPTER 4 Search the Web—Beyond Convention
  145. Google Images (http://images.google.com/)
  146. We all are aware that Google allows us to search the web for images, but what many
  147. of us are unaware of is that it also allows to perform a reverse image search. We
  148. simply need to go to the URL http://images.google.com and click on the camera icon
  149. and provide the URL of the image on the web or upload a locally stored image fie,
  150. we can also drag and drop an image fie into the search bar and voila Google comes
  151. up with links to the pages containing that or similar images on the web.
  152. FIGURE 4.13
  153. Google reverse image search.
  154. TinEye (https://www.tineye.com/)
  155. TinEye is another reverse image search engine and has a huge database of images.
  156. Similar to Google images, searching on TinEye is very simple, we can provide the
  157. URL to the image, upload it, or perform a drag and drop. TinEye also provides
  158. browser plugin for major browsers, which makes the task much easier. Though the
  159. results of TinEye are not as comprehensive as Google images, yet it provides a great
  160. platform for the task and must be tried.
  161. ImageRaider (http://www.ImageRaider.com/)
  162. Last but not the least in this list is ImageRaider. ImageRaider simply lists the
  163. results domain wise. If a domain contains more than one occurrence of the
  164. Introduction 71
  165. image then it also tells that and the links to those images are listed under the
  166. domain name.
  167. Reverse image search can be very helpful to fid out more about someone when
  168. we are hitting dead-ends using conventional methods. As many people use same
  169. profie picture for various different platforms, making a reverse image search can
  170. lead us to other platforms where the use has created a profie and also has previously
  171. undiscovered information.
  172. MISCELLANEOUS
  173. We dealt with a huge list of search engines which are specialize in their domain and
  174. are popular among a community. In this section we will be dealing with some different types of search platforms which are lesser known but serve unique purposes and
  175. are very helpful in special cases.
  176. DataMarket (http://datamarket.com/)
  177. DataMarket is an open portal which consists of large data sets and provides the data
  178. in a great manner through visualizations. The simple search feature provides results
  179. for global topics with list of different visualizations related to the topic, for example,
  180. searching for the keyword gold would provide results such as gold statistics, import/
  181. export of gold, and much more. The results page consists of a bar on the left which
  182. provides a list of fiters using which the listed results can be narrowed down. It also
  183. allows us to upload our own data and create visualization from it. Refer to the link
  184. http://datamarket.com/topic/list/ for a huge list of topics on which DataMarket provides information.
  185. WolframAlpha (http://www.wolframalpha.com/)
  186. In this chapter we learned about various search engines which take some value as
  187. input and provide us with the links which might contain the answer to the questions
  188. we are actually looking for, but what we are going to learn about now is not a search
  189. engine but a computational knowledge engine. What this means is that it takes our
  190. queries as input but does not provides with the URLs to the websites containing the
  191. information, instead it tries to understand our natural language queries and based
  192. upon an organized data set, provides a factual answer to them in form of text and
  193. sometimes apposite visualization also.
  194. Say, for example, we want to know the purpose of .mil domain, so we can simply type in the query “what is the purpose of the .mil internet domain?” and get
  195. the results, to get the words starting with a and ending with e, a query like “words
  196. starting with a and ending with e” would give us the results, we can even check the
  197. net worth of Warren Buffett by a query like “Warren Buffett net worth.” For more
  198. examples of the queries of various domains that WolframAlpha is able to answer,
  199. checkout the page http://www.wolframalpha.com/examples/.
  200. 72 CHAPTER 4 Search the Web—Beyond Convention
  201. FIGURE 4.14
  202. WolframAlpha result.
  203. Addictomatic (http://addictomatic.com)
  204. Usually we visit various different platforms to search information related to a topic,
  205. but addictomatic aggregate various news and media sources to create a single dashboard for any topic of our interest. The content aggregated is displayed in various
  206. sections depending upon the source. It also allows us to move these sections depending upon our preference for better readability.
  207. Carrot2 (http://search.carrot2.org/stable/search)
  208. Carrot2 is a search results clustering engine, what this means is that it takes
  209. search results from other search engines and organizes these results into topics
  210. using its search results clustering algorithms. Its unique capability to cluster
  211. the results into topics allows to get a better understanding of it and associated terms. These clusters are also represented in different interesting forms
  212. such as folders, circles, and FoamTree. Carrot2 can be used through its web
  213. interface which can be accessed using the URL http://search.carrot2.org/
  214. and also through a software application which can be downloaded from
  215. http://project.carrot2.org/download.html.
  216. Introduction 73
  217. FIGURE 4.15
  218. Carrot2 search result cluster.
  219. Boardreader (http://boardreader.com/)
  220. Boards and forums are rich source of information as a lot of interaction and Q&A goes
  221. on in places like this. Members of such platforms range from newbies to experts in the
  222. domain to which the forum is related to. In places like this we can get answers to questions
  223. which are diffiult to fid elsewhere as they purely comprise of user-generated content,
  224. but how do we search them? Here is the answer Boardreader. It allows us to search
  225. forums to get results which contains content with human interaction. It also displays a
  226. trend graph of the search query keyword to show the amount of activity related to it. The
  227. advance search features provided by it such as sort by relevance, occurrence between
  228. specifi dates, domain-specifi search, etc. adds to its already incredible features.
  229. Omgili (http://omgili.com/)
  230. Similar to Boardreader, Omgili is also a forum and boards search engine. It displays
  231. the results in the form of broad bars and these bars contain information such as date,
  232. number of posts, author, etc. which can be helpful in estimating the relevance of the
  233. result. One such information is Thread Info, which provides further information about
  234. a thread such as forum name, number of authors, and replies to the thread, without
  235. actually visiting the original thread forum page. It also allows us to fiter the results
  236. based upon the timeline of their occurrence such as past month, week, day, etc.
  237. 74 CHAPTER 4 Search the Web—Beyond Convention
  238. Truecaller (http://www.truecaller.com)
  239. Almost everyone who uses or has ever used a smartphone is familiar with the concept
  240. of mobile applications, better known as apps and many if not most of them have used
  241. the famous app called Truecaller which helps to identify the person behind the phone
  242. number, what many of us are unaware of is that it can also be used through a web
  243. browser. Truecaller simply allows us to search using a phone number and provides
  244. the user’s details using it’s crowdsourced database.
  245. So we discussed a huge list of various search engines under various categories
  246. which are not conventionally used but as we have already seen these are very useful
  247. in different scenarios. We all are addicted to Google for all our searching needs and it
  248. being one of the best in its domain has also served our purpose most of the time, but
  249. sometimes we need different and specifi answers to our queries, then we need these
  250. kind of search engines. This list tries to cover most of the aspects of daily searching
  251. needs, yet surely there must be other platforms which need to be fid out and used
  252. commonly to solve specifi problems.
  253. In this chapter we learned about various unconventional search engines, their
  254. features, and functionalities, but what about the conventional search engines like
  255. Google, Bing, Yahoo, etc. that we use on daily basis. Oh! we already know how to
  256. Other search engines worth trying:
  257. • Meta search engine
  258. • Search (http://www.search.com/)
  259. •People search
  260. • ZabaSearch (http://www.zabasearch.com/)
  261. •Company search
  262. • Hoovers (http://www.hoovers.com/)
  263. • Kompass (http://kompass.com/)
  264. •Semantic
  265. • Sensebot (http://www.sensebot.net/)
  266. • Social media search
  267. • Whostalkin (http://www.whostalkin.com/)
  268. •Twitter search
  269. • Mentionmapp (http://mentionmapp.com/)
  270. • SocialCollider (http://socialcollider.net/)
  271. • GeoChirp (http://www.geochirp.com/)
  272. • Twitterfall (http://beta.twitterfall.com/)
  273. • Source code search
  274. • Meanpath (https://meanpath.com)
  275. •Technology search
  276. • Netcraft (http://www.netcraft.com/)
  277. • Serversniff (http://serversniff.net)
  278. • Reverse image search
  279. • NerdyData image search (https://search.nerdydata.com/images)
  280. •Miscellaneous
  281. • Freebase (http://www.freebase.com/)
  282. Introduction 75
  283. use them or do we? The search engines we use on daily basis have various advanced
  284. features which many of the users are unaware of. These features allows users to fiter
  285. out the results so that we can get more information and less noise. In the next chapter
  286. we will be dealing with conventional search engines and will learn how to use them
  287. effectively to perform better search and get specifi results.
  288. Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00005-7 77
  289. Copyright © 2015 Elsevier Inc. All rights reserved.
  290. CHAPTER
  291. Advanced Web Searching 5
  292. INFORMATION IN THIS CHAPTER
  293. • Search Engines
  294. • Conventional Search Engines
  295. • Advanced Search Operators of various Search Engines
  296. • Examples and Usage
  297. INTRODUCTION
  298. In the last chapter we dealt with some special platforms which allowed us to perform domain-specifi searches; now let’s go into the depths of conventional search
  299. engines which we use on daily basis and check out how we can utilize them more
  300. effiiently. In this chapter, basically, we will understand the working and advanced
  301. search features of some of the well-known search engines and see what all functionalities and fiters they provide to serve us better.
  302. So we already have a basic idea about what search engine is, how it crawls over
  303. the web to collect information, which are further indexed to provide us with search
  304. results. Let’s revise it once and understand it in more depth.
  305. Web pages as we see them are not actually what they look like. Web pages basically contain HyperText Markup Language (HTML) code and most of the times
  306. some JavaScript and other scripting languages. So HTML is basically a markup language and uses tags to structure the information, for example the tag <h1></h1>
  307. is used to create a heading. When we receive this HTML code from the server, our
  308. browsers interpret this code and display us the web page in its rendered form. To
  309. check the client-side source code of a web page, simply press Ctrl+U in the browser
  310. with a web page open.
  311. Once the web crawler of a search engine reaches a web page, it goes through
  312. its HTML code. Now most of the times these pages also contain links to other
  313. pages, which are used by the crawlers to move further in their quest to collect
  314. data. The content crawled by the web crawler is then stored and indexed by
  315. search engine based on variety of factors. The pages are ranked based upon their
  316. structure (as defied in HTML), the keywords used, interlinking of the pages,
  317. media present on the page, and many other details. Once a page has been crawled
  318. and indexed it is ready to be presented to the user of the search engine depending
  319. upon the query.
  320. 78 CHAPTER 5 Advanced Web Searching
  321. Once a page has been crawled, the job of the crawler does not fiish for that page.
  322. The crawler is scheduled to perform the complete process again after a specifi time
  323. as the content of the page might change. So this process keeps on going and as new
  324. pages are linked they are also crawled and indexed.
  325. Search engine is a huge industry in itself which helps us in our web exploration,
  326. but there is another industry which depends directly on search engines and that is
  327. search engine optimization (SEO). SEO is basically about increasing the rank of
  328. a website/web page or in other words to bring it up to the starting result pages of a
  329. search engine. The motivation behind this is that it will increase the visibility of that
  330. page/site and hence will get more traffi which can be helpful from a commercial or
  331. personal point of view.
  332. Now we have a good understanding of the search engines and how they operate,
  333. let’s move ahead and see how we can better use some of the conventional search
  334. engines.
  335. GOOGLE
  336. Google is one of the most widely used search engines and is the starting point for
  337. web exploration for most of us. Initially Google search was accessible through very
  338. simple interface and provided limited information. Apart from the search box there
  339. were some special search links, links about the company, and a subscription box
  340. where we could enter our email to get updates. There were no ads, no different language options, no login, etc.
  341. It’s not only the look and feel of the interface that has changed over the years but
  342. also the functionalities. It has evolved from providing simple web links to the pages
  343. containing relevant information to a whole bunch of related tools which not only
  344. allow us to search different media types and categories but also narrow down these
  345. results using various fiters. Today there are various categories of search results such
  346. as images, news, maps, videos, and much more. These plethora of functionalities
  347. provided by Google today has certainly made our lives much easier and made the act
  348. of fiding information on the web a piece of cake. Still sometimes we face diffiulty
  349. in fiding the exact information we are looking for and the main reason behind it is
  350. not the lack of information but to the contrary the abundance of it.
  351. Let’s move on to see how we perform Google search and how to improve it. So
  352. whenever we need to search something in Google we simply think about some of the
  353. keywords associated with it and type them into the search bar and hit Enter. Based
  354. upon the indexing Google simply provides us with the associated resources. Now if
  355. we want to get better results or fiter the existing results based upon various factors,
  356. we need to use Google advanced search operators. Let’s have a look at these operators and their usage.
  357. site:
  358. It fetches results only for the site provided. It is very useful when to limit our
  359. search to some specifi domain. It can be used with another keyword and Google
  360. Google 79
  361. will bring back related pages from the site specifid. For an information security
  362. perspective it is very useful to fid out different sub domains related to a particular
  363. domain.
  364. Examples: site:gov, site:house.gov
  365. FIGURE 5.1
  366. Google “site” operator usage.
  367. inurl:
  368. This operator allows looking for keywords in the uniform resource locator (URL) of
  369. the site. It is useful to fid out pages which follow a usual keyword for specifi pages,
  370. such as contact us. Generally, as the URL contains some keywords associated with
  371. the body contents, it will help us to fid out the equivalent page for the keyword we
  372. are searching for.
  373. Example: inurl:hack
  374. allinurl:
  375. Similar to “inurl” this operator allows looking for multiple keywords in the URL. So
  376. we can search for multiple keywords in the URL of a page. This also enhances the
  377. chances of getting quality content of what we are looking for.
  378. Example: allinurl:hack security
  379. intext:
  380. This operator makes sure that the keyword specifid is present in the text of the page.
  381. Sometimes just for the sake of SEO, we can fid some pages only contain keywords
  382. to enhance the page rank but not the associated content. In that case we can use this
  383. 80 CHAPTER 5 Advanced Web Searching
  384. query parameter to get the appropriate content from a page for the keyword we are
  385. looking for.
  386. Example: intext:hack
  387. allintext:
  388. Similar to the “intext” this operator allows to lookup for multiple keywords in the
  389. text. As we discussed earlier the feature of searching for multiple keywords always
  390. enhances the content quality in the result page.
  391. Example: allintext:data marketing
  392. intitle:
  393. It allows us to restrict the results by the keywords present in the title of the pages
  394. (title tag: <title>XYZ</title>). It can be helpful to identify pages which follow a convention for the title of the pages such as directory listing by the keywords “index of”
  395. and most of the sites provide the keywords in the title for improving the page rank.
  396. So this query parameter always helps to search for a particular keyword.
  397. Example: intitle:blueocean
  398. allintitle:
  399. This is the multiple keyword counterpart of “intitle” operator.
  400. Example: allintitle:blueocean market
  401. fietype:
  402. This operator is used to fid out fies of a specifi kind. It supports multiple fie types
  403. such as pdf, swf, kml, doc, svg, txt, etc. This operator comes handy when we are only
  404. looking for specifi type of fies on a specifi domain.
  405. Example: fietype:pdf, site:xyz.com, fietype:doc
  406. ext:
  407. The operator ext simply stands for extension and it works similar to the fietype
  408. operator.
  409. Example: ext:pdf
  410. defie:
  411. This operator is used to fid out the meaning of the keyword supplied. Google returns
  412. dictionary meaning and synonyms for the keyword.
  413. Example: defie:data
  414. AROUND
  415. This operator is helpful when we are looking for the results which contain two
  416. different keywords, but in close association. It allows us to restrict the number
  417. Google 81
  418. of words as the maximum distance between two different keywords in the search
  419. results.
  420. Example: A AROUND(6) Z
  421. AND
  422. A simple Boolean operator which makes sure keywords on both the side are present
  423. in the search results.
  424. Example: data AND market
  425. OR
  426. Another Boolean operator which provides search results that contain either of the
  427. keyword present on both the sides of the operator.
  428. Example: data OR intelligence
  429. NOT
  430. Yet another Boolean operator which excludes the search results that contain the keyword followed by it.
  431. Example: lotus NOT flwer
  432. “”
  433. This operator is useful when we need to search for the results which contain the
  434. provided keyword in the exact sequence. For example we can search pages which
  435. contain quotes or some lyrics.
  436. Example: “time is precious”
  437. -
  438. This operator excludes the search results which contain the keyword followed by it
  439. (no space).
  440. Example: lotus -flwer
  441. *
  442. This wildcard operator is used as a generic placeholder for the unknown term.
  443. We can use this to get quotes which we partially remember or to check variants
  444. of one.
  445. Example: “* is precious”
  446. ..
  447. This special operator is used to provide a number range. It is quite useful to enforce
  448. a price range, time range (date), etc.
  449. Example: japan volcano 1990..2000
  450. 82 CHAPTER 5 Advanced Web Searching
  451. info:
  452. The info operator provides information what Google has on a specifi domain. Links
  453. to different types of information are present in the results, such as cache, similar
  454. websites, etc.
  455. Example: info:elsevier.com
  456. related:
  457. This operator is used to fid out other web pages similar to the provided domain. It
  458. is very helpful when we are looking for websites which provide similar services to a
  459. website or to fid the competitors of it.
  460. Example: related:elsevier.com
  461. cache:
  462. This operator redirects to the latest cache of the page that Google has crawled. In case we
  463. don’t get a result for a website which was accessible earlier, this is a good option to try.
  464. Example: cache:elsevier.com
  465. Advanced Google search can also be performed using the page
  466. http://www.google.com/advanced_search, which allows us to perform restricted
  467. search without using the operators mentioned above.
  468. FIGURE 5.2
  469. Google advanced search page.
  470. Apart from the operators Google also provide some operations which allow us to
  471. check information about current events and also perform some other useful things.
  472. Some examples are:
  473. Google 83
  474. time
  475. Simply entering this keyword displays the current time of the location we are residing in. We can also use name of region to get its current time.
  476. Example: time france
  477. weather
  478. This keyword shows the current weather condition of our current location. Similar to
  479. “time” keyword we can also use it to get the weather conditions of a different region.
  480. Example: weather sweden
  481. Calculator
  482. Google also solves mathematical equations and also provides a calculator.
  483. Example: 39*(9823-312)+44/3
  484. Convertor
  485. Google can be used to perform conversions for different types of units like measurement units, currency, time, etc.
  486. Example: 6 feet in meters
  487. This is not all, sometimes Google also shows relevant information related to
  488. global events as and when they happen; for example, FIFA World Cup.
  489. Apart from searching the web, in general, Google also allows us to search specifi categories such as images, news, videos, etc. All these categories, including web
  490. have some common and some specifi search fiters of their own. These options can
  491. simply be accessed by clicking on the “Search tools” tab just below the search bar.
  492. We can fid options which allow us to restrict the results based upon the country,
  493. time of publish for web; for images there are options like the color of image, its type,
  494. usage rights, etc. and similarly other relevant fiters for different categories. These
  495. options can be very helpful in fiding the required information of a category as they
  496. are designed according to that specifi category. For example if we are looking for an
  497. old photograph of something it is a good idea to see only the results which are black
  498. and white.
  499. The operators we discussed are certainly very useful for anyone who needs to fid out
  500. some information on the web, but the InfoSec community has certainly taken it to next
  501. level. These simple and innocent operators we just discussed are widely used in the cyber
  502. security industry to fid and demonstrate how without even touching the target system,
  503. critical and compromising information can be retrieved. This technique of using Google
  504. search engine operators to fid such information is termed as “Google Hacking.”
  505. When it comes to “Google Hacking” one name that jumps out in mind is Johnny
  506. Long. Johnny was an early adopter and pioneer in the fild of creating such Google
  507. queries which could provide sensitive information related to the target. These queries
  508. are widely popular by the name Google Dorks.
  509. Let’s understand how this technique works. We saw a number of operators which
  510. can narrow down search results to a specifi domain, fietype, title value, etc. Now
  511. 84 CHAPTER 5 Advanced Web Searching
  512. in Google Hacking our motive is to fid sensitive information related to the target;
  513. for this people have come up with various different signatures for different fies and
  514. pages which are known to contain such information. For example, let’s just say we
  515. know the name of a sensitive directory which should not be directly accessible to
  516. any user publicly, but remains public by default after the installation of the related
  517. application. So now if we want to fid out the sites which have not changed the
  518. accessibility for this directory, we can simply use the query “inurl:/sensitive_directory_name/” and we will get a bunch of websites which haven’t changed the setting.
  519. Now if we want to further narrow it down for a specifi website, we can combine
  520. the query with the operator “site,” as “site:targetdomain.com inurl://sensitive_directory_name/.” Similarly we can fid out sensitive fies that are existing on a website
  521. by using the operators “site” and “fietype” in collaboration.
  522. Let’s take another example of Google Hacking which can help us to discover high
  523. severity vulnerability in a website. Many developers use flsh to make websites more
  524. interactive and visually appealing. Small web format (SWF) is a flsh fie format used
  525. to create such multimedia. Now there are many SWF players known to be vulnerable to
  526. cross-site scripting (XSS), which could lead to an account compromise. Now if we want
  527. to fid out if the target domain is vulnerable to such attack, then we can simply put in
  528. the query “site:targetdomain.com fietype:swf SWFPlayer_signature_keyword” and test
  529. the resulting pages using publicly available payloads to verify. There are huge number
  530. of signatures to fid out various types of pages such as sensitive directories, web server
  531. identifiation, fies containing username/password, admin login pages, and much more.
  532. The Google Hacking Database created by Johnny Long can be found at
  533. http://www.hackersforcharity.org/ghdb/ though it is not updated, yet it is a great place
  534. to understand and learn how we can use Google to fid out sensitive information. A
  535. regularly updated version can be found at http://www.exploit-db.com/google-dorks/.
  536. FIGURE 5.3
  537. Google hacking database- www.exploit-db.com/google-dorks/.
  538. Bing 85
  539. BING
  540. Microsoft has been providing search engine solutions from a long time and they
  541. have been known with different names. Bing is latest and most feature-rich search
  542. engine in this series. Unlike its predecessors Bing provides a more clean and simple
  543. interface. As Microsoft covers a major part of operating system market, the general
  544. perspective of a user in terms of search engine is that Bing is just another sideproduct from a technology giant and hence most of them do not take it seriously. But
  545. unfortunately it is wrong. Like all the search engines Bing also has some unique features that will force you to use Bing when you need those features. Defiitely those
  546. features have a unique mark on how we search. We will discuss not only about the
  547. special features but also the general operators which can allow us to understand the
  548. search engine and its functionalities.
  549. +
  550. This operator works quite similar in all the search engines. This allows a user to
  551. forcefully add single or multiple keywords in a search query. Bing will make sure the
  552. keywords come after + operator must present in the result pages.
  553. Example: power +search
  554. -
  555. This operator is also known as NOT operator. This is used to exclude something from
  556. a set of things, such as excluding a cuisine.
  557. Example: Italian food -pizza
  558. Here Bing will display all the Italian foods available but not pizza. We can write
  559. this in another form which can also fetch same result such as the below example
  560. Example: Italian food NOT pizza
  561. “”
  562. This is also same in most of the search engines. This is used to search for exact phrase
  563. used inside double quotation.
  564. Example: “How to do Power Searching?”
  565. |
  566. This is also known as OR operator, mostly used for getting result from one of the two
  567. keywords or one of the many keywords added with this operator.
  568. Example: ios | android
  569. ios OR android
  570. 86 CHAPTER 5 Advanced Web Searching
  571. &
  572. This operator is also known as AND operator. This is the by-default used search
  573. operator. If we do nothing and add multiple keywords then Bing will do a AND
  574. search in the backend and give us the result.
  575. Example: power AND search
  576. power & search
  577. As this is the default search, it’s very important to keep in mind that until and
  578. unless we use OR and NOT in capital, Bing won’t understand it as operators.
  579. ()
  580. This can be called as group operator.
  581. As parenthesis has the top priority order, we can add the lower preferred operators
  582. such as OR in that and create a group query to execute the lower priority operators
  583. fist.
  584. Example: android phone AND (nexus OR xperia)
  585. site:
  586. This operator will help to search a particular keyword within a specifi website. This
  587. operator works quite the same in most of the search engines.
  588. Example: site:owasp.org clickjacking
  589. fietype:
  590. This allows a user to search for data in specifi type of fie. Bing supports all fie
  591. types but few, mostly those are supported by Google are also supported by Bing.
  592. Example: hack fietype:pdf
  593. ip:
  594. This unique operator provided by Bing allows us to search web pages based upon
  595. IP address. Using it we can perform a reverse IP search, which means it allows us to
  596. look for pages hosted on the specifid IP).
  597. Example: ip:176.65.66.66
  598. Grouping of Bing operators supported in following order.
  599. ()
  600. “”
  601. NOT/-
  602. And/&
  603. OR/|
  604. Bing 87
  605. feed:
  606. Yet another unique operator provided by Bing is feed, which allows us to look for
  607. web feed pages containing the provided keyword.
  608. One other feature that Bing provides is to perform social search using the page
  609. https://www.bing.com/explore/social. It allows us to connect our social network
  610. accounts with Bing and perform search within them.
  611. FIGURE 5.5
  612. Bing social search.
  613. FIGURE 5.4
  614. Bing “ip” search.
  615. 88 CHAPTER 5 Advanced Web Searching
  616. YAHOO
  617. Yahoo is one of the oldest players in the search engine arena and has been quite popular. The search page for Yahoo also has a lot of content such as news, trending topics,
  618. weather, fiancial information, and much more. Earlier Yahoo has utilized third party
  619. services to power its search capabilities, later it shifted to become independent and
  620. once again has joined forces with Bing for its searching services. Though there is
  621. not too much that Yahoo offers in terms of advanced searching as compared to other
  622. search engines, the ones provided are worth trying comparing to others. Let’s see
  623. some of the operators that can be useful.
  624. +
  625. This operator is used to make sure the search results contain the keyword followed by it.
  626. Example: +data
  627. -
  628. Opposite to the “+” operator, this operator is used to exclude any specifi keyword
  629. from the search results.
  630. Example: -info
  631. OR
  632. This operator allows us to get results for either of the keywords supplied.
  633. Example: data OR info
  634. site:
  635. This operator allows restricting the result only to the site provided. We will only get
  636. to see the links from the specifid website. There are two other operators which work
  637. like this operator but do not provide results as accurate or in-depth as they are domain
  638. and hostname. Their usage is similar to the “site” operator.
  639. Example: site:elsevier.com
  640. link:
  641. It is another interesting operator which allows us to lookup web pages which link to
  642. the specifi web page provided. While using this operator do keep in mind to provide
  643. the URL with the protocol (http:// or https://).
  644. Yahoo 89
  645. Example: link:http://www.elsevier.com/
  646. defie:
  647. We can use this operator to fid out the dictionary meaning of a word.
  648. Example: defie:data
  649. intitle:
  650. The “intitle” operator is used to get the results which contain the specifid keyword
  651. in their title tag.
  652. Example: intitle:data
  653. So these are the operators which Yahoo supports. Apart from these we can access
  654. the Yahoo advanced search page at http://search.yahoo.com/search/options?fr=
  655. fp-top&p=, which allows us to achieve well-fitered search results. One other thing
  656. that Yahoo offers is advanced news search which can be performed using the page
  657. http://news.search.yahoo.com/advanced .
  658. FIGURE 5.6
  659. Yahoo “link” search.
  660. 90 CHAPTER 5 Advanced Web Searching
  661. FIGURE 5.7
  662. Yahoo advanced search page.
  663. YANDEX:
  664. Yandex is Russian search engine and is not too much popular outside the country,
  665. but it’s one of the most powerful search engines available. Like Google, Bing, Yahoo
  666. it has its own unique keywords and data indexed. Yandex is the most popular and
  667. widely used search engine in Russia. It’s the fourth largest search engine in the world.
  668. Apart from Russia, it is also used in countries like Ukraine, Kazakhstan, Turkey, and
  669. Belarus. It is also most under rated search engine as its use is only limited to specifi
  670. country but in security community we see it otherwise. Most of the people are either
  671. happy with their conventional search engine or they think all the internet information
  672. is available in the search engine they are using. But the fact is that search engines like
  673. Yandex also have many unique features that can provide us with way effiient result
  674. as compared to other search engines.
  675. Here we will discuss how Yandex can be a game changer in searching data on
  676. internet and how to use it effiiently.
  677. As discussed earlier like other search engines, Yandex has its own operators such
  678. as lang, parenthesis, Boolean, and all. Let’s get familiar with these operators and
  679. their usage.
  680. +
  681. This operator works quite same for all the search engines. Here also for Yandex, +
  682. operator is used to include a keyword in a search result page. The keyword added
  683. after + operator is the primary keyword in the search query. The result fetched by the
  684. search engine must contain that keyword.
  685. Yandex 91
  686. Example: osint +tools
  687. Here the result page might not contain the OSINT keyword but must contain tools
  688. keyword. So when we want to focus on a particular keyword or set of keywords in
  689. Yandex, we must use + operator.
  690. ∼∼
  691. This is used as NOT operator which is used to exclude a keyword from a search result
  692. page. It can be used in excluding a particular thing from a set of the things. Let’s say
  693. we want to buy mobile phone but not windows phone. Then we can craft a query
  694. accordingly to avoid windows phone from search result by using ∼∼ operator.
  695. Example: mobile phone ∼∼ windows
  696. Unlike ∼∼ operator ∼ is used to exclude a keyword not from search result page but
  697. search result sentence. That means we might have both or all the keywords present
  698. in the query in a page but the excluded keyword must not be in any sentence with the
  699. other keywords mentioned. I understand it being little complicated so let me explain
  700. simply. Let’s start with the above query
  701. mobile phone ∼∼ windows
  702. Here if a page contains both mobile phone as well as windows, Yandex will
  703. exclude that page from search result.
  704. Example: mobile phone ∼ windows
  705. But for the example shown above, it will show all the pages that contains both
  706. mobile phone as well as windows but not if these two keywords are in same sentence.
  707. &&
  708. The && operator is used to show pages that contains both the keywords in search
  709. result.
  710. Example: power && searching
  711. It will provide the results of all the pages that contain both these keywords.
  712. &
  713. This operator is used to show only pages that contains both the keywords in a sen-
  714. tence. It provides more refied result for both the keywords.
  715. Example: power & searching
  716. /number
  717. It’s a special operator which can be used for different purposes according to the number used after slash. It’s used for defiing the closeness of the keywords. It is quite
  718. similar to AROUND operator of Google and NEAR operator of Bing. The number
  719. used with slash defies the word distance between two keywords.
  720. 92 CHAPTER 5 Advanced Web Searching
  721. Example: power /4 searching
  722. Yandex will make sure that the result page must contain these two keywords with
  723. in four words from each other irrespective of keyword position. That means the order
  724. in which we created the query with the keywords might change in result page.
  725. What if we need to fi the order? Yes, Yandex has a solution for that also: adding
  726. a +sign with the number.
  727. Example: power /+4 searching
  728. By adding the + operator before the number will force Yandex to respond with the
  729. results with only pages where these two keywords are in same order and in within 4
  730. word count.
  731. What if we need the reverse of it, let’s say we need to get results of keyword
  732. “searching” fist and after that “power” within 4 word count and not vice versa. In
  733. that case negative number will come pretty handy where we can use - sign to reverse
  734. what we just did without getting the vice versa result.
  735. Example: power /-4 searching
  736. This will only display pages which contain searching keyword and power after
  737. that within 4 word count.
  738. Let’s say we want to setup a radius or boundary for a keyword with respect to
  739. another; in that case we have to specify that keyword in second position.
  740. Example: power /(-3 +4) searching
  741. Here we are setting up a radius for searching with respect to power. This means
  742. that the page is displayed in results shown only if either “searching” will be found
  743. within 3 words before or after “power” within 4 word count.
  744. This can be helpful when we are searching for two people’s names. In that case
  745. we cannot guess that which name will come fist and which name will come next
  746. so it’s better to create a radius for those two names, and the query will serve our
  747. purpose.
  748. As we discussed a lot about word-based keyword search, now let’s put some light
  749. on sentence-based keyword search. For sentence based keyword search we can use
  750. Yandex && operator with this number operator.
  751. Example: power && /4 searching
  752. In this case we can get result pages containing these two keywords with in 4
  753. sentence difference irrespective of the position of the keyword. That means either
  754. “power” may come fist and “searching” after that or vice versa.
  755. !
  756. This operator does something special. And this is one of my favorite keyword. It
  757. gives a user freedom to only search a specifi keyword without similar word search
  758. or extended search and all. What exactly happens in general search is that if you
  759. Yandex 93
  760. search for a keyword, let’s say AND, you will get some results showing only AND
  761. and then the results will extend to ANDroid or AMD and so on. If we want to get only
  762. result for AND keyword; use this operator.
  763. Example: !and
  764. This will restrict the search engine to provide results only showing pages which
  765. contains this particular keyword AND.
  766. !!
  767. It can be used to search the dictionary form of the keyword.
  768. Example: !!and
  769. ()
  770. When we want to create a complex query with different keywords and operators we
  771. can use these brackets to group them. As we already used these brackets above, now
  772. we will see some other example to understand the true power of this.
  773. FIGURE 5.8
  774. Yandex complex query.
  775. Example: power && (+searching | !search)
  776. Here the query will search for both sets of keywords fist power searching and
  777. power search but not both in same result.
  778. “”
  779. Now it’s about a keyword let’s say we want to search a particular string or set of
  780. keywords then what to do? Here this operator “” comes for rescue. It is quite similar
  781. 94 CHAPTER 5 Advanced Web Searching
  782. as Google’s “”. This will allow a user to search for exact keywords or string which is
  783. put inside the double quotes.
  784. Example: “What is OSINT?”
  785. It will search for exact string and if available will give us the result accordingly.
  786. *
  787. This operator can be refereed as wildcard operator. The use of this operator is quite
  788. same in most of the search engines. This operator is used to fil the missing keyword
  789. or suggest relevant keywords according to the other keywords used in the search
  790. query.
  791. Example: osint is * of technology
  792. It will search for auto fil the space where * is used to complete the query with
  793. relevant keywords. In this case that can be ocean or treasure or anything. We can also
  794. use this operator with double quote to get more effiient and accurate result.
  795. Example: “OSINT is * of technology”
  796. |
  797. This is also quite similar to OR operator of Google. It allows us to go for different
  798. keywords where we want results for any of them. In-real time scenario we can search
  799. for options using this operator. Let’s say I want to buy a laptop and I have different
  800. options: in that case this operator will come to picture.
  801. Example: dell | toshiba | macbook
  802. Here we can get result for any of these three options but not all in one result.
  803. <<
  804. This is an unusual operator known as non-ranking “AND.” It is basically used to
  805. add additional keywords to the list of keywords without impacting the ranking of
  806. the website on result. We might not get to know what exactly it does by just going
  807. through its defiitions. So in simple words it can be used to tag additional keywords
  808. to the query list without impacting the page rankings.
  809. Example: power searching << OSINT
  810. It can be used to additionally search for OSINT along with the other two keywords without impacting the page ranking in the result page.
  811. title:
  812. This is quite equivalent to the “intitle.” It can be used to search the pages with the
  813. keyword (s) specifid after title query parameter.
  814. Example: title:osint
  815. Yandex 95
  816. This will provide pages that contain OSINT in the title of the web page. Similarly
  817. we can use this title query parameter to search for more than one keyword.
  818. Example: title:(power searching)
  819. url:
  820. This “url” search query parameter is also an add-on. It searches for the exact URL
  821. provided by the user in Yandex database.
  822. Example: url:http://attacker.in
  823. Here Yandex will provide a result if and only if the URL has been crawled and
  824. indexed in its database.
  825. inurl:
  826. It can be used to search for keywords present in a URL or in other words for URL
  827. fragment search. This “inurl” query parameter works quite similar in all the search
  828. engines.
  829. Example: inurl:osint
  830. It will search for all the URLs that contain osint keyword no matter what the position of the keyword is.
  831. mime:fietype
  832. This query parameter is quite similar to “fietype” query parameter of Google. This
  833. helps a user to search for a particular fie type.
  834. Example: osint mime:pdf
  835. FIGURE 5.9
  836. Yandex fie search.
  837. 96 CHAPTER 5 Advanced Web Searching
  838. It will provide us all the PDF links that contains osint keyword. The fie types
  839. supported by Yandex mime are
  840. PDF, RTF, SWF, DOC, XLS, PPT, DOCX, PPTX, XLSX, ODT, ODS, ODP,
  841. ODG
  842. host:
  843. It can be used to search all the available hosts. This can be used by the penetration
  844. testers mostly.
  845. Example: host:owasp.org
  846. rhost:
  847. It is quite similar to host but “rhost” searches for reverse hosts. This can also be used
  848. by the penetration testers to get all the reverse host details.
  849. It can be used in two ways. One is for subdomains by using the wildcard operator
  850. * at the end or another without that.
  851. Example: rhost:org.owasp.*
  852. rhost:org.owasp.www
  853. site:
  854. This operator is like the best friend of a penetration tester or hacker. This is available in most of the search engines. It provides all the details of subdomains of the
  855. provided URL.
  856. For penetration testers or hackers fiding the right place to search for vulnerability is most important. As in most cases the main sites are much secured as
  857. compared to the subdomains, if any operator helps to simplify the process by
  858. providing details of the subdomains to any hacker or penetration tester then half
  859. work is done. So the importance of this operator is defiitely felt in security
  860. industry.
  861. Example: site:http://www.owasp.org
  862. It will provide all the available subdomains of the domain owasp.com as well as
  863. all the pages.
  864. date:
  865. This query can be used to either limit the search data to a specifi date or to specifi
  866. period by a little enhancement in the query.
  867. Example: date:201408*
  868. In this case, format of date used is YYYYMMDD, but in case of the DD we used
  869. wildcard operator “*” so we will get results limited to August 2014.
  870. Yandex 97
  871. We can also limit the same to a particular date of the August 2014 by changing a
  872. bit in the query.
  873. date:20140808
  874. It will only show results belong to that date.
  875. We can also use “=” in place of “:” and it will still work the same. So the above
  876. query can be changed to
  877. date=201408*
  878. date=20140808
  879. As we discussed earlier we can also limit the search results to a particular time
  880. period. Let’s say we want to search something from a particular date to till date. In
  881. that case we can use
  882. date=>20140808
  883. It will provide results from 8th August 2014 to till date, but what if we want to
  884. limit both the start date and the end date. In that case also Yandex provide us a provision of providing range.
  885. date=20140808..20140810
  886. Here we will get the results form date 8th August 2014 to 10th August 2014.
  887. domain:
  888. It can be used to specify the search results based of top level domains (TLDs). Mostly
  889. this type of the domain search was done to get results from country-specifi domains.
  890. Let’s say we wanted to get the list of CERT-empanelled security service providing
  891. company names from different countries. In that case we can search for the countryspecifi domain extension let’s say we want to get these details for New Zealand then
  892. its TLD is nz. So we can craft a query like
  893. Example: “cert empanelled company” domain:nz
  894. lang:
  895. It can be used to search pages written in specifi languages.
  896. Yandex supports some specifi languages such as
  897. RU: Russian
  898. UK: Ukrainian
  899. BE: Belorussian
  900. EN: English
  901. FR: French
  902. DE: German
  903. KK: Kazakh
  904. TT: Tatar
  905. TR: Turkish
  906. 98 CHAPTER 5 Advanced Web Searching
  907. Though we can always use Google translator to translate the page from any
  908. languages to English or any other languages, it’s an added feature provided by
  909. Yandex to fulfil minimum requirements of the regions where Yandex is used
  910. popularly.
  911. So to search a page we need to provide the short form of the languages.
  912. Example: power searching lang:en
  913. It will search for the pages in English that contains power searching.
  914. cat:
  915. It is also something unique provided by Yandex. Cat stands for category. Yandex categorizes different things based on region id or topic id. Using cat we can search for a
  916. result based on region or topic assigned in Yandex database.
  917. The details of Regional codes: http://search.yaca.yandex.ru/geo.c2n.
  918. The details of Topic codes: http://search.yaca.yandex.ru/cat.c2n.
  919. Though the pages contains data in Russian language, we can always use Google
  920. translate to serve this purpose.
  921. As we discussed in the beginning that Yandex is an underrated search engine
  922. some of its cool features are defiitely going to put a mark on our life once we go
  923. through this chapter. One of such feature is its advanced search GUI.
  924. There are lazy people like me who want everything in GUI so that they just have
  925. to customize everything by providing limited details and selecting some checkbox or
  926. radio buttons. Yandex provides that in the below link
  927. http://www.yandex.com/search/advanced?&lr=10558
  928. Here we have to just select what we want and most importantly it covers most
  929. of the operators we discussed above. So go to the page, select what you want, and
  930. search effiiently using GUI.
  931. Defiitely after going through all these operators we can easily feel the impact
  932. of the advance search or we can also use the term power search for that. The
  933. advance search facilitates a user with faster, effiient, and reliable data in the result.
  934. It always reduces our manual efforts to get the desired data. And the content quality is also better in advance search as we limit the search to what we are actually
  935. looking for. It can be either country-specifi domain search, a particular fie type,
  936. or content from a specifi date. These things cannot be done easily with simple
  937. keyword search.
  938. We are in an age where information is everything. Then the reliability factor
  939. comes in to picture and if we want bulk of reliable information from the net in very
  940. less time span then we need to focus on the advance search. We can use any conventional search engine of our choice. Most of the search engines have quite similar
  941. operators to serve the purpose but there are some special features present; so look
  942. for those special features and use different search engines for different customized
  943. advance search.
  944. Yandex 99
  945. So we learned about various search engines and their operators and how to utilize
  946. these operators to search better and get precise results. For some operators we say
  947. their individual operations and how they can help to narrow down the results and for
  948. some we saw how they can be used with other operators to generate a great query
  949. which directly gets us to what we want. Though there are some operators for different search engines which work more or less in the same fashion yet as the crawling
  950. and indexing techniques of different platforms are different, it is worthwhile to check
  951. which one of them provides better results depending upon our requirements. One
  952. thing that we need to keep in mind is that the search providers keep on deprecating
  953. the operators or features which are not used frequently enough and also some functionalities are not available in some regions.
  954. We saw how easily we can get the results that we actually want with the use
  955. of some small but effective techniques. The impact of these techniques is not just
  956. limited to fiding out the links to websites, but if used creatively they can be implemented in various filds. Apart from fiding the information on the web, which certainly is useful for everyone, these techniques can be used to fid out details which
  957. are profession specifi. For example a marketing professional can scale the size of
  958. the website of competitor using the operator “site,” or a sales professional can fid
  959. out emails for a company using the wildcard operator “*@randomcompany.com.”
  960. We also saw how search engine dorks are used by cyber security professionals to fid
  961. out sensitive and compromising information just by using some simple keywords and
  962. operators. The takeaway here is not just to learn about the operators but also about
  963. how we can use them creatively in our profession.
  964. We have covered a lot about how to perform searching using different searching
  965. platforms in this and some previous chapters. Till now we have mainly focused on
  966. browser-based applications or we can say web applications. In the next chapter we
  967. will be moving on and learn about various tools which need to be installed as applications and provide us various features for extracting data related to various filds,
  968. using various methods.
  969. Hacking Web Intelligence. http://dx.doi.org/10.1016/B978-0-12-801867-5.00006-9 101
  970. Copyright © 2015 Elsevier Inc. All rights reserved.
  971. CHAPTER
  972. OSINT Tools and
  973. Techniques 6
  974. INFORMATION IN THIS CHAPTER
  975. • OSINT Tools
  976. • Geolocation
  977. • Information Harvesting
  978. • Shodan
  979. • Search Diggity
  980. • Recon-ng
  981. • Yahoo Pipes
  982. • Maltego
  983. INTRODUCTION
  984. In the previous chapters we learned about the basics of the internet and effective ways
  985. to search it. We went to great depths of searching social media to unconventional
  986. search engines and further learned about effective techniques to use regular search
  987. engines. In this chapter we will move a step further and will discuss about some of
  988. the automated tools and web-based services which are used frequently to perform
  989. reconnaissance by professionals of various intelligence-related domains specially
  990. information security. We will start from the installation part to understanding their
  991. interface and will further learn about their functionality and usage. Some of these
  992. tools provide a rich graphic interface (GUI) and some of them are command line
  993. based (CLI), but don’t judge them by their interface but by their functionality and
  994. relevance in our fild of work.
  995. Before moving any further we must install the dependencies for these tools so that
  996. we don’t have to face any issues during their installation and usage. The packages
  997. we need are
  998. • Java latest version
  999. •Python 2.7
  1000. • Microsoft .NET Framework v4
  1001. We simply need to download the relevant package depending upon our system
  1002. confiuration and we are good to go.
  1003. 102 CHAPTER 6 OSINT Tools and Techniques
  1004. CREEPY
  1005. Most of us are addicted to social networks, and image sharing is one of the most utilized
  1006. features of these platforms. But sometimes when we share these pictures it’s not just the
  1007. image that we are sharing but might also the exact location where that picture was taken.
  1008. Creepy is a Python application which can extract out this information and display
  1009. the geolocation on a map. Currently Creepy supports search for Twitter, Flickr, and
  1010. Instagram. It extracts the geolocation based on EXIF information stored in images,
  1011. geolocation information available through application programming interface (API),
  1012. and some other techniques.
  1013. It can be downloaded from http://ilektrojohn.github.io/creepy/. We simply need
  1014. to select the version according to our platform and install it. The next phase after
  1015. installation of Creepy is to confiure the plugins that are available in it, for which
  1016. we simply need to click on the Plug-in Confiuration button present under the edit
  1017. tab. Here we can select the plugins and using their individual confiuration wizard
  1018. confiure them accordingly. Once the confiuration is done we can check whether it
  1019. is working properly or not using the Test Plugin Confiuration button.
  1020. FIGURE 6.1
  1021. Confiure Creepy.
  1022. After the confiuration phase is done, we can start a new project by clicking on
  1023. the person icon on the top bar. Here we can name the project and search for people
  1024. on different portals. From the search results we can select the person of interest and
  1025. include him/her in the target list and fiish the wizard. After this our project will be
  1026. displayed under the project bar at the right-hand side.
  1027. Creepy 103
  1028. FIGURE 6.2
  1029. Search users.
  1030. Now we simply need to select our project and click on the target icon or right
  1031. click on the project and click Analyze Current Project. After this Creepy will start
  1032. the analysis, which will take some time. Once the analysis is complete, Creepy will
  1033. display the results on the map.
  1034. FIGURE 6.3
  1035. Creepy results.
  1036. Now we can see the results in which the map is populated with the markers
  1037. according the identifid geolocation. Now Creepy further allows us to narrow down
  1038. these results based on various fiters.
  1039. 104 CHAPTER 6 OSINT Tools and Techniques
  1040. Clicking on the calendar button allows us to fiter the results based on a time
  1041. period. We can also fiter the results based upon area, which we can defie in the form
  1042. of radius in kilometers from a point of our choice. We can also see the results in the
  1043. form of a heat map instead of the markers. The negative sign (−) present at the end
  1044. can be used to remove all the fiters imposed on the results.
  1045. FIGURE 6.4
  1046. Applying fiter.
  1047. The results that we get from Creepy can also be downloaded in the form of CSV
  1048. fie and also as KML, which can be used to display the markers in another map.
  1049. Creepy can be used for the information-gathering phase during a pentest
  1050. (penetration test) and also as a proof-of-concept tool to demonstrate to users what
  1051. information they are revealing about themselves.
  1052. FIGURE 6.5
  1053. Download Creepy results.
  1054. TheHarvester 105
  1055. THEHARVESTER
  1056. TheHarvester is an open source intelligence tool (OSINT) for obtaining e-mail
  1057. addresses, employee name, open ports, subdomains, hosts banners, etc. from public
  1058. sources such as search engines like Google, Bing and other sites such as LinkedIn. It’s
  1059. a simple Python tool which is easy to use and contains different information-gathering
  1060. functions. Being a Python tool it’s quite understandable that to use this tool we must
  1061. have Python installed in our system. This tool is created by Christian Martorella and
  1062. one of the simple, popular, and widely used tools in terms of information gathering.
  1063. TheHarvester can be found here: http://www.edge-security.com/theharvester.php
  1064. Generally we need to input a domain name or company name to collect relevant
  1065. information such as email addresses, subdomains, or the other details mentioned in
  1066. the above paragraph. But we can use keywords also to collect related information.
  1067. We can specify our search, such as from which particular public source we want to
  1068. use for the information gathering. There are lots of public source that Harvester use for
  1069. information gathering but before moving to that let’s understand how to use Harvester.
  1070. EX: theharvester -d example.com -l 500 -b Google
  1071. -d = Generally, domain name or company name
  1072. -l = Number of result limits to work with
  1073. -b = Specifying the data source such as in the above command its Google, but
  1074. apart from that we can use LinkedIn and all (to use all the available public
  1075. sources) as a source also to collect information.
  1076. FIGURE 6.6
  1077. TheHarvester in action.
  1078. 106 CHAPTER 6 OSINT Tools and Techniques
  1079. Apart from the above mentioned one harvester also has other options to specify,
  1080. such as:
  1081. -s = to start with a particular result number (the default value is 0)
  1082. -v = to get virtual hosts by verifying hostnames via DNS resolution
  1083. -f= for saving the data. (formats available either html or xml)
  1084. -n = to perform DNS resolve query for all the discovered ranges
  1085. -c = to perform DNS bruteforce for all domain names
  1086. -t= to perform a DNS TLD expansion discovery
  1087. -e = to use a specifi DNS server
  1088. -l = To limit the number of result to work with
  1089. -h = to use Shodan database to query discovered hosts.
  1090. FIGURE 6.7
  1091. TheHarvester HTML results.
  1092. The sources it uses are Google, Google profies, Bing, pretty good privacy
  1093. (PGP) servers, LinkedIn, Jigsaw, Shodan, Yandex, name servers, people123, and
  1094. Shodan 107
  1095. Exalead. Google, Yandex, Bing, and Exalead are search engines that are used in
  1096. backend as a source, while Shodan is also a search engine but not the conventional one and we already discussed a bit about it earlier and we will discuss in
  1097. detail about the same in this chapter later. PGP servers are like key servers used
  1098. for data security and those are also a good source to collect e-mail details. The
  1099. people123 is for searching for a particular person and Jigsaw is the cloud-based
  1100. solution for lead generation and other sales stuffs. From different sources harvester collects different information such as for e-mail harvesting it uses Google,
  1101. Bing, PGP servers, and sometimes Exalead and run their specifi queries in the
  1102. background to get the desired result. Similarly for subdomains or host names it
  1103. uses again Google, Bing, Yandex, Exalead, PGP servers, and Exalead. And fially
  1104. for the list for employee names it uses LinkedIn, Google profies, people123, and
  1105. Jigsaw as a main source.
  1106. This is how theHarvester harvests all the information and gives us the desired
  1107. result as per our query. So craft your query wisely to harvest all the required
  1108. information.
  1109. SHODAN
  1110. We have previously discussed about Shodan briefl in Chapter 4, but this unique
  1111. search engine deserves much more than a paragraph to discuss its usage and impact.
  1112. As discussed earlier Shodan is a computer search engine. The internet consists of
  1113. various different types of devices connected online and available publicly. Most of
  1114. these devices have a banner, which they send as a response to the application request
  1115. send by a client. Many if not most of these banners contains information which
  1116. can be called sensitive in nature, such as server version, device type, authentication
  1117. mode, etc. Shodan allows us to search such devices over internet and also provides
  1118. fiters to narrow down the results.
  1119. It is highly recommended to create an account to utilize this great
  1120. tool, as it removes some of the restrictions imposed on the free usage. So
  1121. after logging into the application we will simply go to the dashboard at
  1122. http://www.shodanhq.com/home. Here we can see some the recent searches as
  1123. well as popular searches made on this platform. This page also shows a quick reference to the fiters that we can use. Moving on let’s see more popular searches
  1124. listed under the URL http://www.shodanhq.com/browse. Here we can see there
  1125. are various different search queries which look quite interesting, such as webcam, default password, SCADA, etc. Clicking on one of these directly takes us
  1126. to the result page and lists details of machines on the internet with that specifi
  1127. keyword. The page http://www.shodanhq.com/help/fiters shows the list of all
  1128. the fiters that we can use in Shodan to perform a more focused search, such as
  1129. country, hostname, port, etc., including the usual fiters “+,”“-,” and “|.”
  1130. 108 CHAPTER 6 OSINT Tools and Techniques
  1131. FIGURE 6.8
  1132. Shodan popular searches.
  1133. FIGURE 6.9
  1134. Shodan fiters.
  1135. Let’s perform a simple search on Shodan for the keyword “webcam.” Shodan has
  1136. simply found more than 15,000 results for this keyword; though we cannot view all the
  1137. results under the free package, yet what we get is enough to understand its reach and
  1138. availability of such devices on the internet. Some of these might be protected by some
  1139. kind of authentication mechanism such as username and password, but some might be
  1140. publicly accessible without any such mechanism. We can simply fid out by opening
  1141. Shodan 109
  1142. their listed IP address in our browsers (Warning: It might be illegal to do so depending
  1143. upon the laws of the country, etc.). We can further narrow down these results to a
  1144. country by using the “country” fiter. So our new query is “webcams country:us” which
  1145. gives us a list of webcams in the United States of America.
  1146. FIGURE 6.10
  1147. Shodan results for query “webcam”
  1148. To get a list of machines with fie transfer protocol (FTP) service, residing in India,
  1149. we can use the query “port:21 country:in”. We can also perform search for specifi IP
  1150. address or range of it using the fiter “net.” Shodan is providing a great deal of relevant
  1151. information and its application is only limited by the creativity of its users.
  1152. FIGURE 6.11
  1153. Shodan results for query “port:21 country:in.”
  1154. 110 CHAPTER 6 OSINT Tools and Techniques
  1155. Apart from this Shodan also offers an API to integrate its data into our own
  1156. application. There are also some other services provided by it at a price and are
  1157. worth a try for anyone working in the information security domain. Recently there
  1158. has been a lot of development in Shodan and its associated services which makes
  1159. this product a must try for information security enthusiasts.
  1160. SEARCH DIGGITY
  1161. In the last chapter we learned a lot about using advanced search features of various
  1162. search engines and also briefl discussed about the term “Google Hacking.” To perform such functions we need to have the list of operations that we can use and will
  1163. have to type each query to see if anything is vulnerable, but what if there was a tool
  1164. which has a database of such queries and we can simply run it. Here enters the Search
  1165. Diggity. Search Diggity is tool by Bishop Fox which has a huge set of options and a
  1166. large database of queries for various search engines which allow us to gather compromising information related to our target. It can be downloaded from http://www.bishopfox.com/resources/tools/google-hacking-diggity/attack-tools/. The basic
  1167. requirement for its installation is Microsoft .NET framework v4
  1168. Once we have downloaded and installed the application, the things we need are
  1169. the search ids and API keys. These search ids/API keys are required so that we can
  1170. perform more number of searcher without too many restrictions. We can fid how
  1171. to get and use these keys in the contents section under the Help tab and also from a
  1172. some simple Google searches. Once all the keys (Google, Bing, Shodan, etc.) are at
  1173. their place we can move forward with the usage of the tool.

comments powered by Disqus