Browsing in Python with Mechanize Browsing with Mechanize


SUBMITTED BY: alemotta

DATE: March 14, 2017, 4:23 p.m.

FORMAT: Text only

SIZE: 1.9 kB

HITS: 882

  1. Browsing in Python with Mechanize
  2. Browsing with Mechanize
  3. The mechanize module in Python is similar to perl WWW:Mechanize.
  4. It gives you a browser like object to interact with web pages.
  5. Here is an example on how to use it in a program.
  6. import mechanize
  7. br = mechanize.Browser()
  8. br.open("http://www.example.com/")
  9. Follow second link with element text matching regular expression
  10. response1 = br.follow_link(text_regex=r"cheeses*shop", nr=1)
  11. assert br.viewing_html()
  12. print br.title()
  13. print response1.geturl()
  14. print response1.info() # headers
  15. print response1.read() # body
  16. To get the response code from a website, you can the response.code
  17. from mechanize import Browser
  18. browser = Browser()
  19. response = browser.open('http://www.google.com')
  20. print response.code
  21. Get all forms from a website
  22. import mechanize
  23. br = mechanize.Browser()
  24. br.open("http://www.google.com/")
  25. for f in br.forms():
  26. print f
  27. I found this post at http://stockrt.github.com that very accurate describes how
  28. to emulate a browser in Python using mechanize.
  29. Browsing with Python (written of Drew Stephens)
  30. #!/usr/bin/python
  31. import re
  32. from mechanize import Browser
  33. br = Browser()
  34. Ignore robots.txt
  35. br.set_handle_robots( False )
  36. Google demands a user-agent that isn't a robot
  37. br.addheaders = [('User-agent', 'Firefox')]
  38. Retrieve the Google home page, saving the response
  39. br.open( "http://google.com" )
  40. Select the search box and search for 'foo'
  41. br.select_form( 'f' )
  42. br.form[ 'q' ] = 'foo'
  43. Get the search results
  44. br.submit()
  45. Find the link to foofighters.com; why did we run a search?
  46. resp = None
  47. for link in br.links():
  48. siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
  49. if siteMatch:
  50. resp = br.follow_link( link )
  51. break
  52. Print the site
  53. content = resp.get_data()
  54. print content
  55. The script above is split up to make it easier to read

comments powered by Disqus