Python Mechanize Cheat Sheet Mechanize for null


SUBMITTED BY: alemotta

DATE: March 14, 2017, 4:19 p.m.

FORMAT: Text only

SIZE: 3.1 kB

HITS: 770

  1. Python Mechanize Cheat Sheet
  2. Mechanize
  3. A very useful python module for navigating through web forms is Mechanize.
  4. In a previous post I wrote about "Browsing in Python with Mechanize".
  5. Today I found this excellent cheat sheet on scraperwiki that I would like to
  6. share.
  7. Create a browser object
  8. Create a browser object and give it some optional settings.
  9. import mechanize
  10. br = mechanize.Browser()
  11. br.set_all_readonly(False) # allow everything to be written to
  12. br.set_handle_robots(False) # ignore robots
  13. br.set_handle_refresh(False) # can sometimes hang without this
  14. br.addheaders = # [('User-agent', 'Firefox')]
  15. Open a webpage
  16. Open a webpage and inspect its contents
  17. response = br.open(url)
  18. print response.read() # the text of the page
  19. response1 = br.response() # get the response again
  20. print response1.read() # can apply lxml.html.fromstring()
  21. Using forms
  22. List the forms that are in the page
  23. for form in br.forms():
  24. print "Form name:", form.name
  25. print form
  26. To go on the mechanize browser object must have a form selected
  27. br.select_form("form1") # works when form has a name
  28. br.form = list(br.forms())[0] # use when form is unnamed
  29. Using Controls
  30. Iterate through the controls in the form.
  31. for control in br.form.controls:
  32. print control
  33. print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name])
  34. Controls can be found by name
  35. control = br.form.find_control("controlname")
  36. Having a select control tells you what values can be selected
  37. if control.type == "select": # means it's class ClientForm.SelectControl
  38. for item in control.items:
  39. print " name=%s values=%s" % (item.name, str([label.text for label in item.get_labels()]))
  40. Because 'Select' type controls can have multiple selections, they must be set
  41. with a list, even if it is one element.
  42. print control.value
  43. print control # selected value is starred
  44. control.value = ["ItemName"]
  45. print control
  46. br[control.name] = ["ItemName"] # equivalent and more normal
  47. Text controls can be set as a string
  48. if control.type == "text": # means it's class ClientForm.TextControl
  49. control.value = "stuff here"
  50. br["controlname"] = "stuff here" # equivalent
  51. Controls can be set to readonly and disabled.
  52. control.readonly = False
  53. control.disabled = True
  54. OR disable all of them like so
  55. for control in br.form.controls:
  56. if control.type == "submit":
  57. control.disabled = True
  58. Submit the form
  59. When your form is complete you can submit
  60. response = br.submit()
  61. print response.read()
  62. br.back() # go back
  63. Finding Links
  64. Following links in mechanize is a hassle because you need the have the link
  65. object.
  66. Sometimes it is easier to get them all and find the link you want from the text.
  67. for link in br.links():
  68. print link.text, link.url
  69. Follow link and click links is the same as submit and click
  70. request = br.click_link(link)
  71. response = br.follow_link(link)
  72. print response.geturl()
  73. I hope that you got more understanding of the Mechanize module in Python.

comments powered by Disqus