Getting the most popular pages from your Apache logfile


SUBMITTED BY: alemotta

DATE: March 14, 2017, 4:30 p.m.

FORMAT: Text only

SIZE: 826 Bytes

HITS: 832

  1. Getting the most popular pages from your Apache logfile
  2. An Apache logfile can be huge and hard to read.
  3. Here is a way to get a list of the most visited pages (or files) from an Apache logfile.
  4. In this example, we only want to know the URLs from GET requests. We will use the wonderful Counter which is in Python's Collections
  5. import collections
  6. logfile = open("yourlogfile.log", "r")
  7. clean_log=[]
  8. for line in logfile:
  9. try:
  10. # copy the URLS to an empty list.
  11. # We get the part between GET and HTTP
  12. clean_log.append(line[line.index("GET")+4:line.index("HTTP")])
  13. except:
  14. pass
  15. counter = collections.Counter(clean_log)
  16. # get the Top 50 most popular URLs
  17. for count in counter.most_common(50):
  18. print(str(count[1]) + " " + str(count[0]))
  19. logfile.close()

comments powered by Disqus