Downloading thread code.

Reply

Join Date: Jul 2008
Posts: 893
Reputation: Gribouillis has a spectacular aura about Gribouillis has a spectacular aura about Gribouillis has a spectacular aura about 
Solved Threads: 209
Gribouillis's Avatar
Gribouillis Gribouillis is offline Offline
Practically a Posting Shark

Downloading thread code.

 
0
  #1
Nov 30th, 2008
The following program is able to download the python programs contained in a thread of the python forum. Just start the program, it will prompt you for the thread number and create a directory with the code extracted from the thread. I used it to download all the wx examples. Note that the algorithm is very primitive: only the code written in language tags is extracted (raw code is not), and data attached to posts is not downloaded. Also it's not robust, if the formatting in daniweb changes tomorrow, it won't work anymore
  1. #!/usr/bin/env python
  2. # danidown.py
  3. from htmllib import HTMLParser
  4. from formatter import AbstractFormatter ,AbstractWriter ,NullWriter
  5. import re
  6. from os import mkdir
  7. from os .path import isdir ,join as pjoin
  8. from urllib2 import urlopen
  9.  
  10. class aThread (object ):
  11. itsCntPattern =re .compile (r"Page \d+ of (\d+)")
  12. def __init__ (o ,theThreadNumber ):
  13. o .itsNumber =theThreadNumber
  14. o ._itsPageCnt =None
  15. o .itsReply =0
  16. o .itsCodeIndex =0
  17. def itsUrl (o ,thePage =1 ):
  18. x =""if (thePage ==1 )else "-%d"%thePage
  19. return "http://www.daniweb.com/forums/thread%d%s.html"%(
  20. o .itsNumber ,x )
  21. def itsContent (o ,thePage =1 ):
  22. theUrl =o .itsUrl (thePage )
  23. f =urlopen (theUrl )
  24. s =f .read ()
  25. f .close ()
  26. return s
  27. @property
  28. def itsPageCnt (o ):
  29. if o ._itsPageCnt is None :
  30. o ._itsPageCnt =1
  31. theContent =o .itsContent (1 )
  32. theMatch =o .itsCntPattern .search (theContent )
  33. if theMatch is not None :
  34. o ._itsPageCnt =int (theMatch .group (1 ))
  35. return o ._itsPageCnt
  36. @property
  37. def itsTriples (o ):
  38. theCnt =o .itsPageCnt
  39. printMessage ("The thread contains %d pages..."%theCnt )
  40. for i in xrange (1 ,theCnt +1 ):
  41. printMessage ("Page %d..."%i )
  42. theWriter =Writer1 ()
  43. theParser =HTMLParser (AbstractFormatter (theWriter ))
  44. theContent =o .itsContent (i )
  45. theParser .feed (theContent )
  46. theParser .close ()
  47. for theTriple in theWriter .itsTriples :
  48. yield theTriple
  49.  
  50. @property
  51. def itsFolder (o ):
  52. return "thread%d"%o .itsNumber
  53.  
  54. def itsReplyFolder (o ,n ):
  55. return pjoin (o .itsFolder ,"reply%d"%n )
  56.  
  57. def doDownload (o ):
  58. for theReply ,theAuthor ,theCode in o .itsTriples :
  59. theFolder =o .itsReplyFolder (theReply )
  60. if theReply >o .itsReply :
  61. printMessage ("reply %d..."%theReply )
  62. if o .itsReply ==0 :
  63. if not isdir (o .itsFolder ):
  64. mkdir (o .itsFolder )
  65. o .itsReply =theReply
  66. o .itsCodeIndex =0
  67. if not isdir (theFolder ):
  68. mkdir (theFolder )
  69. f =open (pjoin (theFolder ,"author"),"w")
  70. f .write (theAuthor +"\n")
  71. f .close ()
  72. o .itsCodeIndex +=1
  73. f =open (pjoin (theFolder ,"prog%d.py"%o .itsCodeIndex ),"w")
  74. f .write (theCode )
  75. f .close ()
  76. print "done."
  77.  
  78. class Writer1 (NullWriter ):
  79. def __init__ (o ):
  80. NullWriter .__init__ (o )
  81. o .isInCode =False
  82. o .itsCode =None
  83. o .itsAuthor ="unknown"
  84. o .itsAnswer =0
  85. o .justReadAuthor =False
  86. o .nextIsAuthor =False
  87. o .nextIsNumber =False
  88. o .itsTriples =[]
  89. def send_label_data (o ,data ):
  90. #print "send_label_data(%s)" % repr(data)
  91. if o .isInCode :
  92. assert (data [-1 ]==".")
  93. n =int (data [:-1 ])
  94. o .itsCode .append ([])
  95. assert (len (o .itsCode )==n )
  96. elif data =="1.":
  97. o .isInCode =True
  98. o .itsCode =[[]]
  99. def send_literal_data (o ,data ):
  100. #print "send_literal_data(%s)" % repr(data)
  101. if o .isInCode and data !="\xa0":
  102. o .itsCode [-1 ].append (data )
  103. def send_line_break (o ):
  104. if o .isInCode :
  105. o .itsCode [-1 ]="".join (o .itsCode [-1 ])
  106. def send_paragraph (o ,data ):
  107. #print "send_paragraph(%s)" % repr(data)
  108. if o .isInCode :
  109. theCode ="\n".join (o .itsCode )
  110. o .itsCode =None
  111. o .isInCode =False
  112. o .itsTriples .append ((o .itsAnswer ,o .itsAuthor ,theCode ))
  113. def send_flowing_data (o ,data ):
  114. if o .nextIsNumber :
  115. o .itsAnswer =int (data )
  116. o .nextIsNumber =False
  117. elif o .justReadAuthor :
  118. if data ==" #":
  119. o .nextIsNumber =True
  120. o .justReadAuthor =False
  121. elif o .nextIsAuthor :
  122. o .itsAuthor =data
  123. o .justReadAuthor =True
  124. o .nextIsAuthor =False
  125. elif data .startswith (" Solved Threads:"):
  126. o .nextIsAuthor =True
  127.  
  128. def printMessage (msg ):
  129. print msg
  130.  
  131. if __name__ =="__main__":
  132. n =int (raw_input ("Enter thread number: "))
  133. theThread =aThread (n )
  134. theThread .doDownload ()
Last edited by Gribouillis; Nov 30th, 2008 at 1:24 pm.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC