Grouping strings by prefixes

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Gribouillis Gribouillis is offline Offline Oct 11th, 2008, 1:51 pm |
0
This snippet defines a class which can group a collection of strings according to a given set of prefixes. Each string goes in the group of the longest prefix it contains.
Quick reply to this message  
Python Syntax
  1. #!/usr/bin/env python
  2.  
  3. # Gribouillis at www.daniweb.com
  4. """
  5. The class Grouper implements grouping strings by prefixes.
  6.  
  7. A Grouper object is created with a sequence of prefixes
  8. G = Grouper(["pref1", "pref2", ...])
  9.  
  10. (an empty prefix is automatically added). The Grouper object
  11. is a dictionary "prefix --> set of strings". Initially, it looks
  12. like
  13. {
  14. "pref1" : set([]),
  15. "pref2" : set([]),
  16. }
  17.  
  18. The method 'add_strings' can be used to add a sequence of strings
  19. to the grouper. Each string goes in one of the sets according to
  20. the longest prefix it contains.
  21. After
  22. G.add_strings(["pref1aaaaa", "pref2jjj", "pref1s"])
  23.  
  24. it's content will be
  25.  
  26. {
  27. "pref1" : set(["pref1aaaaa", "pref1s"]),
  28. "pref2" : set(["pref2jjj"]),
  29. }
  30.  
  31. More strings can be added.
  32. """
  33.  
  34. from collections import deque
  35. from bisect import bisect_left ,bisect_right
  36.  
  37. class Grouper (dict ):
  38. def __init__ (self ,iterable ):
  39. prefixes =set (iterable )
  40. prefixes .add ("")
  41. dict .__init__ (self ,[(p ,set ())for p in prefixes ])
  42. self .skeys =[]
  43.  
  44. def add_strings (self ,iterable ):
  45. strings =sorted (set (iterable ))
  46. if len (self .skeys )!=len (self ):
  47. self .skeys =sorted (self )[1 :]
  48. stack =deque ([("",chr (255 ))])
  49. k =0
  50. for p in self .skeys :
  51. while stack [-1 ][1 ]<=p :
  52. q ,qs =stack .pop ()
  53. k =self ._forward (q ,k ,qs ,strings )
  54. k =self ._forward (stack [-1 ][0 ],k ,p ,strings )
  55. stack .append ((p ,p [:-1 ]+chr (ord (p [-1 ])+1 )))
  56. while stack :
  57. q ,qs =stack .pop ()
  58. k =self ._forward (q ,k ,qs ,strings )
  59.  
  60. def _forward (self ,q ,k ,qs ,strings ):
  61. ks =bisect_left (strings ,qs ,k )
  62. self [q ].update (strings [i ]for i in xrange (k ,ks ))
  63. return ks
  64.  
  65. if __name__ =="__main__":
  66. grouper =Grouper (["/usr","/usr/lib","hello","heat"])
  67. grouper .add_strings ([
  68. "/usr/bin/python","/usr/lib/python2.5","/dev",
  69. "hello world","hello","heat foo","heat","her",
  70. "/usr/liberation",
  71. ])
  72. for k ,s in grouper .items ():
  73. print ("%s: %s"%(repr (k ),str (s )))
  74. print "Adding more strings"
  75. grouper .add_strings ([
  76. "hello fred","/usrssss",
  77. ])
  78. for k ,s in grouper .items ():
  79. print ("%s: %s"%(repr (k ),str (s )))

Message:


Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC