| | |
Grouping strings by prefixes
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
This snippet defines a class which can group a collection of strings according to a given set of prefixes. Each string goes in the group of the longest prefix it contains.
#!/usr/bin/env python # Gribouillis at www.daniweb.com """ The class Grouper implements grouping strings by prefixes. A Grouper object is created with a sequence of prefixes G = Grouper(["pref1", "pref2", ...]) (an empty prefix is automatically added). The Grouper object is a dictionary "prefix --> set of strings". Initially, it looks like { "pref1" : set([]), "pref2" : set([]), } The method 'add_strings' can be used to add a sequence of strings to the grouper. Each string goes in one of the sets according to the longest prefix it contains. After G.add_strings(["pref1aaaaa", "pref2jjj", "pref1s"]) it's content will be { "pref1" : set(["pref1aaaaa", "pref1s"]), "pref2" : set(["pref2jjj"]), } More strings can be added. """ from collections import deque from bisect import bisect_left ,bisect_right class Grouper (dict ): def __init__ (self ,iterable ): prefixes =set (iterable ) prefixes .add ("") dict .__init__ (self ,[(p ,set ())for p in prefixes ]) self .skeys =[] def add_strings (self ,iterable ): strings =sorted (set (iterable )) if len (self .skeys )!=len (self ): self .skeys =sorted (self )[1 :] stack =deque ([("",chr (255 ))]) k =0 for p in self .skeys : while stack [-1 ][1 ]<=p : q ,qs =stack .pop () k =self ._forward (q ,k ,qs ,strings ) k =self ._forward (stack [-1 ][0 ],k ,p ,strings ) stack .append ((p ,p [:-1 ]+chr (ord (p [-1 ])+1 ))) while stack : q ,qs =stack .pop () k =self ._forward (q ,k ,qs ,strings ) def _forward (self ,q ,k ,qs ,strings ): ks =bisect_left (strings ,qs ,k ) self [q ].update (strings [i ]for i in xrange (k ,ks )) return ks if __name__ =="__main__": grouper =Grouper (["/usr","/usr/lib","hello","heat"]) grouper .add_strings ([ "/usr/bin/python","/usr/lib/python2.5","/dev", "hello world","hello","heat foo","heat","her", "/usr/liberation", ]) for k ,s in grouper .items (): print ("%s: %s"%(repr (k ),str (s ))) print "Adding more strings" grouper .add_strings ([ "hello fred","/usrssss", ]) for k ,s in grouper .items (): print ("%s: %s"%(repr (k ),str (s )))
Similar Threads
- Doubt with C++ file name prefixes (C++)
- Fast Lookups for Mobile Number Prefixes. (Java)
- Need help with grouping (MySQL)
- Grouping question (MySQL)
- Please Help me on Data Grouping in VB 6.0 (Visual Basic 4 / 5 / 6)
| Thread Tools | Search this Thread |
Tag cloud for Python
advanced anydbm app bash beginner bits calling chmod cmd code data dictionary directory dynamic edit examples excel external feet file float format ftp function gui halp homework http images import info input ip itunes java keycontrol line linux list lists loan loop maintain millimeter mouse newb number numbers output panel parsing path port prime print program programming projects push py-mailer py2exe pygame pyqt python queue random recursion recursive scrolledtext smtp split ssh string strings sudokusolver table terminal text thread threading time tkinter tlapse tricks tuple tutorial ubuntu unicode update urllib urllib2 variable variables ventrilo web webservice whileloop windows wxpython xlib



