Removing duplicates from list in slightly different case

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Mar 2009
Posts: 9
Reputation: dilipkk is an unknown quantity at this point 
Solved Threads: 0
dilipkk dilipkk is offline Offline
Newbie Poster

Removing duplicates from list in slightly different case

 
0
  #1
Mar 28th, 2009
For an application, I need to parse a string which contains urls and their titles.
For example:
'name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'
name means title of url here.
I want list of strings which contain both title and url.
For example:
['name="My Mobile Blog" url="http://caydab565.blogspot.com/"','name="Creative Disaster" url="http://kevinlara.blogspot.com/"'] for the above string
This is very simple and I know how to do it using re module.
I want to get list of strings like above but their titles are unique.
For example:
'name="Creative Disaster" url="http://abc122.blogspot.com/" name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'
From the above string I want list of strings like below:
['name="Creative Disaster" url="http://abc122.blogspot.com/" ,'name="My Mobile Blog" url="http://caydab565.blogspot.com/"']
Can any one help in this?
Thanks in advance.

Dilip Kumar Kola
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 966
Reputation: Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough 
Solved Threads: 222
Gribouillis's Avatar
Gribouillis Gribouillis is offline Offline
Posting Shark

Re: Removing duplicates from list in slightly different case

 
0
  #2
Mar 29th, 2009
I think this function should help you
  1. import re
  2.  
  3. keyPatt = re.compile(r"\b\w+=")
  4.  
  5. testData='name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'
  6.  
  7. def gen_pairs(dataString):
  8. key, pos = None, 0
  9. for match in keyPatt.finditer(dataString):
  10. startPos, endPos = match.span()
  11. if key is not None:
  12. value = dataString[pos:startPos].strip()
  13. yield (key, value)
  14. key, pos = dataString[startPos:endPos-1], endPos
  15. if key is not None:
  16. value = dataString[pos:].strip()
  17. yield (key, value)
  18.  
  19. for item in gen_pairs(testData):
  20. print item
Reply With Quote Quick reply to this message  
Join Date: Mar 2009
Posts: 9
Reputation: dilipkk is an unknown quantity at this point 
Solved Threads: 0
dilipkk dilipkk is offline Offline
Newbie Poster

Re: Removing duplicates from list in slightly different case

 
0
  #3
Mar 29th, 2009
Thank for replying Gribouillis,

I find your solution little difficult to understand.

I find a solution on myself:
  1. #lets say I already got list of string from a big string containing so many urls and titles
  2. strings = ['name="Creative Disaster" url="http://kevinlara121.blogspot.com/"','name="My Mobile Blog" url="http://caydab565.blogspot.com/"','name="Creative Disaster" url="http://kevinlara.blogspot.com/"']
  3. d={}; f={}
  4. for string in strings:
  5. index=url.find('url="')
  6. d[url[6:index-2]]=url[index+5:]
  7. for t,u in d.items():
  8. f[u]=t
  9. strings=f.items()
  10. """
  11. strings = [('Creative Disaster, 'http://kevinlara.blogspot.com/'),('My Mobile Blog' , 'http://caydab565.blogspot.com/')]
  12. """
Last edited by dilipkk; Mar 29th, 2009 at 3:02 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2008
Posts: 966
Reputation: Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough Gribouillis is a jewel in the rough 
Solved Threads: 222
Gribouillis's Avatar
Gribouillis Gribouillis is offline Offline
Posting Shark

Re: Removing duplicates from list in slightly different case

 
0
  #4
Mar 29th, 2009
I see. In fact I wrote a function which can handle general data having the form
  1. 'key1=value1 key2=value2 key3=value3'
It only supposes that the values don't contain the '=' sign and that the keys are made of one or more alphanumeric characters.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the Python Forum
Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC