Removing duplicates from list in slightly different case

Question

dilipkk 0 Newbie Poster

16 Years Ago

For an application, I need to parse a string which contains urls and their titles.
For example:
'name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'
name means title of url here.
I want list of strings which contain both title and url.
For example:
for the above string
This is very simple and I know how to do it using re module.
I want to get list of strings like above but their titles are unique.
For example:
'name="Creative Disaster" url="http://abc122.blogspot.com/" name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'
From the above string I want list of strings like below:

Can any one help in this?
Thanks in advance.

Dilip Kumar Kola

python

2 Contributors
3 Replies
131 Views
21 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by Gribouillis

All 3 Replies

Gribouillis 1,391 Programming Explorer

16 Years Ago

I think this function should help you

import re

keyPatt = re.compile(r"\b\w+=")

testData='name="My Mobile Blog" url="http://caydab565.blogspot.com/" name="Creative Disaster" url="http://kevinlara.blogspot.com/" ...'

def gen_pairs(dataString):
  key, pos = None, 0
  for match in keyPatt.finditer(dataString):
    startPos, endPos = match.span()
    if key is not None:
      value = dataString[pos:startPos].strip()
      yield (key, value)
    key, pos = dataString[startPos:endPos-1], endPos
  if key is not None:
    value = dataString[pos:].strip()
    yield (key, value)

for item in gen_pairs(testData):
  print item

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

dilipkk 0 Newbie Poster · Answer 1 · 2009-03-29T12:01:00+00:00

Thank for replying Gribouillis,

I find your solution little difficult to understand.

I find a solution on myself:

#lets say I already got list of string from a big string containing so many urls and titles
strings = ['name="Creative Disaster" url="http://kevinlara121.blogspot.com/"','name="My Mobile Blog" url="http://caydab565.blogspot.com/"','name="Creative Disaster" url="http://kevinlara.blogspot.com/"']
d={}; f={}
for string in strings:
     index=url.find('url="')
     d[url[6:index-2]]=url[index+5:]
for t,u in d.items():
     f[u]=t
strings=f.items()
"""
strings = [('Creative Disaster, 'http://kevinlara.blogspot.com/'),('My Mobile Blog' , 'http://caydab565.blogspot.com/')]
"""

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 2 · 2009-03-29T12:22:42+00:00

I see. In fact I wrote a function which can handle general data having the form

'key1=value1 key2=value2  key3=value3'

It only supposes that the values don't contain the '=' sign and that the keys are made of one or more alphanumeric characters.

Removing duplicates from list in slightly different case

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers