For an application, I need to parse a string which contains urls and their titles.
For example:
'name="My Mobile Blog" url="" name="Creative Disaster" url="" ...'
name means title of url here.
I want list of strings which contain both title and url.
For example:
for the above string
This is very simple and I know how to do it using re module.
I want to get list of strings like above but their titles are unique.
For example:
'name="Creative Disaster" url="" name="My Mobile Blog" url="" name="Creative Disaster" url="" ...'
From the above string I want list of strings like below:

Can any one help in this?
Thanks in advance.

Dilip Kumar Kola

I think this function should help you

import re

keyPatt = re.compile(r"\b\w+=")

testData='name="My Mobile Blog" url="" name="Creative Disaster" url="" ...'

def gen_pairs(dataString):
  key, pos = None, 0
  for match in keyPatt.finditer(dataString):
    startPos, endPos = match.span()
    if key is not None:
      value = dataString[pos:startPos].strip()
      yield (key, value)
    key, pos = dataString[startPos:endPos-1], endPos
  if key is not None:
    value = dataString[pos:].strip()
    yield (key, value)

for item in gen_pairs(testData):
  print item

Thank for replying Gribouillis,

I find your solution little difficult to understand.

I find a solution on myself:

#lets say I already got list of string from a big string containing so many urls and titles
strings = ['name="Creative Disaster" url=""','name="My Mobile Blog" url=""','name="Creative Disaster" url=""']
d={}; f={}
for string in strings:
for t,u in d.items():
strings = [('Creative Disaster, ''),('My Mobile Blog' , '')]

I see. In fact I wrote a function which can handle general data having the form

'key1=value1 key2=value2  key3=value3'

It only supposes that the values don't contain the '=' sign and that the keys are made of one or more alphanumeric characters.

This article has been dead for over six months. Start a new discussion instead.