Hey,

Do any one know how to get only Lang ID from Google chrome site ("view-source:https://www.google.com/chrome?hl=en-GB") using this regex "<option value=""([a-zA-Z])"">[&#0-9; a-zA-Z()]</option>" ?

Recommended Answers

All 3 Replies

I have tried following but did not work

def getsourcecode():
  url ="https://www.google.com/chrome?hl=da"
  req = urllib2.Request(url, None)
  source_code = urllib2.urlopen(req).read()
  #return (source_code)

  for line in getsourcecode: 
  matchObj = re.match(r"<option value=""([a-zA-Z]*)"">[�-9; a-zA-Z()]*</option>", line, re.M|re.I)

  if matchObj:
    print "matchObj.group(1) : ", matchObj.group(1)

  else:
    print "No match!!"

You can't double the double quotes like this

>>> r"<option value=""([a-zA-Z])"">[&#0-9; a-zA-Z()]</option>" # bad
'<option value=([a-zA-Z])>[&#0-9; a-zA-Z()]</option>'
>>> r'<option value="([a-zA-Z])">[&#0-9; a-zA-Z()]</option>' # good
'<option value="([a-zA-Z])">[&#0-9; a-zA-Z()]</option>'

Use kodos to debug regexes.

edit: in python, r"foo""bar""baz" is the same as r"foo" + "bar" + "baz".

my mistake, just tried and did't worked, and i have tested regex its working.

def getsourcecode():
  url ="https://www.google.com/chrome?hl=da"
  req = urllib2.Request(url, None)
  source_code = urllib2.urlopen(req).read()
  #return (source_code)

  for line in getsourcecode: 
  matchObj = re.match(r"<option value="([a-zA-Z])">[&#0-9; a-zA-Z()]</option>", line)

  if matchObj:
    print "matchObj.group(1) : ", matchObj.group(1)

  else:
    print "No match!!"
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.