Extracting values from a regex match

Question

Tom_45 0 Newbie Poster

1 Year Ago

I am trying to extract three values from the td tags in an html downloaded file.

<tr align="right"><td>236</td><td>Roy</td><td>Allyson</td>
<tr align="right"><td>237</td><td>Marvin</td><td>Pamela</td>
<tr align="right"><td>238</td><td>Micah</td><td>Kristine</td>
<tr align="right"><td>239</td><td>Collin</td><td>Raquel</td>

I am using the pattern match = re.findall(r'<td.?>([\d+])([.?])*<\/td>', file)

The file is created with a read() statement.

The output should look like

(236, "Roy", "Allyson")
(237, "Marvin", "Pamela")
(238, "Micah", "Kristine")
(239, "Collin", "Raquel")

What I get is

(236, "")
(237, "")
(238, "")
(239, "")

I've tried different variations of the same pattern and get

('236', '23', '6')
('Roy', '', 'Roy)
('Allyson', '', 'Alison')
('237', '23', '7')
('Marvin', '', 'Marvin')
('Pamela', '', 'Pamela')
('238', '23', '8')
('Micah', '', 'Micah')
('Kristine', '', 'Kristine')
('239', '23', '9')
('Collin', '', 'Collin')
('Raquel', '', 'Raquel')

I'm relatively new to regular expressions so be gently, but any help would
be appreciated.

PS: I'm using Pythoon

python regex

Edited 1 Year Ago by Tom_45

4 Contributors
5 Replies
190 Views
1 Week Discussion Span
Latest Post 1 Year Ago Latest Post by Tom_45

All 5 Replies

Reverend Jim 5,259 Hi, I'm Jim, one of DaniWeb's moderators.

1 Year Ago

The trick is to use lazy matching which matches the shortest possible string.

html = '<tr align="right"><td>236</td><td>Roy</td><td>Allyson</td>'
pat = '<td>(.+?)</td>'

then

re.split(pat,html)

returns

['<tr align="right">', '236', '', 'Roy', '', 'Allyson', '']

and

re.split(pat,html)[1::2]

returns

['236', 'Roy', 'Allyson']

Edited 1 Year Ago by Reverend Jim

Tom_45 commented: Hey Jim, thanks for the advice. I did finally get the results I was looking for using the pattern '<td>(\d+)+<\/td><td>(\w+)<\/td><td>(\w+)'. +0

pritaeas 2,211 ¯\_(ツ)_/¯

1 Year Ago

Sidenote: If you want to learn, understand and experiment with regexes I can highly recommend RegexBuddy.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Hey Jim, thanks for the advice. I did finally get the results I was looking for using the pattern '<td>(\d+)+<\/td><td>(\w+)<\/td><td>(\w+)'.

score 0 · Answer 1 · 2024-02-12T10:56:53+00:00

Reverend Jim 5,259 Hi, I'm Jim, one of DaniWeb's moderators.

1 Year Ago

Also autoregex

AndreRet 526 Senior Poster · Answer 2 · 2024-02-16T20:18:25+00:00

Same question, different post - Extracting values from capturing groups in rege

Tom_45 0 Newbie Poster Premium Member · Answer 3 · 2024-02-19T11:32:08+00:00

Question has been answered.

The correct pattern is:

matches = re.findall(r'<td>(\d+)+<\/td><td>(\w+)<\/td><td>(\w+)', file)

Extracting values from a regex match

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers