Regular expression class with child/parent

Question

elpablo 0 Newbie Poster

16 Years Ago

Hi,

I'm trying to figure out what would be the best way to develop a regular expression class that can have child or parent.

I want to develop a generic regex extractor for text files.

Example :
- An HTML file has a table
- Each table has some data (let's say classes)
- Each class has some properties.
- Each property can have multiple data (array).
- So on

We need :
- A regex to extract each class which are a subtable in the main table.
- Regexes for each properties that are rows.
- Regexes for each value in an array

You see scheme. So I need a recursive class or something like that.

Does someone have an idea of what could be a good design?

Thanks

regex

3 Contributors
9 Replies
166 Views
2 Days Discussion Span
Latest Post 16 Years Ago Latest Post by elpablo

All 9 Replies

Rashakil Fol 978 Super Senior Demiposter

16 Years Ago

You see scheme. So I need a recursive class or something like that.

What the fuck is a recursive class? What the fuck are you talking about?

Rashakil Fol 978 Super Senior Demiposter

16 Years Ago

Rashakil : Please stay polite, your answer is very non professional.

I am a professional programmer, so that means my answer is by definition professional :P

Rashakil Fol 978 Super Senior Demiposter

16 Years Ago

I think it would help if you specified more precisely what you expect the input text to be, and gave examples.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

LizR 171 Posting Virtuoso · Answer 1 · 2009-01-28T23:48:40+00:00

so, is a class represented by a row in the table? if so, easy.. just look for the tr tags

LizR 171 Posting Virtuoso · Answer 2 · 2009-01-29T00:26:28+00:00

LizR 171 Posting Virtuoso

16 Years Ago

Perhaps he means a linked list?

elpablo 0 Newbie Poster · Answer 3 · 2009-01-29T21:29:04+00:00

Perhaps he means a linked list?

Sort of a linked list. It's not only a question of <tr> tags. Inside each <tr> there could be other sets of values I need to extract, inside these values, their might be other values and so on.

So a regular expression could bear a set of other regular expression.

Algorithm:

matches m_parent = regex_Parent.match(text)
foreach (x in m_parent)
{
   load set of sub_regexes
   foreach r in the set of sub_regexes
   {
      matches m_child = r.match (x)
      ...
      load set of sub_sub...
      ... so on
   }
}

An application of this extractor could be extracting results of a google query. There are blocks of pages and in each block there's some info.
The same application could work with yahoo, pirate bay, etc. Only the regex file could be change.

Rashakil : Please stay polite, your answer is very non professional.

elpablo 0 Newbie Poster · Answer 4 · 2009-01-29T21:40:04+00:00

I am a professional programmer, so that means my answer is by definition professional :P

Ok then... as a professional answer it wasn't useful.

LizR 171 Posting Virtuoso · Answer 5 · 2009-01-29T22:33:08+00:00

Sort of a linked list. It's not only a question of <tr> tags. Inside each <tr> there could be other sets of values I need to extract, inside these values, their might be other values and so on.

Well the regexpression would handle that just fine...

elpablo 0 Newbie Poster · Answer 6 · 2009-01-31T12:05:03+00:00

I want to create an application that could extract any structured data. Kind of a generic parser.

Examples :
- Google results
- CNN news
- Forums
- Engadget
- ...

All these website have structured data. Except all of them are structured diffently. It could be easy to extract data from them using a structured tree of regular expressions.

Regular expression class with child/parent

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers