| | |
Regular expression class with child/parent
Please support our C# advertiser: Intel Parallel Studio Home
![]() |
•
•
Join Date: May 2005
Posts: 7
Reputation:
Solved Threads: 0
Hi,
I'm trying to figure out what would be the best way to develop a regular expression class that can have child or parent.
I want to develop a generic regex extractor for text files.
Example :
- An HTML file has a table
- Each table has some data (let's say classes)
- Each class has some properties.
- Each property can have multiple data (array).
- So on
We need :
- A regex to extract each class which are a subtable in the main table.
- Regexes for each properties that are rows.
- Regexes for each value in an array
You see scheme. So I need a recursive class or something like that.
Does someone have an idea of what could be a good design?
Thanks
I'm trying to figure out what would be the best way to develop a regular expression class that can have child or parent.
I want to develop a generic regex extractor for text files.
Example :
- An HTML file has a table
- Each table has some data (let's say classes)
- Each class has some properties.
- Each property can have multiple data (array).
- So on
We need :
- A regex to extract each class which are a subtable in the main table.
- Regexes for each properties that are rows.
- Regexes for each value in an array
You see scheme. So I need a recursive class or something like that.
Does someone have an idea of what could be a good design?
Thanks
•
•
Join Date: May 2005
Posts: 7
Reputation:
Solved Threads: 0
Sort of a linked list. It's not only a question of <tr> tags. Inside each <tr> there could be other sets of values I need to extract, inside these values, their might be other values and so on.
So a regular expression could bear a set of other regular expression.
Algorithm:
An application of this extractor could be extracting results of a google query. There are blocks of pages and in each block there's some info.
The same application could work with yahoo, pirate bay, etc. Only the regex file could be change.
Rashakil : Please stay polite, your answer is very non professional.
So a regular expression could bear a set of other regular expression.
Algorithm:
C# Syntax (Toggle Plain Text)
matches m_parent = regex_Parent.match(text) foreach (x in m_parent) { load set of sub_regexes foreach r in the set of sub_regexes { matches m_child = r.match (x) ... load set of sub_sub... ... so on } }
An application of this extractor could be extracting results of a google query. There are blocks of pages and in each block there's some info.
The same application could work with yahoo, pirate bay, etc. Only the regex file could be change.
Rashakil : Please stay polite, your answer is very non professional.
•
•
Join Date: Aug 2008
Posts: 1,735
Reputation:
Solved Threads: 186
Did I just hear "You gotta help us, Doc. We've tried nothin' and we're all out of ideas" ? Is this you? Dont let this be you! I will put in as much effort as you seem to.
•
•
Join Date: May 2005
Posts: 7
Reputation:
Solved Threads: 0
I want to create an application that could extract any structured data. Kind of a generic parser.
Examples :
- Google results
- CNN news
- Forums
- Engadget
- ...
All these website have structured data. Except all of them are structured diffently. It could be easy to extract data from them using a structured tree of regular expressions.
Examples :
- Google results
- CNN news
- Forums
- Engadget
- ...
All these website have structured data. Except all of them are structured diffently. It could be easy to extract data from them using a structured tree of regular expressions.
![]() |
Other Threads in the C# Forum
- Previous Thread: queue help
- Next Thread: Need of a New IT Project
| Thread Tools | Search this Thread |
.net access ado.net algorithm array barchart bitmap box broadcast buttons c# check checkbox client color combobox control conversion csharp custom database databasesearch datagrid datagridview datagridviewcheckbox dataset datetime degrees development draganddrop drawing encryption enum equation event excel file form format formatting forms function gdi+ httpwebrequest image index input install interface java label list listbox mandelbrot math mouse mouseclick mysql namevaluepairs operator path photoshop picturebox pixelinversion post powerpacks programming property radians regex remote remoting resource restore richtextbox serialization server sleep socket sql statistics stream string table text textbox thread time timer update usercontrol validation visualstudio wait webbrowser windows winforms working wpf xml






