User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Java section within the Software Development category of DaniWeb, a massive community of 456,521 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,802 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Java advertiser: Lunarpages Java Web Hosting
Views: 544 | Replies: 4
Reply
Join Date: Apr 2006
Posts: 90
Reputation: KimJack is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
KimJack KimJack is offline Offline
Junior Poster in Training

Regex

  #1  
Oct 3rd, 2007
Hello All,

I have working for this simple little thing for the past few days and I am stuck. Can anyone tell me or explain a regex formula that will extract words from xml.

Example:

<person> Sue Smith
<age> 32 </age>
<sex> female </sex>
</person>
<person> John
<child>
<name>Jim</name>
<age> 2</age>
</child>
<age> 45 </age>
<sex> female </sex>
</person>


output:
name: Sue Smith
age: 32
sex: female


name: John
child: Jim
age: 2
Age:45
sex: male

Thanks for any suggestions and input
Last edited by KimJack : Oct 3rd, 2007 at 5:04 pm. Reason: formatting
AddThis Social Bookmark Button
Reply With Quote  
Join Date: May 2007
Location: USA
Posts: 3,090
Reputation: Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold 
Rep Power: 15
Solved Threads: 307
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Posting Sensei

Re: Regex

  #2  
Oct 3rd, 2007
You could use regex to extract all of those things, but really an XML parser would be a lot easier and more appropriate to the task. An XML DOM parser is what I would recommend you use. See the XML tutorial here:http://java.sun.com/xml/tutorial_intro.html and specifically this section XML and the Document Object Model (DOM).
Reply With Quote  
Join Date: Apr 2006
Posts: 90
Reputation: KimJack is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
KimJack KimJack is offline Offline
Junior Poster in Training

Re: Regex

  #3  
Oct 3rd, 2007
As much as I want to use an XML parser, I have to write it myself.

Any suggestions would be great.

Thanks
Reply With Quote  
Join Date: Apr 2006
Posts: 90
Reputation: KimJack is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
KimJack KimJack is offline Offline
Junior Poster in Training

Re: Regex

  #4  
Oct 3rd, 2007
Better yet, I think I will learn more if I just write it without using regex. Any ideas?
Reply With Quote  
Join Date: May 2007
Location: USA
Posts: 3,090
Reputation: Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold Ezzaral is a splendid one to behold 
Rep Power: 15
Solved Threads: 307
Featured Poster
Ezzaral's Avatar
Ezzaral Ezzaral is offline Offline
Posting Sensei

Re: Regex

  #5  
Oct 3rd, 2007
Originally Posted by KimJack View Post
Better yet, I think I will learn more if I just write it without using regex. Any ideas?

Well, if you absolutely cannot use an XML parser (I am assuming because of class assignment restrictions?), then regex is your next best choice. Straight parsing might be possible if you are certain that each data element will always be on a separate line, such as your example above, but if it is not then you should stick with regex.

Even with regex there can be tricky spots with XML because of it's nested nature. The code below will show you what I mean and perhaps give you a starting point to work from
        String in = "<person> Sue Smith "+
                    "<age> 32 </age> "+
                    "<sex> female </sex> "+
                    "</person> "+
                    "<person> John "+
                    "<child> "+
                    "<name>Jim</name> "+
                    "<age> 2</age> "+
                    "</child> "+
                    "<age> 45 </age> "+
                    "<sex> female </sex> "+
                    "</person>";
        Pattern personPattern = Pattern.compile("<person>(.+?)</person>",Pattern.CASE_INSENSITIVE);
        Pattern namePattern = Pattern.compile("<person>(.+?)<",Pattern.CASE_INSENSITIVE);
        Pattern agePattern = Pattern.compile("<age>(.+?)<",Pattern.CASE_INSENSITIVE);
        Matcher personMatcher = personPattern.matcher(in);
        while (personMatcher.find()){
            System.out.println("person match:");
            Matcher nameMatcher = namePattern.matcher(personMatcher.group(0));
            while (nameMatcher.find()){
                System.out.println("Name: "+nameMatcher.group(1));
            }
            Matcher ageMatcher = agePattern.matcher(personMatcher.group(0));
            while (ageMatcher.find()){
                System.out.println("Age: "+ageMatcher.group(1));
            }
        }
If you run that (in a test class main() is fine), you will see that it catches 2 ages for John because there is a <child> element with an <age> tag within his <person> element. Your regex parsing needs to take that into account to separate that age from the child age. Good luck!
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Java Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Similar Threads
Other Threads in the Java Forum

All times are GMT -4. The time now is 4:12 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC