•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the Java section within the Software Development category of DaniWeb, a massive community of 456,521 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,802 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Java advertiser: Lunarpages Java Web Hosting
Views: 544 | Replies: 4
![]() |
•
•
Join Date: Apr 2006
Posts: 90
Reputation:
Rep Power: 0
Solved Threads: 0
Hello All,
I have working for this simple little thing for the past few days and I am stuck. Can anyone tell me or explain a regex formula that will extract words from xml.
Example:
<person> Sue Smith
<age> 32 </age>
<sex> female </sex>
</person>
<person> John
<child>
<name>Jim</name>
<age> 2</age>
</child>
<age> 45 </age>
<sex> female </sex>
</person>
output:
name: Sue Smith
age: 32
sex: female
name: John
child: Jim
age: 2
Age:45
sex: male
Thanks for any suggestions and input
I have working for this simple little thing for the past few days and I am stuck. Can anyone tell me or explain a regex formula that will extract words from xml.
Example:
<person> Sue Smith
<age> 32 </age>
<sex> female </sex>
</person>
<person> John
<child>
<name>Jim</name>
<age> 2</age>
</child>
<age> 45 </age>
<sex> female </sex>
</person>
output:
name: Sue Smith
age: 32
sex: female
name: John
child: Jim
age: 2
Age:45
sex: male
Thanks for any suggestions and input
Last edited by KimJack : Oct 3rd, 2007 at 5:04 pm. Reason: formatting
You could use regex to extract all of those things, but really an XML parser would be a lot easier and more appropriate to the task. An XML DOM parser is what I would recommend you use. See the XML tutorial here:http://java.sun.com/xml/tutorial_intro.html and specifically this section XML and the Document Object Model (DOM).
•
•
•
•
Better yet, I think I will learn more if I just write it without using regex. Any ideas?
Well, if you absolutely cannot use an XML parser (I am assuming because of class assignment restrictions?), then regex is your next best choice. Straight parsing might be possible if you are certain that each data element will always be on a separate line, such as your example above, but if it is not then you should stick with regex.
Even with regex there can be tricky spots with XML because of it's nested nature. The code below will show you what I mean and perhaps give you a starting point to work from
String in = "<person> Sue Smith "+
"<age> 32 </age> "+
"<sex> female </sex> "+
"</person> "+
"<person> John "+
"<child> "+
"<name>Jim</name> "+
"<age> 2</age> "+
"</child> "+
"<age> 45 </age> "+
"<sex> female </sex> "+
"</person>";
Pattern personPattern = Pattern.compile("<person>(.+?)</person>",Pattern.CASE_INSENSITIVE);
Pattern namePattern = Pattern.compile("<person>(.+?)<",Pattern.CASE_INSENSITIVE);
Pattern agePattern = Pattern.compile("<age>(.+?)<",Pattern.CASE_INSENSITIVE);
Matcher personMatcher = personPattern.matcher(in);
while (personMatcher.find()){
System.out.println("person match:");
Matcher nameMatcher = namePattern.matcher(personMatcher.group(0));
while (nameMatcher.find()){
System.out.println("Name: "+nameMatcher.group(1));
}
Matcher ageMatcher = agePattern.matcher(personMatcher.group(0));
while (ageMatcher.find()){
System.out.println("Age: "+ageMatcher.group(1));
}
}![]() |
•
•
•
•
•
•
•
•
DaniWeb Java Marketplace
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
- Matching multiple href="Something.aspx" using RegEx (ASP.NET)
- regex question (Perl)
- regex (boost) (C++)
- Getting the start / end of string in regex through match objects (Python)
- Regex in Java (Java)
- Regex for password (Shell Scripting)
- Regex in C++ (C++)
- help with regex...and marking up text in JTextpane with html (Java)
Other Threads in the Java Forum
- Previous Thread: How to create a setup file 4 my project??
- Next Thread: hi i need coding of this question can any body help me



Linear Mode