Hello guys!
I need some help... I'm new in Java programming and I'm dealing with in issue. I'm at a step of my project that keeps stoping me from going further. I have some xml files that contains diacritics and I need to read them and extract the tag's that I need. Here's an example of my xml files:

<?xml version='1.0'?>
<entry>
<word>VAGABONDAJ</word>
<pos>s. n. </pos>
<year>1837</year>
<etim> fr. vagabondage, it. vagabondaggio. </etim>
</entry>

From this I extracted the <word>, the <year> and the most important <etim> (I did this with REGEX). If a word has french etymology or italian etymology my program searches in some html pages (online dictionaries, http://www.cnrtl.fr/etymologie and http://www.sapere.it) who are next processed (also with REGEX) and I take further the etymology field of the french and the italian word. All those things (the word in romanian, the year, his etymology, and next the eymology of the french and italian words) I want to put them in a .jsp page. The big problem is that when my program searches in those pages he puts instead of words with accents (or diacritics) some codes that I cannot get rid of them. I've setted all that I could with UTF-8 encoding (when reading pages, when writing pages in my jsp - the same problem in the console)...but still nothing.
Do any of you has a clue for me? If you need some code let me know.
Thanks!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.