I got a string \x3Cb\x3EHello, World\x3C\x2Fb\x3E as a webresponse..i think it means <b>Hello, World</b>
but i don't know how to unescape that sequence into java string..could anyone please help me with this??
Thank you.

Recommended Answers

All 21 Replies

you can use methods of the java String class to extract hello world out of the string above. the methods are all listed there. i dont know much about web stuff , but feed the above into a string , and apply the methods you think will be best from the list.

your best bet will be regexes imo , if you're familiar with them. if not , and you want to try them out , this may help you.

I did the following..but doesn't seem to work.

String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),
                                                            "US-ASCII");
System.out.println(res);

try this regex pattern
H\w+,\s+W\w+ this is tailor made to get "Hello, World".

it starts with H , looks for all word characters from H (the \w+ part), this stops at the first comma , then include one or many whitespace (via \s+ ) , then do what you did in case of H but with W.

Thank you for the reply..but I need "<b>Hello World</b>" not only "Hello, World", plus the response might contain other sequences beside "<b>Hello World</b>" like "<div id='box'></div>", so writing regex pattern for all of them would be impossible.

I need to convert the escaped hex values into their corresponding ascii..if I do it with regex or String.contains(..), I think I've to write 255 if conditions to map them into appropriate characters..any alternate ideas would be welcome.

writing regex pattern for all of them would be impossible

complicated ? yes , impossible : no.
example : <[^>]*> can be used in conjuction with a replace by "" to remove non nested html tags.

.i need to convert the escaped hex values into their corresponding ascii.

i hope you dont plan to check them one by one after that ? thats gonna be quite slow.

ps : Click Here

:D

ps : Click Here

i couldn't understand half of the things in that post :D..well I think i'm gonna iterate through the json string, convert the hex value into char and push it into a StringBuilder..i'll post if anything goes wrong

thank you.

yo !

i don't know how to unescape that sequence into java string..could anyone please help me with this??

That's not how a normal HTML response looks like. If that service is in your control, the first action should be to investigate why you are getting hex encoded HTML entities as opposed to a normal string.

URL encoding/decoding rules are different from the HTML ones hence URL decoder should only be used for decoding URLs. HTML encoding uses ampersand encoding (&) whereas URL encoding uses % encoding.

For HTML decoding, the StringEscapeUtils class from Apache commons should be used.

commented: good info +0
try {
            String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),"US-ASCII");
            System.out.println(res);
            int index=0;
            int posHolder=0;
            Pattern regex=Pattern.compile("([\\\\x]([a-zA-Z0-9]{3}))");
            Matcher regexMatcher;
            while(true) {               
                index=res.indexOf("\\x", index);
                if(index<0) {
                    break;
                }

                posHolder=index;
                index+=2;
                String hex=res.substring(index,index+2);
                int decVal=Integer.parseInt(hex, 16);
                regexMatcher=regex.matcher(res.substring(posHolder));
                String result=regexMatcher.replaceFirst(""+(char)decVal);
                String prefix=res.substring(0,posHolder);
                String suffix=result;
                res=prefix+suffix;
                index-=3;                               
            }           
            System.out.println(res);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

i have this till now..it's giving the correct output..thanks everyone for your support..why do i have to put 4 '\' in the Pattern.compile??

why do i have to put 4 '\' in the Pattern.compile??

because you have 2 \ in your string , and so , one extra \ for each.

"\" => "\" in java isn't it?? so basicaly in my string i have only 1 "\"

~s.o.s~
Yes, I realised that isn't URL encoded (nor is it HTML encoded - no ampersands), but at first look it seemed like URL with \x instead of %, so I thought maybe there could be an easy solution based on replacing all \x by % and URL decoding it. This works for the given test data:

String s= "\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E";

s = s.replaceAll("\\\\x", "%"); // now it looks like a URL
s = java.net.URLDecoder.decode(s, "UTF-8"); 

// s is now "<b>Hello, World</b>"

in java or c , if you want \ in your string , you have to give another \ before it. its the holy rule of the blackslashes . ;)

well i got it..'\x' is itself an escape sequence for hex so we need four backslashes

Not quite. "\x" is not an escape in Java. You get the 4 \ because of the combination of Java and regex. \ is an escape char in both Java and regex, so if you want a literal \ in a regex you have to code it as \\. But then if that's a Java string you have to code each of those \ as \\, giving \\\\.
so
\\\\ in a Java regex string is \\ in the regex, which is a literal regex \

(That's why I have a "\\\\x" in my 2-line solution above - the first String in a replaceAll is a regex.)

thanks for the information..that was really a new thing to me :)
could you also tell me about URLEncoder because if i do
URLEncoder.encode("hello world");
it gives hello+world instead of hello%20world

According to the rules for URL encoding most non-alphanumerics are replaced by %(hex value), execpt for a space, which is replaced by a "+"

When encoding a String, the following rules apply:

The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte...

thanks for the help..finally done.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.