I got a string \x3Cb\x3EHello, World\x3C\x2Fb\x3E as a webresponse..i think it means <b>Hello, World</b>
but i don't know how to unescape that sequence into java string..could anyone please help me with this??
Thank you.

Edited 3 Years Ago by cool_zephyr

you can use methods of the java String class to extract hello world out of the string above. the methods are all listed there. i dont know much about web stuff , but feed the above into a string , and apply the methods you think will be best from the list.

your best bet will be regexes imo , if you're familiar with them. if not , and you want to try them out , this may help you.

Edited 3 Years Ago by somjit{}

I did the following..but doesn't seem to work.

String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),
                                                            "US-ASCII");
System.out.println(res);

try this regex pattern
H\w+,\s+W\w+ this is tailor made to get "Hello, World".

it starts with H , looks for all word characters from H (the \w+ part), this stops at the first comma , then include one or many whitespace (via \s+ ) , then do what you did in case of H but with W.

Edited 3 Years Ago by somjit{}

Thank you for the reply..but I need "<b>Hello World</b>" not only "Hello, World", plus the response might contain other sequences beside "<b>Hello World</b>" like "<div id='box'></div>", so writing regex pattern for all of them would be impossible.

I need to convert the escaped hex values into their corresponding ascii..if I do it with regex or String.contains(..), I think I've to write 255 if conditions to map them into appropriate characters..any alternate ideas would be welcome.

Edited 3 Years Ago by cool_zephyr

writing regex pattern for all of them would be impossible

complicated ? yes , impossible : no.
example : <[^>]*> can be used in conjuction with a replace by "" to remove non nested html tags.

.i need to convert the escaped hex values into their corresponding ascii.

i hope you dont plan to check them one by one after that ? thats gonna be quite slow.

Edited 3 Years Ago by somjit{}

ps : Click Here

i couldn't understand half of the things in that post :D..well I think i'm gonna iterate through the json string, convert the hex value into char and push it into a StringBuilder..i'll post if anything goes wrong

thank you.

i don't know how to unescape that sequence into java string..could anyone please help me with this??

That's not how a normal HTML response looks like. If that service is in your control, the first action should be to investigate why you are getting hex encoded HTML entities as opposed to a normal string.

URL encoding/decoding rules are different from the HTML ones hence URL decoder should only be used for decoding URLs. HTML encoding uses ampersand encoding (&) whereas URL encoding uses % encoding.

For HTML decoding, the StringEscapeUtils class from Apache commons should be used.

Comments
good info
try {
            String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),"US-ASCII");
            System.out.println(res);
            int index=0;
            int posHolder=0;
            Pattern regex=Pattern.compile("([\\\\x]([a-zA-Z0-9]{3}))");
            Matcher regexMatcher;
            while(true) {               
                index=res.indexOf("\\x", index);
                if(index<0) {
                    break;
                }

                posHolder=index;
                index+=2;
                String hex=res.substring(index,index+2);
                int decVal=Integer.parseInt(hex, 16);
                regexMatcher=regex.matcher(res.substring(posHolder));
                String result=regexMatcher.replaceFirst(""+(char)decVal);
                String prefix=res.substring(0,posHolder);
                String suffix=result;
                res=prefix+suffix;
                index-=3;                               
            }           
            System.out.println(res);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

i have this till now..it's giving the correct output..thanks everyone for your support..why do i have to put 4 '\' in the Pattern.compile??

Edited 3 Years Ago by cool_zephyr

why do i have to put 4 '\' in the Pattern.compile??

because you have 2 \ in your string , and so , one extra \ for each.

~s.o.s~
Yes, I realised that isn't URL encoded (nor is it HTML encoded - no ampersands), but at first look it seemed like URL with \x instead of %, so I thought maybe there could be an easy solution based on replacing all \x by % and URL decoding it. This works for the given test data:

String s= "\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E";

s = s.replaceAll("\\\\x", "%"); // now it looks like a URL
s = java.net.URLDecoder.decode(s, "UTF-8"); 

// s is now "<b>Hello, World</b>"

Edited 3 Years Ago by JamesCherrill

in java or c , if you want \ in your string , you have to give another \ before it. its the holy rule of the blackslashes . ;)

well i got it..'\x' is itself an escape sequence for hex so we need four backslashes

Not quite. "\x" is not an escape in Java. You get the 4 \ because of the combination of Java and regex. \ is an escape char in both Java and regex, so if you want a literal \ in a regex you have to code it as \\. But then if that's a Java string you have to code each of those \ as \\, giving \\\\.
so
\\\\ in a Java regex string is \\ in the regex, which is a literal regex \

(That's why I have a "\\\\x" in my 2-line solution above - the first String in a replaceAll is a regex.)

Edited 3 Years Ago by JamesCherrill

thanks for the information..that was really a new thing to me :)
could you also tell me about URLEncoder because if i do
URLEncoder.encode("hello world");
it gives hello+world instead of hello%20world

Edited 3 Years Ago by cool_zephyr

According to the rules for URL encoding most non-alphanumerics are replaced by %(hex value), execpt for a space, which is replaced by a "+"

When encoding a String, the following rules apply:

The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte...

Edited 3 Years Ago by JamesCherrill

This question has already been answered. Start a new discussion instead.