escaped sequence in java

Question

cool_zephyr 7 Junior Poster in Training

11 Years Ago

I got a string \x3Cb\x3EHello, World\x3C\x2Fb\x3E as a webresponse..i think it means <b>Hello, World</b>
but i don't know how to unescape that sequence into java string..could anyone please help me with this??
Thank you.

java

Edited 11 Years Ago by cool_zephyr

4 Contributors
21 Replies
695 Views
10 Hours Discussion Span
Latest Post 11 Years Ago Latest Post by cool_zephyr

All 21 Replies

somjit{} 60 Junior Poster in Training

11 Years Ago

you can use methods of the java String class to extract hello world out of the string above. the methods are all listed there. i dont know much about web stuff , but feed the above into a string , and apply the methods you think will be best from the list.

your best bet will be regexes imo , if you're familiar with them. if not , and you want to try them out , this may help you.

Edited 11 Years Ago by somjit{}

~s.o.s~ 2,560 Failure as a human

11 Years Ago

i don't know how to unescape that sequence into java string..could anyone please help me with this??

That's not how a normal HTML response looks like. If that service is in your control, the first action should be to investigate why you are getting hex encoded HTML entities as opposed to a normal string.

JamesCherrill 4,733 Most Valuable Poster

11 Years Ago

Have a look at the URLDecoder class
http://docs.oracle.com/javase/6/docs/api/java/net/URLDecoder.html

~s.o.s~ 2,560 Failure as a human

11 Years Ago

URL encoding/decoding rules are different from the HTML ones hence URL decoder should only be used for decoding URLs. HTML encoding uses ampersand encoding (&) whereas URL encoding uses % encoding.

For HTML decoding, the StringEscapeUtils class from Apache commons should be used.

somjit{} commented: good info +0

JamesCherrill 4,733 Most Valuable Poster

11 Years Ago

~s.o.s~
Yes, I realised that isn't URL encoded (nor is it HTML encoded - no ampersands), but at first look it seemed like URL with \x instead of %, so I thought maybe there could be an easy solution based on replacing all \x by % and URL decoding it. This works for the given test data:

String s= "\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E";

s = s.replaceAll("\\\\x", "%"); // now it looks like a URL
s = java.net.URLDecoder.decode(s, "UTF-8"); 

// s is now "<b>Hello, World</b>"

Edited 11 Years Ago by JamesCherrill

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

cool_zephyr 7 Junior Poster in Training · Answer 1 · 2013-08-23T06:03:01+00:00

I did the following..but doesn't seem to work.

String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),
                                                            "US-ASCII");
System.out.println(res);

somjit{} 60 Junior Poster in Training Featured Poster · Answer 2 · 2013-08-23T06:29:19+00:00

try this regex pattern
H\w+,\s+W\w+ this is tailor made to get "Hello, World".

it starts with H , looks for all word characters from H (the \w+ part), this stops at the first comma , then include one or many whitespace (via \s+ ) , then do what you did in case of H but with W.

cool_zephyr 7 Junior Poster in Training · Answer 3 · 2013-08-23T06:39:18+00:00

Thank you for the reply..but I need "<b>Hello World</b>" not only "Hello, World", plus the response might contain other sequences beside "<b>Hello World</b>" like "<div id='box'></div>", so writing regex pattern for all of them would be impossible.

I need to convert the escaped hex values into their corresponding ascii..if I do it with regex or String.contains(..), I think I've to write 255 if conditions to map them into appropriate characters..any alternate ideas would be welcome.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 4 · 2013-08-23T06:46:02+00:00

writing regex pattern for all of them would be impossible

complicated ? yes , impossible : no.
example : <[^>]*> can be used in conjuction with a replace by "" to remove non nested html tags.

.i need to convert the escaped hex values into their corresponding ascii.

i hope you dont plan to check them one by one after that ? thats gonna be quite slow.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 5 · 2013-08-23T07:16:15+00:00

somjit{} 60 Junior Poster in Training

11 Years Ago

ps : Click Here

:D

Edited 11 Years Ago by somjit{}

cool_zephyr 7 Junior Poster in Training · Answer 6 · 2013-08-23T07:24:37+00:00

ps : Click Here

i couldn't understand half of the things in that post :D..well I think i'm gonna iterate through the json string, convert the hex value into char and push it into a StringBuilder..i'll post if anything goes wrong

thank you.

somjit{} 60 Junior Poster in Training Featured Poster · Answer 7 · 2013-08-23T07:34:50+00:00

somjit{} 60 Junior Poster in Training

11 Years Ago

yo !

cool_zephyr 7 Junior Poster in Training · Answer 8 · 2013-08-23T08:15:10+00:00

try {
            String res=new String("\\x3Cb\\x3EHello, World\\x3C\\x2Fb\\x3E".getBytes(),"US-ASCII");
            System.out.println(res);
            int index=0;
            int posHolder=0;
            Pattern regex=Pattern.compile("([\\\\x]([a-zA-Z0-9]{3}))");
            Matcher regexMatcher;
            while(true) {               
                index=res.indexOf("\\x", index);
                if(index<0) {
                    break;
                }

                posHolder=index;
                index+=2;
                String hex=res.substring(index,index+2);
                int decVal=Integer.parseInt(hex, 16);
                regexMatcher=regex.matcher(res.substring(posHolder));
                String result=regexMatcher.replaceFirst(""+(char)decVal);
                String prefix=res.substring(0,posHolder);
                String suffix=result;
                res=prefix+suffix;
                index-=3;                               
            }           
            System.out.println(res);
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

i have this till now..it's giving the correct output..thanks everyone for your support..why do i have to put 4 '\' in the Pattern.compile??

somjit{} 60 Junior Poster in Training Featured Poster · Answer 9 · 2013-08-23T08:50:32+00:00

why do i have to put 4 '\' in the Pattern.compile??

because you have 2 \ in your string , and so , one extra \ for each.

cool_zephyr 7 Junior Poster in Training · Answer 10 · 2013-08-23T09:14:05+00:00

"\" => "\" in java isn't it?? so basicaly in my string i have only 1 "\"

somjit{} 60 Junior Poster in Training Featured Poster · Answer 11 · 2013-08-23T10:16:08+00:00

in java or c , if you want \ in your string , you have to give another \ before it. its the holy rule of the blackslashes . ;)

cool_zephyr 7 Junior Poster in Training · Answer 12 · 2013-08-23T14:35:15+00:00

well i got it..'\x' is itself an escape sequence for hex so we need four backslashes

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 13 · 2013-08-23T15:08:07+00:00

Not quite. "\x" is not an escape in Java. You get the 4 \ because of the combination of Java and regex. \ is an escape char in both Java and regex, so if you want a literal \ in a regex you have to code it as \\. But then if that's a Java string you have to code each of those \ as \\, giving \\\\.
so
\\\\ in a Java regex string is \\ in the regex, which is a literal regex \

(That's why I have a "\\\\x" in my 2-line solution above - the first String in a replaceAll is a regex.)

cool_zephyr 7 Junior Poster in Training · Answer 14 · 2013-08-23T15:51:06+00:00

thanks for the information..that was really a new thing to me :)
could you also tell me about URLEncoder because if i do
URLEncoder.encode("hello world");
it gives hello+world instead of hello%20world

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 15 · 2013-08-23T16:03:18+00:00

According to the rules for URL encoding most non-alphanumerics are replaced by %(hex value), execpt for a space, which is replaced by a "+"

When encoding a String, the following rules apply:

The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte...

cool_zephyr 7 Junior Poster in Training · Answer 16 · 2013-08-23T16:07:02+00:00

cool_zephyr 7 Junior Poster in Training

11 Years Ago

thanks for the help..finally done.

escaped sequence in java

Recommended Answers Collapse Answers

All 21 Replies

Recommended Answers