Hello,

im currently working on a program to split a string based on symbols only, i need to seperate the string around the symbols whilst keeping the symbols as well eg.

test_String_123_^;

would result in an array containing : [test] [] [String] [] [123] [_] [^]

However although i have achieved it with this code :

exampelstring.split("(?<=\\^)|(?=\\^)|(?<=\\_)|(?=\\_)");

I was wondering if there is a way to achieve this with less code as if i add more symbols then the regex becomes huge have a forward and behind check for each symbol. Is there a way to combine multiple lookaheads and lookbehinds into the same regex or a cleaner way of performing such a split?

Thanks for helping :)

Recommended Answers

All 4 Replies

How about not using split, and, instead, using pattern and matcher with (([^_^]+)?([_^])?)

i.e.

package bogustest;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class BogusTest {
  public static void main(String[] args) {
    String test = "test_String_123_^";
    Pattern p = Pattern.compile("(([^_^]+)?([_^])?)");
    Matcher m = p.matcher(test);
    while (m.find()) {
      if (m.group(1) != null) System.out.println("Match:  " + m.group(1));
      if (m.group(2) != null) System.out.println("substr 1:  " + m.group(2));
      if (m.group(3) != null) System.out.println("substr 2:  " + m.group(3));
    }
  }
}

Edit:
Of course, if you want to keep split you have this
System.out.println(Arrays.asList("test_String_123_^".split("(?=[_^])|(?<=[_^])")));

As a note though, '^' is a special character ONLY when used as the first character within square brackets, and an '_' is NOT a special character, at all, so there was no reason for the \\ occurances in your initial regex.

'^' is a special character ONLY when used as the first character within square brackets

^ means "beginning of the line" outside of square brackets, so it does need to be escaped.

You're right about _ though.

Ach, yeah, idiot. Inside square brackets it does NOT need to be escaped, however (unless it is to be the only character in the brackets, otherwise, simply do not make it the FIRST character within those brackets).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.