Hi there, I am having some trouble with a regex and am hoping on one of you great folks can assist me. I am very weak with regular expressions just to give you guys a heads up.

Here is the text I am using a regex on:

Control flow         6881            Server           0.01 Kbps          5089.59 Kbps     
BitTorrent           6881            Server           0.01 Kbps          6963.48 Kbps     
Control flow         6881            Server           0.01 Kbps          6653.03 Kbps     
BitTorrent           47649           Server           0.01 Kbps          6033.00 Kbps     
Control flow         47649           Server           0.01 Kbps          6432.94 Kbps     
BitTorrent           47649           Server           0.01 Kbps          6014.99 Kbps 

Here is the regex I am using that grabs the download speed at the end:

("([0-9]{1,5}\\.[0-9]{1,2})");

I need to change this regex/create new regex to grab the download speed at the end, but only on the lines that start with BitTorrent.

Here is what I tried:

("[^BitTorrent]+([0-9]{1,5}\\.[0-9]{1,2})");

However that regex is giving me an error later on in my code when I go to convert the string to a double.

If anyone can give me any help I would greatly appreciate it. Also if you need me to give any more information just let me know.

Bonus points - If anyone could show me a regex that will grab the download speed at the end on lines that contain 6881 I would be stoked.

Recommended Answers

All 12 Replies

There are too many ways to do this. Is that the content in each line?

("[^BitTorrent]+([0-9]{1,5}\.[0-9]{1,2})");

Your regex is incorrect when you use [] in regex because anything inside [] means "or" for any character inside it. However, the symbol ^ inside [] means "not" and that means you are looking for any combination of i|T|o|r|e|n|t but not 'B'.

"^.+(\\d+\.\\d+)\\s*Kbps\\s*$" means anything from start but group only the end digits with precision value and end with Kbps.

The one with "BitTorrent" could be "^Bit.+(\\d+\.\\d+)\s*\Kbps\\s*$"

The one with 6881 could be "^[A-Za-z\\s]+6881\\s+.+(\\d+\.\\d+)\\s*\Kbps\\s*$"

PS: I believe regex string in Java needs double backslash for escaping?

Actually, [^BitTorrent] means any letter that is none of B, i, t, T, o, r, e, or n. It's surprising that deadsolo somehow stumbled onto something so contrary to the solution to the problem even though deadsolo must know how [] works.

It seems to me that the best solution would be just to abandon anything fancy and do it in a brute force way, like this:

^BitTorrent\\s+\\d+\\s+\\w+\\s+\\d*\\.\\d* Kbps\\s+(\\d*\\.\\d*) Kbps$

We have exactly five columns of data, so just have a part of the regular expression for each column. Once you start doing it that way, then things like looking for lines containing 6881 become easy:

^.+\\s+6881\\s+\\w+\\s+\\d*\\.\\d* Kbps\\s+(\\d*\\.\\d*) Kbps$

Thanks for the quick responses guys/gals.

Taywin - I had to add extra backslashes for escaping. When I try your regex:

"^Bit.+(\\d+\\.\\d+)\\s*\\Kbps\\s*$"

I get an error:
java.util.regex.PatterSyntaxException:
null (in java.util.regex.Pattern)

bguild - when I try your regex:

^BitTorrent\\s+\\d+\\s+\\w+\\s+\\d*\\.\\d* Kbps\\s+(\\d*\\.\\d*) Kbps$

I am not getting any matches.

When I try:

^BitTorrent\\s+\\d+\\s+\\w+\\s+\\d*\\.\\d* Kbps\\s+([0-9]{1,5}\\.[0-9]{1,2})

my string variable becomes the whole line

BitTorrent           6881            Server           0.01 Kbps          6963.48 Kbps

I just want the 6963.48 part. Do you guys have any more ideas/suggestions? I really appreciate all the help.

let me know

is it nessasary to use regular expression concept to get your actual output?

may i suggest any alternate solution to your requirement?

isn't it good to use split() for this requirement?

is it nessasary to use regular expression concept to get your actual output?
may i suggest any alternate solution to your requirement?
isn't it good to use split() for this requirement?

What do you think split() uses?

Here's its signature:
public String[] split(String regex)

API:
Splits this string around matches of the given regular expression.

I am wrong about explanation. Thanks to bguild. Anyway,bguild regex doesn't work because of the \w. The short hand includes digit numbers as well. In other words, it includes [A-Za-z0-9_] and the match, by default, is greedy and will attempt to match to the end, so it does not see the white space char but rather a period char. It won't back track the search so it returns not found.

/*
with ^BitTorrent\\s+\\d+\\s+\\w+\\s+

BitTorrent           6881            Server           0.01 Kbps          6963.48 Kbps
^-----------------------------------------------------^

Then there is no \\s+ because it found character . instead.
*/

"^Bit.+(\\d+\.\\d+)\s\\Kbps\\s$"

Did I add too many back slash? The 2 backslash before K aren't supposed to be there because there is no \K.

@tux4life, it is a good suggestion. I would go on that route too but the OP wants regex for a line, so I am not going to stop him/her.

@tux4life, it is a good suggestion. I would go on that route too but the OP wants regex for a line, so I am not going to stop him/her.

Perhaps you misunderstood the point I was trying to make.

radhakrishna.p said:

is it nessasary to use regular expression concept to get your actual output?

From what follows it seems like implying that split() doesn't make use of regular expressions - which it does - that's the point I wanted to make.

I am a pretty new programmer and have never herd of until now split(). So I would not quite know how to use it for this issue without a bit more assistance. I tried googling it but can not understand how I would implement it.

Taywin - My IDE tells me that I have to have 2 backslashes everywhere.

Anyway,bguild regex doesn't work because of the \w. The short hand includes digit numbers as well. In other words, it includes [A-Za-z0-9_] and the match, by default, is greedy and will attempt to match to the end, so it does not see the white space char but rather a period char. It won't back track the search so it returns not found.

That can't be right. For one thing, there is no period char anywhere near the \\w+ in the pattern, and \\w+ certainly won't cross whitespace. On top of that, I tested the regular expressions I posted before posting them and they work fine. I suspect there may be some whitespace at the end of the data that deadsolo is using.

yes there is some whitespace after the final Kbps

hai deadsolo,

i tried the solution for your requirement by using split(" +")

i dont know whether this is an exact solution for your requirement or not. because iam not considering any other issues like( perfomance isssue ... etc) to give this solution

i know this is not the correct way for what you are looking for but its giving a way for the solution

i apologies to all the experts/guys over here who are giving solutions for this problem

File f = new File("Your_data_file_location here");
BufferedReader br = new BufferedReader(new FileReader(f));
String strt = "";

while((strt = br.readLine()) != null){                

     if(strt.startsWith("BitTorrent")){

         System.out.println(strt);
         String[] str_toks = strt.split(" +");
        // for (int i = 0; i < str_toks.length; i++) {
        //            String string = str_toks[i];
        // }
         System.out.println("your desired value :"+str_toks[str_toks.length-2]);
}
}
br.close();

explanation of my code:
1.i have taken your data in a file
2. then i read the file data line by line
3. then i checks if the line starts with (BitTorrent) or not
if yes i split the line using the above regular expression

4.based on our requiement i get the value from the array of spllitted line

thats it

let me know whether this is right approach or not

i will wait for all your valuable comment's

note: i dont have much experience in java

Thank you radha krishna .p!

Using this line:

if(strt.startsWith("BitTorrent")){

I am able to do half my project.

Later I will need to figure out how to extract the speed on lines specifically with 6881 and specifically 47649. But I will work with what I have for now. I'll do some more of my own research before I get to that part. If I need to I will make a new thread about it.

Thanks for everyone who tried to help me out, I really appreciate it. This forum bored is great!!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.