Hey people. I need someone to tell me what they think of this.
I've been trying to write this small java program to search for mobile phone numbers in a document.
The document is a .csv file, but thats not too relevant.
The numbers are Irish mobile numbers in the format 08[3,5,6,7,8]1234567. so ten digits, eg 0868278402. Thats not my number btw.

Anyway so in a document which contains at least 5000 numbers (i estimate), the result of the scan is only about 3500. Its important that I get all the numbers, or as many as possible.
So I need help.

Here is the code:

class checker extends Thread
{
	String ID = null;
	private boolean goGo = true;
	public lineBox linebox;
	public isolatedUseTools IUTools;
	char[] number;
	char[] numberDone;
	String[] Pt;
	String woow;
	boolean newNumber;
	boolean hasNum;

	public checker(lineBox linebox, isolatedUseTools IUTools)
	{
		this.IUTools = IUTools;
		this.linebox = linebox;
		number = new char [10];
		numberDone = new char [10];
	}

	public void setID(String ID)
	{
		this.ID = ID;
	}

	public static boolean isNumber(char [] number)
	{
		for(int i = 0; i < 10; i++)
		{
			if(number[i] != '0'&& number[i] != '1'&& number[i] != '2'&& number[i] != '3'){
				if(number[i] != '4'&& number[i] != '5'&& number[i] != '6'){
					if(number[i] != '7'&& number[i] != '8'&& number[i] != '9'){
						return false;
					}
				}
			}
			if(number[i] == '"'|| number[i] == ','|| number[i] == '/'|| number[i] == '-'||
				number[i] == '\\' || number[i] == ':'){
				return false;
			}
		}
		return true;
	}

	public int processLine(String woow)
	{
		hasNum = false;
		newNumber = false;
		char [] blankNum = new char [10];
		number = blankNum;

		if(woow != null)
		{
			for(int i=1; i<woow.length()-13; i++)
			{
				if(woow.charAt(i)=='0' && woow.charAt(i+1)=='8' && i+10<woow.length())
				{
					if(woow.charAt(i+2)=='3'||woow.charAt(i+2)=='5'||
						woow.charAt(i+2)=='6'||woow.charAt(i+2)=='7'||woow.charAt(i+2)=='8')
					{
						int a = 0;
						int b = 0;
						while(a<10){
							if(woow.charAt(i+b) == ' ' || woow.charAt(i+b) == '-'){
								a--;
							}
							else {
								number[a] = woow.charAt(i+b);
							}
							a++;
							b++;
						}
						if(isNumber(number) == true)
						{
							newNumber = true;
						}
					}
				}
				else if(woow.charAt(i) == '3' && woow.charAt(i+1) == '5' && woow.charAt(i+2) == '3' && i+13 < woow.length())
				{
					number[0] = '0';
					int a = 1;
					int b = i + 3;
					while(a<10)
					{
						if(woow.charAt(b) == ' ' || woow.charAt(b) == '-'){
							a--;
						}
						else {
							number[a] = woow.charAt(b);
						}
						a++;
						b++;
					}
					if(isNumber(number) == true)
					{
						newNumber = true;
					}
				}
				else if(woow.charAt(i) == '8' && i+13 < woow.length())
				{
					if(woow.charAt(i+1)=='6'||woow.charAt(i+1)=='7'||
						woow.charAt(i+1)=='5'||woow.charAt(i+1)=='3'||woow.charAt(i+1)=='8')
					{
						number[0] = '0';
						int a = 1;
						int b = 0;
						while(a<10)
						{
							if(woow.charAt(i+b) == ' ' || woow.charAt(i+b) == '-'){
								a--;
							}
							else{
								number[a] = woow.charAt(i+b);
							}
							a++;
							b++;
						}
						if(isNumber(number)){
							newNumber = true;
						}
					}
				}
				if(newNumber && isNumber(number) && number != numberDone)
				{
					IUTools.outNumber(number);
					numberDone = number;
					newNumber = false;
					hasNum = true;
					i = i+9;
					System.out.println(number);
				}
			}
		}
		return 0;	
	}

	public void run()
	{
		String line="";
		while(goGo)
		{
			hasNum = false;
			IUTools.hasNum = false;
			newNumber = false;

			line = linebox.takeLine();
			if(line.compareTo("yoke~~~~~~yoke") == 0){
				goGo = false;
			}
			else{
				if(line.length()>13)
				{
					int waitForThisMethod = processLine(line);
					hasNum = IUTools.hasNum;
					newNumber = IUTools.newNum;
					if(line!=null && !hasNum){
						IUTools.outAddress(line);
					}
				}
			}
			IUTools.newNum = false;
		}
	}
}

Numbers can sometimes have spaces in them that need to be omitted. And sometimes the first 0 is missing, and they appear as 868242602. and sometimes they might be in the form +353868242602.

Recommended Answers

All 11 Replies

Write a separate method that takes a String as argument and removes all spaces:
Input: +0 69 34 55 --> Result: +0693455

And sometimes the first 0 is missing, and they appear as 868242602. and sometimes they might be in the form +353868242602.

Then another method that takes the above result and:
If it starts with: +353 remove it from the String
Then if 0 is missing added to the String.
Then check if the result is a number (long)
Then if it has length 10
Then if it starts with 08
Then if the 3rd character is [3,5,6,7,8]

And now you are sure that what you read is a number.

And use that ONE method to check every line in the file

Hey, thanks for the ideas.
The thing is the number will be contained in a much bigger line, so it can be surrounded by any amount of useless information on either side.
In order to successfully extract the number, I need to check for all these things: +353, 86 etc, as I scan through the line. So what of benefit would it be to disregard this info (whether it starts 0, 3, or 8) once the number has been extracted, only return to it again in a seperate method?

Please explain further. I'm sure i'm missing your point.

Can you provide an example of what the lines might look like?

This is a segment of line:

6,"Mr","Mzzzzd","Azzzzzl","AzzzzRI M",4999999,,02/09/82,19975,"MOD",,"086 3211514",,,,,,16/05/05,,30/03/05,,00:00:00,00:00:00,,,,,25/06/07,"male",,,,"0999996a 11/04",,,0,"74 Bazzzzn Road","Rzzzzzzzzzm","Dzzz 9",,"Mr Azzzzzl",,,,,,,"his","him","he","Dzzzzy",20/02/01,"Tzzzzy",16:20:00,00:10:00,,,,,,,,,,,

From what I see they are comma separated, so use StringTokenizer or String.split() to separate into its elements and check each one of them if they are phone numbers or not

Regardless of whether the elements have been separated by a split, regular expressions are the easiest way to provide a match pattern for the phone number that allows for the variability that was described.

Regardless of whether the elements have been separated by a split, regular expressions are the easiest way to provide a match pattern for the phone number that allows for the variability that was described.

Then follow Ezzaral suggestion if you are not familiar with none of the two recommendations

I had a look at the sun tutorials on regular expressions, and it does look great. Good call.
The only thing is I have to learn it, which sucks, time wise. But it can be done.
I'll try the other ideas on my way towards regular expressions and see if there's a significant increase in the amount of numbers found.
Thanks for the suggestions!

I tried removing all the space chars and stuff before processing, and it made no difference.

The problem remains: I've figured out that the count reads 6196, but the output is of 5011 (numbers). Don't understand why.

I have found my problem, or at least the main one.
I had to close the PrinterWriter and all properly. I had the method to do it, but i never called it. This resulted in a bunch of numbers missing from the end of the file. 6196 - 5011. 1185. Not sure why that amount exactly.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.