954,536 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

reading duplicate records in text file

hello everyone,

got issue regarding reading duplicate records in a text file...

i need to read the file and look for any duplicates data/keys in the text file and write

them to another file (all the duplicates records)...

how can i do that, in looping...:-/

any help...:)

thanks!

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

That really depends on what you are classifying as a duplicate record. What is your primary key? Is it composite? Can you provide an example of the data?

ztini
Posting Whiz in Training
299 posts since Jan 2011
Reputation Points: 54
Solved Threads: 52
 

I smell homework here... :)

Okay, so start us off. You have a text file. Can you open it and read lines from it, in a loop? Go ahead and do that, just dumping the lines straight to the screen.

That'll make a good first step.

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

heheheh...;)

actually my issue here is how to get the duplicate data in the text file (.txt) and display it.

all i know is displaying the data without the duplicates...

i need to get the duplicate data and put it in a file.

here's my syntax
i have a Scanner an ArrayList and a TreeSet
in the while loop of the scanner i put the arrayList


while (scanner.hasNextLine()) {
String line = scanner.nextLine();
arrayList.add(line);
}

//in the TreeSet i add the arrayList
set.addAll(arrayList);

//i declare iterator here to get the data without the duplicates
Iterator it=set.iterator();
while (it.hasNext()){
System.out.println("print data w/o duplicates "+(String)it.next());
}

--how can i display the duplicates only...

thanks!:)

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

Again what IS a duplicate record?

Take this data for example:
Id Name Title Salary
1 Rodgers, Frank Developer 55,000
2 Smith, Joe Developer 55,000
3 Rodger, Frank Team Lead 55,000

Are records 2 & 3 duplicate b/c they share the same salary as 1?
Is 3 a duplicate of 1 b/c they share the same name?
Is 2 a duplicate of 1 b/c they share the same title?
Are none of them duplicates b/c they have different id numbers?

Solving your homework depends on what a duplicate record actually is---simply saying you need duplicate records is too abstract to code; you need parameters that define the duplication.

ztini
Posting Whiz in Training
299 posts since Jan 2011
Reputation Points: 54
Solved Threads: 52
 
i need to read the file and look for any duplicates data/keys in the text file and write

Can you clarify this? Are you trying to eliminate duplicate lines of data, or do you need to parse the lines into data and keys. That would be an added step.

In any case, if you want to display the duplicates only, maybe you should check each item against the rest of the list when you're reading it into the list. If it's already in the list, do whatever you need to do with it - write it to a file, skip adding it to the list, put it in another list, paint it blue and ship it to Waukegan, whatever you like.

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

actually im not trying to eliminate the duplicates lines, all i need is to put all the duplicates lines to another file...

for instance, in my file "data.txt" i have duplicates lines

AAAAA
BBBBB
BBBBB
CCCCC

i need to copy the [BBBBB] lines to another file...

how should i do that in a loop?..

thanks

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

Maybe you should check each item against the rest of the list when you're reading it into the list. If it's already in the list, do whatever you need to do with it - write it to a file, skip adding it to the list, put it in another list, paint it blue and ship it to Waukegan, whatever you like.

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

Or, if you want a second loop, after you've read everything in to the file, you pretty much have to check each item against each other item. Now you're talking about a generic problem of eliminating duplicated items from a list.
The easiest thing to do is to sort the list and go through it - is this item like the one after it? If so, put it in a second list. Write the second list to the file.

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

i dont know how to display the duplicate lines in a loop, cos whenever i put the arrayList in the loop it only display the whole lines in the 'data.txt'

//it will display the whole lines even it is not duplicate
for (int i=0; i< arrayList1.size(); i++){
System.out.println(arrayList1.get(i));
}

-thanks

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

i just need to display the duplicate lines...how should i do that?:)

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

Yes, that just goes through and for each item in the ArrayList, it prints it.

That's not what you want, though. For each item in the list, you want to check if it's a duplicate of some other item in the list.

Suppose you have a list of Strings:

blueberry
tangerine
apricot
kiwi
durian

and I give you another String:
gorgonzola

How do you check whether it's in the list?

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

hello again,

thanks for replying to this thread i already figure it out how to do it...:)

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

Each time you read an element from the txt file, you start a loop and compare it to all the elements read until that point.

You probably also have to check the location where you dump the duplicates to be free of duplicates. If your data.txt is like this:

AAA
BBB
BBB
BBB

Your duplicates file will probably look like this:
BBB
BBB

...

While writing this in the quick reply window, I realized the thread has 2 pages and saw it's already solved.

Buffalo101
Junior Poster
116 posts since Oct 2009
Reputation Points: 9
Solved Threads: 3
 

the you can (probably) use a some of Set, that doesn't allows duplicates, and you can test it with methods someSet#contains

mKorbel
Veteran Poster
1,141 posts since Feb 2011
Reputation Points: 480
Solved Threads: 224
 

Use a Hashtable to store records as they come in. Looking up whether an element is in a Hashtable is O(1). If hashtable.containsKey(record) returns true, print it to file :)

apolishch
Newbie Poster
5 posts since Jan 2006
Reputation Points: 10
Solved Threads: 1
 

i already figure it out, thanks again! :)

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

[assumin we have already declared arrayList, finalFile,
after adding the lines to an ArrayList<> which i think u can do;

sort them { arrayList.sort() }
arrayList.sort() ;
loop thru the list using for loop then compare value at( i ) and at (i + 1)
finalfile;
//assuming the value are numbers
for(int i = 0; i

gALENA
Newbie Poster
7 posts since Aug 2009
Reputation Points: 10
Solved Threads: 1
 

r0n, do you remember that Disney cartoon of the Sorceror's Apprentice? The one where Mickey Mouse conjures up an endless stream of animated brooms?

I don't know why that came to mind, but maybe you should mark this thread as "closed".

jon.kiparsky
Posting Virtuoso
1,849 posts since Jun 2010
Reputation Points: 383
Solved Threads: 187
 

oh, i forgot to closed..thanks anyway:)

r0n
Newbie Poster
19 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: