Hi friends,
I am new to SQL..

I have a table with 200 rows and 100 columns..

I want to extract unique rows from this table such that no two row are more than 60% similar.. this 60% is a variable and would be user defined..

I have attached an example file with the raw data.. I have done it in excel.. my logic goes as follows:

1st I compare Row 2 with Row 1 and find out its relevancy.. If it is more than what is desired I discard this row else I add it to an array.. I move on to the next row and compare it with all the rows present in the array.. if the existing row is similar to any of the row present in the array then it is ruled out else it is added into the array..

I actually have to perform this test on a data containing more than 65000 rows and 200 columns so I need an SQL query that will make my work faster..

Thanks for your help..


Not sure about this one.

Looks like a stored procedure may be required. Or possibly views based on views.

The major problem as I see it is that once you've tested row 1 against all of the others you now have to do the process again starting at row 2, then row three... This is going to get very messy and vvvvveeeeeeerrrrrryyyyyyy ssssssssssllllllllllooooooooooowwwwwwwww.

Are you sure that this is a business requirement or is it just top of an idle wish list?