SQL Query required to extract unique row based on a relevancy percentage

Question

c.vaibhav 0 Newbie Poster

15 Years Ago

Hi friends,
I am new to SQL..

I have a table with 200 rows and 100 columns..

I want to extract unique rows from this table such that no two row are more than 60% similar.. this 60% is a variable and would be user defined..

I have attached an example file with the raw data.. I have done it in excel.. my logic goes as follows:

1st I compare Row 2 with Row 1 and find out its relevancy.. If it is more than what is desired I discard this row else I add it to an array.. I move on to the next row and compare it with all the rows present in the array.. if the existing row is similar to any of the row present in the array then it is ruled out else it is added into the array..

I actually have to perform this test on a data containing more than 65000 rows and 200 columns so I need an SQL query that will make my work faster..

Thanks for your help..

Regards,
Vaibhav

mssql sql

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

full-raw-listing.zip (60.47 KB)

2 Contributors
1 Reply
129 Views
8 Months Discussion Span
Latest Post 15 Years Ago Latest Post by pclfw

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

pclfw 23 Junior Poster · Answer 1 · 2010-04-07T20:46:35+00:00

Not sure about this one.

Looks like a stored procedure may be required. Or possibly views based on views.

The major problem as I see it is that once you've tested row 1 against all of the others you now have to do the process again starting at row 2, then row three... This is going to get very messy and vvvvveeeeeeerrrrrryyyyyyy ssssssssssllllllllllooooooooooowwwwwwwww.

Are you sure that this is a business requirement or is it just top of an idle wish list?