I think that, if possible, the data could be read in chunks of, say, 2K rows. Each time a chunk is read, first look for duplicates inside the chunk and, then, compare to the other 2M – 2K. If a row is present in the chunk of 2K and in the reminding 1,998,000 rows, discard from the chunk. Supposing there are 2K- N duplicates left in the chunk, proceed to write into a text file formatted in CSV, for example. Continue with another chunk of 2K rows in the same manner, appending the unique email from the second 2k rows to the text file with the first chunk. This will require 1000 chunks to be treated and examined by Regex (1000x2000=2M).
pbj.codez commented: Exactly what I needed to see. Thank you again =D +0
pbj.codez commented: Thank you for the help. +0