0

Hi all,

I have a csv that has over 800,000 records that gets loaded into my db via SSIS package routines. However, there are about 190 records that I do not want to be loaded into the the database. The csv comes from an external source and is unaltered at package runtime. In order to not load these records, I made a table to control load variables but I'm not sure how to use it nor do I know if it is the right approach. Basically my thoughts are:`

 if a record identifier from the csv file matches the record identifier from the Exclusion table, 
 exclude it, 
 else load db. 

If someone can set me straight on a proper approach I would truly appreciate it.

2
Contributors
6
Replies
9
Views
4 Years
Discussion Span
Last Post by BitBlt
1

There are several approaches you could take. It all depends if you want to exclude the rows before the load. Given the small proportion of rows to exclude, I would suggest you just go ahead and load the entire CSV into the database, then as a next step, join that table to your exclusion list and delete.

The statement might look something like this:

DELETE x
FROM dbo.my_Loaded_CSV_Data_Table x
INNER JOIN dbo.my_Exclusion_Table y
on x.myTableId = y.excludeId

That way you don't take the performance hit of doing a lookup inside the SSIS package.

Votes + Comments
Awesome suggestion, thank you so much!
0

Thank you so much BitBit. I'm going to give your suggestion a go and see the logic of the statement. I will be forever learning and I am so glad that there are sites like this to aid one's learning.

0

Hi BitBit, I know this is just a staging table but I am not wanting to delete rows that shouldn't be deleted. I've used the following to select the rows in the staging table that shouldn't be there

select  x.vSeries_Geography,x.vSeries_Type
from dbo.tblCPIStaging x
right outer join dbo.tblVSeries_Exclusion y
on x.vSeries_Geography=y.Exclusion_GEO
and x.vSeries_Type = ltrim(rtrim(y.Exclusion_Type))

Does this seem right? If I understand my revision, I am selecting all geography and type records from tblCPIStaging that match the geography and type records from tblVSeries_Exclusion. If I use an inner join, it seems that the same number of rows are selected as well.

So, do you suppose the following would work right?

DELETE x
FROM dbo.tblCPIStaging x
INNER JOIN d bo.tblVSeries_Exclusion y
ON  x.vSeries_Geography=y.Exclusion_GEO
AND x.vSeries_Type = ltrim(rtrim(y.Exclusion_Type))
1

Yes, you did great! The good news is that since it was a staging table, even if you'd completely trashed it, you could still just reload from the original source.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.