0

Hi all,

This is going to be difficult for me to describe but I'm going to give it a go!

I have tables called tblCPI, tblCPI_Staging, & tblVSeriesList. Please see the diagram.
Diagram

Right now I am using SSIS 2008 to get data from Statistics Canada and load the staging table. I need to do a couple of things though. First, I need a statement that fills the vSeries_ID in the tblCPIStaging table with the vSeries_ID from the tblVSeriesList where vSeries_Geography = vSeries_Geography and vSeries_Type = vSeries_Type from both tables. I have been playing around with select statements and nested select statements but to no avail. My understanding of SQL syntax is decent but not good enough to figure this out.

From there, I need the tblCPISC to be filled accordingly from the staging table. Again, I don't really know how I should approach this.

Can someone swing their fist at me in the appropriate direction to knock some sense into me? I would really appreciate it!

2
Contributors
13
Replies
14
Views
5 Years
Discussion Span
Last Post by Stuugie
Featured Replies
  • 1

    Seems that on the surface your problem doesn't sound like it should be giving you difficulties. Questions: 1. Is the combination of vSeries_Geography and vSeries_Type from the staging table enough to uniquely identify one row in the table tblVseriesList? 2. Is there a special reason you have created referential integrity … Read More

  • 1

    To begin with, in order to compare values between the tables in a single select, you have to join them. So, perhaps something like this would be better: SELECT a.vSeries_ID FROM dbo.tblCPIStaging a INNER JOIN dbo.tblVSeriesList b ON a.vSeries_Geography = b.vSeries_Geography AND a.vSeries_Type = b.vSeries_Type If this select works, it's … Read More

  • 1

    I usually use something like: LTRIM(RTRIM(myStringColumn)) to take care of both leading and trailing spaces. You can look for unprintable characters by looking at the LENGTH(myStringColumn) function. Here's a little "science experiment" script you can use to demonstrate how this works: declare @myVarchar varchar(25) declare @myNVarchar nVarchar(25) select @myVarchar = … Read More

1

Seems that on the surface your problem doesn't sound like it should be giving you difficulties. Questions:
1. Is the combination of vSeries_Geography and vSeries_Type from the staging table enough to uniquely identify one row in the table tblVseriesList?
2. Is there a special reason you have created referential integrity between the two tables? Usually for a staging table it's better left off so you don't run into "bad data" issues and performance issues.
3. What have you already tried? Please post.
4. What error conditions or other unwanted results are you seeing?
5. What do you expect your final result to look like? Some sample data would be very helpful here.

Also, a narrative of how your load goes would be good (as in, what SSIS transformations are you using, and in what order). Which table starts out empty, what order do your steps progress, etc.

Edited by BitBlt

0

Thanks for the reply BitBit,
1. Yes, they are enough to uniquely identify one row in the tblVSeriesList
2. There's no special reason other than I thought it may be appropriate. I will reverse the idea.
3. I have only tried to retrieve the row with the combination from question 1 without success.
4. The unwanted results are returns with no values. I think it is the way I am trying to retrieve the values in the where clause. My last attempt looked similar to:

SELECT vSeries_ID 
FROM tblVSeriesList
WHERE (dbo.tblCPIStaging.vSeries_Geography = dbo.tblVSeriesList.vSeries_Geography AND dbo.tblCPIStaging.vSeries_Type = 
dbo.tblVSeriesList.vSeries_Type)

I know this is wrong but I am confused by why!
5. I expect to be able to load the vSeries_ID, vSeries_Value, & vSeries_Ref_Date into the table tblCPISC.

As for a narrative,
1. I query a table called tblRelease_Dates to determine if there is an update from the source.
2. If yes, download a zipped file from the source
3. Unzip it
4. Delete the zip file
5. Use a script task to figure out which table is being updated
6. Once it has been figured out, run a Data Flow Task that connects to the flat file (csv) and load the staging table via OLE DB Destination Editor
7. Delete the csv

SSISLoad

Data_Flow_CPI

1

To begin with, in order to compare values between the tables in a single select, you have to join them. So, perhaps something like this would be better:

SELECT a.vSeries_ID 
FROM dbo.tblCPIStaging a
INNER JOIN dbo.tblVSeriesList b
ON a.vSeries_Geography = b.vSeries_Geography 
AND a.vSeries_Type = b.vSeries_Type

If this select works, it's simple to rearrange a little and turn it into an update statement:

UPDATE a
SET a.vSeries_ID = b.vSeries_ID
FROM dbo.tblCPIStaging a
INNER JOIN dbo.tblVSeriesList b
ON a.vSeries_Geography = b.vSeries_Geography 
AND a.vSeries_Type = b.vSeries_Type

If it doesn't work properly, one thing that jumps out at me is that the datatypes and lengths of the columns you compare are different...for example, vSeriesGeography on one table is varchar(100) and the other nvarchar(50). Are there perhaps unprintable characters or appended spaces? Unicode that didn't translate well? You may wish to include a conversion or scrubbing transformation in there somewhere.

Anyway, the other caveat is that if any of your staging data doesn't match up with the "decode" data in tblVSeriesList, then you'll have staging data that doesn't get completely populated. You'll want to test for that further downstream in your SSIS package.

Hope this helps! Good luck!

Edited by BitBlt

0

Alright, I fixed the datatypes, I meant to do that already but I was having truncation errors before and for some reason, changing the datatypes in the staging table was suggested and it worked. That aside, if I try:

SELECT a.vSeries_ID
FROM dbo.tblVSeriesList AS a
INNER JOIN
dbo.tblCPIStaging AS b ON
a.vSeries_Geography = b.vSeries_Geography

I get vSeries_ID values returned. If I try:

SELECT a.vSeries_ID
FROM dbo.tblVSeriesList AS a
INNER JOIN
dbo.tblCPIStaging AS b ON
a.vSeries_Type = b.vSeries_Type

I get vSeries_ID values returned. If I try:

SELECT a.vSeries_ID
FROM dbo.tblVSeriesList AS a
INNER JOIN
dbo.tblCPIStaging AS b ON
a.vSeries_Geography = b.vSeries_Geography AND
a.vSeries_Type = b.vSeries_Type

I do not get any values returned.

0

Correction: some of the vSeries_Type values have leading whitespaces which I think is what is causing the problems. How do I deal with such things? I though SQL Server ignored leading whitespaces?

0

Coolness! This does the trick for your first select statement:
`

select a.vSeries_ID
from dbo.tblVSeriesList as a
inner join
dbo.tblCPIStaging b
on b.vSeries_Type = LTRIM(a.vSeries_Type)
and b.vSeries_Geography=a.vSeries_Geography

`

1

I usually use something like: LTRIM(RTRIM(myStringColumn)) to take care of both leading and trailing spaces. You can look for unprintable characters by looking at the LENGTH(myStringColumn) function. Here's a little "science experiment" script you can use to demonstrate how this works:

declare @myVarchar varchar(25)
declare @myNVarchar nVarchar(25)

select @myVarchar = 'No Trailing Stuff',
@myNVarchar = 'No Trailing Stuff'

select @myVarchar, LEN(@myVarchar), @myNVarchar, LEN(@myNVarchar)

select @myVarchar = 'Trailing Stuff   ',
@myNVarchar = 'Trailing Stuff   '

select @myVarchar, LEN(@myVarchar), ltrim(rtrim(@myNVarchar)), LEN(ltrim(rtrim(@myNVarchar)))

select @myVarchar = 'No Trailing Stuff',
@myNVarchar = 'No Trailing Stuff' + CHAR(11)

select @myVarchar, LEN(@myVarchar), ltrim(rtrim(@myNVarchar)), LEN(ltrim(rtrim(@myNVarchar)))

Oh, and if you got your problem working, please be sure to mark this thread "Solved". Best of luck!

0

It's almost there. Now I need to update the CPISC table with just the vSeries_ID, Variable_Value, & Variable_Date. Working on that right now. Oh and thanks for the above suggestion.

0

Alright, I'm at the final stages of this.

In tblCPIStaging, the vSeries_ID is set to allow nulls based on the fact that the csv file does not contain the vSeries_ID numbers, they are located and loaded with the update query:

UPDATE a
SET a.vSeries_ID = b.vSeries_ID
FROM dbo.tblCPIStaging a
INNER JOIN dbo.tblVSeriesList b
ON a.vSeries_Geography = b.vSeries_Geography 
AND a.vSeries_Type = ltrim(rtrim(b.vSeries_Type))

No problem.

From there, we use decided to go with the Insert Into query:

UPDATE a
SET a.vSeries_ID = b.vSeries_ID
FROM dbo.tblCPIStaging a
INNER JOIN dbo.tblVSeriesList b
ON a.vSeries_Geography = b.vSeries_Geography 
AND a.vSeries_Type = ltrim(b.vSeries_Type)

No problem.

However, when I use:

insert into tblCPISC(vSeries_ID,Variable_Value,Variable_Date)
select l.vSeries_ID,l.vSeries_Value,l.vSeries_Ref_Date 
from tblCPIStaging as l

The tblCPISC’s vSeries_ID cannot accept nulls because of its join with the tblVSeriesList’s vSeries_ID. I’ve been trying to work around this issue but I keep getting null errors even though the data set does not contain any nulls at this stage.

Am I making sense here and if so, what are my options regarding null properties?

Edited by Stuugie

0

I recreated the tblCPISC to allow nulls and now the procedure works for the whole ETL process. :)

0

Thanks again BitBit, you helped me so much with this and also with my understanding of SQL statements!

0

Yikes, my problem with the nulls is actually from the table tblVSeriesList. The data I filled into it is missing data that the staging table has and therefore the nulls happen. I have to refill tblVSeriesList to exactly match the source's geography and types. So the procedure I am working on is right but the data isn't. No problems though, all this means to me is a little more time.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.