Forum

 
ForumForumDiscussionsDiscussionsSQLBI Methodolo...SQLBI Methodolo...utilizing views and lookups - your experienceutilizing views and lookups - your experience
Previous Previous
 
Next Next
New Post
 6/2/2009 9:30 PM
 

I've been using views since day 1 to source the integration and my cubes , and use it as a decouple mechanism (like Ansi-Sparc prescribes) and I think it's nice many think the same.

But when using SSIS and while sourcing the packages with the views I noticed a very strange and , dangerous, behaviour:

Sometimes they issue false surrogate keys and issue a -1 (unknown) value while an existing SK value should be issued

Further detail:

- the view is a straight select statement

- within the views I use surrogate lookup functions , for instance a snippet of a view which shows it

SELECT

    etl.fn_lookup_SK_Organisation (@BK_Organisation, @startdate, @enddate), the variables are replaced by concrete attributes of the staging table.

I prefer surrogate lookups within the views, using functions, because this way I only need to maintain the surrogate lookup at one place. When using the SSIS lookup you will find yourself changing tons of packages once a lookup strategy, of a popular used dimension, is being changed.

- What is your favourite key lookup strategy?

- did you experience false key issuing?

(note : i'm not using materialized views)

 

New Post
 6/3/2009 6:23 PM
 

Your approach is interesting, because it centralizes the logic of SCD surrogate key resolution. However, you could obtain a similar result by using stored procedures in SSIS lookup components, but probably it would affect performance in a bad way. At that point, the problem is that you will end having a distributed logic between SSIS and SQL views and it could be a bad idea (having SQLBI layers in mind, where we use SQL views only to decouple layers and not to put transformation logic into).

Talking about false key issues: you should check your lookup function, maybe there is some issue there... for example, could it be a race condition running several processes in parallel (for example, dimension still not ready with dimension member when fact row referencing it has to be loaded?)

Marco

 

New Post
 6/4/2009 11:28 PM
 
 Modified By johan  on 6/4/2009 11:44:18 PM

Thanks for the quick response.

We're not facing LAD's (lately arriving dimensions) with the false keys, actually these dimensions which are  incorrect issued with -1 (which is very rarely the case within 1 specific package) are days to months old and were in the DWH before the facts came in. When running the view, without the package, it neatly shows the surrogate key with the correct value. When actually rerunning the package again it works well so i blame it to "stress". I ever heard about 'processor floating points' issues which are capable to corrupt calculations but in such case it is about 10th digit decimals and not such a simple thing like key lookup within a function which  I would think of 100% trusted computable under any stress condition.

Another strategy I use is putting the SK_ (surrogates) in the staging tables and fill them using stored procedures which are initiated by the ETL package in case of large datasets, prior to the actual loading of the target table.

Of course you have a theoretical point in not putting transformation into the view (which I do by calling the T-SQL function), but a centralized repository of lookups being built, factually doesn't hurt the decoupling strategy (anybody could think of a scenario where it does? I would love to hear). In a performance sense I don't see a problem too, because the SSIS lookup also needs to apply full table scans to resolve the key. At the end I end up with simple lookup maintenance rather than divergent similar functionality which needs multiple editing when situations change.

 

 

 

 

 

New Post
 6/5/2009 12:41 AM
 

The failure of surrogate key lookup appears really strange - if it is the result of a stress condition, it should be investigated more.

Do you use SQL2005 or SQL2008?

Did you tried to disable parallelism using OPTION (MAXDOP 1) hint?

New Post
 6/5/2009 10:05 AM
 
 Modified By johan  on 6/5/2009 10:06:49 AM

we're using 2005.

I actually wanted to change that specific package by using SSIS lookups but I will try parallelism first, thanks for the hint I will report over here whether this resolves the problem.

 

Previous Previous
 
Next Next
ForumForumDiscussionsDiscussionsSQLBI Methodolo...SQLBI Methodolo...utilizing views and lookups - your experienceutilizing views and lookups - your experience