Tuesday, March 9, 2010

Number of Threads Parameter for Reference Data loading

Kalido DIW 8.5 has several ways to load the data into the warehouse. We can use Feeds if we want to load only some columns for a particular CBE or we can use file definitions to load from flat files or ODBC.

I was using feeds and was doing a performance testing for reference data loading. Now the machine which we have for our performance testing is 4 times better machine than what we have in DEV environment. So my initial guess was that the data loads would take atleast 50% less time than it takes in DEV environment.

But guess what Kalido loads are not linearly proportional to the Hardware configuration. It i snot always the case if we double the Hardware configuration, the load time would decrease by half. Not true.

So i played with the loads little more and explored some of the parameters Kalido has and with some permutations and combination's of the parameters i was able to decrease the load time, not significantly, but it did decrease.

So here is the list of parameters i used

Number of threads : 2
The interesting thing about this parameter, the pre-processing time is reduced, but the compound processing still takes the same time. So this parameter is only useful in the initial stages when the data is preprocessed for loading and Delta Detection

Staging Table Update Method: IGNORE
Now Kalido updates the staging table stating that the record was loaded successfully or the record was rejected and specifying the reason why it was rejected. This functionality of Kalido is coool, but it is an overhead on the application to write back to the tables and it utilizes more time.

So there is another parameter called IGNORE, in which we can tell Kalido DIW ignore any updates and load the data in the warehouse and dont bother about the staging table.
If we use this parameter, the overhead is reduced and for the rejected record we have to go to the STAGINGEXCEPTION table , which is okay to find the reason if the record was rejected.

And then finally COMMIT Size was set to 20000 in the reference data load. If we try to increase the COMMIT size beyond 20000 we ran into memory errors when we had huge huge amounts of memory. So just beware if you try to increase the commit size. My personal recommendations is to set at 20000

After using all the three parameters, i was able to get a improvement in performance but i would not say it was because of the hardware was 4 times better. But using the above parameters definitely helps and i would always recommend using the above parameters for reference data loading (Number of Threads, Staging table update Method, Commit size)

The interesting part is the Number of threads parameter is only for reference data loading and there is no such parameter for Transaction data loading. I have no idea why this parameter is not for Transaction Data loading.

If you know why we dont have this parameter for Fact data loads, please let me know?

As usual sharing my learning experiences, would be back with more knowledge sharing...

Till then take care...

Nilesh

No comments:

Post a Comment