MDM Consultant: 2010

Tuesday, September 28, 2010

Master Data Management as a Service?

Recently Cloud Computing is on a boom and most of the customers prefer to store their data in the cloud. Google is the biggest example of cloud computing. I personally like clouds; as being in any part of the world i can access my data provided i have an internet connection.

Recently i came across a technology where all the Dashboards and Executive Reports are delivered as a service and the underlying architecture is the cloud.

The application i am talking about is GoodData. It delivers complete business intelligence platform-as-a-service (BI PaaS) that brings the benefits of cloud computing to the world of business intelligence and data warehousing. Built as a complete integrated platform and offered as a service, GoodData delivers on the fundamental promise of the cloud – on-demand, self-service to deploy and use, and easily adaptable to business requirements.(www.gooddata.com). It also provides an open source API's which any devloper can hijack and customize to their business requirements.

This article makes me think on two thoughts

Thought#1: Business Analytics industry is not matured yet and a recent survey by Deloitte Consulting confirmed it. The results showed only 33% of the US firms are using Dashboards and Executive reports; which is a staggering shock to me. In spite of the several advancements in Business Intelligence and Data Warehousing world, still 67% of firms are not able to implement the BI/DW technology.(http://www.cioupdate.com/news/article.php/3900476/Deloitte---33-of-Firms-Missing-Out-on-Business-Analytics.htm)

And on the flip side my other thought is:

Thought #2: Would cloud computing help 67% of firms in building the right business technology? Does cloud computing alleviate the challenges that firms face during building their in-house business intelligence platform?

I believe cloud computing has a long way to go. And since my expertise is Master Data Management; i would be really happy to come across a technolgy where i can build Master Hiearchies and build Master Data using the cloud.

That would be soo cool; Master Data Management as a Service!!!

As always i am eager to hear my reader's experience and any comments/suggestions are welcome
Talk to you soon!!

Nilesh

Monday, July 26, 2010

Does the frequent upgrades of a product/software affect the end users

Recently i came across a question which was "Does the frequent upgrades of a product/software affect the end users"
http://www.linkedin.com/groupItem?view=&gid=45685&type=member&item=25553748&qid=d6f850cb-3794-4a19-86ec-722190e40d61&goback=.gmp_45685

I would like to share my thoughts on this one.

Based on several years of experience in consulting, i have participated in 4-5 upgrades of various products/software. And in all upgrade engagements i would definitely say that it does take significant amount of resources, time and money. And again it also depends on the reasons why a particular software is been updated.

Is it because
1) A particular bug is fixed for which the client is waiting for, which is majority of upgrade engagements
2) New features have been introduced in the new version and hence the client is updating the software
3) Many a times i have seen the support has been extended to other platforms. For e.g. Initially a tool used to support only Oracle and SQL Server databases, now in the new version it supports MySQL or Netezza

So in every upgrade project, a customer may have different reasons and it is always a process which everyone has to go through. The Best Practice i would like to follow is " Upgrade the software every six months or twelve months depending on the requirements. 12 months if the requirements are not severe or if we can a get a feature which would be beneficial for IT; it should be six months"

To summarize,
The pros i have seen are end users get new features in a product, extended support to different breed of platforms and cons i see as time, money and resources and majorly the confidence of the end users and most of the time its difficult to buy in the end users.

And to answer the question it does affect the end users, as they have to be made aware of the new features in the tool and again a training process has to be followed so that users get familiar with the new version. In my experience a minor version does not changes the product in and out compared to major version. So it is a case by case implementation and requirements, scope can be different in every implementation.

What have been your experiences? Do you have any best practices or recommendations which should be followed during upgrade engagements?

Nilesh Makhija

Wednesday, June 2, 2010

Toad 9.0 with Oracle 11g

Here are my two cents on connectivity issues between Toad and Oracle 11g...

Everyone knows about Toad and is the most widely used tool for connecting to Oracle databases. We use Toad all the time to connect to Oracle 10g databases. But recently we upgraded one of our environments to Oracle 11g and Toad stopped working. (Dont we have such issues all the time?)

Basically this is the issue:
I tried connecting to Oracle 11g using Toad 9.1 and it popped an error saying invalid username/password. So i used SQL Plus, Native connection of Oracle and tried connecting it to 11g environment and expected i will get the same error, but guess what SQL Plus got connected.

So Toad was unable to connect and SQL Plus got connected. Then i installed SQL Developer and again surprisingly SQL Developer also got connected using the same tnsnames.ora file. So there was something going on with Toad; which i needed to find out.

So basically this was the issue

In Oracle 11g, we have a variable called SEC_CASE_SENSITIVE_LOGON which says whether case sensitivity should be enabled for password or not during the logon

And keep in mind this was introduced only in Oracle 11g and none of the previous versions of Oracle had this parameter for the "password case sensitivity".

Now if sec_case_sensitive_logon = true, Password case sensitivity matters
and if sec_case_sensitive_logon = false, Password case sensitivity does not matters.

Now issue on the Toad side, it converts all the passwords to Uppercase(which is a bug in Toad 9.0). So the workaround is while connecting using Toad, enter username and keep the password field blank and Toad will prompt a dialog box asking for password, and then enter the password...and there you go..You successfully connected to Oracle 11g.

Or the other solution is change the sec_case_sensitive_logon and set it to False, using the following query
alter system set sec_case_sensitive_logon=false

Just sharing my knowledge....as i learn

Nilesh Makhija

Multiple Data Feeds in Kalido 8.5

As we have seen in most of the Data Warehousing projects, we try to run the reference data loads in parallel and increase the number of threads so that we can maximize the resources allocated to the Database Server. With Kalido also we can run the parallel loads but we need to keep a small thing in mind that we cannot run the parallel loads on the same object in Kalido Warehouse. Meaning; if we have a GL Account CBE and this CBE is being seeded from two source Systems SRC1 and SRC 2, we cannot run the loads from the 2 source systems at the same time. For sure these can run back to back but not in parallel with each other. And the reason for this is Kalido locks the object and only one load can update that object(in our example GL Account CBE). So if we try running another load while one load is in progress,the second load will error out. Consider yourself lucky, as Kalido generates a log file which can be decrypted by a human being with an error message "Another Load is in Progress".

So if we have feeds from various source systems loading data into the CBE, they should be executed sequentially. A dependency has to be build saying that when one load is complete, trigger the next load on the same object. For sure we can execute the different feeds for different objects and in that way parallelism can be achieved but not on the same object. We could increase the number of threads while loading the data, but not greater than 4(loads start degrading after 4)

So the thought for the day is "Never Ever run multiple data feeds on the same object in Kalido at the same time"

As usual Enjoy working with this great tool....

Nilesh Makhija

Friday, April 9, 2010

Oracle V/s Kalido

Hello,

There is an interesting link i came across which talks about implementing warehouse using Oracle Suite V/s implementing warehouse using Kalido tool and the back end database is still Oracle.

Kalido tool will just function as a tool and will automate much of the manual stuff and would also factor in some of the ETL functionality

Here is the link and let me know your feedback

Oracle V/s Kalido

Thursday, April 1, 2010

Cube V/s Data Marts

Cubes and Data marts

I am sure all you guys who have worked in warehousing practice, have at least heard these words and many of them actually have implemented these concepts...Cube and a Data mart

Recently i was having discussion with my peers in BI/warehousing pratice, on what are the differences between Cubes and Data Marts...and to my surprise..the discussion as not so productive as we digressed to diferent terms like OLAP and ROLAP...as no one was clear which way to go for a particular scenario.

So basically i am asking is two questions
1) What are the differences between a Cube and Data mart?

I've also had folks tell me that a data mart is more than just a collection of cubes. I've also had people tell me that data mart is a reporting cube, nothing more...

So what are the distinctions you understand

2) In what scenario would you choose cube over Data Mart and vice versa?

Waiting for your inputs...

--Nilesh

Tuesday, March 9, 2010

Number of Threads Parameter for Reference Data loading

Kalido DIW 8.5 has several ways to load the data into the warehouse. We can use Feeds if we want to load only some columns for a particular CBE or we can use file definitions to load from flat files or ODBC.

I was using feeds and was doing a performance testing for reference data loading. Now the machine which we have for our performance testing is 4 times better machine than what we have in DEV environment. So my initial guess was that the data loads would take atleast 50% less time than it takes in DEV environment.

But guess what Kalido loads are not linearly proportional to the Hardware configuration. It i snot always the case if we double the Hardware configuration, the load time would decrease by half. Not true.

So i played with the loads little more and explored some of the parameters Kalido has and with some permutations and combination's of the parameters i was able to decrease the load time, not significantly, but it did decrease.

So here is the list of parameters i used

Number of threads : 2
The interesting thing about this parameter, the pre-processing time is reduced, but the compound processing still takes the same time. So this parameter is only useful in the initial stages when the data is preprocessed for loading and Delta Detection

Staging Table Update Method: IGNORE
Now Kalido updates the staging table stating that the record was loaded successfully or the record was rejected and specifying the reason why it was rejected. This functionality of Kalido is coool, but it is an overhead on the application to write back to the tables and it utilizes more time.

So there is another parameter called IGNORE, in which we can tell Kalido DIW ignore any updates and load the data in the warehouse and dont bother about the staging table.
If we use this parameter, the overhead is reduced and for the rejected record we have to go to the STAGINGEXCEPTION table , which is okay to find the reason if the record was rejected.

And then finally COMMIT Size was set to 20000 in the reference data load. If we try to increase the COMMIT size beyond 20000 we ran into memory errors when we had huge huge amounts of memory. So just beware if you try to increase the commit size. My personal recommendations is to set at 20000

After using all the three parameters, i was able to get a improvement in performance but i would not say it was because of the hardware was 4 times better. But using the above parameters definitely helps and i would always recommend using the above parameters for reference data loading (Number of Threads, Staging table update Method, Commit size)

The interesting part is the Number of threads parameter is only for reference data loading and there is no such parameter for Transaction data loading. I have no idea why this parameter is not for Transaction Data loading.

If you know why we dont have this parameter for Fact data loads, please let me know?

As usual sharing my learning experiences, would be back with more knowledge sharing...

Till then take care...

Nilesh

Tuesday, February 23, 2010

Changing column names in Kalido MDM 8.5

Kalido MDM introduced the functionality of ODBC loaders in their new release of their MDM product. Now the data can be loaded from the staging tables itself, which makes Kalido MDM a complete MDM tool.

But when we create the staging tables in MDM, and save the staging tables, we are stuck with it. We cannot go back and change the column names in the tables or do any modification in the staging tables. The only way to edit the staging tables is to drop the staging tables and re-create it from the scratch which is a process in most of the cases.

A better way to change the column names in the staging tables for MDM, is to export the whole model into an XML(which is done using MDM tool itself) and then modify that XML file and change the column names as required. Once editing the XML file is done, we can import the file back into the MDM tool and it would have the new column names and also it would contain any structural changes we have made to the MDM model.

The only disadvantage for this process would be: it requires the knowledge of XML file, which the MDM model would be exported.

Remember: Always take a back Up of the original XML file...

This is one of the way we can change the column names in MDM staging tables.

Kalido DIW 8.5

I was creating a new warehouse using Kalido and this warehouse was created on UNIX Database server. Now when we create a new warehouse Kalido gives an option of loading the initial data for TIME, Unit of Measure and Currency. I chose an option not to load the data, as i can always come back and reload the data.

But to my surprise there is no opion of coming back and reloading the data. Ideally there should be an option of coming back to the tool and reloading the data(Time,UOM and Currency). But to my surprise there is no such magic button.

The other option is always to generate the scripts and to re-run the scripts manually. But just in case if the scripts are not generated; we have to re-create the warehouse from the scratch.

Just Sharing my learning experiences...