Data Warehousing Testing

September 27th, 2009 admin Posted in Data Warehouse Consulting, Data warehousing consultants, Uncategorized 2 Comments »

Summary: These days, reviewing historical trends and monitoring the near real-time operational data has become a key competitive advantage. This is why businesses are primarily focusing on the collection and organization of several kinds of data for strategic decision-making. Here in this article, you will find some practical recommendations and suggestions for testing extract, transform and load ETL applications based on years of experience in data warehousing testing. Every possible effort has been made to help the organizations looking to improve their existing data warehouse.

There are some common goals for testing an ETL application:

Data completeness-It ensures that all expected data is loaded.

Data Quality- It promises that the ETL application correctly rejects, substitutes default values, corrects and reports invalid data.

Data transformation- This is meant for ensuring that all data is correctly transformed according to business rules and design specifications.

Performance and scalability- This is to ensure that the data loads and queries perform within expected time frames and the technical architecture is scalable.

Integration testing- It is to ensure that ETL process functions well with other upstream and downstream applications.

User-acceptance testing- It ensures the solution fulfills the users’ current expectations and also anticipates their future expectations.

Regression testing- This is to keep the existing functionality intact each time a new release of code is completed.

Data warehouse testing is basically divided into two categories ‘Back-end testing’ and ‘Front-end testing’. The former applies where the source systems data is compared to the end-result data in Loaded area, while the latter refers to where the user checks the data by comparing their MIS with the data that is displayed by the end-user tools.

Data Validation

Data completeness is one of the basic ways for data validation. This is needed to verify that all expected data loads into the data warehouse. This includes the validation of all the records, fields and ensures that the full contents of each field are loaded.

Data Transformation

Validating that the data is transformed correctly based on business rules, can be one of the most complex parts of testing an ETL application with significant transformation logic. Another way of testing is to pick up some sample records and compare them for validating data transformation manually, but this method requires manual testing steps and testers who have a good amount of experience and understand of the ETL logic.

Testing data warehouse is not any easy task; rather it is a very complex process that needs a high degree of experience from the testers. There are some basic and common steps for system analysis and process testing.

First Step- One needs to analyze documents of some sort. These documents can be useful for testing the development process, but the downside is that they lack the necessary details to support test development and execution. Many a times, people require other documents, which are, know as ‘ source-to-target mappings’. Such documents are full of the detailed technical specifications and help people to know where the data is coming from, what should be done to that data and where it should be loaded.

Second Step- After analyzing the documentation source, the second step is to develop strategy and test plans. It is recommended that the incremental testing approach is good when testing a data warehouse. The added advantage of this approach is that it avoids an overwhelming delivery of data and enables early defect detection and simplified debugging. Further, this approach serves you to set up the detailed processes that involve the development and testing process.

This article has been written by a data warehouse expert.

AddThis Social Bookmark Button

11 Steps to Successful Data Warehousing

September 25th, 2009 admin Posted in Data Warehouse Consulting, Data warehousing consultants 1 Comment »

Summary: Data warehousing is becoming more and more popular with every passing day as most of the companies are taking the support of data warehousing as it is really helpful to win over a new customer, develop new products and lower the costs. Searching through the data meant for corporate transactions can provide insights and highlight critical facts that can improve your business. Data warehousing has been a common practice for big companies, but now even small organizations can avail it’s advantages due to the reduced cost of data warehousing technology.

Here are some steps that are considered in implementing your data warehousing solution:

1-Recognize that the job is probably harder than you expect.

According to reports of several experts, 30-50 percent of the information in a typical database is missing or incorrect. This much percentage of error is completely unacceptable in a data warehousing system that is designed to sort through millions of historical records to identify trends or select customers for a new product or service. Sometimes, the data is correct, but it may not be usable in a data warehouse environment. For examples, legacy system programmers use shortcuts to save the disk space. They use numbers in place of the names of cities and thus their data is meaningless in a generic environment.

2- Understand the data in your existing systems

The second step is an important part of understanding the existing data. It is done to determine the interrelationships between various systems. Interrelationship is essential to be maintained while moving the data into the warehouse. One must have a clear idea about the data relationships among various heterogeneous systems to determine how any change may impact the system.

3-Do recognize equivalent entities

One of the most important points while preparing for a data warehousing project is to identify equivalent entities and heterogeneous systems. This problem generally occurs when the same kind of information appears under different field names. For example, it is possible that two different companies may be serving the same customer but they may be using the name differently such as AIG and American International Group.

4- Using Metadata

Metedata is considered highly crucial for successful data warehousing implementation. Metadata refers to the data that indicates the subject of a web document. There are various categories of data that can be associated with a database to characterize an index data, facilitate or restrict access to data, determine the source and currency of data etc. One major task is to synchronize the metadata between various vendor products, different functions and several metadata stores.

5-Choose the right data transformation tools

Data transformation tools are required to extract data from the operational sources, clean it and load it into the data warehouse. Such a transformation process involves the creation and population of new fields from operational data, surprising data to an appropriate level for analysis and error checking operations to validate the integrity of data. Hence, buy tools that ensure a high level of data transformation. This usually requires the help of a good data warehousing consultant.

6- Use external sources

The third party provides the external data such as data from a customer’s transaction processing systems or market research data and they are great value for internal information. For example, you want to know the income of the customers in a particular country; you can easily do so using external data sources. This type of data will not provide you with the exact information, but it will give you a rough idea about their income.

7- Use new information distribution methods

Today, with the advancement in technology, information can be distributed in several ways directly to individuals who require it. Now, it is very easy to subscribe to regular reports and have them delivered through email. Another good way to delver information is to let the users log in, search for the desired data and open the files that give the desired information.

8- Focus on hot marketing applications

Applications in data warehousing involve high-payback marketing applications. For example, catalog manufacturers are using them to generate higher sales.

9- Focus on early wins to build support throughout the organization

Off- the- shelf solutions can be used to provide point solutions in a short time that serve as a training and demonstration platform and then build for full-scale implementation. Many times is still requires data warehouse consulting.

10- Do not undervalue hardware equipments

Hardware equipments are required, as a large number of CPU cycles are needed to slice and dice the data again and again to fulfill various types of needs of users throughout the organization.

11- Outsource your data ware house development and maintenance

Most of the companies these days outsource the data warehouse development and maintenance in order to avoid the difficulty of locating the high cost of retaining skilled IT staff.

An expert data warehouse designer has written this article.

AddThis Social Bookmark Button

Grids in Data Warehousing

September 24th, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants No Comments »

Summary: IT organizations face a lot of problems when data volume explodes, but there is no need to worry, as gird technology is available for use. A Grid is very useful to handle data volume and make the data processing faster. IT firms that have implemented the grid technology in their database and ETL tools are reaping the benefits of fast processing.

What is grid technology? A grid basically refers to a collection of low-cost servers connected over a high-speed network in which IT resources such as computer power, storage and network capacity are pooled into a single set of shared services that can be distributed on demand. A Grid results in effective utilization of already available resources.

Why do we use gird technology? Data warehousing is basically about loading of the several forms of data such as raw data, summery data and metadata from heterogeneous sources like operational systems, mainframes and files etc. This is one of the most useful technologies to extract, transform and load data into the data warehouse. This process of data loading is called as ETL. The general problem in this process is that loading becomes inefficient when the data volume increases. Hence, the ETL process should be designed in such a way that data loading should be completed within the given load window.

To handle all the data explosion and the other related problems, grid computing is one of the innovative solutions that provides the following benefits:

1- Scalability: By distributing the task over a shared pool of resources, the scalability and performance can be improved.

2- Reliability- In a grid, if one server fails, then another server may be used for further processing tasks. Thus, grid computing is a highly reliable way of processing.

3- Cost saving- It is an interesting fact that companies can enhance their return on investment and lower the cost of ownership by utilizing the computing power of unused resources.

4- Effective utilization of resources- A number of users can access the shared pool of resources in order to obtain the best possible response time. Maximizing the utilization of all the resources, which are available in the pool, can do this.

Some examples of the implementation process of a grid:

1- Informatica Corporation
Informatica PowerCenter 8 is the latest release of Informatica that harnesses the power of grid computing. It delivers load balancing, dynamic partitioning and parallel processing to ensure optimal scalability, performance and reliability.

2- Oracle Corporation

Oracle has implemented grid computing in their Oracle 10g version. In Oracle 10g, the database can balance the workload with new processing capacity. On the other hand, the Oracle Database 11g delivers the benefits of grid computing with more self-management and automation in order to store more data and run queries faster and protect and audit the data.

A Grid in data warehousing is useful to several people like IT managers and directors, data warehouse architects and specialists and most importantly business analysts and decision makers.

An expert data warehousing consultant has written this article.

AddThis Social Bookmark Button

The Importance Of Data Cleansing Your Data Warehouse

August 22nd, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consultants, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants, data warehousing consultant No Comments »

Data Cleansing and ETL

Summary: Data warehouse plays a crucial role in defining the process flow in an organization. When you are designing a data warehouse, it is essential to make it go through cleansing and ETL. There are various reasons behind it. Read the article to know more.

A data warehouse is a system that produces data, scrubs it up, organizes the data and places it in the right order for data analysis, browsing and decision-making. We can say that a data warehouse helps different departments within an organization to work in a smooth way. There are many important elements of any data warehouse development process and data cleansing is one of them.

Data cleansing or data scrubbing is the process to stop undecipherable data from entering the data warehouse. During this process, all the records are checked for consistency & accuracy and some of them are even deleted or corrected. This is absolutely essential for data warehousing.

The process of data cleansing can occur within a single set of records or in multiple data sets. It is most of the times an intensive and contentious process that makes a data warehouse more secure.

What is ETL?

Extract, transform and load is a category of tools that will help in ensuring that the data is clean and confirms to quality standard before it is being entered into the data warehouse. The ETL tools supplied by the vendor are more easy to use for managing the data cleansing process on an ongoing basis. The tools are used for reading the incoming data and if they come across a data for which they have been programmed to transform, they will make changes in it before loading it into the warehouse.

You can use the ETL tools for extracting data from remote databases manually or through automatically scheduled events. Although there are other alternatives to purchase ETL but all this depends on the complexity and the budget of the project.

If you are working on smaller projects, the database administrations can write scripts that will perform the ETL functionality. Even Microsoft has a SQL server with a free ETL tool called Data Transforming Service. The tool is good but has a few limitations when it comes to ongoing administration of data cleansing.

Planning for Data Cleansing

It is essential for a data warehouse designer to map out the data before entering it into the data warehouse as soon as possible. Many aspects will change as the project will get mature but this documentation will serve as a valuable source to get commitment from data owners that they will not change the format of the warehouse without informing the designer.

Your very first step should be to create a list of data that will be extracted, transformed and loaded. One should also make a separate list of data that has chances of changing formats. The next step should be to decide if you want to purchase ETL tools or work with free tools..

An expert in data warehousing consultant has written this article.

AddThis Social Bookmark Button

How to Implement a Data Warehouse?

August 4th, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consultants, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants, data warehousing consultant 1 Comment »

A data warehouse is the repository of a company’s electronically stored data. It is designed to facilitate reporting and analyzing tasks. Apart from analyzing and reporting a data warehouse has the tools to extract, transform and load data, manage and retrieve Meta data.

You cannot purchase or install a data warehouse by yourself. Many times a data warehouse consultant is needed. For implementing a data warehouse, you will have to integrate the implementation of many products. Here are some steps of data warehouse implementation:

• Collecting and analyzing business requirements.

• Creating a data model, physical design and deciding the appropriate hardware platform for the data warehouse.

• Defining data sources.

• Choosing DBMS and software platform for the data warehouse.

• Steps like extracting the data from operational data sources, translating it, cleaning and loading it into the data warehouse model.

• Choosing the right database access and reporting tools.

• Finalizing the data analysis (OLAP) and presentation (client GUI) software’s.

• Periodic refreshing of the data warehouse.

Choosing Access Tools

Without SAS, no data warehouse vendor can provide a comprehensive single-window software tool that is capable of handling different aspects of data warehousing project implementation. SAS not only has its own database but also has the capability of importing data from any vendor DBMS software. Thus you can implement a data warehousing and mining solution individually with SAS.

While you are choosing the tools, the best way is to understand the capability and compatibility of available options for the type of access you want. Here is the type of access and reporting:

• Time series analysis.

• General statistical analysis.

• Data visualization, graphics, charting and pivoting.

• Ad-hoc user specific queries.

• Artificial intelligence features like identification and validation of data, trends discovery and more.

• Predefined repeatable queries.

• Text mining.

• Reporting the analysis by drilling down.

• Queries with multilevel sub-queries, multiple table forces and sophisticated search criteria.

When we talk of big organizations, there are chances of the user requirement exceeding the capability of available tools. If you are facing a similar problem, it is better to search for other tools in the market. There are ample options that will enable the ordinary user to make customized reports by composing and executing ad-hoc queries. You don’t need to have specific knowledge about the design details, technology, SQL or data model to work with these tools.

When you are getting a data warehouse for your organization, it is important to consider all these points for getting flawless results. Investing on a data warehouse  is not an easy thing therefore you have every right to pick and choose the options depending on the nature of your business and how much user friendly application you want. This is why the industry has seen a rise in data warehousing consulting due to the complex nature of the product and the implementation.

AddThis Social Bookmark Button