Data Warehousing Vs Databases

September 26th, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consultants, Data warehousing Consulting, data warehousing consultant 3 Comments »

Summary:In simple terms, a database is an integrated collection of information that is logically organized so that it can be easily accessed, managed and updated. On the basis of content, a database can be classified into different types such as bibliographic, full-text, numeric and images. On the other hand, data warehousing refers to an electronically stored data that is meant for facilitating reporting and analysis. It serves to retrieve, analyze, extract, transform and manage the data dictionary. In data warehousing, many types of tools are used to extract, transform and load data into the repository.

Many people are confused between a database and data warehouse. Not many know as to why Business Intelligence systems are using Data Warehouse rather than Database to pull historical data? What is the difference between a Database and Data Warehouse?

Here are some valuable points that differentiate them from each other:

Database:

1- Database is commonly used for Online Transactional Processing (OLTP). This serves as a historical record.

2- To design the database, rational modeling techniques is used.

3- It is optimized for writing operation.

4- For analysis queries, performance is low.

Data Warehouse:

1- This is used for Online Analytical Processing (OLAP). It is meant for reading the data for users involved in business decisions.

2- It is an especially setup database used for reporting and analysis purposes.

3-It is meant to typically keep a very long history from several years to the entire life of the company so that all the old records can be viewed.

4-In order to save space and reduce redundant data, the tables and joins are normalized.

5- Ensures high performance for analytical queries.

6- In this, modeling techniques are used for database design.

General Data Flow- (Considering an example of An Online Insurance Registration)

1- First of all, it allows customers to enter the details in an online registration form.

2- To save the details, a customer needs to enter the ‘ Submit button’ in the form.

3- To retrieve the data from the database tables to the data warehouse tables, a businesses Intelligence Team of Insurance Company uses an ETL tool.

4- Finally, Businesses Management uses Business Reporting Tools to retrieve a Data Warehouse table for preparing business reports. For example, to see the customer details entered into an Online Registration Form by the customer.

Hence, there are many points that differentiate a database from a data warehouse, however both are used for several types of information, which is used for business and non-business purposes. They are integral parts of almost each and every business now to maintain valuable records. Online transactional activities cannot happen without the facility of database and data warehousing facilities. Useful information for businesses cannot be availed without tools for storing and retrieving data. Data is useful for everyone including researchers, businessmen, students, customers and many more. It is therefore that the significance of data tools have gained much more value now. There are various types of tools for extracting, storing and retrieving data in the market. They have different configurations and performance results.

This article has been written by a data warehouse expert.

AddThis Social Bookmark Button

Grids in Data Warehousing

September 24th, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants No Comments »

Summary: IT organizations face a lot of problems when data volume explodes, but there is no need to worry, as gird technology is available for use. A Grid is very useful to handle data volume and make the data processing faster. IT firms that have implemented the grid technology in their database and ETL tools are reaping the benefits of fast processing.

What is grid technology? A grid basically refers to a collection of low-cost servers connected over a high-speed network in which IT resources such as computer power, storage and network capacity are pooled into a single set of shared services that can be distributed on demand. A Grid results in effective utilization of already available resources.

Why do we use gird technology? Data warehousing is basically about loading of the several forms of data such as raw data, summery data and metadata from heterogeneous sources like operational systems, mainframes and files etc. This is one of the most useful technologies to extract, transform and load data into the data warehouse. This process of data loading is called as ETL. The general problem in this process is that loading becomes inefficient when the data volume increases. Hence, the ETL process should be designed in such a way that data loading should be completed within the given load window.

To handle all the data explosion and the other related problems, grid computing is one of the innovative solutions that provides the following benefits:

1- Scalability: By distributing the task over a shared pool of resources, the scalability and performance can be improved.

2- Reliability- In a grid, if one server fails, then another server may be used for further processing tasks. Thus, grid computing is a highly reliable way of processing.

3- Cost saving- It is an interesting fact that companies can enhance their return on investment and lower the cost of ownership by utilizing the computing power of unused resources.

4- Effective utilization of resources- A number of users can access the shared pool of resources in order to obtain the best possible response time. Maximizing the utilization of all the resources, which are available in the pool, can do this.

Some examples of the implementation process of a grid:

1- Informatica Corporation
Informatica PowerCenter 8 is the latest release of Informatica that harnesses the power of grid computing. It delivers load balancing, dynamic partitioning and parallel processing to ensure optimal scalability, performance and reliability.

2- Oracle Corporation

Oracle has implemented grid computing in their Oracle 10g version. In Oracle 10g, the database can balance the workload with new processing capacity. On the other hand, the Oracle Database 11g delivers the benefits of grid computing with more self-management and automation in order to store more data and run queries faster and protect and audit the data.

A Grid in data warehousing is useful to several people like IT managers and directors, data warehouse architects and specialists and most importantly business analysts and decision makers.

An expert data warehousing consultant has written this article.

AddThis Social Bookmark Button

The Importance Of Data Cleansing Your Data Warehouse

August 22nd, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consultants, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants, data warehousing consultant No Comments »

Data Cleansing and ETL

Summary: Data warehouse plays a crucial role in defining the process flow in an organization. When you are designing a data warehouse, it is essential to make it go through cleansing and ETL. There are various reasons behind it. Read the article to know more.

A data warehouse is a system that produces data, scrubs it up, organizes the data and places it in the right order for data analysis, browsing and decision-making. We can say that a data warehouse helps different departments within an organization to work in a smooth way. There are many important elements of any data warehouse development process and data cleansing is one of them.

Data cleansing or data scrubbing is the process to stop undecipherable data from entering the data warehouse. During this process, all the records are checked for consistency & accuracy and some of them are even deleted or corrected. This is absolutely essential for data warehousing.

The process of data cleansing can occur within a single set of records or in multiple data sets. It is most of the times an intensive and contentious process that makes a data warehouse more secure.

What is ETL?

Extract, transform and load is a category of tools that will help in ensuring that the data is clean and confirms to quality standard before it is being entered into the data warehouse. The ETL tools supplied by the vendor are more easy to use for managing the data cleansing process on an ongoing basis. The tools are used for reading the incoming data and if they come across a data for which they have been programmed to transform, they will make changes in it before loading it into the warehouse.

You can use the ETL tools for extracting data from remote databases manually or through automatically scheduled events. Although there are other alternatives to purchase ETL but all this depends on the complexity and the budget of the project.

If you are working on smaller projects, the database administrations can write scripts that will perform the ETL functionality. Even Microsoft has a SQL server with a free ETL tool called Data Transforming Service. The tool is good but has a few limitations when it comes to ongoing administration of data cleansing.

Planning for Data Cleansing

It is essential for a data warehouse designer to map out the data before entering it into the data warehouse as soon as possible. Many aspects will change as the project will get mature but this documentation will serve as a valuable source to get commitment from data owners that they will not change the format of the warehouse without informing the designer.

Your very first step should be to create a list of data that will be extracted, transformed and loaded. One should also make a separate list of data that has chances of changing formats. The next step should be to decide if you want to purchase ETL tools or work with free tools..

An expert in data warehousing consultant has written this article.

AddThis Social Bookmark Button

How to Implement a Data Warehouse?

August 4th, 2009 admin Posted in Data Warehouse Consultant, Data Warehouse Consultants, Data Warehouse Consulting, Data warehousing Consulting, Data warehousing consultants, data warehousing consultant 1 Comment »

A data warehouse is the repository of a company’s electronically stored data. It is designed to facilitate reporting and analyzing tasks. Apart from analyzing and reporting a data warehouse has the tools to extract, transform and load data, manage and retrieve Meta data.

You cannot purchase or install a data warehouse by yourself. Many times a data warehouse consultant is needed. For implementing a data warehouse, you will have to integrate the implementation of many products. Here are some steps of data warehouse implementation:

• Collecting and analyzing business requirements.

• Creating a data model, physical design and deciding the appropriate hardware platform for the data warehouse.

• Defining data sources.

• Choosing DBMS and software platform for the data warehouse.

• Steps like extracting the data from operational data sources, translating it, cleaning and loading it into the data warehouse model.

• Choosing the right database access and reporting tools.

• Finalizing the data analysis (OLAP) and presentation (client GUI) software’s.

• Periodic refreshing of the data warehouse.

Choosing Access Tools

Without SAS, no data warehouse vendor can provide a comprehensive single-window software tool that is capable of handling different aspects of data warehousing project implementation. SAS not only has its own database but also has the capability of importing data from any vendor DBMS software. Thus you can implement a data warehousing and mining solution individually with SAS.

While you are choosing the tools, the best way is to understand the capability and compatibility of available options for the type of access you want. Here is the type of access and reporting:

• Time series analysis.

• General statistical analysis.

• Data visualization, graphics, charting and pivoting.

• Ad-hoc user specific queries.

• Artificial intelligence features like identification and validation of data, trends discovery and more.

• Predefined repeatable queries.

• Text mining.

• Reporting the analysis by drilling down.

• Queries with multilevel sub-queries, multiple table forces and sophisticated search criteria.

When we talk of big organizations, there are chances of the user requirement exceeding the capability of available tools. If you are facing a similar problem, it is better to search for other tools in the market. There are ample options that will enable the ordinary user to make customized reports by composing and executing ad-hoc queries. You don’t need to have specific knowledge about the design details, technology, SQL or data model to work with these tools.

When you are getting a data warehouse for your organization, it is important to consider all these points for getting flawless results. Investing on a data warehouse  is not an easy thing therefore you have every right to pick and choose the options depending on the nature of your business and how much user friendly application you want. This is why the industry has seen a rise in data warehousing consulting due to the complex nature of the product and the implementation.

AddThis Social Bookmark Button