Data Mining

 

 

The purpose of data analysis is to discover previously unknown data characteristics, relationships, dependencies, or trends. A typical data analysis tool relies on the end users to define the problem, select the data and initiate the appropriate data analysis to generate the information that helps model and solve problems that the end users uncover.

 

If the end user fails to detect a problem, no action is taken. Given that limitation, some current BI environments now support various types of automated alerts. The alerts are software agents that constantly monitor certain parameters, such as sales indicators and inventory levels, and then perform specified actions (send e-mail or alert messages, run programs, and so on) when such parameters reach predefined levels.

You May Also Like:

Business Intelligence and Its Architecture
Operational Data and Decision Support Data
The Data Warehouse and Data Mart
OLAP and Its Characteristics

 

In contrast to the traditional (reactive) BI tools, data mining is proactive. Instead of having the end user define the problem, select the data, and select the tools to analyze the data, data-mining tools automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user.

 

In other words, data mining refers to the activities that analyze the data, uncover problems or opportunities hidden in the data relationships, from computer models based on their findings, and then use the models to predict business behavior—requiring minimal end-user intervention.

 


Data mining describes a new breed of specialized decision support tools that automate data analysis. In short, data-mining tools initiate analysis to create knowledge. Such knowledge can be used to address any number of business problems.

 

To put data mining in perspective, look at the pyramid in the following figure, which represents how knowledge is extracted from data. Data form the pyramid base and represent what most organizations collect in their operational databases. The second level contains information that represents the purified and processed data. Information forms the basis for decision making and business understanding. Knowledge is found at the pyramid’s apex and represents highly specialized information.

 

Data Mining

 

In spite of the lack of precise standards, data mining is subject to four general phases:

 

I. Data preparation:

In the data preparation phase, the main data sets to be used by the data-mining operation are identified and cleaned of any data impurities. Because the data in the data warehouse are already integrated and filtered, the data warehouse usually is the target set for data-mining operations.

 

II. Data analysis and classification:

 

The data analysis and classification phase studies the data to identify common data characteristics or patterns. During this phase, the data-mining tool applies specific algorithms to find:

 

• Data groupings, classifications, clusters, or sequences.

• Data dependencies, links, or relationships.

• Data patterns, trends, and deviations.

 

III. Knowledge acquisition:

The knowledge acquisition phase uses the results of the data analysis and classification phase. During the knowledge acquisition phase, the data-mining tool (with possible intervention by the end user) selects the appropriate modeling or knowledge acquisition algorithms. The most common algorithms used in data mining are based on neural networks, decision trees, rules induction, genetic algorithms, classification and regression trees, memory-based reasoning, and nearest neighbor and data visualization.

 

IV. Prognosis:

 

 

In the prognosis phase, the data-mining findings are used to predict future behavior and forecast business outcomes. Examples of data-mining findings can be:

• Sixty-five percent of customers who did not use a particular credit card in the last six months are 88 percent likely to cancel that account.

• Eighty-two percent of customers who bought a 42-inch or larger LCD TV are 90 percent likely to buy an entertainment center within the next four weeks.

You May Also Like:

Relational OLAP
Star Schema and Its Components
Issues In Data Warehouse Implimentation    
Other DBMS Questions

 

 

Advertisement

Free Training

Coursera Data Science