Notes and Study Materials

Data Mining

 

 

The purpose of data analysis is to discover previously unknown data characteristics, relationships, dependencies, or trends. A typical data analysis tool relies on the end users to define the problem, select the data and initiate the appropriate data analysis to generate the information that helps model and solve problems that the end users uncover.

 

If the end user fails to detect a problem, no action is taken. Given that limitation, some current BI environments now support various types of automated alerts. The alerts are software agents that constantly monitor certain parameters, such as sales indicators and inventory levels, and then perform specified actions (send e-mail or alert messages, run programs, and so on) when such parameters reach predefined levels.

Add a comment

Data as a Corporate Asset:

 

 

Data viewed as a company asset for two reasons:

 

1. There were simply too many data to be processed manually

2. Internal and external business operations moved at a much slower pace than they do today, so there was relatively little need for quick reactions triggered by fast-flowing information.

 

Data are a valuable resource that can translate into information. If the information is accurate and timely, it is likely to trigger actions that enhance the company’s competitive position and generate wealth. In effect, an organization is subject to a data-information-decision cycle; that is, the data user applies intelligence to data to produce information that is the basis of knowledge used in decision making by the user. This cycle is illustrated in the following figure.

 

 

You May Also Like:

Role of Databases in An Organizations
Summary of DBA Activities
DBA Evoluation
DA and DBA

 

The decisions made by high-level managers trigger actions within the organization’s lower levels. Such actions produce additional data to be used for monitoring company performance. In turn, the additional data must be recycled within the data-information-decision framework. Thus, data form the basis for decision making, strategic planning, control, and operations monitoring.

data as a corporate asset

As organizations become more dependent on information, the accuracy of that information becomes ever more critical. Dirty data, or data that suffer from inaccuracies and inconsistencies, becomes an even greater threat to these organizations. Data can become dirty for many reasons, such as:

• Lack of enforcement of integrity constraints (not null, uniqueness, referential integrity, etc.).

• Data entry typographical errors.

• Use of synonyms and/or homonyms across systems.

• Non standardized use of abbreviations in character data.

• Different decompositions of composite attributes into simple attributes across systems.

 

Efforts to control dirty data are generally referred to as data quality initiatives.

 

 

Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of the data. Data quality is concerned with more than just cleaning dirty data; it also focuses on the prevention of future inaccuracies in the data, and building user confidence in the data. Large-scale data quality initiatives tend to be complex and expensive projects. While data quality efforts vary greatly from one organization to another, most involve an interaction of:

• A data governance structure that is responsible for data quality.

• Measurements of current data quality.

• Definition of data quality standards in alignment with business goals.

• Implementation of tools and processes to ensure future data quality.

 

There are a number of tools that can assist in the implementation of data quality initiatives. In particular, data profiling and master data management software is available from many vendors to assist in ensuring data quality. Data profiling software consists of programs that gather statistics and analyze existing data sources. These programs analyze existing data and the metadata to determine data patterns, and can compare the existing data patterns against standards that the organization has defined. Master data management (MDM) software helps to prevent dirty data by coordinating common data across multiple systems. MDM provides a _master_ copy of entities, such as customers, that appear in numerous systems throughout the organization. 

 

You May Also Like:

Security of DBMS

Database Administration Tools

Denormalization

Back to DBMS Questions

 

 

Add a comment

The Need and Role of a Database in an Organization:

 

 

Data are used by different people in different departments for different reasons. Therefore, data management must address the concept of shared data. The DBMS facilitates:

 

• Interpretation and presentation of data in useful formats by transforming raw data into information.

• Distribution of data and information to the right people at the right time.

 

• Data preservation and monitoring the data usage for adequate periods of time.

 

• Control over data duplication and use, both internally and externally.

 

Add a comment

Summary of DBA Activities

 

 

As you examine in the following figure, note that the DBA is the focal point for data/user interaction. The DBA defines and enforces the procedures and standards to be used by programmers and end users during their work with the DBMS. The DBA also verifies that programmer and end-user access meets the required quality and security standards. 

 

The DBA activities portrayed in the above figure suggest the need for a diverse mix of skills. In large companies, such skills are likely to be distributed among several people who work within the DBA function. In small companies, the skills might be the domain of just one individual. The skills can be divided into two categories—managerial and technical.

Add a comment

DBA Evoluation:

 

 

Data administration has its roots in the old, decentralized world of the file system. The cost of data and managerial duplication in such file systems gave rise to a centralized data administration function known as the electronic data processing (EDP) or data processing (DP) department. The DP department’s task was to pool all computer resources to support all departments at the operational level. The DP administration function was given the authority to manage all existing company file systems as well as resolve data and managerial conflicts created by the duplication and/or misuse of data.

Add a comment

Differences Between DA and DBA:

 

 

Effective data administration requires both technical and managerial skills. For example, the DA’s job typically has a strong managerial orientation with company-wide scope. In contrast, the DBA’s job tends to be more technically oriented and has a narrower DBMS-specific scope.

Roles of Data Administrator (DA):

Add a comment

Security Of DBMS

 

 

Security refers to activities and measures to ensure the confidentiality, integrity, and availability of an information system and its main asset, data.3 It is important to understand that securing data requires a comprehensive, company-wide approach. 

To understand the scope of data security, let’s discuss each of the three security goals in more detail:

Add a comment

Database Administration Tools:

 

 

The Data Dictionary:

A data dictionary was defined as “a DBMS component that stores the definition of data characteristics and relationships.” You may recall that such “data about data” are called metadata. The DBMS data dictionary provides the DBMS with its self-describing characteristic.

Two main types of data dictionaries exist: integrated and standalone. An integrated data dictionary is included with the DBMS. For example, all relational DBMSs include a built-in data dictionary or system catalog that is frequently accessed and updated by the RDBMS. Other DBMSs, especially older types, do not have a built-in data dictionary; instead, the DBA may use third-party standalone data dictionary systems.

Add a comment