Query Optimization in DDBMS

Performance transparency and query optimization in DDBMS

One of the most important functions of a database is its ability to make data available. Because all data reside at a single site in a centralized database, the DBMS must evaluate every data request and find the most efficient way to access the local data.

In contrast, the DDBMS makes it possible to partition a database into several fragments, thereby rendering the query translation more complicated, because the DDBMS must decide which fragment of the database to access.

In addition, the data may also be replicated at several different sites. The data replication makes the access problem even more complex, because the database must decide which copy of the data to access. The DDBMS uses query optimization techniques to deal with such problems and to ensure acceptable database performance.

You May Also Like:

Evoluation of DDBMS
Distributed Processing and Distributed Databases
Advantages and Disadvantages of DDBMS
Components of DDBMS

The objective of a query optimization routine is to minimize the total cost associated with the execution of a request.

The costs associated with a request are a function of the:

• Access time (I/O) cost involved in accessing the physical data stored on disk.

• Communication cost associated with the transmission of data among nodes in distributed database systems.

• CPU time cost associated with the processing overhead of managing distributed transactions.

To evaluate query optimization, keep in mind that the TP must receive data from the DP, synchronize it, assemble the answer, and present it to the end user or an application. Although that process is standard, you should consider that a particular query may be executed at any one of several different sites.

One of the most important characteristics of query optimization in distributed database systems is that it must provide distribution transparency as well as replica transparency. Replica transparency refers to the DDBMS’s ability to hide the existence of multiple copies of data from the user.

Most of the algorithms proposed for query optimization are based on two principles:

• The selection of the optimum execution order.

• The selection of sites to be accessed to minimize communication costs.

Within those two principles, a query optimization algorithm can be evaluated on the basis of its operation mode or the timing of its optimization.

Operation modes can be classified as manual or automatic. Automatic query optimization means that the DDBMS finds the most cost-effective access path without user intervention. Manual query optimization requires that the optimization be selected and scheduled by the end user or programmer.

Query optimization algorithms can also be classified according to when the optimization is done. Within this timing classification, query optimization algorithms can be classified as static or dynamic.

• Static query optimization takes place at compilation time. In other words, the best optimization strategy is selected when the query is compiled by the DBMS. This approach is common when SQL statements are embedded in procedural programming languages such as C# or Visual Basic .NET. When the program is submitted to the DBMS for compilation, it creates the plan necessary to access the database. When the program is executed, the DBMS uses that plan to access the database.

• Dynamic query optimization takes place at execution time. Database access strategy is defined when the program is executed. Therefore, access strategy is dynamically determined by the DBMS at run time, using the most up-to-date information about the database. Although dynamic query optimization is efficient, its cost is measured by run-time processing overhead. The best strategy is determined every time the query is executed; this could happen several times in the same program.

Finally, query optimization techniques can be classified according to the type of information that is used to optimize the query. For example, queries may be based on statistically based or rule-based algorithms.

• A statistically based query optimization algorithm uses statistical information about the database. The statistics provide information about database characteristics such as size, number of records, average access time, number of requests serviced, and number of users with access rights. These statistics are then used by the DBMS to determine the best access strategy.

• The statistical information is managed by the DDBMS and is generated in one of two different modes: dynamic or manual. In the dynamic statistical generation mode, the DDBMS automatically evaluates and updates the statistics after each access. In the manual statistical generation mode, the statistics must be updated periodically through a user-selected utility such as IBM’s RUNSTAT command used by DB2 DBMSs.

• A rule-based query optimization algorithm is based on a set of user-defined rules to determine the best query access strategy. The rules are entered by the end user or database administrator, and they are typically very general in nature.

You May Also Like:

Client/Server Vs. DDBMS

Distributed Database Design Concepts

Different Types of Distribution Transparency

Transaction Transparency

Back to DBMS Questions