Pages

Thursday, February 11, 2010

Database Models

Post-relational database models

Products offering a more general data model than the relational model are sometimes classified[by whom?] as post-relational[dubious ]. The data model in such products incorporates relations but is not constrained by the Information Principle[clarification needed], which requires the representation of all information by data values in relation to it.[original research?]

Some of these extensions to the relational model actually integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes.

Some products implementing such models do so by extending relational database systems with non-relational features. Others, however, have arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational in their current architecture.

Object database models

In recent years, the object-oriented paradigm has been applied[by whom?] to database technology, creating various kinds of new programming models known as object databases. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried[by whom?] for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Storage structures

Databases may store relational tables/indexes in memory or on hard disk in one of many forms:

These have various advantages and disadvantages - discussed further in the articles on each topic. The most commonly used[citation needed] are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory-mapped files to make the native language (C++, Java etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others break the objects down into fixed- and varying-length components that are then clustered tightly together in fixed sized blocks on disk and reassembled into the appropriate format either for the client or in the client address space. Another popular technique involves storing the objects in tuples (much like a relational database) which the database server then reassembles for the client.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. Memory management and storage topology can be important design choices for database designers as well. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce database transactions. Ideally, the database software should enforce the ACID rules, summarized here:

  • Atomicity: Either all the tasks in a transaction must happen, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot leave the data in a contradictory state.
  • Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction must remain invisible to other transactions.
  • Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes.

In practice, many DBMSs allow the selective relaxation of most of these rules — for better performance.

Concurrency control ensures that transactions execute in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

No comments:

Post a Comment