MySQL 5.6.14 Source Code Document
|
For each of these "sent" transactions, there are three possible states:
The poll method invoked (either Ndb::pollNdb() or Ndb::sendPollNdb()) will return when:
The NDB Kernel is the collection of storage nodes belonging to a MySQL Cluster. The application programmer can for most purposes view the set of all storage nodes as a single entity. Each storage node is made up of three main components:
When an application program executes a transaction, it connects to one transaction co-ordinator on one storage node. Usually, the programmer does not need to specify which TC should be used, but in some cases when performance is important, the programmer can provide "hints" to use a certain TC. (If the node with the desired transaction co-ordinator is down, then another TC will automatically take over the work.)
Every storage node has an ACC and a TUP which store the indexes and data portions of the database table fragment. Even though one TC is responsible for the transaction, several ACCs and TUPs on other storage nodes might be involved in the execution of the transaction.
The default method is to select the transaction co-ordinator (TC) determined to be the "closest" storage node, using a heuristic for proximity based on the type of transporter connection. In order of closest to most distant, these are
As noted previously, the application programmer can provide hints to the NDB API as to which transaction co-ordinator it should use. This is done by providing a table and partition key (usually the primary key). By using the primary key as the partition key, the transaction will be placed on the node where the primary replica of that record resides. Note that this is only a hint; the system can be reconfigured at any time, in which case the NDB API will choose a transaction co-ordinator without using the hint. For more information, see NdbDictionary::Column::getPartitionKey() and Ndb::startTransaction(). The application programmer can specify the partition key from SQL by using the construct, CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (attribute-list);
.
The NDB Cluster engine used by MySQL Cluster is a relational database engine storing records in tables just as with any other RDBMS. Table rows represent records as tuples of relational data. When a new table is created, its attribute schema is specified for the table as a whole, and thus each record of the table has the same structure. Again, this is typical of relational databases, and NDB is no different in this regard.
Each record has from 1 up to 32 attributes which belong to the primary key of the table.
Transactions are committed first to main memory, and then to disk after a global checkpoint (GCP) is issued. Since all data is (in most NDB Cluster configurations) synchronously replicated and stored on multiple NDB nodes, the system can still handle processor failures without loss of data. However, in the case of a system failure (e.g. the whole system goes down), then all (committed or not) transactions occurring since the latest GCP are lost.
NDB Cluster uses pessimistic concurrency control based on locking. If a requested lock (implicit and depending on database operation) cannot be attained within a specified time, then a timeout error occurs.
Concurrent transactions as requested by parallel application programs and thread-based applications can sometimes deadlock when they try to access the same information simultaneously. Thus, applications need to be written in a manner so that timeout errors occurring due to such deadlocks are handled gracefully. This generally means that the transaction encountering a timeout should be rolled back and restarted.
Placing the transaction co-ordinator in close proximity to the actual data used in the transaction can in many cases improve performance significantly. This is particularly true for systems using TCP/IP. For example, a Solaris system using a single 500 MHz processor has a cost model for TCP/IP communication which can be represented by the formula
[30 microseconds] + ([100 nanoseconds] * [number of bytes])
This means that if we can ensure that we use "popular" links we increase buffering and thus drastically reduce the communication cost. The same system using SCI has a different cost model:
[5 microseconds] + ([10 nanoseconds] * [number of bytes])
Thus, the efficiency of an SCI system is much less dependent on selection of transaction co-ordinators. Typically, TCP/IP systems spend 30-60% of their working time on communication, whereas for SCI systems this figure is closer to 5-10%. Thus, employing SCI for data transport means that less care from the NDB API programmer is required and greater scalability can be achieved, even for applications using data from many different parts of the database.
A simple example is an application that uses many simple updates where a transaction needs to update one record. This record has a 32 bit primary key, which is also the partition key. Then the keyData will be the address of the integer of the primary key and keyLen will be 4.