MySQL 5.6.14 Source Code Document
|
#include <waiting_threads.h>
#include <m_string.h>
Go to the source code of this file.
Classes | |
struct | st_wt_resource |
struct | deadlock_arg |
Macros | |
#define | incr(VAR, LOCK) do { (VAR)++; } while(0) |
Functions | |
void | wt_init () |
void | wt_end () |
void | wt_thd_lazy_init (WT_THD *thd, const ulong *ds, const ulong *ts, const ulong *dl, const ulong *tl) |
void | wt_thd_destroy (WT_THD *thd) |
my_bool | wt_resource_id_memcmp (const void *a, const void *b) |
int | wt_thd_will_wait_for (WT_THD *thd, WT_THD *blocker, const WT_RESOURCE_ID *resid) |
int | wt_thd_cond_timedwait (WT_THD *thd, mysql_mutex_t *mutex) |
void | wt_thd_release (WT_THD *thd, const WT_RESOURCE_ID *resid) |
Variables | |
ulonglong | wt_wait_table [WT_WAIT_STATS] |
uint32 | wt_wait_stats [WT_WAIT_STATS+1] |
uint32 | wt_cycle_stats [2][WT_CYCLE_STATS+1] |
uint32 | wt_success_stats |
"waiting threads" subsystem - a unified interface for threads to wait on each other, with built-in deadlock detection.
Main concepts ^^^^^^^^^^^^^ a thread - is represented by a WT_THD structure. One physical thread can have only one WT_THD descriptor at any given moment.
a resource - a thread does not wait for other threads directly, instead it waits for a "resource", which is "owned" by other threads. It waits, exactly, for all "owners" to "release" a resource. It does not have to correspond to a physical resource. For example, it may be convenient in certain cases to force resource == thread. A resource is represented by a WT_RESOURCE structure.
a resource identifier - a pair of {resource type, value}. A value is an ulonglong number. Represented by a WT_RESOURCE_ID structure.
a resource type - a pointer to a statically defined instance of WT_RESOURCE_TYPE structure. This structure contains a pointer to a function that knows how to compare values of this resource type. In the simple case it could be wt_resource_id_memcmp().
a wait-for graph - a graph, that represenst "wait-for" relationships. It has two types of nodes - threads and resources. There are directed edges from a thread to a resource it is waiting for (WT_THD::waiting_for), from a thread to resources that it "owns" (WT_THD::my_resources), and from a resource to threads that "own" it (WT_RESOURCE::owners)
Graph completeness ^^^^^^^^^^^^^^^^^^
For flawless deadlock detection wait-for graph must be complete. It means that when a thread starts waiting it needs to know all its blockers, and call wt_thd_will_wait_for() for every one of them. Otherwise two phenomena should be expected:
Fuzzy timeouts:
thread A needs to get a lock, and is blocked by a thread B. it waits. Just before the timeout thread B releases the lock. thread A is ready to grab the lock but discovers that it is also blocked by a thread C. It waits and times out.
As a result thread A has waited two timeout intervals, instead of one.
Unreliable cycle detection:
Thread A waits for threads B and C Thread C waits for D Thread D wants to start waiting for A
one can see immediately that thread D creates a cycle, and thus a deadlock is detected.
But if thread A would only wait for B, and start waiting for C when B would unlock, thread D would be allowed to wait, a deadlock would be only detected when B unlocks or somebody times out.
These two phenomena don't affect a correctness, and strictly speaking, the caller is not required to call wt_thd_will_wait_for() for all blockers - it may optimize wt_thd_will_wait_for() calls. But they may be perceived as bugs by users, it must be understood that such an optimization comes with its price.
Usage ^^^^^
First, the wt* subsystem must be initialized by calling wt_init(). In the server you don't need to do it, it's done in mysqld.cc.
Similarly, wt_end() frees wt* structures, should be called at the end, but in the server mysqld.cc takes care of that.
Every WT_THD should be initialized with wt_thd_lazy_init(). After that they can be used in other wt_thd_* calls. Before discarding, WT_THD should be free'd with wt_thd_destroy(). In the server both are handled in sql_class.cc, it's an error to try to do it manually.
To use the deadlock detection one needs to use this thread's WT_THD, call wt_thd_will_wait_for() for every thread it needs to wait on, then call wt_thd_cond_timedwait(). When thread releases a resource it should call wt_thd_release() (or wt_thd_release_all()) - it will notify (send a signal) threads waiting in wt_thd_cond_timedwait(), if appropriate.
Just like with pthread's cond_wait, there could be spurious wake-ups from wt_thd_cond_timedwait(). A caller is expected to handle that (that is, to re-check the blocking criteria).
wt_thd_will_wait_for() and wt_thd_cond_timedwait() return either WT_OK or WT_DEADLOCK. Additionally wt_thd_cond_timedwait() can return WT_TIMEOUT. Out of memory and other fatal errors are reported as WT_DEADLOCK - and a transaction must be aborted just the same.
Configuration ^^^^^^^^^^^^^ There are four config variables. Two deadlock search depths - short and long - and two timeouts. Deadlock search is performed with the short depth on every wt_thd_will_wait_for() call. wt_thd_cond_timedwait() waits with a short timeout, performs a deadlock search with the long depth, and waits with a long timeout. As most deadlock cycles are supposed to be short, most deadlocks will be detected at once, and waits will rarely be necessary.
These config variables are thread-local. Different threads may have different search depth and timeout values.
Also, deadlock detector supports different killing strategies, the victim in a deadlock cycle is selected based on the "weight". See "weight" description in waiting_threads.h for details. It's up to the caller to set weights accordingly.
Status ^^^^^^ We calculate the number of successfull waits (WT_OK returned from wt_thd_cond_timedwait()), a number of timeouts, a deadlock cycle length distribution - number of deadlocks with every length from 1 to WT_CYCLE_STATS, and a wait time distribution - number of waits with a time from 1 us to 1 min in WT_WAIT_STATS intervals on a log e scale.
Sample usage as was done in the Maria engine ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
while (keyinfo->ck_insert(info, (*keyinfo->make_key)(info, &int_key, i, buff, record, filepos, info->trn->trid))) { / * we got a write error * / if error is not "duplicate key" then return error; info->dup_key_trid has the culprit: if the culprit is ourselves then return error; otherwise: blocker= trnman_trid_to_trn(info->trn, info->dup_key_trid); / * if blocker TRN was not found, it means that the conflicting transaction was committed long time ago. It could not be aborted, as it would have to wait on the key tree lock to remove the conflicting key it has inserted. / if (!blocker || blocker->commit_trid != ~(TrID)0) { / * committed * / if (blocker) pthread_mutex_unlock(& blocker->state_lock); rw_unlock(&keyinfo->root_lock); goto err; } / * release root_lock to let blocker finish its work * / rw_unlock(&keyinfo->root_lock); { / * running. now we wait * / WT_RESOURCE_ID rc; int res; const char *old_proc_info; rc.type= &ma_rc_dup_unique; rc.value= (intptr)blocker; res= wt_thd_will_wait_for(info->trn->wt, blocker->wt, & rc); if (res != WT_OK) { pthread_mutex_unlock(& blocker->state_lock); my_errno= HA_ERR_LOCK_DEADLOCK; goto err; } old_proc_info= proc_info_hook(0, "waiting for a resource", __func__, __FILE__, __LINE__); res= wt_thd_cond_timedwait(info->trn->wt, & blocker->state_lock); proc_info_hook(0, old_proc_info, __func__, __FILE__, __LINE__); pthread_mutex_unlock(& blocker->state_lock); if (res != WT_OK) { my_errno= res == WT_TIMEOUT ? HA_ERR_LOCK_WAIT_TIMEOUT : HA_ERR_LOCK_DEADLOCK; goto err; } / * if we come here, blocker has rolled back or committed, so is gone, we can retry the ck_insert() * / } rw_wrlock(&keyinfo->root_lock);#ifndef MARIA_CANNOT_ROLLBACK keyinfo->version++; #endif }
Tests ^^^^^ unittest/mysys/waiting_threads-t.c, currently disabled.
Definition in file waiting_threads.c.
my_bool wt_resource_id_memcmp | ( | const void * | a, |
const void * | b | ||
) |
Trivial resource id comparison function - bytewise memcmp.
It can be used in WT_RESOURCE_TYPE structures where bytewise comparison of values is sufficient.
Definition at line 651 of file waiting_threads.c.
int wt_thd_cond_timedwait | ( | WT_THD * | thd, |
mysql_mutex_t * | mutex | ||
) |
called by a waiter (thd) to start waiting
It's supposed to be a drop-in replacement for pthread_cond_timedwait(), and it takes mutex as an argument.
Definition at line 1139 of file waiting_threads.c.
void wt_thd_lazy_init | ( | WT_THD * | thd, |
const ulong * | ds, | ||
const ulong * | ts, | ||
const ulong * | dl, | ||
const ulong * | tl | ||
) |
Lazy WT_THD initialization
Cheap initialization of WT_THD. Only initialize fields that don't require memory allocations - basically, it only does assignments. The rest of the WT_THD structure will be initialized on demand, on the first use. This allows one to initialize lazily all WT_THD structures, even if some (or even most) of them will never be used for deadlock detection.
ds | a pointer to deadlock search depth short value |
ts | a pointer to deadlock timeout short value |
dl | a pointer to deadlock search depth long value |
tl | a pointer to deadlock timeout long value |
Definition at line 594 of file waiting_threads.c.
void wt_thd_release | ( | WT_THD * | thd, |
const WT_RESOURCE_ID * | resid | ||
) |
called by a blocker when it releases a resource
it's conceptually similar to pthread_cond_broadcast, and must be done under the same mutex as wt_thd_cond_timedwait().
resid | a resource to release. 0 to release all resources |
Definition at line 1213 of file waiting_threads.c.
notify the system that a thread needs to wait for another thread
called by a waiter to declare that it (thd) will wait for another thread (blocker) on a specific resource (resid). can be called many times, if many blockers own a blocking resource. but must always be called with the same resource id - a thread cannot wait for more than one resource at a time.
As a new edge is added to the wait-for graph, a deadlock detection is performed for this new edge.
Definition at line 1011 of file waiting_threads.c.
uint32 wt_cycle_stats[2][WT_CYCLE_STATS+1] |
distribution of cycle lengths first column tells whether this was during short or long detection
Definition at line 294 of file waiting_threads.c.
uint32 wt_wait_stats[WT_WAIT_STATS+1] |
wait time distribution (log e scale)
Definition at line 289 of file waiting_threads.c.
ulonglong wt_wait_table[WT_WAIT_STATS] |
preset table of wait intervals
Definition at line 285 of file waiting_threads.c.