158 lines
6.2 KiB
Plaintext
158 lines
6.2 KiB
Plaintext
|
Generic Mutex Subsystem
|
||
|
|
||
|
started by Ingo Molnar <mingo@redhat.com>
|
||
|
updated by Davidlohr Bueso <davidlohr@hp.com>
|
||
|
|
||
|
What are mutexes?
|
||
|
-----------------
|
||
|
|
||
|
In the Linux kernel, mutexes refer to a particular locking primitive
|
||
|
that enforces serialization on shared memory systems, and not only to
|
||
|
the generic term referring to 'mutual exclusion' found in academia
|
||
|
or similar theoretical text books. Mutexes are sleeping locks which
|
||
|
behave similarly to binary semaphores, and were introduced in 2006[1]
|
||
|
as an alternative to these. This new data structure provided a number
|
||
|
of advantages, including simpler interfaces, and at that time smaller
|
||
|
code (see Disadvantages).
|
||
|
|
||
|
[1] http://lwn.net/Articles/164802/
|
||
|
|
||
|
Implementation
|
||
|
--------------
|
||
|
|
||
|
Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
|
||
|
and implemented in kernel/locking/mutex.c. These locks use a three
|
||
|
state atomic counter (->count) to represent the different possible
|
||
|
transitions that can occur during the lifetime of a lock:
|
||
|
|
||
|
1: unlocked
|
||
|
0: locked, no waiters
|
||
|
negative: locked, with potential waiters
|
||
|
|
||
|
In its most basic form it also includes a wait-queue and a spinlock
|
||
|
that serializes access to it. CONFIG_SMP systems can also include
|
||
|
a pointer to the lock task owner (->owner) as well as a spinner MCS
|
||
|
lock (->osq), both described below in (ii).
|
||
|
|
||
|
When acquiring a mutex, there are three possible paths that can be
|
||
|
taken, depending on the state of the lock:
|
||
|
|
||
|
(i) fastpath: tries to atomically acquire the lock by decrementing the
|
||
|
counter. If it was already taken by another task it goes to the next
|
||
|
possible path. This logic is architecture specific. On x86-64, the
|
||
|
locking fastpath is 2 instructions:
|
||
|
|
||
|
0000000000000e10 <mutex_lock>:
|
||
|
e21: f0 ff 0b lock decl (%rbx)
|
||
|
e24: 79 08 jns e2e <mutex_lock+0x1e>
|
||
|
|
||
|
the unlocking fastpath is equally tight:
|
||
|
|
||
|
0000000000000bc0 <mutex_unlock>:
|
||
|
bc8: f0 ff 07 lock incl (%rdi)
|
||
|
bcb: 7f 0a jg bd7 <mutex_unlock+0x17>
|
||
|
|
||
|
|
||
|
(ii) midpath: aka optimistic spinning, tries to spin for acquisition
|
||
|
while the lock owner is running and there are no other tasks ready
|
||
|
to run that have higher priority (need_resched). The rationale is
|
||
|
that if the lock owner is running, it is likely to release the lock
|
||
|
soon. The mutex spinners are queued up using MCS lock so that only
|
||
|
one spinner can compete for the mutex.
|
||
|
|
||
|
The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
|
||
|
with the desirable properties of being fair and with each cpu trying
|
||
|
to acquire the lock spinning on a local variable. It avoids expensive
|
||
|
cacheline bouncing that common test-and-set spinlock implementations
|
||
|
incur. An MCS-like lock is specially tailored for optimistic spinning
|
||
|
for sleeping lock implementation. An important feature of the customized
|
||
|
MCS lock is that it has the extra property that spinners are able to exit
|
||
|
the MCS spinlock queue when they need to reschedule. This further helps
|
||
|
avoid situations where MCS spinners that need to reschedule would continue
|
||
|
waiting to spin on mutex owner, only to go directly to slowpath upon
|
||
|
obtaining the MCS lock.
|
||
|
|
||
|
|
||
|
(iii) slowpath: last resort, if the lock is still unable to be acquired,
|
||
|
the task is added to the wait-queue and sleeps until woken up by the
|
||
|
unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
|
||
|
|
||
|
While formally kernel mutexes are sleepable locks, it is path (ii) that
|
||
|
makes them more practically a hybrid type. By simply not interrupting a
|
||
|
task and busy-waiting for a few cycles instead of immediately sleeping,
|
||
|
the performance of this lock has been seen to significantly improve a
|
||
|
number of workloads. Note that this technique is also used for rw-semaphores.
|
||
|
|
||
|
Semantics
|
||
|
---------
|
||
|
|
||
|
The mutex subsystem checks and enforces the following rules:
|
||
|
|
||
|
- Only one task can hold the mutex at a time.
|
||
|
- Only the owner can unlock the mutex.
|
||
|
- Multiple unlocks are not permitted.
|
||
|
- Recursive locking/unlocking is not permitted.
|
||
|
- A mutex must only be initialized via the API (see below).
|
||
|
- A task may not exit with a mutex held.
|
||
|
- Memory areas where held locks reside must not be freed.
|
||
|
- Held mutexes must not be reinitialized.
|
||
|
- Mutexes may not be used in hardware or software interrupt
|
||
|
contexts such as tasklets and timers.
|
||
|
|
||
|
These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
|
||
|
In addition, the mutex debugging code also implements a number of other
|
||
|
features that make lock debugging easier and faster:
|
||
|
|
||
|
- Uses symbolic names of mutexes, whenever they are printed
|
||
|
in debug output.
|
||
|
- Point-of-acquire tracking, symbolic lookup of function names,
|
||
|
list of all locks held in the system, printout of them.
|
||
|
- Owner tracking.
|
||
|
- Detects self-recursing locks and prints out all relevant info.
|
||
|
- Detects multi-task circular deadlocks and prints out all affected
|
||
|
locks and tasks (and only those tasks).
|
||
|
|
||
|
|
||
|
Interfaces
|
||
|
----------
|
||
|
Statically define the mutex:
|
||
|
DEFINE_MUTEX(name);
|
||
|
|
||
|
Dynamically initialize the mutex:
|
||
|
mutex_init(mutex);
|
||
|
|
||
|
Acquire the mutex, uninterruptible:
|
||
|
void mutex_lock(struct mutex *lock);
|
||
|
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
|
||
|
int mutex_trylock(struct mutex *lock);
|
||
|
|
||
|
Acquire the mutex, interruptible:
|
||
|
int mutex_lock_interruptible_nested(struct mutex *lock,
|
||
|
unsigned int subclass);
|
||
|
int mutex_lock_interruptible(struct mutex *lock);
|
||
|
|
||
|
Acquire the mutex, interruptible, if dec to 0:
|
||
|
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
|
||
|
|
||
|
Unlock the mutex:
|
||
|
void mutex_unlock(struct mutex *lock);
|
||
|
|
||
|
Test if the mutex is taken:
|
||
|
int mutex_is_locked(struct mutex *lock);
|
||
|
|
||
|
Disadvantages
|
||
|
-------------
|
||
|
|
||
|
Unlike its original design and purpose, 'struct mutex' is larger than
|
||
|
most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice
|
||
|
as large as 'struct semaphore' (24 bytes) and tied, along with rwsems,
|
||
|
for the largest lock in the kernel. Larger structure sizes mean more
|
||
|
CPU cache and memory footprint.
|
||
|
|
||
|
When to use mutexes
|
||
|
-------------------
|
||
|
|
||
|
Unless the strict semantics of mutexes are unsuitable and/or the critical
|
||
|
region prevents the lock from being shared, always prefer them to any other
|
||
|
locking primitive.
|