LWLock in PostgreSQL

The last article introduced SpinLock in PostgreSQL. The LWLock that this article will introduce is a lightweight lock (Lightweight Lock) based on SpinLock.

1. What is LWLock？

From the notes of PG 10.5, LWLock mainly provides mutually exclusive access to shared memory variables, such as Clog buffer (transaction commit status cache), Shared buffers (data page cache), Substran buffer (sub-transaction cache) and so on.

The data structure of LWLock is defined as follows:

typedef struct LWLock
{
	uint16		tranche;		/* tranche ID */
	pg_atomic_uint32 state;		/* state of exclusive/nonexclusive lockers */
	proclist_head waiters;		/* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
	pg_atomic_uint32 nwaiters;	/* number of waiters */
	struct PGPROC *owner;		/* last exclusive owner of the lock */
#endif
} LWLock;

From the code point of view, the mutually exclusive access of LWLock relies on the variable pg_atomic_uint32 state, which is an atomic variable implemented inside PG. As mentioned in the previous article, the atomic operation in PG also depends on SpinLock. tranche: equivalent to the id of LWLock, uniquely marking a LWLock, mainly used to find a certain LWLock to observe its status. waiters: Record the process numbers waiting to obtain LWLog. nwaiters: The number of processes waiting, related to debugging. owner: The process of obtaining LWLock last time, related to debugging.

We know that the application scenario of the lock is multi-concurrent control access, that is, different processes/threads will concurrently access the lock. The contemporary CPU architecture basically uses a multi-level architecture for memory access: L1 Cache -> L2 Cache -> L3 Cache -> Memory -> Disk. Among them, access to L1 Cache is the fastest, followed by L2 Cache (data is no longer in L1, go to L2 to find, and so on), and the following storage access speeds are in turn. For more information about CPU cache, readers can read this wiki. In addition, different CPUs have their own local L1 Cache and L2 Cache. Therefore, when a thread on a CPU updates the data on the L1 Cache, it needs to communicate between the CPUs to update the Cache on other CPUs. Otherwise, "dirty reads" will appear.

Therefore, considering the memory access architecture of contemporary computers, PG has optimized the data structure of LWLock, as shown in the following code:

#define LWLOCK_PADDED_SIZE	PG_CACHE_LINE_SIZE  // 128
#define LWLOCK_MINIMAL_SIZE (sizeof(LWLock) <= 32 ? 32 : 64)

/* LWLock, padded to a full cache line size */
typedef union LWLockPadded
{
	LWLock		lock;
	char		pad[LWLOCK_PADDED_SIZE];
} LWLockPadded;

/* LWLock, minimally padded */
typedef union LWLockMinimallyPadded
{
	LWLock		lock;
	char		pad[LWLOCK_MINIMAL_SIZE];
} LWLockMinimallyPadded;

In fact, there is no high-level optimization ==, that is, padding is done when LWLock is allocated. Among them, LWLockPadded ensures that the allocation size of LWLock occupies one or more cache lines to prevent false sharing and affect performance. In most cases now in PG, when LWLock is allocated, the data type LWLockPadded is used. LWLockMinimallyPadded (currently, only shared buffers will use this type of LWLock) to ensure that LWLock allocation will not cross cache lines, but will only be on one cache line, because if two cache lines are crossed, it will be read/updated. Two cache lines are required, which affects performance.

False sharing is the thread t1 and t2 on two different CPUs. Among them, t1 changes a data x on a cache line in the local cache, and t2 changes another data y on the same local cache line. And when t1 needs to access another data z (or x) on this cache line, according to the logic of the current CPU Cache failure, it needs to reload the data of this cache line from the shared memory (this process Is very slow), but in fact the z on the cache line has not been modified by any other thread.
Why ensure that LWLock does not cross the cache line of 32 or 64? Because the actual size of LWLock is about 16 bytes, and PG considers the size of the cache line to be 128 bytes, the actual cache line size is also 32 bytes/64 bytes/128 bytes, so if there is no padding, follow it The size is aligned and may cross cache lines. And allocation according to 32 or 64 byte alignment can be guaranteed within a cache line.

2. Some details

In PG, LWLock is divided into modules (called named tranches). Each module has its own independent LWLock (one or more). To access the shared variables of a module, use the LWLock of the corresponding module. The advantages of this are: 1. It can reduce the conflict of locking. Each module uses its own lock to access the shared variables of the module; 2. It is convenient to track and debug the locked state, so that it is convenient for development and users to observe which module is locked. More serious, so as to optimize the code logic or business logic.

View lock waiting commands: SELECT pid, wait_event_type, wait_event FROM pg_stat_activity; When the wait_event_type is WAIT_LWLOCK_NAMED and WAIT_LWLOCK_TRANCHE, it is the lock type waiting of LWLock. Among them, WAIT_LWLOCK_NAMED is The backend is waiting for a specific named lightweight lock. Each such lock protects a particular data structure in shared memory.wait_eventwill contain the name of the lightweight lock., an ID will only have 1 LWLock, a total of 45, such as ShmemIndexLock, OidGenLock, etc. So in the LWLock of the following submodule, the start ID starts from NUM_INDIVIDUAL_LWLOCKS (46). LWLockTranche: The backend is waiting for one of a group of related lightweight locks. All locks in the group perform a similar function; wait_eventwill identify the general purpose of locks in that group. There are usually multiple locks corresponding to this ID.
Currently, the locking sub-modules of LWLock include:

/*
 * Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
 * we reserve additional tranche IDs for builtin tranches not included in
 * the set of individual LWLocks.  A call to LWLockNewTrancheId will never
 * return a value less than LWTRANCHE_FIRST_USER_DEFINED.
 */
typedef enum BuiltinTrancheIds
{
	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
	LWTRANCHE_COMMITTS_BUFFERS,
	LWTRANCHE_SUBTRANS_BUFFERS,
	LWTRANCHE_MXACTOFFSET_BUFFERS,
	LWTRANCHE_MXACTMEMBER_BUFFERS,
	LWTRANCHE_ASYNC_BUFFERS,
	LWTRANCHE_OLDSERXID_BUFFERS,
	LWTRANCHE_WAL_INSERT,
	LWTRANCHE_BUFFER_CONTENT,
	LWTRANCHE_BUFFER_IO_IN_PROGRESS,
	LWTRANCHE_REPLICATION_ORIGIN,
	LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
	LWTRANCHE_PROC,
	LWTRANCHE_BUFFER_MAPPING,
	LWTRANCHE_LOCK_MANAGER,
	LWTRANCHE_PREDICATE_LOCK_MANAGER,
	LWTRANCHE_PARALLEL_QUERY_DSA,
	LWTRANCHE_TBM,
	LWTRANCHE_FIRST_USER_DEFINED
}			BuiltinTrancheIds;

2.1 Initialization of LWLock

When PG initializes shared memory and semaphore, it will initialize LWLock array (CreateLWLocks). The specific things to do are: 1. Calculate the memory space of shared memory that LWLock needs to occupy: Calculate the number of fixed and each sub-module (requested named tranches) LWLock (fixed when the system is initialized, the LWLock that needs to be allocated is: buffer_mapping, lock_manager, predicate_lock_manager, parallel_query_dsa, tbm), the size of each LWLock (LWLOCK_PADDED_SIZE+counter, couter is a lock calculator, used to record how many share locks are added), and the information occupied by the submodule. 2. Allocate memory space and align with the cache line. 3. Call LWLockInitialize for each LWLock in turn to initialize, and set the state of LWLock to LW_FLAG_RELEASE_OK. 4. Use the LWLockRegisterTranche function to register all the sub-modules that initialized LWLock, including the pre-defined system (BuiltinTrancheIds) and user-defined ones.

User-defined LWLock needs to use the RequestNamedLWLockTranche function. In the current PG10.5 code, the LWLock of the pg_stat_statements module is customized. This module is used to monitor the statistics of SQL execution. Custom LWLock is generally used for PG extension. RequestNamedLWLockTranche needs to be called in the _PG_init function. Otherwise, once the shared memory is allocated (the LWLock phase above must also be executed), then the custom lock will not be allocated.

/*
 * LWLockInitialize - initialize a new lwlock; it's initially unlocked
 */
void
LWLockInitialize(LWLock *lock, int tranche_id)
{
	pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
#ifdef LOCK_DEBUG
	pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
	lock->tranche = tranche_id;
	proclist_init(&lock->waiters);
}

/*
 * Register a tranche ID in the lookup table for the current process.  This
 * routine will save a pointer to the tranche name passed as an argument,
 * so the name should be allocated in a backend-lifetime context
 * (TopMemoryContext, static variable, or similar).
 */
void
LWLockRegisterTranche(int tranche_id, char *tranche_name)
{
	Assert(LWLockTrancheArray != NULL);

	if (tranche_id >= LWLockTranchesAllocated)
	{
		int			i = LWLockTranchesAllocated;
		int			j = LWLockTranchesAllocated;

		while (i <= tranche_id)
			i *= 2;

		LWLockTrancheArray = (char **)
			repalloc(LWLockTrancheArray, i * sizeof(char *));
		LWLockTranchesAllocated = i;
		while (j < LWLockTranchesAllocated)
			LWLockTrancheArray[j++] = NULL;
	}

	LWLockTrancheArray[tranche_id] = tranche_name;
}

/*
 * RequestNamedLWLockTranche
 *		Request that extra LWLocks be allocated during postmaster
 *		startup.
 *
 * This is only useful for extensions if called from the _PG_init hook
 * of a library that is loaded into the postmaster via
 * shared_preload_libraries.  Once shared memory has been allocated, calls
 * will be ignored.  (We could raise an error, but it seems better to make
 * it a no-op, so that libraries containing such calls can be reloaded if
 * needed.)
 */
void
RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
{
	NamedLWLockTrancheRequest *request;

	if (IsUnderPostmaster || !lock_named_request_allowed)
		return;					/* too late */

	if (NamedLWLockTrancheRequestArray == NULL)
	{
		NamedLWLockTrancheRequestsAllocated = 16;
		NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
			MemoryContextAlloc(TopMemoryContext,
							   NamedLWLockTrancheRequestsAllocated
							   * sizeof(NamedLWLockTrancheRequest));
	}

	if (NamedLWLockTrancheRequests >= NamedLWLockTrancheRequestsAllocated)
	{
		int			i = NamedLWLockTrancheRequestsAllocated;

		while (i <= NamedLWLockTrancheRequests)
			i *= 2;

		NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
			repalloc(NamedLWLockTrancheRequestArray,
					 i * sizeof(NamedLWLockTrancheRequest));
		NamedLWLockTrancheRequestsAllocated = i;
	}

	request = &NamedLWLockTrancheRequestArray[NamedLWLockTrancheRequests];
	Assert(strlen(tranche_name) + 1 < NAMEDATALEN);
	StrNCpy(request->tranche_name, tranche_name, NAMEDATALEN);
	request->num_lwlocks = num_lwlocks;
	NamedLWLockTrancheRequests++;
}

2.2 Use of LWLock

The use of LWLock is the same as other locks, mainly divided into three behaviors: lock, lock and wait.

Locking: Use LWLockAcquire(LWLock *lock, LWLockMode mode) to lock, where the mode can be LW_SHARED (shared) and LW_EXCLUSIVE (exclusive). When locking, first put the lock that needs to be added into the waiting queue, and then judge whether the lock can be successfully locked through the state state in LWLock. If the lock can be successfully locked, use the atomic operation campare and set to modify the state of LWLock, and remove the lock from Waiting to be deleted from the queue. Otherwise, you need to wait for the lock. You can also use LWLockConditionalAcquire(LWLock *lock, LWLockMode mode) to acquire the lock. The difference from LWLockAcquire is that if you can't get it and return directly, it won't sleep and wait. LWLockAcquireOrWait function, if the lock is unsuccessful, it will wait forever, but if the lock becomes free, it will not be locked but will return directly; currently this function is used in WALWriteLock, when a backend needs to flush WAL, it will Add WALWriteLock, and then the WAL generated by other backends will be flushed incidentally. Therefore, other backends waiting for the lock to flush WAL do not actually need to flush WAL.
Release lock: Use LWLockRelease(LWLock *lock) to release a given lock. Find the given lock from the LWLockRelease(LWLock *lock)LWLock acquired by the current proc, and determine whether there is another process waiting for the lock. If so, call LWLockWakeup to wake up the backend waiting for the lock.
Wait lock: The wait lock logic is also in the LWLockAcquire function. When the lock is not added, it will wait for a semaphore proc->sem (it will sleep at this time and will not consume the CPU), because the semaphore regular lock manager and ProcWaitForSignal will be used, when the semaphore is acquired at this time , Is not necessarily issued by LWLockRelease, so if it is not issued by LWLockRelease, you need to wait for the semaphore, if it is, then re-enable the above lock.
In addition to LW_SHARED and LW_EXCLUSIVE, the lock mode also has a mode of LW_WAIT_UNTIL_FREE. This mode is only available through LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
to use, the scenario is: if the lock is held by other backends (the mode needs to be LW_EXCLUSIVE. If the LW_SHARED mode will not block, return directly), then wait for the corresponding backend to release the lock , Or the backend holding the lock updated valptr through the LWLockUpdateVar function (the lock will not be released) (if valptr is updated, this value will be assigned to newval). In PG10.5, LWLockWaitForVar is only used in XLog insertion: when flushing WAL to disk, you need to call WaitXLogInsertionsToFinish to flush the disk at a certain point. At this time, you have to wait for the current insertion that may be in progress to complete. Therefore, Here, LWLockWaitForVar will be called. If the lock is in the free state and LWLockWaitForVar returns true, it means that there is no waiting for the lock or no insert is in progress, so the position of the insert has not changed. Just take out the position written by the last Xlog. If the lock is in the exclusive state, indicating that a backend is being inserted, LWLockWaitForVar will or obtain the new position after the xlog is inserted, and then return.

3. Comparison of LWLock and SpinLock

As can be seen from the above introduction, LWLock uses atomic operations (based on SpinLock implementation) for mutual exclusive access, so LWLock in PG is implemented based on SpinLock.

LWLock provides two modes: share and exclusive, while SpinLock has only one mode, that is exclusive. Obviously, if a variable needs to be accessed frequently in share mode, then using LWLock is a better choice.
LWLock is wait-free, which means that LWLock basically does not consume CPU resources when it needs to wait for the lock. For this reason, LWLock implements a waiting queue that can reduce atomic operations to determine the state , Thereby reducing the competition overhead during atomic operations. SpinLock consumes a lot of CPU resources while waiting for the lock state.
Based on the second point, LWLock can be applied to shared variables that will be locked for a long time, while SpinLock lock operation variables must be very short, otherwise it will cause a lot of overhead .

Reference

https://www.postgresql.org/docs/9.6/monitoring-stats.html

Transfer from

LWLock in PostgreSQL

Intelligent Recommendation

<Meta name = "viewport" ... ＞ What does it mean

1. Name attribute value ViewPort represents the view port, which not only matches the PC end, but also can match the mobile terminal. 2. Content attribute value: (1) Width: The width of the visible ar...

C language math function ceil(), floor(), round()

C language math function ceil(), floor(), round() joePosted @ April 24, 2010 17:07 in programwith tags C language, 1279 reading usage: Ceil(x) returns the smallest integer value (not converted to doub...

Python-Data Type: List List

Python-Data Type: List List 0 foreword 1 important special new list 1 list of methods for list 2 Create a list 3 List transformation with other data types 4 list unpack 5 list slice 6 list add 7 Add l...

HTML video audio

HTML<video>label The tag defines the video, such as movie clips or other video streams. Currently,<video> Element supports three video formats: MP4, WebM, Ogg. Tips and notes: Tip: You can...

Salesforce Batchable

Foreword Salesforce development language APEX is relatively simple relative to other programming languages, which are not complicated. Based on APEX does not provide a thread management and use, the q...