Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html A brief overview of cache replacement algorithms Cache replacement algorithms are what defines which item caches evict when they are full. They are critical for performance as they are drives up or down the hit ratio of the cache. A good cache replacement algorithm will keep the most used items and get rid of the unused ones. Belady's Algorithm From Hungarian computer scientist László Bélády. Also called the "optimal" algorithm, it is a clairvoyant algorithm that evicts the item that will be the less needed in the future. This algorithm is not implementable because it is impossible for computers to predict the future. It is mainly used as a benchmark algorithm, used to compare how well you algorithm against a perfect algorithm. FIFO (First In First Out) Just a simple fixed size queue used as a cache. When the cache is full, the item that was inserted the longest time ago is evicted (just dequeue the underlying queue). An advantage is that it's very easy to implement, especially if you already have a queue library. FILO (First In Last Out) Just a simple fixed size stack used as a cache. When the cache is full, the item that was inserted the most recently is evicted (just pop the underlying stack). An advantage is that it's very easy to implement, especially if you already have a stack library. LRU (Least Recently Used) The item that has not been accessed for the longest time is evicted. There are also multiple variations of the principle of this algorithm such as 2Q, LRU/k and MQ. The algorithm has the downside that it is complicated because you need to find a way to efficiently track when items where accessed which will be O(n) (in order to determine the least recently used item). 1 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html Also LRU is very bad a cyclical access pattern. Imagine your cache has a size of 3 and an access pattern of 1234 1234; after the first 3 elements, the cache will always be evicting an element that it will need right after, ending up with a hit rate of 0. LRU is best for applications with high access locality (that would access the same data often). PLRU (Pseudo LRU) The computational cost of LRU is high because you have to keep track of the acceses of each item. PLRU solves this implementing a sort of binary tree which will give in O(log n) time on of the least recently used items. This algorithm isn't as isn't perfect as it doesn't always evict the best victim, but it has a lower complexity than LRU. LFU (Least Frequently Used) Counts how many times each item is used, when cache is full evict the least used item. Be careful, while this might seem like the best algorithm, it could evict the currently least used item which might be needed right after its eviction. LFUDA (LFU Dynamic Aging) The issue why LFU is that an item that has been used a lot in the past will have a use count, therefore will be hard to evict from the cache. If that item is never used again it will rot in the cache. LFUDA brings down the use count of items as time passes and they are not accessed, making items that were highly accessed in the past and not anymore be evicted. SLRU (Segment LRU) The cache is split into two segments, the prohibited segment and the protected segment. Items enter the cache through the prohibited segment and upon cache hit are moved into the protected segment. Victims are chosen using an LRU algorithm from the prohibited segment and it is empty, then victims are chosen from the protected segment. The advantage over LRU is that SLRU better protects more used items (because they are moved to the protected segment after having been accessed twice; once to get in the cache, twice to be promoted to the protected segment). TLRU (Time aware Least Recently Used) TLRU was invented by Muhammad Bilal and Shin-Gak Kang in 2014 as a caching algorithm optimized for content distribution. This algorithm introduces the notion of TTU (Time to Use), this represents the lifetime of the item. It is 2 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html assigned to all the items from the start, usually . The cache then calculates a cache-local TTU using the item's TTU and a function predefined in the cache. Using this TTU it can choose which victim to evict from the cache or if it should even store this item in the cache. The idea is that an item with a lower TTU will be invalidated sooner, therefore you might as well get rid of it. There are multiple advantages, first you can give a TTU of 0 to an item you never want to see in a cache. Also the function predefined in the cache could be anything, so for example you can make larger items have a lower/higher TTU so that they are more/less kept in the cache. MRU (Most Recently Used) Evict the item that was the most recently used, consider that the items that will be needed more are the oldest ones. This algorithm is very good when you have cyclic access patterns, as you are most likely to encounter again the items you saw the longest time ago (when restarting the cycle). RR (Random Replacement) Evict a random victim from the cache. It costs very little to implement and for this reason, it has been used in ARM processors. LFRU (LFU + LRU) This algorithm is close to SLRU but uses LRU for protected segment eviction and LFU for prohibited segment eviction. LIRS (Low Inter-reference Recency Set) Clock This algorithm was invented in 1969 by Corbató, Fernando J. for Multics. Clock is an approximation of LRU with a much lower cost. It can be represented using a circular buffer (the clock), a pointer inside the circular buffer (the hand of the clock) and a reference bits on the elements inside the circular buffer. Using the diagrams below, lets see how it works (a picture is worth a thousand words, therefore a few pictures are worth a few thousand words at least). Imagine our cache's circular buffer is of size 3; if there is an asterisk it means that the element's reference bit is set. 3 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html We have the following access pattern: 0 4 1 4 2 4 3 4 2 4 0 4 1 4 2 4 3 4 041424342404142434 Place 0 where the hand is and advance hand 41424342404142434 Place 4 where the hand is and advance hand 1424342404142434 Place 1 where the hand is and advance hand 424342404142434 4 is already in the cache, just set its reference bit 24342404142434 2 is not in the cache, replace the element the hand points to with 2 and advance the hand 4 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html 4342404142434 4 is already in the cache and its reference bit is already set, do nothing 342404142434 3 is not in the cache, 4 has its reference bit so it is spared but its reference bit is cleared, advance hand, 1 is replaced with 3 42404142434 4 is already in the cache, just set its reference bit 2404142434 2 is already in the cache, just set its reference bit 404142434 4 is already in the cache and its reference bit is already set, do nothing 04142434 2 has its reference bit set, clear it and advance hand, 4 has its reference bit set, clear it and advance hand replace 3 with 0 and advance hand 5 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html 4142434 4 is already in the cache, just set its reference bit 142434 replace 2 with 1 and advance hand 42434 4 is already in the cache with its reference bit set, do nothing 2434 4 has its reference bit set, clear it and advance hand replace 0 with 2 and advance hand 434 4 is already in the cache, just set its reference bit 6 of 8 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html 34 replace 1 with 3 and advance hand 4 4 is already in the cache with its reference bit set, do nothing Pseudocode for Clock: Let data be the data currently accessed. while 1: if data is already in the cache: set its reference bit break if hand points to slot with reference bit cleared or slot is empty: set slot pointed by hand to data advance hand break else: /* hand points to slot with reference bit set */ clear reference of slot pointed by hand advance hand Clock has the same weakpoint as LRU where a perfectly cyclic access pattern such as 1234 1234 1234 could end up with a 0% hit ratio. But with slightly cyclic access patterns such as 1 2 3 4 1 3 4 1 2 3 4 1, we get a hit ratio of 25% with LRU and 33.3% with Clock. Clock Pro 7 of 8 MQ (Multi-Queue) 1/22/21, 12:35 AM Rich Text Editor, editor1 https://erp5.nexedi.net/ckeditor.gadget.html MQ was invented by Zhou, Philbin, and Li to improve second level caches. MQ uses multiple LRU queues organized hierarchically. All items start in the lowest queue (i=0) and they move to the above queue (i+1) once they reach 2^i accesses. When items are evicted, their reference along with their access count is stored in the Qout queue. When the Qout queue is full, it evict references LRU style. Bibliography: Cache Replacement Policies - https://en.wikipedia.org/wiki/Cache_replacement_policies TLRU (2017) - https://arxiv.org/pdf/1801.00390.pdf PLRU Wikipedia - https://en.wikipedia.org/wiki/Pseudo-LRU LIRS Wikipedia - https://en.wikipedia.org/wiki/LIRS_caching_algorithm LIRS Paper - http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/abs02-6.html A page experiment with Multics (Clock) - https://www.multicians.org/paging-experiment.pdf CLOCK-PRO LWN - https://lwn.net/Articles/147879/ Page Replacement LinuxMM - https://linux-mm.org/PageReplacementDesign Clock-Pro Paper - http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf Clock-Pro approximation by LinuxMM - https://linux-mm.org/ClockProApproximation ClockPro Implementation by NetBSD - http://fxr.watson.org/fxr/source/uvm/uvm_pdpolicy_clockpro.c 8 of 8 1/22/21, 12:35 AM
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-