Microprocessor, latency, cache coherence, bandwidth, multiprocessor, cache coherence protocol, shared memory, multicore processor
Many multiprocessor chips and computer systems today have hardware that supports shared-memory. This is because shared-memory multicore chips are considered a cost-effective way of providing increased and improved computing speed and power since they utilize economically interconnected low-cost microprocessors. Shared-memory multiprocessors utilize cache to reduce memory access latency and significantly reduce bandwidth needs for the global interconnect and local memory module. However, a problem still exists in these systems: cache coherence problem introduced by local caching of data, which leads to reduced processor execution speeds. The problem of cache coherence in hardware is reduced in today's microprocessors through the implementation of various cache coherence protocols. This article reviews literature on cache coherence with particular attention to cache coherence problem, and the protocols-both hardware and software that have been proposed to solve it. Most importantly, it identifies a specific problem associated with cache coherence and proposes a novel solution.
[...] Fig illustrates a cache coherence problem. Initially, the memory has location x marked with value and both processors 1 and 0 are reading into their caches the location x. Processor 1 cache will contain a value 0 for the location x in the event that the location x is written into the value 1 by processor 0. If processor 1 continues to subsequently read location then the cached, stale value 0 will continuously be returned. Normally, this is not what a programmer expects since the anticipated behavior is that, for any read, a processor needs to return the most updated copy of the data. [...]
[...] In the event that a cache discovers a read request on the bus, it checks to find out whether it is the most up-to-date copy of the datum; if so, it responds to the bus request. In the event that the cache finds out that there is a write on the bus, if a line is present, it is invalidated out of the cache. Building snoopy bus-based systems are very easy (Ferdman et al., 2012). However, an increase in the number of processors on a bus results in a bandwidth bottleneck, which, in turn, makes dependence on broadcast techniques a scalability nightmare. Snoopy protocols are commonly used in commercial multicore processors. [...]
[...] The first part describes cache coherence and analyzes the cache coherence problem. Part two reviews literature on cache coherence protocol and describes various protocols. The third part compares and contrast the protocols discussed in part three with regard to their advantages and disadvantages. Additionally, a specific cache coherence problem is identified and a suitable solution is proposed. A conclusion is then offered, as a summary of the whole subject of cache coherence with recommendation of which cache coherence protocol based on trade-offs is ideal. [...]
[...] The values in each cache will be different because of a subsequent store. Therefore, in the event that p1 stores to a memory block that is present in both caches, p1 and p2, then the cache in p2 holds a stale value since p1 had by default storing in its own cache. However, if p2 never loads to the memory block again, or the microprocessor did not support shared memory, cache incoherence would not be problematic. However, multiprocessor memory now support shared-memory, and as such, there is a point in which p2 must receive the value p1 had stored. [...]
[...] When snoopy-based cache coherence protocol is used, the speed of shared buses and cache coherence overhead limits the required bandwidth for broadcasting messages. On the other hand, when a directory based cache coherence protocol is utilized, the extra interconnect traversal and the directory access lies on the main path of cache-to-cache misses and the directory state is manipulated. These pertinent issues or problems cannot be addressed by either of the cache coherence protocols. As such, on the basis of the review offered herein, it would be ideal if a hybrid-compatible cache coherence protocol is designed. [...]
using our reader.