Difference between speed of processor and memory is increasing with the advent of new technology. Chip Multi Processors (CMP) has further increased the pressure on the memory hierarchy. So it has become important to manage on chip memory very judiciously to reduce average memory access time. Inclusion property is almost always implemented in present cache hierarchies. One implication of inclusion is that higher level cache1 is always a subset of lower level cache. This replication of data in cache hierarchy reduces the effective cache space available. This paper proposes a non inclusive cache implementation where any data item can be present only at one level in cache hierarchy. So, aggregate cache space available will be more and more data can be present on chip, hence reducing of chip communication. In CMPs multiple cores may need to share same block of data, in this case block may be replicated between cache hierarchies of multiple cores but a block is never replicated in cache hierarchy of a single core. Advantage of implementing inclusion property is ease of maintaining coherence, since coherence has to be maintained only at the last level cache. This paper proposes a coherence protocol which is slight modification of existing protocols, to maintain coherence in non inclusive caches.
Keywords: Cache memory, Inclusion property, Chip Multiprocessing, Multi-level cache hierarchy, Cache Coherence
[...] Hardware implementation of requires one saturating counter per block in entire cache hierarchy which is a significant hardware requirement, this paper proposes two alternative to this approach with less hardware requirement. This paper also proposes a coherence protocol to maintain coherence in non-inclusive caches. Fig Data cache miss rate of L2 cache for different SPEC programs for inclusive and noninclusive (Basic, AP,CS) caches L Hsu proposed to organized last level of cache as shared cache to optimize on chip cache utilization. [...]
[...] Proposed architecture gets rid of this redundancy of data by placing data only at one level in cache hierarchy and this type of architecture is called non inclusive architecture. In inclusive cache architecture, whenever a cache miss occurs new block of data is directly brought to the highest cache level. Data block just brought in may or may not be used in future as frequently as the block which is replaced by this incoming data block. So it would be beneficial, not to bring a new block directly to the highest cache level on its very first reference because it may not be reference again in future. [...]
[...] Franklin, “NonInclusion property in Multi-level Caches Revisited”, International Journal of Computers and Their Applications, June 2007. J. Chang and G. S. Sohi. “Cooperative caching for chip multiprocessors”. Proc. of the International Symposium on Computer Architecture, Boston M. Zhang and K. Asanovic. “Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors”, Proc. of the 32nd Annual International Symposium on Computer Architecture, Madison B. M. Beckmann and D. A. Wood. Adaptive Selective Replication for CMP Caches”, Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando L. [...]
[...] With the use of non inclusive cache total cache space available in a uniprocessor will be A+B+C KB. In CMPs, a block may need to be shared by several cores so it has to be replicated between the caches hierarchies of multiple cores. So the total cache space available in a CMP with N cores will be KB when no block is shared by cores and A+B+C KB when all the cores have same data in their L1 & L2 caches. [...]
[...] If requested block is in shared mode and present in private cache hierarchies of several processors, then L3 directory will fetch requested block from one of sharers and transfer it to requesting processor. Requesting processor will be included in the list of sharers and block will be marked as shared in the L2 directory of Fig Proposed Coherence Protocol flow chart. directories of these processors. L2 directory checks at which level the block to be invalidated is present and then invalidate the block. [...]
using our reader.