A single-core cache hierarchy simulator written in python.
The goal is to accurately simulate the caching (allocation/hit/miss/replace/evict) behavior of all cache levels found in modern processors. It is developed as a backend to kerncraft, but is also planned to introduce a command line interface to replay LOAD/STORE instructions.
- Currently supported features:
- Inclusive cache hierarchies
- LRU, MRU, RR and FIFO policies
- N-way cache associativity
- Write-allocate with write-back caches
- Non-write-allocate with write-through caches
- Write-combining with sub-blocking
- Tracking of cacheline states (e.g., using dirty bits)
- Speed (core is implemented in C)
- Python 2.7+ and 3.4+ support, with no other dependencies
- Planned features:
- Report cachelines on all levels (preliminary support through
backend.verbosity > 0
) - Report timeline of cache events (preliminary support through
backend.verbosity > 0
) - Visualize events (html file?)
- Interface to Valgrind Infrastructure (see Lackey) for access history replay.
- (uncertain) instruction cache
- Optional classification into compulsory/capacity and conflict misses (by simulating other cache configurations in parallel)
- (uncertain) multi-core support
- Report cachelines on all levels (preliminary support through
pycachesim is licensed under AGPLv3.
from cachesim import CacheSimulator, Cache, MainMemory
mem = MainMemory()
l3 = Cache("L3", 20480, 16, 64, "LRU") # 20MB: 20480 sets, 16-ways with cacheline size of 64 bytes
mem.load_to(l3)
mem.store_from(l3)
l2 = Cache("L2", 512, 8, 64, "LRU", store_to=l3, load_from=l3) # 256KB
l1 = Cache("L1", 64, 8, 64, "LRU", store_to=l2, load_from=l2) # 32KB
cs = CacheSimulator(l1, mem)
cs.load(2342) # Loads one byte from address 2342, should be a miss in all cache-levels
cs.store(512, length=8) # Stores 8 bytes to addresses 512-519,
# will also be a load miss (due to write-allocate)
cs.load(512, length=8) # Loads from address 512 until (exclusive) 520 (eight bytes)
cs.force_write_back()
cs.print_stats()
This should return:
CACHE *******HIT******** *******MISS******* *******LOAD******* ******STORE*******
L1 1 ( 8B) 2 ( 65B) 3 ( 73B) 1 ( 8B)
L2 0 ( 0B) 2 ( 128B) 2 ( 128B) 1 ( 64B)
L3 0 ( 0B) 2 ( 128B) 2 ( 128B) 1 ( 64B)
MEM 2 ( 128B) 0 ( 0B) 2 ( 128B) 1 ( 64B)
Each row refers to one memory-level, starting with L1 and ending with main memory. The 3 loads in L1 are the sum of all individual accesses to the cache-hierarchy. 1 (from first load) + 1 (from store with write-allocate) + 1 (from second load) = 3.
The 1 hit, is for bytes which were cached already. Internally the pycachesim operates on cache-lines, which all addresses get transformed to. Thus, the two misses throughout all cache-levels are actually two complete cache-lines and after the cache-line had been loaded the consecutive access to the same cache-line are handled as hits. That is also the reason why data sizes increase from L1 to L2. L1 is accessed byte-wise and L2 only with cache-line granularity.
So: hits, misses, stores and loads in L1 are byte-wise. Every other statistical information are based on cache-lines.
When using victim caches, setting victims_to to the victim cache level, will cause pycachesim to forward unmodified cache-lines to this level on replacement. During a miss, victims_to is checked for availability and only hit if it the cache-line is found. This means, that load stats will equal hit stats in victim caches and misses should always be zero.
While searching for more versatile cache simulator for kerncraft, I stumbled across the following:
- gem5: Very fully-featured full system simulator. Complex to extract only the memory subsystem
- dineroIV: Nice and simple code, but does not support exclusive caches and not available under open source license.
- cachegrind: Maintained and stable code of a well established open source project, but only supports inclusive first and last level caches.
- callgrind: see cachegrind
- SMPcache: Only supports one single cache and runs on Windows with GUI. Also not freely available.
- CMPsim: Was only academically published and source code never made available.
- CASPER: Was only academically published and source code never made available.
Package | instructions [0] | blocks [1] | sub-blocks [2] | associtivity [3] | LRU [4] | MRU [4] | FIFO [4] | RR [4] | CCC [5] | 3+ levels [6] | exclusive [7] | victim [8] | multi-core [9] | API [10] | open source [11] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gem5 | x | x | ? | x | x | x | x | ? | ? | x | ? | ? | ? | python, ruby, c++ | yes, BSD-style |
dineroIV | x | x | x | x | x | x | x | x | x | c | no, free for non-comercial use | ||||
cachegrind | x | x | x | x | cli | yes, GPLv2 | |||||||||
callgrind | x | x | x | x | cli | yes, GPLv2 | |||||||||
SMPcache | x | x | x | x | x | ? | Windows GUI | no, free for education und research | |||||||
CMPsim | x | x | x | x | x | x | x | ? | ? | x | ? | no, source not public | |||
CASPER | x | x | x | x | x | x | x | x | x | x | x | perl, c | no, source not public | ||
pycachesim | x | x | x | x | x | x | x | x | x | x | python, C backend | yes, AGPLv3 |
[0] | Instruction cache support (typically L1I) |
[1] | Cacheline/block granular caching |
[2] | Sub-blocking/sectoring for in cache-storage |
[3] | Support for n-way associativity |
[4] | (1, 2, 3, 4) Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy |
[5] | Classification of misses into: compulsory (first time access), capacity (access after replacement), conflict (would have been a hit with full-associativity) |
[6] | Combining of at least three cache levels |
[7] | Exclusive cache relations (two levels may not share the same cacheline) |
[8] | Victim caches, where only evicted lines endup(e.g., AMD Bulldozer L3) |
[9] | Multi-core cache hierarchies with private and shared caches and cache coherency protocol |
[10] | Supported interfaces (cli = command-line-interface) |
[11] | Published under an Open Source Initiative approved license? |