Knowledge base how to setup the ECM model inputs
- LIKWID
- LaTeX and TikZ for plotting (Step 3)
- Only for streaming kernels
- No hardware counter measurements included for verification
The script ./bench_scan_size.sh
has to be used to run the benchmark and collect the performance results.
The script takes in a run configuration file which defines the likwid-bench
benchmarks to be runs, the result folders and hardware settings.
Some sample configuration files can be found in the run_config
folder.
The benchmark runs streaming kernels with different stream/array size.
For example to run with the settings in run_config/casclakesp2_config.txt
file the following can be used:
/bench_scan_size.sh run_config/casclakesp2_config.txt
NB: If some basic performance relevant hardware configuration is not set as described in the config file, the script will pre-exit with a message "Hardware not configured properly".
This step is optional. The step generates the ECM model plots corresponding to the benchmarks run in the previous step. The ECM generation requires two basic inputs: application model and machine model.
The application model defines the properties of the benchmark.
The application model is defined using a file written into the folder ecm_generator/application_model
which indicates the number and type of streams seen in the benchmark.
Read-only
specifies the number of steams/arrays that have to be just readWrite-only
specifies the number of steams/arrays that have to be just writtenRead-Write
specifies the number of steams/arrays that have to be both read and written
Examples of the application model files can be found in the ecm_generator/application_model
folder.
The machine model determines the machine capabilities.
It is defined using files written to ecm_generator/machine_model
folder.
The files carry information like:
CL_size
specifies the cacheline size in bytes[cache-name]_read_bw
specifies the read bandwidth between the given [cache-name] cache level and its higher hierarchy. For exampleL1_read_bw
indicates read bandwidth between L1 and registers[cache-name]_write_bw
specifies the write bandwidth between the given [cache-name] cache level and its higher hierarchy. For exampleL1_write_bw
indicates write bandwidth between L1 and registers[cache-name]_shared_bw
specifies the bandwidth between the given [cache-name] cache level and its higher hierarchy when a different resource becomes a bottleneck. For exampleL1_shared_bw
indicates the bandwidth when the common (for both reads and writes) address generation unit (AGU) becomes a bottleneck.[cache-name]_WA
indicates whether write-allocate is applicable for the cache[cache-name]_VICTIM
indicates whether the cache is a victim cache[cache-name]_SIZE
indicates the cache size in kBECM_hypothesis
indicates the overlap hypothesis of the given hardware under a given setting.
The script ecm_generator/ecm.sh
generates the ECM model prediction data.
It takes the application model and machine model defined above as input.
For example to generate ECM model with application model ecm_generator/application_model/copy.config
and machine model ecm_generator/machine_model/casclakesp2_nps1.config
following command can be used:
cd ecm_generator
./ecm.sh "application_model/copy.config" "machine_model/casclakesp2_nps1.config"
Note that in general only the application model (2.1) and machine model (2.2) has to be defined. The generation of ECM model (2.3) need not be done manually as indicated here and will be done automatically when calling the plotting script. See next section.
The script ./generate_all_plots.sh
runs ecm script ecm_generator/ecm.sh
(2.3) and collects the performance measurements collected in Step 1 to generate final plots.
The script requires the location of folder where performance measurements are collected (as specified through configuration file in Step 1) and the machine file corresponding to the machine model.
For example to plot the results collected in results/casclakesp2/nps2/avx512/
with the ECM model corresponding to machine file ecm_generator/machine_model/casclakesp2/nps2/avx512/casclakesp2_nps2_avx512.config
, the following command should be used:
./generate_all_plots.sh results/casclakesp2/nps2/avx512/ ecm_generator/machine_model/casclakesp2/nps2/avx512/casclakesp2_nps2_avx512.config
The plots are then generated in a folder called plots
located in the same directory given in the input (results/casclakesp2/nps2/avx512/
in above example).
Plots (in pdf format) for different benchmarks as well as an overall compiled plot called ecm.pdf
could be found in the plots
directory.