Pipelining allows the stages of various instructions to be executed in parallel.
gcc, the __builtin_expect() function allows the program to provide the compiler hints.| State | Observed | Generated | Next State |
|---|---|---|---|
| Valid | PrRd | ~ | Valid |
| Valid | PrWr | BusWr | Valid |
| Valid | BusWr | ~ | Invalid |
| Invalid | PrWr | BusWr | Valid |
| Invalid | PrRd | BusRd | Valid |
| State | Observed | Generated | Next State |
|---|---|---|---|
| Modified | PrRd | ~ | Modified |
| Modified | PrWr | ~ | Modified |
| Modified | BusRd | BusWB | Shared |
| Modified | BusRdX | BusWB | Invalid |
| Shared | PrRd | ~ | Shared |
| Shared | BusRd | ~ | Shared |
| Shared | BusRdX | ~ | Invalid |
| Shared | PrWr | BusRdX | Modified |
| Invalid | PrRd | BusRd | Shared |
| Invalid | PrWr | BusRdX | Modified |
volatile keyword provides the following features:setjmp and longjmp.sig_atomic_t variables in signal handlers.volatile keyword does not prevent reordering of instructions.pthreads fix the issues of clone and provide a uniform interface for most systems.See Lecture 6 - Working with Threads for pthreads.
| ~ | Read 2nd | Write 2nd |
|---|---|---|
| Read 1st | RAR - No Dependency | WAR - Antidependency |
| Write 1st | RAW - True Dependency | WAW - Output Dependency |
See Lecture 8 - Asynchronous I/O for cURL.
See Lecture 9 - Of Asgard Hel for Valgrind.
sleep() frequently, then the threads in the lock convoy have an increased probability to make progress.notify() to wake a single thread instead of all the threads, it is possible that the notify() becomes lost.gcc flags -fdump-tree-gimple and -fdump-tree-all can be used to see all the three address code.restrict Qualifier¶restrict qualifier on a pointer p tells the compiler that it may assume that, in the scope of p, the program will not use any other pointer q to access the data at *p.restrict qualifier allows a compiler to optimize code, especially critical loops, better.icc: Intel C Compiler.cc: Solaris Studio Compiler.gcc: GNU C Compiler - Graphite.clang: Clang Compiler - polly.for (i = 0; i < 1000; i++)
x[i] = i + 3;
for (i = 0; i < 100; i++)
for (j = 0; j < 100; j++)
x[i][j] = x[i][j] + y[i - 1][j];
for (i = 0; i < 10; i++)
x[2 *i + 1] = x[2 * i];
for (j = 0; j <= 10; j++)
if (j > 5) x[i] = i + 3;
for (i = 0; i < 100; i++)
for (j = i; j < 100; j++)
x[i][j] = 5;
See Lecture 13 - OpenMP for OpenMP.
See Lecture 14 - OpenMP Tasks for OpenMP Tasks.
#pragma omp flush [(list)]
See Lecture 15 - Memory Consistency.
See Lecture 15 - Memory Consistency.
mfence: All loads and stores before the barrier become visible before any loads and stores after the barrier become visible.sfence: All stores before the barrier become visible before all stores after the barrier become visible.lfence: All loads before the barrier become visible before all loads after the barrier become visible.See Lectures 17 to 36 for Post-Midterm Content; School Closure b/c Pandemic.