course schedule: lectures
Toggle SectionsLecture Schedule |
Week 1[ 05/11 - 05/17 ]
[Wed 05/14] Lecture 1: Overview of parallel computation
Online videos
1a. Course objectives and policies [11:32] Watch
1b. Architectural trends [12:58] Watch
1c. Types of parallelism [12:03] Watch
1d. Flynn taxonomy [8:28] Watch
1e. Scope of CSC/ECE 506 [7:09] Watch
In-class exercises
Number of transistors on a chip Submit
Multicore/manycore processor info Submit
Top 500 observation Submit
[Fri 05/16] Lecture 2: Three parallel-programming models
Online videos
2a. Three parallel programming models [5:14] Watch
2b. The shared address-space model [7:32] Watch
2c. The message-passing model [7:08] Watch
2d. Interconnection networks [8:52] Watch
For your knowledge, not on quiz: Spaced repetition in learning Watch
Week 2[ 05/18 - 05/24 ]
[Mon 05/19] Lecture 3: GPU architecture
Online videos
3a. Introduction to heterogeneous parallel computing [16:54] Watch
3b. Portability and scalability in heterogeneous parallel computing [9:12] Watch
3c. Introduction to CUDA, data parallelism and threads [23:15] Watch
The testing effect Watch
[Wed 05/21] Lecture 4: Shared-memory parallel programming
Online videos
4a. Amdahl's law [9:30] Watch
4b. Steps in parallelization [4:43] Watch
4c. Loop-dependence analysis [5:16] Watch
4d. Loop-independent vs. loop-carried dependences [8:58] Watch
[Fri 05/23] Lecture 5: Dependences, DOACROSS, DOPIPE
Online videos
5a. Finding parallel tasks across iterations [4:30] Watch
5b. DOACROSS parallelism [4:03] Watch
5c. Function parallelism [3:37] Watch
5d. DOPIPE parallelism [3:24] Watch
Week 3[ 05/25 - 05/31 ]
[Wed 05/28] Lecture 6: Variable scope
Online videos
6a. Determining variable scope [8:59] Watch
6b. Privatization [4:59] Watch
6c. Reduction [2:50] Watch
6d. Summary of scope criteria [7:32] Watch
6e. Synchronization [5:03] Watch
[Fri 05/30] Lecture 7: Parallelizing the Ocean application
Online videos
7a. Parallelization in the Ocean simulation [5:03] Watch
7b. The serial solver [3:25] Watch
7c. Decomposition of the serial algorithm [6:30] Watch
7d. Assignment of elements to processes [2:46] Watch
7e. The data-parallel and shared-memory orchestrations [9:29] Watch
7f. The message-passing orchestration [5:19] Watch
Week 4[ 06/01 - 06/07 ]
[Mon 06/02] Lecture 8: Data-parallel algorithms
Online videos
8a. Control parallelism vs. data parallelism [4:53] Watch
8b. Building blocks for data parallelism [13:16] Watch
8c. Pointer doubling [10:14] Watch
8d. Multiplying matrices [5:00] Watch
8e. Labeling regions in an image [8:01] Watch
[Wed 06/04] Lecture 9: Parallelizing linked data structures
Online videos
9a. Parallel access to linked data structures [3:58] Watch
9b. Correctness of parallel LDS operations [8:26] Watch
9c. Three approaches to parallelization [12:28] Watch
[Fri 06/06] Lecture 10: Caches
Online videos
10a. Intro and direct-mapped caches [16:05] Watch
10b. Fully associative caches [9:03] Watch
10c. Address translation and set-associative caches [12:39] Watch
10d. Multilevel caches and the principle of inclusion [8:09] Watch
Week 5[ 06/08 - 06/14 ]
[Mon 06/09] Test 1
[Wed 06/11] Lecture 11: Physical and logical cache organization
Online videos
11a. Translation lookaside buffers [4:30] Watch
11b. Virtually vs. physically indexed caches [7:20] Watch
11c. Inclusive, exclusive, and NINE caches [6:00] Watch
11d. Replacement policies and mechanisms [8:47] Watch
[Fri 06/13] Lecture 12: The cache-coherence problem
Online videos
12a. Bus-based multiprocessors [6:08] Watch
12b. The cache-coherence problem [2:54] Watch
12c. Peterson's algorithm [6:53] Watch
12d. Coherence vs. consistency [7:56] Watch
Week 6[ 06/15 - 06/21 ]
[Mon 06/16] Lecture 13: Coherence and consistency
Online videos
13a. Bus-based coherence [4:27] Watch
13b. Coherence with write-through caches [5:54] Watch
13c. Invalidation vs. update protocols [5:03] Watch
13d. Memory consistency [10:35] Watch
[Fri 06/20] Lecture 14: Invalidation and update protocols
Online videos
14a. The MSI protocol [14:20] Watch
14b. The MESI protocol [10:35] Watch
14c. The Dragon protocol [10:37] Watch
14d. The Firefly protocol [6:52] Watch
Week 7[ 06/22 - 06/28 ]
[Mon 06/23] Lecture 15: Multicore caches: organization & performance
Online videos
15a. Classifying cache misses [9:29] Watch
15b. Simulating cache parameters [9:25] Watch
15c. Physical and logical cache organization [6:20] Watch
[Wed 06/25] Lecture 16: Hardware support for locking
Online videos
16a. Lock implementations [9:07] Watch
16b. Test-and-set lock (TSL) [6:21] Watch
16c. Test and test-and-set lock (TTSL) [3:52] Watch
16d. Load linked/store conditional (LL/SC) [9:04] Watch
16e. Ticket lock [3:30] Watch
16f. Array-based queuing locks [8:57] Watch
[Fri 06/27] Lecture 17: Barrier implementations
Online videos
17a. Centralized barrier implementations [7:44] Watch
17b. Distributed barrier implementations [2:03] Watch
Week 8[ 06/29 - 07/05 ]
[Mon 06/30] Lecture 18: Memory consistency
Online videos
18a. The two hypotheses of memory consistency [6:40] Watch
18b. Sequentially consistent outcomes [5:45] Watch
18c. Building an SC system [8:41] Watch
[Wed 07/02] Lecture 19: Relaxed memory-consistency models
Online videos
19a. Relaxed memory consistency models [4:27] Watch
19b. Sequential and causal consistency [8:19] Watch
19c. Processor consistency [8:34] Watch
19d. Weak ordering [8:19] Watch
19e. Release consistency [7:43] Watch
Week 9[ 07/06 - 07/12 ]
[Mon 07/07] Test 2
[Wed 07/09] Lecture 20: Caching in DSM machines
Online videos
20a. How to scale a multiprocessor [5:22] Watch
20b. Bus-based vs. directory-based coherence [2:07] Watch
20c. Mapping memory on a DSM [3:08] Watch
20d. Handling misses [4:05] Watch
20e. Alternatives for organizing directories [7:09] Watch
[Fri 07/11] Lecture 21: Coherence in DSM machines
Online videos
21a. Basic DSM cache coherence [4:32] Watch
21b. Main-memory states and network transactions [3:22] Watch
21c. Full bit-vector animation [4:54] Watch
21d. Scaling FBV with the number of processors [7:40] Watch
21e. The SSCI protocol [6:51] Watch
Week 10[ 07/13 - 07/19 ]
[Mon 07/14] Lecture 22: The Silicon Graphics S2MP architecture
Online videos
22a. Today's MP architectures [7:20] Watch
22b. Directory-based coherence [8:41] Watch
22c. Scaling the SMP model [7:05] Watch
22d. SGI's Origin [5:55] Watch
22e. Design issues [9:36] Watch
22f. Directory organization [5:42] Watch
22g. Coherence protocol and summary [10:33] Watch
[Wed 07/16] Lecture 23: DSM implementation correctness & performance
Online videos
23a. Protocol races: out-of-sync directory [6:42] Watch
23b. Transitions from States S and U [4:25] Watch
23c. Handling races: non-atomic messages [7:39] Watch
23d. Write propagation and memory consistency models [4:39] Watch
[Fri 07/18] Lecture 24: Caching in multicore architectures
Online videos
24a. Write requests to blocks in state U or S [2:03] Watch
24b. Write request to a block in state EM [1:57] Watch
24c. Dealing with imprecise directory information [4:41] Watch
24d. Accelerating thread migration [1:51] Watch
Week 11[ 07/20 - 07/26 ]
[Mon 07/21] Lecture 25: Interconnection network topologies
Online videos
25a. Interconnection networks and metrics [9:18] Watch
25b. Interconnection topologies: ring and mesh [8:18] Watch
25c. Hypercubes and shuffle-exchange [10:08] Watch
25d. Butterfly and Benes networks [5:50] Watch
25e. Trees and fat trees [4:26] Watch
[Wed 07/23] Lecture 26: Routing and switch design
Online videos
26a. Routing algorithms [8:01] Watch
26b. Deadlock-free routing [7:20] Watch
26c. Turn-model routing [2:54] Watch
26d. Store and forward, switch design [4:34] Watch
[Fri 07/25] Lecture 27: Review
In-class exercises
Three orchestrations of Ocean Submit See
Coherence and consistency Submit
Physical and logical cache organization Submit
Four "C"s of cache misses Submit
Summing a vector with copy-scan Submit See
Miscellaneous questions Submit See