NCSU brick logo

CSC/ECE 506: Architecture of Parallel Computers

Summer 2025

course schedule: lectures

Toggle Sections

Lecture Schedule

Week 1[ 05/11 - 05/17 ]

[Wed 05/14] Lecture 1: Overview of parallel computation
Online videos

1a. Course objectives and policies [11:32] Watch  
1b. Architectural trends [12:58] Watch  
1c. Types of parallelism [12:03] Watch  
1d. Flynn taxonomy [8:28] Watch  
1e. Scope of CSC/ECE 506 [7:09] Watch  

In-class exercises

Number of transistors on a chip  Submit  
Multicore/manycore processor info  Submit  
Top 500 observation  Submit  

[Fri 05/16] Lecture 2: Three parallel-programming models
Online videos

2a. Three parallel programming models [5:14] Watch  
2b. The shared address-space model [7:32] Watch  
2c. The message-passing model [7:08] Watch  
2d. Interconnection networks [8:52] Watch  
For your knowledge, not on quiz: Spaced repetition in learning Watch  

Week 2[ 05/18 - 05/24 ]

[Mon 05/19] Lecture 3: GPU architecture
Online videos

3a. Introduction to heterogeneous parallel computing [16:54] Watch  
3b. Portability and scalability in heterogeneous parallel computing [9:12] Watch  
3c. Introduction to CUDA, data parallelism and threads [23:15] Watch  
The testing effect Watch  

[Wed 05/21] Lecture 4: Shared-memory parallel programming
Online videos

4a. Amdahl's law [9:30] Watch  
4b. Steps in parallelization [4:43] Watch  
4c. Loop-dependence analysis [5:16] Watch  
4d. Loop-independent vs. loop-carried dependences [8:58] Watch  

[Fri 05/23] Lecture 5: Dependences, DOACROSS, DOPIPE
Online videos

5a. Finding parallel tasks across iterations [4:30] Watch  
5b. DOACROSS parallelism [4:03] Watch  
5c. Function parallelism [3:37] Watch  
5d. DOPIPE parallelism [3:24] Watch  

Week 3[ 05/25 - 05/31 ]

[Wed 05/28] Lecture 6: Variable scope
Online videos

6a. Determining variable scope [8:59] Watch  
6b. Privatization [4:59] Watch  
6c. Reduction [2:50] Watch  
6d. Summary of scope criteria [7:32] Watch  
6e. Synchronization [5:03] Watch  

[Fri 05/30] Lecture 7: Parallelizing the Ocean application
Online videos

7a. Parallelization in the Ocean simulation [5:03] Watch  
7b. The serial solver [3:25] Watch  
7c. Decomposition of the serial algorithm [6:30] Watch  
7d. Assignment of elements to processes [2:46] Watch  
7e. The data-parallel and shared-memory orchestrations [9:29] Watch  
7f. The message-passing orchestration [5:19] Watch  

Week 4[ 06/01 - 06/07 ]

[Mon 06/02] Lecture 8: Data-parallel algorithms
Online videos

8a. Control parallelism vs. data parallelism [4:53] Watch  
8b. Building blocks for data parallelism [13:16] Watch  
8c. Pointer doubling [10:14] Watch  
8d. Multiplying matrices [5:00] Watch  
8e. Labeling regions in an image [8:01] Watch  

[Wed 06/04] Lecture 9: Parallelizing linked data structures
Online videos

9a. Parallel access to linked data structures [3:58] Watch  
9b. Correctness of parallel LDS operations [8:26] Watch  
9c. Three approaches to parallelization [12:28] Watch  

[Fri 06/06] Lecture 10: Caches
Online videos

10a. Intro and direct-mapped caches [16:05] Watch  
10b. Fully associative caches [9:03] Watch  
10c. Address translation and set-associative caches [12:39] Watch  
10d. Multilevel caches and the principle of inclusion [8:09] Watch  

Week 5[ 06/08 - 06/14 ]

[Mon 06/09] Test 1
[Wed 06/11] Lecture 11: Physical and logical cache organization
Online videos

11a. Translation lookaside buffers [4:30] Watch  
11b. Virtually vs. physically indexed caches [7:20] Watch  
11c. Inclusive, exclusive, and NINE caches [6:00] Watch  
11d. Replacement policies and mechanisms [8:47] Watch  

[Fri 06/13] Lecture 12: The cache-coherence problem
Online videos

12a. Bus-based multiprocessors [6:08] Watch  
12b. The cache-coherence problem [2:54] Watch  
12c. Peterson's algorithm [6:53] Watch  
12d. Coherence vs. consistency [7:56] Watch  

Week 6[ 06/15 - 06/21 ]

[Mon 06/16] Lecture 13: Coherence and consistency
Online videos

13a. Bus-based coherence [4:27] Watch  
13b. Coherence with write-through caches [5:54] Watch  
13c. Invalidation vs. update protocols [5:03] Watch  
13d. Memory consistency [10:35] Watch  

[Fri 06/20] Lecture 14: Invalidation and update protocols
Online videos

14a. The MSI protocol [14:20] Watch  
14b. The MESI protocol [10:35] Watch  
14c. The Dragon protocol [10:37] Watch  
14d. The Firefly protocol [6:52] Watch  

Week 7[ 06/22 - 06/28 ]

[Mon 06/23] Lecture 15: Multicore caches: organization & performance
Online videos

15a. Classifying cache misses [9:29] Watch  
15b. Simulating cache parameters [9:25] Watch  
15c. Physical and logical cache organization [6:20] Watch  

[Wed 06/25] Lecture 16: Hardware support for locking
Online videos

16a. Lock implementations [9:07] Watch  
16b. Test-and-set lock (TSL) [6:21] Watch  
16c. Test and test-and-set lock (TTSL) [3:52] Watch  
16d. Load linked/store conditional (LL/SC) [9:04] Watch  
16e. Ticket lock [3:30] Watch  
16f. Array-based queuing locks [8:57] Watch  

[Fri 06/27] Lecture 17: Barrier implementations
Online videos

17a. Centralized barrier implementations [7:44] Watch  
17b. Distributed barrier implementations [2:03] Watch  

Week 8[ 06/29 - 07/05 ]

[Mon 06/30] Lecture 18: Memory consistency
Online videos

18a. The two hypotheses of memory consistency [6:40] Watch  
18b. Sequentially consistent outcomes [5:45] Watch  
18c. Building an SC system [8:41] Watch  

[Wed 07/02] Lecture 19: Relaxed memory-consistency models
Online videos

19a. Relaxed memory consistency models [4:27] Watch  
19b. Sequential and causal consistency [8:19] Watch  
19c. Processor consistency [8:34] Watch  
19d. Weak ordering [8:19] Watch  
19e. Release consistency [7:43] Watch  

Week 9[ 07/06 - 07/12 ]

[Mon 07/07] Test 2
[Wed 07/09] Lecture 20: Caching in DSM machines
Online videos

20a. How to scale a multiprocessor [5:22] Watch  
20b. Bus-based vs. directory-based coherence [2:07] Watch  
20c. Mapping memory on a DSM [3:08] Watch  
20d. Handling misses [4:05] Watch  
20e. Alternatives for organizing directories [7:09] Watch  

[Fri 07/11] Lecture 21: Coherence in DSM machines
Online videos

21a. Basic DSM cache coherence [4:32] Watch  
21b. Main-memory states and network transactions [3:22] Watch  
21c. Full bit-vector animation [4:54] Watch  
21d. Scaling FBV with the number of processors [7:40] Watch  
21e. The SSCI protocol [6:51] Watch  

Week 10[ 07/13 - 07/19 ]

[Mon 07/14] Lecture 22: The Silicon Graphics S2MP architecture
Online videos

22a. Today's MP architectures [7:20] Watch  
22b. Directory-based coherence [8:41] Watch  
22c. Scaling the SMP model [7:05] Watch  
22d. SGI's Origin [5:55] Watch  
22e. Design issues [9:36] Watch  
22f. Directory organization [5:42] Watch  
22g. Coherence protocol and summary [10:33] Watch  

[Wed 07/16] Lecture 23: DSM implementation correctness & performance
Online videos

23a. Protocol races: out-of-sync directory [6:42] Watch  
23b. Transitions from States S and U [4:25] Watch  
23c. Handling races: non-atomic messages [7:39] Watch  
23d. Write propagation and memory consistency models [4:39] Watch  

[Fri 07/18] Lecture 24: Caching in multicore architectures
Online videos

24a. Write requests to blocks in state U or S [2:03] Watch  
24b. Write request to a block in state EM [1:57] Watch  
24c. Dealing with imprecise directory information [4:41] Watch  
24d. Accelerating thread migration [1:51] Watch  

Week 11[ 07/20 - 07/26 ]

[Mon 07/21] Lecture 25: Interconnection network topologies
Online videos

25a. Interconnection networks and metrics [9:18] Watch  
25b. Interconnection topologies: ring and mesh [8:18] Watch  
25c. Hypercubes and shuffle-exchange [10:08] Watch  
25d. Butterfly and Benes networks [5:50] Watch  
25e. Trees and fat trees [4:26] Watch  

[Wed 07/23] Lecture 26: Routing and switch design
Online videos

26a. Routing algorithms [8:01] Watch  
26b. Deadlock-free routing [7:20] Watch  
26c. Turn-model routing [2:54] Watch  
26d. Store and forward, switch design [4:34] Watch  

[Fri 07/25] Lecture 27: Review
  • Lecture notes
In-class exercises

Three orchestrations of Ocean  Submit  See
Coherence and consistency  Submit  
Physical and logical cache organization  Submit  
Four "C"s of cache misses  Submit  
Summing a vector with copy-scan  Submit  See
Miscellaneous questions  Submit  See

Week 12[ 07/27 - 08/02 ]

[Mon 07/28] Final Exam
©2007-2024 NC State University | Disclaimer
back to top