course schedule: lectures
Toggle SectionsLecture Schedule |
Week 1[ 01/08 - 01/14 ]
[Mon 01/09] Lecture 1: Overview of parallel computation
In-class exercises
Where are you sitting today? Submit
Number of transistors on a chip Submit See
Multicore/manycore processor info Submit See
Top 500 observation Submit See
[Wed 01/11] Lecture 2: Three parallel-programming models
In-class exercises
Advantages and disadvantages of SMP organization Submit See
Overheads of message-passing Submit See
Shared-memory vs. message-passing programming Submit See
Reflection Submit
Week 2[ 01/15 - 01/21 ]
[Wed 01/18] Lecture 3: GPU architecture
In-class exercises
Best definition of speedup Submit See
Amdahl's law example Submit See
Upload your answers to practice questions Submit
Week 3[ 01/22 - 01/28 ]
[Mon 01/23] Lecture 4: Caches
In-class exercises
Direct-mapped cache: field sizes Submit See
Fully associative cache: field sizes Submit See
Set-associative cache: field sizes Submit See
Write policy in two-level caches Submit See
Reflection Submit
[Wed 01/25] Lecture 5: Physical and logical cache organization
In-class exercises
Steps in cache access Submit See
Parallelism in cache access Submit See
Alternatives for cache indexing and tagging Submit See
Multilevel cache design Submit See
Characteristics of inclusion properties Submit See
Reflection Submit
Week 4[ 01/29 - 02/04 ]
[Mon 01/30] Lecture 6: The cache-coherence problem
In-class exercises
Shared vs. distributed memory Submit See
Cache-coherence questions Submit See
Software lock using a flag Submit See
What's gone wrong in this race situation? Submit See
Reflection Submit
[Wed 02/01] Lecture 7: Coherence and consistency
In-class exercises
How does write-through guarantee coherence? Submit See
How many processors on a write-through bus? Submit See
What happens when a block is ejected? Submit See
Invalidation vs. update protocols Submit See
Ordering of operations in two threads Submit See
Why might A not print as 1? Submit See
Reflection Submit
Week 5[ 02/05 - 02/11 ]
[Mon 02/06] Lecture 8: Shared-memory parallel programming
In-class exercises
Please make a photocopy of textbook pages for me Submit
The three levels of parallelism Submit See
Dependences example Submit See
Dependences in truncated 4-point iteration example Submit See
LDG for Loop Nest 2 Submit See
Second dependences example Submit See
Reflection Submit
[Wed 02/08] Lecture 9: Dependences, DOACROSS, DOPIPE
In-class exercises
Dependences in function-parallelism example Submit See
Dependences in DOPIPE-parallelism example Submit See
Variable scopes - Example 1 Submit See
Exercise 2: for i tasks Submit See
Reflection Submit
Week 6[ 02/12 - 02/18 ]
[Mon 02/13] Test 1 - 7:00-9:00 PM, EB II 1231
[Wed 02/15] Lecture 10: Variable scope
In-class exercises
Why is each variable privatizable? Submit See
Example 1: Which variables should be declared as shared/private? Submit See
Example 2: Which variables should be declared as shared/private? Submit See
Scopes in matrix multiplication - for k ||ization Submit See
Scopes in matrix multiplication - for i ||ization Submit See
Reflection Submit
Week 7[ 02/19 - 02/25 ]
[Mon 02/20] Lecture 11: Parallelizing the Ocean application
In-class exercises
Questions about the serial solver Submit
Order of updating points Submit
Concurrency along antidiagonals Submit
Bad ways of exploiting parallelism in Ocean application Submit
Red/black ordering Submit
Does it matter that execution is no longer deterministic? Submit
[Wed 02/22] Lecture 12: Parallelization in three models
In-class exercises
Advantages and disadvantages of assignment options Submit
Block assignment and communication Submit
Block partitioning Submit
Synchronization in the shared-memory program Submit
Barrier synchronization in shared-memory version Submit
Questions about the message-passing program Submit
Typos in message-passing if statements Submit
Reflection Submit
Week 8[ 02/26 - 03/04 ]
[Mon 02/27] Lecture 13: Data-parallel algorithms
Online videos
13a. Control parallelism vs. data parallelism [4:53] Watch
13b. Building blocks for data parallelism [13:16] Watch
13c. Pointer doubling [10:14] Watch
13d. Multiplying matrices [5:00] Watch
13e. Labeling regions in an image [8:01] Watch
[Wed 03/01] Lecture 14: Parallelizing linked data structures
In-class exercises
Parallelizing operations on linked data structures Submit See
Conflict between an insertion and a deletion Submit See
Fine-grain locking approach Submit See
Questions about insertion with fine-grain locks Submit See
Reflection Submit
Week 9[ 03/05 - 03/11 ]
[Mon 03/06] Lecture 15: Invalidation and update protocols
Online videos
15a. The MSI protocol [14:20] Watch
15b. The MESI protocol [10:35] Watch
15c. The Dragon protocol [10:37] Watch
15d. The Firefly protocol [6:52] Watch
[Wed 03/08] Lecture 16: Multicore caches: organization & performance
In-class exercises
Hits and misses in set-associative cache Submit See
Hits and misses in direct-mapped cache Submit See
Coherence misses Submit See
Cache changes to reduce miss rate Submit See
Effects of increasing line size Submit See
Context-switch misses Submit See
Logical cache organization Submit See
Partitioned shared cache organization Submit See
Week 10[ 03/19 - 03/25 ]
[Mon 03/20] Lecture 17: Hardware support for locking
In-class exercises
Performance of test-and-set Submit
TSL vs. TTSL Submit
LL/SC vs. TTSL Submit
Ticket locks vs. array-based queueing locks Submit
Reflection
[Wed 03/22] Lecture 18: Barrier implementations
In-class exercises
Ticket lock with MSI Submit
Scalability at the barrier Submit
Performance of combining-tree barrier Submit
Reflection Submit
Week 11[ 03/26 - 04/01 ]
[Mon 03/27] Test 2 - 7:00-9:00 PM, EB II 1231
[Wed 03/29] Lecture 19: Memory consistency
In-class exercises
Permission form for study on dual-submission homework Submit
Interest in independent study/thesis topics Submit See
Example: Why is a memory consistency model needed? Submit See
Sequentially consistent vs. non-seq. consistent outcomes Submit See
Which outcomes are possible under SC? Submit See
Prefetching early and late Submit See
Reflection Submit
Week 12[ 04/02 - 04/08 ]
[Mon 04/03] Lecture 20: Relaxed memory-consistency models
In-class exercises
Need for relaxed consistency models Submit
Causual-consistency example Submit
Strongest consistency model Submit
How can both processes be killed? Submit
Weak ordering Submit
[Wed 04/05] Lecture 21: Caching in DSM machines
In-class exercises
Why doesn't a bus-based design scale? Submit
Why aren't invalidations too slow? Submit
Page placement without interleaving Submit
Directory messages for read and write misses Submit
Merging the directory with the LLC tag array Submit
Reflection Submit
Week 13[ 04/09 - 04/15 ]
[Mon 04/10] Lecture 22: Coherence in DSM machines
In-class exercises
Pseudocode for full bit-vector approach Submit
Block states in main memory Submit
Optimizing a full bit-vector scheme Submit
Reflection Submit
[Wed 04/12] Lecture 23: The Silicon Graphics S2MP architecture
Online videos
23a. Today's MP architectures [7:20] Watch
23b. Directory-based coherence [8:41] Watch
23c. Scaling the SMP model [7:05] Watch
23d. SGI's Origin [5:55] Watch
23e. Design issues [9:36] Watch
23f. Directory organization [5:42] Watch
23g. Coherence protocol and summary [10:33] Watch
Week 14[ 04/16 - 04/22 ]
[Mon 04/17] Lecture 24: DSM implementation correctness & performance
In-class exercises
An invalidation to a node that no longer has a block Submit
Transition from state U on a read request Submit
Transition from state S on a readX request Submit
Home-centric vs. requester-assisted approach Submit
Reflection Submit
[Wed 04/19] Lecture 25: Caching in multicore architectures
In-class exercises
ReadX in state S or U with non-atomic message Submit
ReadX to EM block with non-atomic message Submit
What's wrong with imprecise directory info? Submit
Increased power consumption and latency Submit
Other problems with stale directory info Submit
Accelerating thread migration Submit
Reflection Submit
Week 15[ 04/23 - 04/29 ]
[Mon 04/24] Lecture 26: Review
Lecture notes, etc.
In-class exercises
Three orchestrations of Ocean Submit
Coherence and consistency Submit
Physical and logical cache organization Submit
Four "C"s of cache misses Submit
Summing a vector with copy-scan Submit
Miscellaneous questions Submit
Kahoot questions Submit