





























<complex-block>









































































































<image><complex-block><complex-block>









| 🌲 Pro                                                                                                       | Dicessor $P_3$ Reads A                  |                                             |
|-------------------------------------------------------------------------------------------------------------|-----------------------------------------|---------------------------------------------|
| Read operation completes.                                                                                   | P,<br>Cache<br>A = 2 S<br>Shooper       | P <sub>3</sub><br>Cache<br>A=2 S<br>Snooper |
| Trace<br>P: Read A<br>P: We to 3 = 22<br>P: Read A<br>P: White A = 3<br>P: Read A<br>P: Read A<br>P: Read A | Bus<br>Controller<br>A=2<br>Main memory | Jav. Upd.<br>J MS Favily<br>4 MEBI Dagon    |
| NC STATE UNIVERSITY                                                                                         | CSC/ECE 506: Architecture               | of Parallel Computers                       |











































|    | MESI Example (Cache-to-Cache Transfer)                    |          |          |             |                       |                  |
|----|-----------------------------------------------------------|----------|----------|-------------|-----------------------|------------------|
|    |                                                           |          |          |             |                       |                  |
|    | Proc<br>Action                                            | State P1 | State P2 | State P3    | Bus Action            | Data From        |
|    | R1                                                        | E        | -        | -           | BusRd                 | Mem              |
|    | W1                                                        | М        |          | -           | -                     | Own cache        |
|    | R3                                                        | S        | -        | S           | BusRd/Flush           | P1 cache         |
|    | W3                                                        | 1        | -        | М           | BusRdX                | Mem              |
|    | R1                                                        | S        | -        | S           | BusRd/Flush           | P3 cache         |
|    | R3                                                        | S        | -        | S           | -                     | Own cache        |
|    | R2                                                        | S        | S        | S           | BusRd/Flush           | P1/P3<br>Cache*  |
|    | * Data from memory if no cache-to-cache transfer, BusRd/- |          |          |             |                       |                  |
| NC | STATE UNIVERSITY                                          | (        |          | CSC/ECE 506 | 6: Architecture of Pa | rallel Computers |

| Proc<br>Action                                             | State P1 | State P2 | State P3 | Bus Action  | Data From       |
|------------------------------------------------------------|----------|----------|----------|-------------|-----------------|
| R1                                                         | E        | -        | -        | BusRd       | Mem             |
| W1                                                         | м        |          | -        | -           | Own cache       |
| R3                                                         | S        | -        | S        | BusRd/Flush | P1 cache        |
| W3                                                         | I        | -        | М        | BusRdX      | Mem             |
| R1                                                         | S        | -        | S        | BusRd/Flush | P3 cache        |
| R3                                                         | S        | -        | S        | -           | Own cache       |
| R2                                                         | S        | S        | S        | BusRd/Flush | P1/P3<br>Cache* |
| * Data from memory if no cache-to-cache transfer, BusRd/ – |          |          |          |             |                 |

| Proc<br>Action                                             | State P1 | State P2 | State P3    | Bus Action            | Data From        |
|------------------------------------------------------------|----------|----------|-------------|-----------------------|------------------|
| R1                                                         | E        | -        | -           | BusRd                 | Mem              |
| W1                                                         | М        |          | -           | -                     | Own cache        |
| R3                                                         | S        | -        | S           | BusRd/Flush           | P1 cache         |
| W3                                                         | 1        | -        | м           | BusRdX                | Mem              |
| R1                                                         | S        | -        | S           | BusRd/Flush           | P3 cache         |
| R3                                                         | S        | -        | S           | -                     | Own cache        |
| R2                                                         | S        | S        | S           | BusRd/Flush           | P1/P3<br>Cache*  |
| * Data from memory if no cache-to-cache transfer, BusRd/ – |          |          |             |                       |                  |
| NC STATE UNIVERSIT                                         | Y        |          | CSC/ECE 506 | 3: Architecture of Pa | rallel Computers |



| MESI Example (Cache-to-Cache Transfer+BusUpgr)             |           |          |          |            |                       |                  |
|------------------------------------------------------------|-----------|----------|----------|------------|-----------------------|------------------|
|                                                            |           |          |          |            |                       |                  |
| Pro<br>Acti                                                | oc<br>ion | State P1 | State P2 | State P3   | Bus Action            | Data From        |
| R                                                          | 1         | E        | -        | -          | BusRd                 | Mem              |
| W                                                          | 1         | М        |          | -          | -                     | Own cache        |
| R                                                          | 3         | S        | -        | S          | BusRd/Flush           | P1 cache         |
| W                                                          | 3         | I        | -        | м          | BusUpgr               | Own cache        |
| R                                                          | 1         | S        | -        | S          | BusRd/Flush           | P3 cache         |
| R                                                          | 3         | S        | -        | S          | -                     | Own cache        |
| R                                                          | 2         | S        | S        | S          | BusRd/Flush'          | P1/P3<br>Cache*  |
| * Data from memory if no cache-to-cache transfer, BusRd/ – |           |          |          |            |                       |                  |
|                                                            |           | _        |          | 000/505 50 |                       | 82               |
| C STATE UN                                                 | IVERSITY  |          |          | CSC/ECE 50 | o: Architecture of Pa | rallel Computers |

|    | MESI Example (Cache-to-Cache Transfer+BusUpgr)             |          |          |             |                       |                  |
|----|------------------------------------------------------------|----------|----------|-------------|-----------------------|------------------|
|    |                                                            |          |          |             |                       |                  |
|    | Proc<br>Action                                             | State P1 | State P2 | State P3    | Bus Action            | Data From        |
|    | R1                                                         | E        | -        | -           | BusRd                 | Mem              |
|    | W1                                                         | М        | \        | -           | -                     | Own cache        |
|    | R3                                                         | S        | -        | S           | BusRd/Flush           | P1 cache         |
|    | W3                                                         | I        | -        | М           | BusUpgr               | Own cache        |
|    | R1                                                         | S        | -        | S           | BusRd/Flush           | P3 cache         |
|    | R3                                                         | S        | -        | S           | -                     | Own cache        |
|    | R2                                                         | s        | s        | s           | BusRd/Flush'          | P1/P3<br>Cache*  |
|    | * Data from memory if no cache-to-cache transfer, BusRd/ – |          |          |             |                       |                  |
| NC | STATE UNIVERSITY                                           |          |          | CSC/ECE 506 | 6: Architecture of Pa | rallel Computers |
|    |                                                            |          |          |             |                       |                  |









Dragon Writeback Update Protocol Four states • Exclusive-clean (E): Memory and I have it · Shared clean (Sc): I, others, and maybe memory, but I'm not owner Shared modified (Sm): I and others but not memory, and I'm the owner Sm and Sc can coexist in different caches, with at most one Sm • Modified or dirty (M): I and, no one else On replacement: Sc can silently drop, Sm has to flush No invalid state 57 · If in cache, cannot be invalid If not present in cache, can view as being in not-present or invalid state New processor events: PrRdMiss, PrWrMiss Introduced to specify actions when block not present in cache New bus transaction: BusUpd Broadcasts single word written on bus: updates other relevant caches NC STATE UNIVERSITY CSC/ECE 506: Architecture of Parallel Computer











| Processor P                                                                                                                        | Reads A                            |                                        |
|------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------------------------------|
| Processor P, attempts to<br>read A from its cache.                                                                                 | P2<br>Cache<br>Snooper             | P3<br>Cache<br>Snooper                 |
| Trace<br>P, Read A<br>P, Wine A - 2<br>P, Read A<br>P, Wine A - 3<br>P, Read A<br>P, Read A<br>P, Read A<br>P, Read A<br>P, Read A | Controller<br>A = 1<br>Main memory | 100. Upd.<br>3 103. Teedy              |
| NC STATE UNIVERSITY CSC                                                                                                            | /FCE 506: Architectur              | MESI Dragon 91 e of Parallel Computers |
| 91                                                                                                                                 |                                    | o or r aranor computers                |

 Red operation completes

 Image: Complete state

 <











































| Proces                                                                                                   | ssor $P_3$                                  | Reads A                          |                                                   |
|----------------------------------------------------------------------------------------------------------|---------------------------------------------|----------------------------------|---------------------------------------------------|
| Processor P <sub>3</sub> reads from its cache.                                                           | P <sub>1</sub><br>Cache<br>A=3 S:<br>Snoope | P2<br>Cache<br>Snooper           | P <sub>3</sub><br>Cache<br>A = 3 S<br>Shoopar     |
| Trace<br>P: Read A<br>P: With A - 2<br>P: With A - 3<br>P: Read A<br>P: Read A<br>P: Read A<br>P: Read A | 14                                          | Controller<br>A=1<br>Main memory | ім Црі.<br>3 <u>МШ Райу</u><br>4 <u>МЕЗі Очер</u> |
| NC STATE UNIVERSITY                                                                                      | CSC/E                                       | CE 506: Architectu               | re of Parallel Computers                          |















| Pro Pro                                                                                                                                                          | ocessor P₂                       | Reads A                          |                                 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|----------------------------------|---------------------------------|
| Operation completes.                                                                                                                                             | P.<br>Cache<br>A=3 5:<br>Snooper | P2<br>Cache<br>A=3 S2<br>Snooper | P3<br>Cache<br>A=3 S<br>Snooper |
| Trace           P, Read A           P, Write A = 2           P, Write A = 2           P, Write A = 3           P, Read A           P, Read A           P, Read A | Bus                              | Controller<br>A=1<br>Main memory | 104 Upd<br>2 M3 Predy           |
| NC STATE UNIVERSITY                                                                                                                                              | CSC/                             | ECE 506: Architectur             | e of Parallel Computers         |





| Action | State P1 | State P2 | State P3 | Bus Action  | Data from |
|--------|----------|----------|----------|-------------|-----------|
| R1     | E        | -        | -        | BusRd       | Mem       |
| W1     | М        |          | -        | -           | Own cache |
| R3     | Sm       | -        | Sc       | BusRd/Flush | P1 cache  |
| W3     | Sc       | -        | Sm       | BusUpd/Upd  | Own cache |
| R1     | Sc       | -        | Sm       | -           | Own cache |
| R3     | Sc       | -        | Sm       | -           | Own cache |
| R2     | Sc       | Sc       | Sm       | BusRd/Flush | P3 cache  |
|        |          |          |          |             |           |



























| 🔅 Pro                                                               | ocessor P <sub>1</sub> Reads A            |                                                                                                   |
|---------------------------------------------------------------------|-------------------------------------------|---------------------------------------------------------------------------------------------------|
| Read operation completes.                                           | Pr<br>Cache<br>A = 1 V<br>Grooper         | P <sub>3</sub><br>Cache<br>Snooper                                                                |
| Trace<br>Pr Read A<br>Pr Write A = 2<br>P, Read A<br>Pr Write A = 3 | Eus<br>Controller<br>A = 1<br>Main memory |                                                                                                   |
| $P_3$ Read A<br>$P_2$ Read A                                        |                                           | Inv.         Upd.           3         MSI         Firefly           4         MESI         Dragen |
| NC STATE UNIVERSITY                                                 | CSC/ECE 506: Architecture                 | 127<br>of Parallel Computers                                                                      |











































| 👘 Pro                  | cessor P <sub>2</sub> Reads A                   |
|------------------------|-------------------------------------------------|
| Main memory controller | Pickis (S)                                      |
| observes the BusRd.    | Hind A                                          |
| P <sub>1</sub> Read A  | Inc. Lpd                                        |
| P <sub>2</sub> Read A  | 2 MS Facty                                      |
| P <sub>2</sub> Read A  | 4 MS Dogon                                      |
| NC STATE UNIVERSITY    | CSC/ECE 506: Architecture of Parallel Computers |





| Action | State P1 | State P2 | State P3 | Bus Action  | Data From |
|--------|----------|----------|----------|-------------|-----------|
| R1     | V        | -        | -        | BusRd       | Mem       |
| W1     | D        | -        | -        | -           | Own cache |
| R3     | S        | -        | S        | BusRd/Flush | P1 cache  |
| W3     | S        | -        | S        | BusUpd      | Own cache |
| R1     | S        | -        | S        | -           | Own cache |
| R3     | S        | -        | S        | -           | Own cache |
| <br>R2 | S        | S        | S        | BusRd/Flush | P1 Cache  |



