Chapter02 - Home

Exercise. This is the solution to exercise 2.1 in the book.

Solution. (a) The timing for the various operations is broken down in the following table:

Operation | Time, ns |

Fetch | 2 |

Compare | 1 |

Shift | 1 |

Add | 1 |

Normalize | 1 |

Round | 1 |

Store | 2 |

Total | 9 |

(b) Unpipelined addition should take 9000 nanoseconds.

(c) Notice that operations taking 1 ns will be idle for 1 ns waiting for fetches to complete (and the idle 1ns propagates through the pipeline). Hence, the operation will occur as detailed in the following table:

Operation | Occuring at times, ns |

Fetches | 0,2,4,$\dots $,1998 |

Compares | 2,4,6,$\dots $,2000 |

Shifts | 3,5,7,$\dots $,2001 |

Adds | 4,6,8,$\dots $,2002 |

Normalize | 5,7,9,$\dots $,2003 |

Round | 6,8,10,$\dots $,2004 |

Stores | 7,9,11,$\dots $,2005 |

So the pipelined addition of 1000 pairs of floats will take 2006 ns.

(d) If we operate under the assumptions from (c) (i.e., fetches/stores are 2ns) and there are no level 1 misses then the operation completes in 2006 ns. For fetches from level 2 cache (at 5ns latency) and stores at 2 ns (stores will have 3 ns idle time between them, all other operations except fetches will have 4 ns idle time between them), the pipelined timing is:

Operation | Occuring at times, ns |

Fetches | 0,5,10,$\dots $,4995 |

Compares | 5,10,15,$\dots $,5000 |

Shifts | 6,11,16,$\dots $,5001 |

Adds | 7,12,17,$\dots $,5002 |

Normalize | 8,13,18,$\dots $,5003 |

Round | 9,14,19,$\dots $,5004 |

Stores | 10,15,20,$\dots $,5005 |

Thus, with no level 2 cache misses the total operation will take 5006 ns. For fetches from main memory (and assuming all other operations maintain the same timing as in (c)), we get the following pipelined timing:

Operation | Occuring at times, ns |

Fetches | 0,50,100,$\dots $,49950 |

Compares | 50,100,150,$\dots $,50000 |

Shifts | 51,101,151,$\dots $,50001 |

Adds | 52,102,152,$\dots $,50002 |

Normalize | 53,103,153,$\dots $,50003 |

Round | 54,104,154,$\dots $,50004 |

Stores | 55,105,155,$\dots $,50005 |

So memory fetches have high latency, with the whole operation completing in some 50 microseconds. If, for example we get a level 1 cache miss at the 3rd fetch, then the fetch will complete with the latency of the main memory. Hence the timing becomes:

Operation | Occuring at times, ns |

Fetches | 0,2,4,54,56,$\dots $,2046 |

Compares | 2,4,54,56,58,$\dots $,2048 |

Shifts | 3,5,55,57,59,$\dots $,2049 |

Adds | 4,6,56,58,60,$\dots $,2050 |

Normalize | 5,7,57,59,61,$\dots $,2051 |

Round | 6,8,58,60,62,$\dots $,2052 |

Stores | 7,9,59,61,63,$\dots $,2053 |

Hence with a single level 1 cache miss the latency of completion increases from 2006 ns to 2054 ns (by 50-2=48 ns). Notice also that after completing the addition of the second pair the pipeline is idle from 11 ns (after second store is completed) to 54 ns (when 4th fetch and 3rd compare can start).

Similarly, if we get a level 2 cache miss latency increases from 5006 ns to 5051 ns (by 50-5=45 ns).