# An Introduction to Parallel Programming

Chapter02 - Home
Exercises: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - 20 - 21 - 22 - 23 - 24

Exercise. This is the solution to exercise 2.1 in the book.

Solution. (a) The timing for the various operations is broken down in the following table:

 Operation Time, ns Fetch 2 Compare 1 Shift 1 Add 1 Normalize 1 Round 1 Store 2 Total 9

(b) Unpipelined addition should take 9000 nanoseconds.

(c) Notice that operations taking 1 ns will be idle for 1 ns waiting for fetches to complete (and the idle 1ns propagates through the pipeline). Hence, the operation will occur as detailed in the following table:

 Operation Occuring at times, ns Fetches 0,2,4,$\dots$,1998 Compares 2,4,6,$\dots$,2000 Shifts 3,5,7,$\dots$,2001 Adds 4,6,8,$\dots$,2002 Normalize 5,7,9,$\dots$,2003 Round 6,8,10,$\dots$,2004 Stores 7,9,11,$\dots$,2005

So the pipelined addition of 1000 pairs of floats will take 2006 ns.

(d) If we operate under the assumptions from (c) (i.e., fetches/stores are 2ns) and there are no level 1 misses then the operation completes in 2006 ns. For fetches from level 2 cache (at 5ns latency) and stores at 2 ns (stores will have 3 ns idle time between them, all other operations except fetches will have 4 ns idle time between them), the pipelined timing is:

 Operation Occuring at times, ns Fetches 0,5,10,$\dots$,4995 Compares 5,10,15,$\dots$,5000 Shifts 6,11,16,$\dots$,5001 Adds 7,12,17,$\dots$,5002 Normalize 8,13,18,$\dots$,5003 Round 9,14,19,$\dots$,5004 Stores 10,15,20,$\dots$,5005

Thus, with no level 2 cache misses the total operation will take 5006 ns. For fetches from main memory (and assuming all other operations maintain the same timing as in (c)), we get the following pipelined timing:

 Operation Occuring at times, ns Fetches 0,50,100,$\dots$,49950 Compares 50,100,150,$\dots$,50000 Shifts 51,101,151,$\dots$,50001 Adds 52,102,152,$\dots$,50002 Normalize 53,103,153,$\dots$,50003 Round 54,104,154,$\dots$,50004 Stores 55,105,155,$\dots$,50005

So memory fetches have high latency, with the whole operation completing in some 50 microseconds. If, for example we get a level 1 cache miss at the 3rd fetch, then the fetch will complete with the latency of the main memory. Hence the timing becomes:

 Operation Occuring at times, ns Fetches 0,2,4,54,56,$\dots$,2046 Compares 2,4,54,56,58,$\dots$,2048 Shifts 3,5,55,57,59,$\dots$,2049 Adds 4,6,56,58,60,$\dots$,2050 Normalize 5,7,57,59,61,$\dots$,2051 Round 6,8,58,60,62,$\dots$,2052 Stores 7,9,59,61,63,$\dots$,2053

Hence with a single level 1 cache miss the latency of completion increases from 2006 ns to 2054 ns (by 50-2=48 ns). Notice also that after completing the addition of the second pair the pipeline is idle from 11 ns (after second store is completed) to 54 ns (when 4th fetch and 3rd compare can start).

Similarly, if we get a level 2 cache miss latency increases from 5006 ns to 5051 ns (by 50-5=45 ns).