An Introduction to Parallel Programming

Chapter02 - Home
Exercises: 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - 20 - 21 - 22 - 23 - 24

Exercise. This is the solution to exercise 2.1 in the book.

Solution. (a) The timing for the various operations is broken down in the following table:

OperationTime, ns


Fetch 2
Compare 1
Shift 1
Add 1
Normalize 1
Round 1
Store 2


Total 9


(b) Unpipelined addition should take 9000 nanoseconds.

(c) Notice that operations taking 1 ns will be idle for 1 ns waiting for fetches to complete (and the idle 1ns propagates through the pipeline). Hence, the operation will occur as detailed in the following table:

Operation Occuring at times, ns


Fetches 0,2,4,,1998
Compares 2,4,6,,2000
Shifts 3,5,7,,2001
Adds 4,6,8,,2002
Normalize 5,7,9,,2003
Round 6,8,10,,2004
Stores 7,9,11,,2005

So the pipelined addition of 1000 pairs of floats will take 2006 ns.

(d) If we operate under the assumptions from (c) (i.e., fetches/stores are 2ns) and there are no level 1 misses then the operation completes in 2006 ns. For fetches from level 2 cache (at 5ns latency) and stores at 2 ns (stores will have 3 ns idle time between them, all other operations except fetches will have 4 ns idle time between them), the pipelined timing is:

Operation Occuring at times, ns


Fetches 0,5,10,,4995
Compares 5,10,15,,5000
Shifts 6,11,16,,5001
Adds 7,12,17,,5002
Normalize 8,13,18,,5003
Round 9,14,19,,5004
Stores 10,15,20,,5005

Thus, with no level 2 cache misses the total operation will take 5006 ns. For fetches from main memory (and assuming all other operations maintain the same timing as in (c)), we get the following pipelined timing:

Operation Occuring at times, ns


Fetches 0,50,100,,49950
Compares 50,100,150,,50000
Shifts 51,101,151,,50001
Adds 52,102,152,,50002
Normalize 53,103,153,,50003
Round 54,104,154,,50004
Stores 55,105,155,,50005

So memory fetches have high latency, with the whole operation completing in some 50 microseconds. If, for example we get a level 1 cache miss at the 3rd fetch, then the fetch will complete with the latency of the main memory. Hence the timing becomes:

Operation Occuring at times, ns


Fetches 0,2,4,54,56,,2046
Compares 2,4,54,56,58,,2048
Shifts 3,5,55,57,59,,2049
Adds 4,6,56,58,60,,2050
Normalize 5,7,57,59,61,,2051
Round 6,8,58,60,62,,2052
Stores 7,9,59,61,63,,2053

Hence with a single level 1 cache miss the latency of completion increases from 2006 ns to 2054 ns (by 50-2=48 ns). Notice also that after completing the addition of the second pair the pipeline is idle from 11 ns (after second store is completed) to 54 ns (when 4th fetch and 3rd compare can start).

Similarly, if we get a level 2 cache miss latency increases from 5006 ns to 5051 ns (by 50-5=45 ns).