-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathcs151b
167 lines (124 loc) · 4.38 KB
/
cs151b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
Computer Science M151B - Computer Systems Architecture
==============
Jan. 11, 2017
==============
Defining Performance
---------------------
Response Time: how long it takes to do a task
Throughput: total work done per unit time
e.g. tasks/transactions/... per hour
How are response time and throughput affected by
- replacing processors with faster versions
- adding more processors
Relative Performance
---------------------
Define Performance = 1/Execution Time
"X is n times faster than Y"
Performance(X) / Performance(Y) = ExecutionTime(Y) / ExecutionTime(X) = n
Measuring Execution Time
-------------------------
Elapsed Time
total response time, including all aspects
processing, I/O, OS overhead, idle time
determines system performance
CPU Time
time spent processing a given job
discounts I/O time, other jobs' shares
comprises user CPU time and system CPU time
different programs are affected differently by CPU and system performance
CPU Clocking
-------------
operation of digital hardware governed by a constant-rate clock
clock period
|<----------->|
| +------+ +------+ +------+
clock (cycles) |--+ +------+ +------+ +------+
|
data transfer |
computation | |<----------->|<----------->|<----------->|
|
update state | [-] [-] [-]
+---------------------------------------------------->
clock period: duration of a clock cycle
clock frequency (rate): cycles per second
CPU Time
---------
CPU Time = CPU Clock Cycles * Clock Cycle Time
= CPU Clock Cyclcs / Clock Rate
Performance imrpoved by
- reducing number of clock cycles
- increasing clock rate
- hardware designer must often trade off clock rate against cycle count
Instruction Count and Cycles per Instruction (CPI)
---------------------------------------------------
Clock Cycles = Instruction Count * CPI
CPU Time = Instruction Count * CPI * Clock Cycle Time
= (Instruction Count * CPI) / Clock Rate
Instruction Count for a program
determined by program, ISA, and compiler
static: actual count in compiled code
dynamic: includes loops (implied instructions) <-- using this one
Average Cycles per Instruction
determined by CPU hardware
if different instructions have different CPI
average CPI affected by instruction mix
If different instruction classes take different numbers of cycles
Clock Cycles = sum(i=1 -> n) (CPI(i) * InstructionCount(i))
Weighted average CPI
CPI = Clock Cycles / Instruction Count
= sum(i=i -> n) (CPI(i) * (InstructionCount(i) / InstructionCount))
Summary
--------
CPU Time = Instructions/Program * Clock Cycles/Instruction
* Seconds/Clock Cycle
Performance depends on:
- algorithm
affects IC, possibly CPI
- programming language
affects IC, CPI
- compiler
affects IC, CPI
- instruction set architecture
affects IC, CPI, Tc
Power Trends
-------------
Power = Capacitive Load * Voltage^2 * Frequency
Multiprocessors
----------------
Multicore processors
more than one processor per chip
Requires explicitly parallel programming
compare with instruction level parallelism
hardware executes multiple instructions at once
hidden from programmer
hard to do
programming for performance
load balancing
optimizing communication and synchronization
Pitfall: Amdahl's Law
----------------------
Improving an aspect of a computer and expecting a proportional improvement
in overall performance
T_improved = T_affected / ImprovementFactor + T_unaffected
Corollary: make the common case fast
Pitfall: MIPS as a Performance Metric
--------------------------------------
MIPS: millions of instructions per second
doesn't account for
differences in ISAs between computers
differences in complexity between instructions
MIPS = InstructionCount / ExecutionTime * 10^6
= InstructionCount / (InstructionCount * CPI * 10^6 / ClockRate)
= ClockRate / CPI * 10^6
CPI varies between programs on a given CPU
Summary
--------
Cost/performance is improving
due to underlying technology development
Hierarchical layers of abstraction
in both hardware and software
Instruction set architecture
hardware/software interface
Execution time: the best performance measure
Power is a limiting factor
use parallelism to improve performance