-
Notifications
You must be signed in to change notification settings - Fork 6
CS 33 (Fall '17)
Chenyu Xu.
For the following forms of machine parallelism, give an example of an application that will likely work better with this form of parallelism.
- instruction level parallelism
- multiplexing
- process parallelism
-
When we have one CPU and core, and our program does not rely on sequences of values a lot, instruction level parallelism enable micro operations to perform at the same time. This parallelism is enabled besides other machine parallelisms, and is achieved based on techniques of coding.
-
When we have only one CPU and cor, and we need controls over the application. For example, we want to give preferred service or priority to specific client and part of the program. Also, when the whole application need to share data between flows, multiplexing enables every logical flow has access to the entire address space of the process.
-
When we have several CPUs and cores, and have several clients working on the application. Besides enabling parallelism for each client, every client has its seperate address space, so none of them can overwrite other's memory, and keep their memory safe.
Contributed by Caleb Chau.
Given two numbers x and y, determine if x is less than or equal to y. Return 1 if x <= y and 0 if not.
isLessOrEqual(5, 6) returns 1, isLessOrEqual(6, 5) returns 0, isLessOrEqual(5, 5) returns 1
int isLessOrEqual(int x, int y) {
/*
* If x <= y, then either x is negative and y is positive/zero, or
* they have the same sign and y - x is positive/zero (sign bit is
* 0). xneg_ypos is 1 only when x is negative and y is
* positive. nneg_diff is 1 only when x and y have the same sign and
* their difference is positive/zero.
*/
int difference = y + (~x + 1);
int xneg_ypos = (x & (~y)) >> 31 & 0x01;
int nonneg_difference = (~(x ^ y) & ~difference) >> 31 & 0x01;
return xneg_ypos | nonneg_difference;
}
Contributed by Joey Ganley.
What will the following code output? Hint: The binary represention of "1065353216" is "0 01111111 00000000000000000000000"
A) 1065353216
B) FLT_MAX
C) 1
D) NaN
#include <iostream>
using namespace std;
union MagicUnion {
int a;
float b;
};
int main() {
MagicUnion test;
test.a = 1065353216;
cout << test.b << endl;
}
The answer is C. The union takes the binary data and outputs the data in whichever type is being requested. Since "test.b" is being requested, the union will output the binary data in float form. Since floating point value "0 01111111 00000000000000000000000" is equal to 1, the program will output 1.
Contributed by Anirudh Veeraragavan.
Write a function isPositive
that returns 1 if x > 0 and 0 otherwise.
Restrictions:
- You may only use the operators
! ~ & ^ | + << >>
- You may only use the above operators 8 times.
- You may not use any control flow constructs. I.E. no
if, do, while, for, switch, etc.
- You may not cast and may not use data types other than
int
.
isPositive(-1) = 0
x is positive if and only if
- The sign bit is not 0
- x is not 0
int isPositive(int x) {
int sign = (x >> 31) & 1;
int isZero = !x;
return !sign & !isZero;
}
Contributed by Arvind Ramachandran.
The following code transposes the elements of an square matrix.
#define N ___
void transpose(int A[N][N]) {
int i, j;
for (i=0; i < N; i++) {
for (j=0; j < i; j++) {
int t = A[i][j];
A[i][j] = A[j][i];
A[j][i] = t;
}
}
}
When compiled with optimization level -01, GCC generates the following code for the inner loop:
movl (%rdx), %ecx
movl (%rax), %esi
movl %esi, (%rdx)
movl %ecx, (%rax)
addq $4, %rdx
addq $400, %rax
cmpq %rdi, %rax
jne .L4
- Which register holds a pointer to array element A[i][j]?
- Which register holds a pointer to array element A[j][i]?
- What is the value of N?
(A) 4
(B) 100
(C) 400
- %rdx
- %rax
- (B) 100
Contributed by Benjamin Yang.
Create a counter that is incremented by two threads, 5 times each. Print out the thread number and current value of count after each increment. Print out the final value after both threads are finished as well. Use POSIX threads and semaphores to implement your solution. Use global variables as appropriate.
Output:
Thread 1 incremented count to 1
Thread 2 incremented count to 2
Thread 1 incremented count to 3
Thread 2 incremented count to 4
Thread 1 incremented count to 5
Thread 2 incremented count to 6
Thread 1 incremented count to 7
Thread 2 incremented count to 8
Thread 1 incremented count to 9
Thread 2 incremented count to 10
The value of count is 10
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
sem_t s;
int count;
void *countup(void *p) {
int *id = (int*)p;
for (int i = 0; i < 5; i++) {
sem_wait(&s);
count++;
printf(""Thread %d incremented count to %d\n"", *id, count);
sem_post(&s);
}
}
int main() {
pthread_t tid1, tid2;
count = 0;
sem_init(&s, 0, 1);
int id1 = 1;
int id2 = 2;
pthread_create(&tid1, NULL, countup, &id1);
pthread_create(&tid2, NULL, countup, &id2);
pthread_join(tid1, NULL);
pthread_join(tid2, NULL);
printf(""The value of count is %d\n"", count);
}
Contributed by Vaibhav Aggarwal.
How can you subtract two integers using only bitwise operators?
int subtract(int a, int b);
If we create a table of the different bitwise scenarios:
a | b | diff | borrow
0 | 0 | 0 | 0
0 | 1 | 1 | 1
1 | 0 | 1 | 0
1 | 1 | 0 | 0
We can see that we only borrow if (~a) & b
, and the difference bit is a ^ b
. Therefore:
int subtract(int a, int b) {
while (b) {
int borrow = (~a) & b;
a = a ^ b;
b = borrow << 1;
}
return a;
}
Contributed by Nyan Tun.
Given an integer A and an integer B, determine the number of bits it would take to toggle to convert A to B.
in: a, b = 2, 3 // in binary: 10, 11
out: 1
in: a, b = 10, 20 // in binary: 01010, 10100
out: 4
You iterate from the least significant bit leftwards. a^b returns a 1 if a and b are different and the c & 1 masks c to retain only the bit value.
#include <iostream>
using namespace std;
int toggle_bit(int a, int b) {
int count = 0;
for (int c = a^b; c!= 0; c = c >> 1) {
count += c & 1;
}
return count;
}
int main() {
cout << toggle_bit(2, 3) << endl;
return 0;
}
Contributed by Shawye Ho.
1a. Di cuss the difference between a Solid State Drive (SSD) and a regular Hard Drive. What advantages and disadvantages does an SSD have over a regular hard disk drive due to the differences in their design? (Remember: SSDs use flash memory)
1b. Some SSD manufacturers display a warning to leave at least 25% of the storage space empty for optimal write performance and SSD longevity. Why do you think this is? Other manufacturers do not display such a warning for their SSDs. What differences between the SSDs of different manufacturers might lead to some manufacturers recommending that 25% of the storage space on their SSD be left empty, while other SSD manufacturers do not have such a recommendation?
1c. Samsung is currently working on a new type of SSD called a Key SSD. This SSD can bypass the flash controller on the SSD, and stores "keys" that map to certain blocks of memory on the SSD. Thus, to find one of these "blocks" of memory, the SSD can use the keys, rather than working through the flash controller. What advantage(s) (or disadvantage(s)) might the Key SSD have over a regular SSD? What types of objects or files might the Key SSD be best suited to store?
1a. An SSD uses flash memory, which means it has large blocks of integrated-circut assemblies as memory that is accessed using a flash controller, which allows the hardware (usually a computer) to communicate with the drive. This controller does many of the necessary operations to maintain the flash memory, including bad block mapping, encryption, caching, error detection and correction, garbage collection, and wear leveling, among other things. This type of memory, with integrated-circut assemblies rather than actual spinning disks and read/write heads, has no moving parts. As such, SSDs tend to be more resistant to physical shock and damage. They are also more silent when they run. Futhermore, because of the nature of their memory access capabilities, they tend to read faster (have lower access time and lower latency) than hard disk drives. However, due to the cost of creating the SSD, SSDs cost much more (per GB memory) than hard drive disks.
1b. Flash memory (like in SSDs) is most effective when it is written in large blocks, because it is typically accessed in blocks. Thus, the flash controller will always attempt to write to a free block, to optimize the SSD's read capability. Thus, if there are no free blocks, and something new must be written, the flash controller must first read a block (or multiple blocks) and write it somewhere else so that the current item being written can be written in a free block. Thus, if there is low space in the SSD, writing will be slower. Furthermore, because this occurance also requires MORE writing (since it needs to rewrite an old block before writing a new block), and will thus wear out the flash memory sooner. Thus, some manufacturers might recommend keeping at least 25% of the SSD free to increase write performance and the longevity of the SSD. However, some manufacturers do not have this recommendation; this is because these SSDs might have different flash controllers that optimize differently (always wipe and write at the same time, perhaps), or may include extra unusable space on their SSDs, to act as a buffer for these scenarios, and ensure that write performance won't suffer, and so that the SSD will have increased longevity.
1c. The Key SSD will be able to bypass the flash controller, which is slower, and access memory directly through a key, which is faster. Thus, the Key SSD should be faster than regular SSDs, because a Key system should be faster than going through the (relatively) slow flash controller. However, because each Key is associated with a particular region, this system could be unsecure. While traditional flash controller encrypt the data being stored on the drive (thus making it impossible to access the data on the drive unless someone has the flash controller associated with the drive) the Key directly gives the region of the information, making encryption more difficult. As a result, the Key SSD could be less secure than traditional SSDs. Because the Key SSD uses keys that map to certain large blocks of memory, it is most effective with regard to objects or files that take up large blocks of memory. These include, among others, videos, images, and data blocks (large amounts of data, organized in some particular way, such as a large excel spreadsheet, etc.).
Contributed by Koji Kusumi.
Examine the function loop1 below:
#include <stdio.h>
void loop1(void) {
int arr[1000][1000];
for ( int j = 0; j < 1000; j++ ) {
for ( int i = 0; i < 1000; i++ )
arr[i][j] = i*j;
}
int sum = 0;
for ( int j = 0; j < 1000; j++ ) {
for (int i = 0; i < 1000; i++ )
sum += arr[i][j];
}
printf(""the sum is %d \n"", sum);
}
int main(void) {
loop1();
//loop2();
}
With your understanding of locality and caching, write improved function, loop2. This function should compute the same sum and still use loops in a meaninful way. Explain what changes you made and how they improve the speed of the function.
A two-dimensional array is stored as single line of values. If a two-dimensional array is indexed by [row][col], values in the same row are stored adjacent to one other. Notice that loop1 sums values in the same col. For each index, in the array an entire new block of memory must be loaded. Blocks of memory pulled into a cache are pieces of memory stored adjacently. Since values in the same col are not adjacent, we cannot use more than one value in a single block.
To speed up the program, we can sum values by row. Now, when we load a block, we can use all of the values because the values are adjacent and we area summing adjacent indices. In code, this manifests itself as swapping the order of our loops
#include <stdio.h>
void loop2(void) {
int arr[1000][1000];
for ( int i = 0; i < 1000; i++ ) {
for ( int j = 0; j < 1000; j++)
arr[i][j] = i*j;
}
int sum = 0;
for ( int i = 0; i < 1000; i++ ) {
for ( int j = 0; j < 1000; j++ )
sum += arr[i][j];
}
printf(""the sum is %d \n"", sum);
}
void loop1(void) {
int arr[1000][1000];
for ( int j = 0; j < 1000; j++ ) {
for ( int i = 0; i < 1000; i++ )
arr[i][j] = i*j;
}
int sum = 0;
for ( int j = 0; j < 1000; j++ ) {
for (int i = 0; i < 1000; i++ )
sum += arr[i][j];
}
printf(""the sum is %d \n"", sum);
}
int main(void) {
loop1();
loop2();
}
Contributed by Kevin Tan.
Consider the following C code:
void strIncrement(int *n, int x, const char *word) {
int i;
for (i = 0; i < strlen(word); i++) {
*n += i * x;
}
}
- Suppose
word
is a very long C-string. How well does speculative execution perform when evaluating the condition of the for loop? - Without modifying the body of the for loop, propose one way to improve the performance of this code.
- Our function is rewritten to conform to the syntax required for thread routines:
// Global struct definition
struct arguments {
int *n,
int x;
const char *word;
};
void *strIncrement(void *args) {
// Cast args to a pointer to a ""struct arguments""
struct arguments *arg = (struct arguments *)args;
// Get parameters
int *n = arg->n;
int x = arg->x;
const char *word = arg->word;
// Main thread routine
int i;
for (i = 0; i < strlen(word); i++) {
*n += i * x;
}
return NULL;
}
Is strIncrement
always thread-safe? Explain. If no, also propose a way to make the function always thread-safe.
-
If
word
is long, we are evaluating the for loop condition many times. Speculative execution works well: the CPU will guess that the condition evaluates to true since that is the most typical program behavior. The condition only ever evaluates to false when we exit the loop! -
We can hoist out the call to
strlen
so that it isn't called every time the program evaluates the condition for the for loop (function calls are treated as black boxes that cannot be optimized - with the exception of calls to inline functions). It would look something like:void strIncrement(int *n, int x, const char *word) { int i; unsigned long length = strlen(word); for (i = 0; i < length; i++) { *n += i * x; } }
-
strIncrement
is not always thread-safe. It never accesses any global or static variables, but it is not explicitly reentrant - it is implicitly reentrant. If the parametern
(orarg->n
) points to the same address for two or more different threads, we have race conditions. One thread might update the value ofn
after another thread has already read its old value.One way we can fix this is with semaphores, or more specifically a mutex. We could create a mutex, then surround the body of the for loop with P and V operations.
Another way we could fix this is to change the function so that it calculates and returns the total value that it would've added to
n
. Then, the main thread can reap all child threads, sum up all of their return values, and manually updaten
. This solution would notably require a lot more code rewriting and overhead than the above solution.
Contributed by Kyle Romero.
Which of the following statements is NOT true regarding the general use of caches?
A. Caches utilize spatial locality to offer fast access to elements in arrays
B. Accessing data in a cache is faster than accessing data in RAM
C. If we write data to the cache, this value is always changed immediately in the backing store
D. Caches optimize data accesses while iterating through loops
E. All of the above statements are true regarding general use of caches
C. If we write data to the cache, this value is always changed immediately in the backing store This answer is incorrect because this depends on the write policy of our caches. This would be true if our cache implemented a write-through policy. However, in a write-back policy this would not always be the case.
Contributed by An Nguyen.
typedef struct {
char a;
int b;
char c;
double d;
} X;
What is the size of X? How can you reduce memory consumption of X? How small can X be?
a: 1 byte
padding: 3bytes
b: 4 bytes
c: 1 byte
padding: 7 bytes
d: 8 bytes
=> Size of X is 24 bytes
To reduce memory consumption of X, put data in the struct in decreasing order of size.
typedef struct {
double d;
int b;
char a;
char c;
} X;
Now, size of X is 16 bytes.
Contributed by Andrew Ding.
#include <iostream>
int rotateRight(int x, int n) {
int shift = (!n + ~0) & (32 + ~n + 1); // makes sure shifting n is valid
int mask1 = (x >> n) & ~(~0 << shift);
int mask2 = x << shift;
return mask1 | mask2;
}
Contributed by Marko Sevo.
Are the following thread-safe functions in the Rio I/O package also reentrant?
ssize_t rio_readn(int fd, void *usrbuf, size_t n);
ssize_t rio_writen(int fd, void *usrbuf, size_t n);
Just because the functions in the Rio I/O package are thread-safe does not mean that they are reentrant. These functions are definitely not explicitly reentrant because they both take in pointers as arguments, but they are implicitly reentrant. This means that if they are used correctly and no shared data is passed as arguments (global data, static data, etc), then they are reentrant. However, since reentrancy is a property of the caller and callee, if shared data is passed into the functions, then they are not reentrant.
Contributed by Max (Yingbo) Wang.
Consider the new memory allocations on each line in the following C function. Is the memory allocated in stack or heap? After the function returns, is there a memory leak? If so, which line of code is causing the leak?
int cs33Rocks()
{
char *ptr; // 1
char buffer[128]; // 2
ptr = new char[128]; // 3
return 0;
}
A. stack, stack, heap; #2 is causing a memory leak
B. stack, stack, heap; #3 is causing a memory leak
C. stack, heap, heap; #3 is causing a memory leak
D. stack, heap, heap; no memory leak
Correct answer: B
Local variables, pointers and function return addresses are stored in stack memory. It's organized by the OS and all allocated spaces in the scope will be recycled when the function returns. Manually allocated memory spaces by new
or malloc
keywords go onto heap memory instead. It's responsible for memory leak. The developer is responsible for deallocating the space by free
when the object is no longer needed.
Contributed by Michael Mallett
Four functions are shown in the code below. Find the one which is not explicitly reentrant.
#include stdin
int t;
// function a
void hello() {
printf(""Hello World"");
}
// function b
int f(int i) {
hello();
return i + 2;
}
// function c
void swap(int *x, int *y) {
t = *x;
*x = *y;
*y = t;
}
// function d
unsigned int fib_rec(unsigned int n) {
if (n == 0) {
return 0;
}
if (n == 1) {
return 1;
}
return fib_rec(n - 1) + fib_rec(n - 2);
}
Function c is not reentrant because it uses a global.
A is obviously thread-safe.
B also is reentrant, despite calling another function.
D is reentrant because it is recursive.
Contributed by Tina Luo.
Write a function add(int x, int y) that returns sum of two integers. The function cannot use any of the arithmetic operators (+, ++, –, -, .. etc).
#include<stdio.h>
int add(int x, int y) {
while (y != 0) {
int carry = x & y;
x = x ^ y;
y = carry << 1;
}
return x;
}
Yizhu Zhang.
Given an array of integers, every element appears twice except for one. Find that single one.
Given [2, 3, 5, 3, 5]. Returns 2.
int singleNumber(int A[], int n) {
int result = 0;
for (int i = 0; i < n; i++)
result ^=A[i];
return result;
}
Contributed by Daniel Li.
Explain the two types of locality used in caching.
Temporal locality: data that we access is likely to be accessed again in the future Spatial locality: if you access a piece of data, it's likely that it we'll access data nearby soon.
Contributed by Derek Xu
Prof. Eggert has developed a new programming language that acts exactly like C except that it uses a new data structure called bloats instead of doubles. bloat is a number inspired by the floating point and have the following structure:
+-----------+----------+----------+
| sign byte | mantissa | exponent |
+-----------+----------+----------+
|<--1 bit-->|<-32bits->|<-31bits->|
Assume that casting a bloat to a long will preserve binary representation. Assume that bloat's like float's can be in normalized, denormalized, or special forms. Assume any operations on bloat act the same if performed on floats (including NaN behavior).
a) can bloating point numbers represent all int's? Why or why not?
b) can bloating point numbers represent all double's? Why or why not?
c) you are tasked with writing a C function that typecasts longs into bloats. You do not need to take into consideration rounding when finding the closest bloat representation. You may only use bitwise operations, addition, subtraction, while loops, if statements, comparison operators, and unions. Try to make your solution as efficient as possible, rather than using brute force (whatever that may be). HINT: 2^30 = 1073741824
Complete the following function:
bloat long2bloat (long x){
}
a) yes: the bloat's mantissa is 32bits. int is 32bit. Thus bloats can represent all ints.
b) no: the double's mantissa is 52bits. The bloat's mantissa is 32bits. The double can represent numbers more fine grain than bloat, but its numbers have less range.
c)
typedef union {
long l
bloat b;
} conversion;
bloat long2bloat (long x) {
conversion c;
c.l = 0;
//convert negative to positive 2s complement
if (x < 0) {
c.l = 0x8000000000000000;
//hard code LONG_MIN case (can use unsigned long too)
if (x != LONG_MIN)
x = (x^0xffffffffffffffff) + 1;
else {
c.l = 0x8000000043e00000;
return c.b;
}
}
//calculate exponent and make mantissa
long shifts = 0;
while ((x & 0x7fffffff00000000) != 0) {
x = x >> 1;
shifts--;
}
while ((x & 0x0000000100000000) == 0) {
x = x << 1;
shifts++;
}
//calculate bias: 2^30 - 1 = 1073741823
long exponent = 1073741823 + 32 - shifts;
//combine everything to form new number
c.l = c.l | exponent | ((x & 0x00000000ffffffff) << 31);
//use union to print out same binary representation
return c.b;
}
© 2017 UCLA UPE Tutoring Committee.