How many cache-hits and cache-miss does this code generate? - caching

I have a code and I need to analyze how many cache-miss and cache-hits occur when it is executed.
The cache memory has 1024 lines organized in blocks with 4 words of 1 byte each. The replacement method is direct mapping.
The code:
int N = 1024;
byte x[N], y[N];
for(int i = 0; i<N-1;i++){
for(int j=0; j<N;j++){
x[i] = x[i+1]*y[j];
}
}
Can someone help me?

Related

how do we calculate the number of reads/misses of the cache in this code snippet?

I'm trying to get an understanding of how to calculate the errors in the code, from the link on this page, Example given from text book. I can see where the calculations come from, but as the values are the same (32), I cannot work out how to do the calculation should the value in the two loops differ. Using different sized loops, what would the calculations be please?
`
for (i = 32; i >= 0; i--) {
for (j = 128; j >= 0; j--) {
total_x += grid[i][j].x;
}
}
for (i = 128; i >= 0; i--) {
for (j = 32; j >= 0; j--) {
total_y += grid[i][j].y;
}
}
`
If we had a matrix with 128 rows and 24 columns (instead of the 32 x 32 in the example), using 32-bit integers, and with each memory block able to hold 16 bytes, how do we calculate the number of compulsory misses on the top loop?
Also, if we use a direct-mapped cache holding 256 bytes of data, how would we calculate the number of all the data cache misses when running the top loop?
Finally, if we flip it and use the bottom loop, how does the maths change (if it does) for the points above?
Apologies as this is all new to me and I just want to understand the maths behind it so I can answer the problem, rather than just be given an answer.
Nothing - it's a theoretical question

Counting Byte Occurrence in Read Files in BPF

I am relatively new to BPF and trying to write a program that counts the occurrence of each byte read from a file (later will calculate entropy).
The idea is to have two BPF_PERCPU_ARRAYS to circumvent stack size limitations. To one, I will copy the first 4096 bytes of the content of the written file, and with the other, I will count the occurrence of each possible value of a byte.
The described arrays are initialized like this:
struct data_t {
u8 data[4096];
};
struct counter_t {
u32 data[256];
};
BPF_PERCPU_ARRAY(storage, struct data_t, 1); //to store the buffer
BPF_PERCPU_ARRAY(countarr, struct counter_t, 1); //to count occurrences
and used later in a function:
//set both arrays to zero
int zero = 0;
struct data_t *pstorage = storage.lookup(&zero);
struct counter_t *pcountarr = countarr.lookup(&zero);
//check if init worked
if (!pstorage)
return 0;
if (!pcountarr)
return 0;
//copy data to storage
bpf_probe_read((void*)&pstorage->data, sizeof(pstorage->data), (void*)buf);
u8 tmpint = 0;
for(i = 0; i < 4095; i++){
if (i == count){
break;
}
tmpint = pstorage->data[i];
//TROUBLE IS HERE
//bpf_trace_printk("Current Byte: %d", (int)tmpint); //THIS IS LINE A
//pcountarr->data[tmpint]++; //THIS IS LINE B
}
The last two lines that are commented out are the ones giving me trouble. Uncommenting line A gives me the error
invalid access to map value, value_size=4096 off=4096 size=1
R8 min value is outside of the allowed memory range
processed 102513 insns (limit 1000000) max_states_per_insn 4 total_states 981 peak_states 977 mark_read 459
with R8 (are R8 and R8_w the same?) being
R8_w=map_value(id=0,off=4096,ks=4,vs=4096,imm=0)
Doing so with Line B results in pretty much the same problem. At this point im decently lost with my experience and wish i had posted this several days ago :D...
Any help is appreciated :)
You are assigning zero to i but it is defined outside of the loop. for(i = 0; i < 4095; i++){. I suspect that i is not an unsigned number and thus can have a negative minimum value according to the verifier. Would define i as a u16 and see if that fixes the issue:
for(u16 i = 0; i < 4095; i++){

Minimize the number of page faults by loop interchange

Assume page size is 1024 words and each row is stored in one page.
If the OS allocates 512 frames for a program and uses LRU page replacement algorithm,
What will be the number of page faults in the following programs?
int A[][] = new int[1024][1024];
Program 1:
for (j = 0; j < A.length; j++)
for (i = 0; i < A.length; i++)
A[i][j] = 0;
Program 2:
for (i = 0; i < A.length; i++)
for(j = 0; j < A.length; j++)
A[i][j] = 0;
I assume that bringing the pages by row is better than bringing by column, however I cannot support my claim. Can you help me to calculate # of page faults?
One way to answer this is by simulation. You could change your loops to output the address of the assignment rather than setting it to zero:
printf("%p\n", &A[i][j]);
Then, write a second program that simulates page placement, so it would do something like:
uintptr_t h;
uintptr_t work[NWORKING_SET] = {0};
int lru = 0;
int fault = 0;
while (gethex(&h)) {
h /= pagesize;
int i;
for (i = 0; i < NWORKING_SET && work[i] != h; i++) {
}
if (i == NWORKING_SET) {
work[lru] = h;
fault++;
lru = (lru+1) % NWORKING_SET;
}
}
printf("%d\n", fault);
With that program in place, you can try multiple traversal strategies. PS: my lru just happens to work; I'm sure you can do much better.
For the second program; the CPU accesses the first int in a row causing a page fault, then accesses the other ints in the row while the page is already present. This means that (if the rows start on page boundaries) you'll get a page fault per row, plus probably one when the program's code is first started, plus probably another when the program's stack is first used (and one more if the array isn't aligned on a page boundary); which probably works out to 1026 or 1027 page faults.
For the first program; the CPU accesses the first int in a row causing a page fault; but by the time it accesses the second int in that same row the page has been evicted (become "least recently used" and replaced with a different page). This means that you'll get 1024*1024 page faults while accessing the array (plus one for program's code, stack, etc). That probably works out to 1048578 page faults (as long as the start of the array is aligned to "sizeof(int)").
However; this all assumes that the compiler failed to optimize anything. In reality it's extremely likely that any compiler that is worth using would have transformed both programs into something a little more like "memset(array, 0, sizeof(int)*1024*1024); that does consecutive writes (possibly writing multiple ints in a single larger write if underlying CPU supports larger writes). This implies that both programs would take probably cause 1026 or 1027 page faults.

nodemcu - problem writing to arbitrary location in memory

From esp8266 documentation:
You need to call EEPROM.begin(size) before you start reading or writing, size being the number of bytes you want to use. Size can be anywhere between 4 and 4096 bytes.
I had problem reading/writing to memory when start address was not 0 but some xx address and had to write small test program to check it out.
First code ...
EEPROM.begin(16);
for (int i = 0; i < 16; i++)
{
EEPROM.write(i, i);
}
EEPROM.commit();
EEPROM.end();
this is OK. Everything is written correctly.
But if I change start address …
EEPROM.begin(20);
for (int i = 0; i < 20; i++)
{
EEPROM.write(i+16, i);
}
EEPROM.commit();
EEPROM.end();
it only writes first 4 bytes since in begin I requested 20 bytes.
My question is: Is this normal or is it a bug? In documentation states
size being the number of bytes you want to use
so if I want to use only 20 bytes from random address should I write EEPROM.begin(20);?
If it's not bug how to read from address 5000 if max number for begin method is 4096?

Understanding Cache memory

I am trying to understand how cache memory reads and writes. Also I am trying to determine the hit and miss rate. I have tried reading and reading the textbook "Computer Systems - A Programmer Perspective" over and over and can't seem to grasp this idea. Maybe someone can help me understand this:
I am working with a two-dimensional array which has 480 rows and 640 columns. The cache is direct-mapped and 64 KB with 4 byte lines. Below is the C-code:
struct pixel {
char r;
char g;
char b;
char a;
};
struct pixel buffer[480][640];
register int i, j;
register char *cptr;
register int *iptr;
sizeof(char) == 1 (meaning an index in the array consists of 4 byte each (if I am understanding that correctly)). The buffer begins at memory address 0 and the cache is initially empty (cold cache). The only memory accesses are to the entries of the array. All other variables are stored in registers.
for (j=0; j < 640; j++) {
for (i=0; i < 480; i++){
buffer[i][j].r = 0;
buffer[i][j].g = 0;
buffer[i][j].b = 0;
buffer[i][j].a = 0;
}
}
For the code above then it is initializing all the elements in the array to 0, so it must be writing. I can see that this is bad locality because the array is writing column by column instead of row by row. Doesn't that affect the miss rate? I am trying to determine the miss rate for this code based on the cache size. I think the miss rate is 100% and if the locality was row by row then it would be 25%. But I am not totally understanding how cache-memory works so... Can anyone tell me something that could help me understand this better?
I would recommend you to watch the whole Tutorial if you are a beginner.
But for your question, lecture 27 to 31 would explain everything.
https://www.youtube.com/watch?v=tGarzP488Wc&index=29&list=PL2F82ECDF8BB71B0C
IISc Bangalore.

Resources