WRITE_ONCE and READ_ONCE in linux kernel

WRITE_ONCE and READ_ONCE in linux kernel - linux-kernel

Can somebody explain the usage of WRITE_ONCE and READ_ONCE?
And internally WRITE_ONCE uses a volatile qualifier. Why?
How does WRITE_ONCE and READ_ONCE solve cache coherency problem?
Difference between *(volatile __u8_alias_t *) p and (volatile __u8_alias_t *) *p ?

Related

Memory loads experience different latency on the same core

I am trying to implement a cache based covert channel in C but noticed something weird. The physical address between the sender and the receiver is shared by using the mmap() call that maps to the same file with the MAP_SHARED option. Below is the code for the sender process which flushes an address from the cache to transmit a 1 and loads an address into the cache to transmit a 0. It also measures the latency of a load in both cases:
// computes latency of a load operation
static inline CYCLES load_latency(volatile void* p) {
CYCLES t1 = rdtscp();
load = *((int *)p);
CYCLES t2 = rdtscp();
return (t2-t1);
}
void send_bit(int one, void *addr) {
if(one) {
clflush((void *)addr);
load__latency = load_latency((void *)addr);
printf("load latency = %d.\n", load__latency);
clflush((void *)addr);
}
else {
x = *((int *)addr);
load__latency = load_latency((void *)addr);
printf("load latency = %d.\n", load__latency);
}
}
int main(int argc, char **argv) {
if(argc == 2)
{
bit = atoi(argv[1]);
}
// transmit bit
init_address(DEFAULT_FILE_NAME);
send_bit(bit, address);
return 0;
}
The load operation takes around 0 - 1000 cycles (during a cache-hit and a cache-miss) when issued by the same process.
The receiver program loads the same shared physical address and measures the latency during a cache-hit or a cache-miss, the code for which has been shown below:
int main(int argc, char **argv) {
init_address(DEFAULT_FILE_NAME);
rdtscp();
load__latency = load_latency((void *)address);
printf("load latency = %d\n", load__latency);
return 0;
}
(I ran the receiver manually after the sender process terminated)
However, the latency observed in this scenario is very much different as compared to the first case. The load operation takes around 5000-1000 cycles.
Both the processes have been pinned to the same core-id by using the taskset command. So if I'm not wrong, during a cache-hit, both processes will experience a load latency of the L1-cache on a cache-hit and DRAM on a cache-miss. Yet, these two processes experience a very different latency. What could be the reason for this observation, and how can I have both the processes experience the same amount of latency?

The initial access to an mmaped region will page-fault (lazy mapping/allocation by the kernel), unless you use mmap(MAP_POPULATE), or mlock, or touch some other cache line of the page first.
You're probably timing a page fault if you only do one time measurement per mmap, or per run of a whole program.
(Also you don't seem to be doing anything to warm up the CPU frequency, so once core cycle could be many reference cycles. Some of the time for an L3 miss is fixed in terms of memory clock cycles, but another part of it scales with core/uncore clock.)
Also note that unless you run the 2nd process immediately (e.g. from the same shell command), the OS will get a chance to put that core into a deep sleep. On Intel CPUs at least, that empties L1d and L2 so it can power them down in the deeper C states. Probably also the TLBs.
It's also strange that you you cast away volatile in load = *((int *)p);
Assigning the load result to a global(?) variable inside the timed region is also pretty questionable; that could also soft page fault. If so, RDTSCP will have to wait for it, because the store can't retire.
(But on TLB hit for the store, it doesn't have to wait for it commit to cache, since there's no mfence before rdtscp. A store can retire while the store is still in the store buffer. In fact it must retire before the store buffer entry is known to be non-speculative so it can commit.)

" transmitting spi data..ioc" invalid argument

Hi guys I am new to coding and I am trying to monitor the voltage and resistance value of a potantiometer by using spi communication on raspberryPi3. I found that code but when I try to run it, the program gives me:
Problem transmitting spi data..ioc" invalid argument
I read the code carefully again but I couldn't find anything wrong. Maybe I am missing something. If you help me it would be so good. Thanks:) By the way the code is here:
http://www.hertaville.com/interfacing-an-spi-adc-mcp3008-chip-to-the-raspberry-pi-using-c.html

You should try to fully initialize your SPI structure.
.........
spi[i].speed_hz = this->speed ;
spi[i].bits_per_word = this->bitsPerWord ;
spi[i].cs_change = 0;
//Ypu should add this lines
spi[i].pad = 0;
spi[i].tx_nbits = 0;
spi[i].rx_nbits = 0;
It should help :)

Building an UCS4 string buffer in python 2.7 ctypes

In an attempt to recreate the getenvironment(..) C function of _winapi.c (direct link) in plain python using ctypes, I'm wondering how the following C code could be translated:
buffer = PyMem_NEW(Py_UCS4, totalsize);
if (! buffer) {
PyErr_NoMemory();
goto error;
}
p = buffer;
end = buffer + totalsize;
for (i = 0; i < envsize; i++) {
PyObject* key = PyList_GET_ITEM(keys, i);
PyObject* value = PyList_GET_ITEM(values, i);
if (!PyUnicode_AsUCS4(key, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(key);
*p++ = '=';
if (!PyUnicode_AsUCS4(value, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(value);
*p++ = '\0';
}
/* add trailing null byte */
*p++ = '\0';
It seems that the function ctypes.create_unicode_buffer(..) (doc, code) is doing something quite close that I could reproduce if only I could have an access to Py_UCS4 C type or be sure of its link to any other type accessible to python through ctypes.
Would c_wchar be a good candidate ?, but it seems I can't make that assumption, as python 2.7 could be compiled in UCS-2 if I'm right (source), and I guess windows is really waiting fo UCS-4 there... even if it seems that ctypes.wintypes.LPWSTR is an alias to c_wchart_p in cPython 2.7 (code).
For this question, it is safe to make the assumption that the target platform is python 2.7 on Windows if that helps.
Context (if it has some importance):
I'm in the process of delving for the first time in ctypes to attempt a plain python fix at cPython 2.7's bug hitting windows subprocess.Popen(..) implementation. This bug is a won't fix. This bug prevents the usage of unicode in command line calls (as executable name or arguments). This is fixed in python 3, so I'm having a go at reverse implementing in plain python the actual cPython3 implementation of the required CreateProcess(..) in _winapi.c which calls in turn getenvironment(..).
This possible workaround was mentionned in the comments of this answer to a question related to subprocess.Popen(..) unicode issues.

This doesn't answer the part in the title about build specifically UCS4 buffer. But it gives a partial answer to the question in bold and manage to create a unicode buffer that seems to work on my current python 2.7 on windows: (so maybe UCS4 is not required).
So we are here taking the assumption that c_wchar is what windows require (if it is UCS4 or UCS2 is not so clear to me yet, and it might have no importance, but I recon having a very light confidence in my knowledge here).
So here is the python code that reproduces the C code as requested in the question:
## creation of buffer of size totalsize
wenv = (c_wchar * totalsize)()
wenv.value = (unicode("").join([
unicode("%s=%s\0") % (key, value)
for k, v in env.items()])) + "\0"
This wenv can then be fed to CreateProcessW and this seems to work.

Wrong result from GlobalMemoryStatusEx() on Windows

I'm trying to get total system memory using GlobalMemoryStatusEx():
MEMORYSTATUSEX memory;
GlobalMemoryStatusEx(&memory);
#define PRINT(v) {printf("%s ~%.3fGB\n", (#v), ((double)v)/(1024.*1024.*1024.));}
PRINT(memory.ullAvailPhys);
PRINT(memory.ullTotalPhys);
PRINT(memory.ullTotalVirtual);
PRINT(memory.ullAvailPageFile);
PRINT(memory.ullTotalPageFile);
#undef PRINT
fflush(stdout);
But the result is very weired and not understandable.
memory.ullAvailPhys ~1.002GB
memory.ullTotalPhys ~1.002GB
memory.ullTotalVirtual ~0.154GB
memory.ullAvailPageFile ~0.002GB
memory.ullTotalPageFile ~1.002GB
My total physical memory is 8GB but non of result is close it. All values are much smaller.
Also, the 'total' values keep changing whenever I execute. For instance, another result is here:
memory.ullAvailPhys ~0.979GB
memory.ullTotalPhys ~0.979GB
memory.ullTotalVirtual ~0.154GB
memory.ullAvailPageFile ~0.002GB
memory.ullTotalPageFile ~0.979GB
What am I doing wrong?

This is the part you are missing:
MEMORYSTATUSEX memory = { sizeof memory };
MSDN:
dwLength
The size of the structure, in bytes. You must set this member before calling GlobalMemoryStatusEx.
If you checked value returned by GlobalMemoryStatusEx, you could see the problem by getting error indication there.

PIC16F877 + 24LC64 via i2c

My task is to copy first 255 bytes from external EEPROM (24LC64) to internal (PIC16F877) via i2c bus. I've read AN1488, all datasheets, MikroC gide (oh, yes, I'm using MikroC), but hopeless.. Meaning that my code trys to read smtng but then, reading my PIC's eeprom at programmer (which can't read 24LC64, so I don't even know what's on it, but there is smtng defenately and it is different from what i'm getting), and I'm getting all EEPROM filled by "A2" or "A3". My guess is that it's that first addr, by which I'm addressing to 24LC64. Could you pls inspect my code (it's quite small =)) and point me at my misstakes.
char i;
unsigned short Data;
void main(){
PORTB = 0;
TRISB = 0;
I2C1_Init(100000);
PORTB = 0b00000010;
for (i = 0x00; i<0xFF; i++) {
I2C1_Start();
I2C1_Wr(0xA2); //being 1010 001 0
//I'm getting full internal EE filled with what's in brackets from above
I2C1_Wr(0b00000000);
I2C1_Wr(i);
I2C1_Repeated_Start();
I2C1_Wr(0xA3); //being 1010 001 1
Data = I2C1_Rd(0);
I2C1_Stop();
EEPROM_write(i, Data); //How could that 1010 001 0 get into here???
Delay_100ms();
}
PORTB = 0b00000000;
while (1) {
}
}
P.S. I've tryed this with sequantial read, but it "reads" (again that "A2"..) only 1st byte.. So i've posted this one..
P.S.S. I`m working in "hardware", no Proteus involved..
P.S.S.S. I can't test writing, because I have only one 24LC64 with important info on it, so it's even pulld up to Vcc on it's WP pin...

This isn't a specific answer but more of a checklist for I2C comms, since it's difficult to help with your problem without looking at a scope and without delving into the API calls that you've provided.
Check the address of your EEPROM. I2C uses a 7-bit address with a R/W bit appended to the end, so it's easy to make a mistake here.
Check the command sequence that your EEPROM expects to receive for a "data read"
Check how the I2C_ API that you're using deals with acks from the EEPROM. They need to be handled somewhere (usually in an ISR) and it's not obvious where they're dealt with from your example.
Check that you've got the correct pull-ups on SDA and SCL as per the requirements of your design - they're needed for I2C to work.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

WRITE_ONCE and READ_ONCE in linux kernel - linux-kernel

Can somebody explain the usage of WRITE_ONCE and READ_ONCE? And internally WRITE_ONCE uses a volatile qualifier. Why? How does WRITE_ONCE and READ_ONCE solve cache coherency problem? Difference between (volatile __u8_alias_t ) p and (volatile __u8_alias_t ) p ?

Related

Memory loads experience different latency on the same core

" transmitting spi data..ioc" invalid argument

Building an UCS4 string buffer in python 2.7 ctypes

Wrong result from GlobalMemoryStatusEx() on Windows

PIC16F877 + 24LC64 via i2c

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

WRITE_ONCE and READ_ONCE in linux kernel - linux-kernel

Can somebody explain the usage of WRITE_ONCE and READ_ONCE? And internally WRITE_ONCE uses a volatile qualifier. Why? How does WRITE_ONCE and READ_ONCE solve cache coherency problem? Difference between *(volatile __u8_alias_t *) p and (volatile __u8_alias_t *) *p ?

Related

Memory loads experience different latency on the same core

" transmitting spi data..ioc" invalid argument

Building an UCS4 string buffer in python 2.7 ctypes

Wrong result from GlobalMemoryStatusEx() on Windows

PIC16F877 + 24LC64 via i2c

Categories

Resources

Can somebody explain the usage of WRITE_ONCE and READ_ONCE? And internally WRITE_ONCE uses a volatile qualifier. Why? How does WRITE_ONCE and READ_ONCE solve cache coherency problem? Difference between (volatile __u8_alias_t ) p and (volatile __u8_alias_t ) p ?