What can be a reason of PCI MMIO read data delay?

What can be a reason of PCI MMIO read data delay? - linux-kernel

Our team is currently working on custom device.
There is Cyclone V board with COM Express amd64-based PC plugged into it. This board works as PCIe native endpoint. It switches on first, then it switches PC on. PC has linux running on it with kernel 4.10 and some drivers and software working with PCI BAR0 via MMIO.
System works flawlessly until the first reboot from linux terminal. On the next boot MMIO read access is broken, while MMIO write access is OK.
Let's say there are two offsets to read, A and B, with values 0xa and 0xb respectively. Now if we read bytes from these offsets, it seems as if there is delay by 8 read operations in values retrieved:
read A ten times - returns 0xa every time
read B eight times - returns 0xa every time
read B ten times - returns 0xb every time
read A once - returns 0xb
read B seven times - returns 0xb every time
read B once - returns 0xa
read B ten times - returns 0xb
In case offsets A and B are within the same 64-bit word all works as expected.
MMIO access is done via readb/readw/readl/readq functions, the actual function used does not affect this delay at all.
Sequential reboots may fix or break MMIO reads again.
From the linux point of view, mmiotrace gives the same picture with broken data.
From device point of view, signaltap logic analyzer shows valid data values on PCIe core bus.
We have no PCI bus analyzer device, so we do not know any possibility to check data exchange in between those two points.
What can be a reason for such behaviour and how it can be fixed?

Related

Can you deassert CS in the middle of a block read when using an SD card in SPI mode?

I have a limited buffer in my microcontroller so rather than read an entire sector, I'm trying to read N bytes from a sector on an SD card, then send those N bytes to a target device (FPGA), then repeat until the whole sector is read. In order to do that I have to deassert the chip select (CS) to the SD card and assert the CS to the FPGA. When I reassert CS on the SD card I can't seem to read any more data from that sector so I'm wondering if by deasserting CS I'm terminating that transaction on the SD card. Can't seem to find CS specs to prove this though.

When you first assert CS, the SD card (and every other SPI device I've ever used) expects to start a new transaction. The clock where CS first asserts is used as a synchronization point, without which it would just be an endless bitstream and would require an alternate synchronization method of some kind. Specs may not explicitly say that deasserting CS terminates a transaction, but the fact that asserting it starts a new one implies that.
I think standard capacity SD (SDSC) cards will let you read blocks smaller than 512 bytes. That will limit what cards you can use, but if that's acceptable then that's an option to consider. Otherwise your best bet (without modifying the HW) is probably to just read the block over and over as many times as you need.

STM32F411 I need to send a lot of data by USB with high speed

I'm using STM32F411 with USB CDC library, and max speed for this library is ~1Mb/s.
I'm creating a project where I have 8 microphones connected into ADC line (this part works fine), I need a 16-bit signal, so I'm increasing accuracy by adding first 16 signals from one line (ADC gives only 12-bits signal). In my project, I need 96k 16-bit samples for one line, so it's 0,768M signals for all 8 lines. This signal needs 12000Kb space, but STM32 have only 128Kb SRAM, so I decided to send about 120 with 100Kb data in one second.
The conclusion is I need ~11,72Mb/s to send this.
The problem is that I'm unable to do that because CDC USB limited me to ~1Mb/s.
Question is how to increase USB speed to 12Mb/s for STM32F4. I need some prompt or library.
Or maybe should I set up "audio device" in CubeMX?

If small b means byte in your question, the answer is: it is not possible as your micro has FS USB which max speeds is 12M bits per second.
If it means bits your 1Mb (bit) speed assumption is wrong. But you will not reach the 12M bit payload transfer.
You may try to write (only if b means bit) your own class but I afraid you will not find a ready made library. You will need also to write the device driver on the host computer

How does computer really request data in a computer?

I was wondering how exactly does a CPU request data in a computer. In a 32 Bits architecture, I thought that a computer would put a destination on the address bus and would receive 4 Bytes on the data bus. I recently read on the memory alignment in computer and it confused me. I read that the CPU has to read two times the memory to access a not multiple 4 address. Why is so? The address bus lets it access not multiple 4 address.

The address bus itself, even in a 32-bit architecture, is usually not 32 bits in size. E.g. the Pentium's address bus was 29 bits. Given that it has a full 32-bit range, in the Pentium's case that means each slot in memory is eight bytes wide. So reading a value that straddles two of those slots means two reads rather than one, and alignment prevents that from happening.
Other processors (including other implementations of the 32-bit Intel architecture) have different memory access word sizes but the point is generic.

Linux block driver merge bio's

I have a block device driver which is working, after a fashion. It is for a PCIe device, and I am handling the bios directly with a make_request_fn rather than use a request queue, as the device has no seek time. However, it still has transaction overhead.
When I read consecutively from the device, I get bios with many segments (generally my maximum of 32), each consisting of 2 hardware sectors (so 2 * 2k) and this is then handled as one scatter-gather transaction to the device, saving a lot of signaling overhead. However on a write, the bios each have just one segment of 2 sectors and therefore the operations take a lot longer in total. What I would like to happen is to somehow cause the incoming bios to consist of many segments, or to merge bios sensibly together myself. What is the right approach here?
The current content of the make_request_fn is something along the lines of:
Determine read/write of the bio
For each segment in the bio, make an entry in a scatterlist* with sg_set_page
Map this scatterlist to PCI with pci_map_sg
For every segment in the scatterlist, add to a device-specific structure defining a multiple-segment DMA scatter-gather operation
Map that structure to DMA
Carry out transaction
Unmap structure and SG DMA
Call bio_endio with -EIO if failed and 0 if succeeded.
The request queue is set up like:
#define MYDEV_BLOCK_MAX_SEGS 32
#define MYDEV_SECTOR_SIZE 2048
blk_queue_make_request(mydev->queue, mydev_make_req);
set_bit(QUEUE_FLAG_NONROT, &mydev->queue->queue_flags);
blk_queue_max_segments(mydev->queue, MYDEV_BLOCK_MAX_SEGS);
blk_queue_physical_block_size(mydev->queue, MYDEV_SECTOR_SIZE);
blk_queue_logical_block_size(mydev->queue, MYDEV_SECTOR_SIZE);
blk_queue_flush(mydev->queue, 0);
blk_queue_segment_boundary(mydev->queue, -1UL);
blk_queue_max_segments(mydev->queue, MYDEV_BLOCK_MAX_SEGS);
blk_queue_dma_alignment(mydev->queue, 0x7);

Non-standard comport baudrates in windows

Do the windows built in com port drivers support non-standard baudrates? (actually does windows have a built in driver for com1 & 2?)
The reason I ask is I'm having trouble getting a reliable connection to a device that uses the unusual baudrate 5787. The device & PC talk briefly, then seem to loose the dialogue, and then get it again. Once a long message is sent, it gets lost at the other end, a short time later the dialogue is back. This sounds to me like the classic baudrate mismatch. Not quite close enough to be reliable though but close enough that some data gets through.
If I use an inexpensive PCI serial board it works without problems. It's only computers that use on board serial I've found don't work properly.

Baudrates in a PC are controlled by a UART and a crystal. The crystal frequency determines what baudrates the serial port can generate. The baudrate is often generated by a divide by 16 counter. The crystal frequency for a standard PC is normally 1.8432 MHz. Dividing that by 16 gives you 115200 which is usually the maximum the com port can do.
Inside the UART is a DLAB register. This further divides the clock. So essentially, to get 5787 baud you're talking about dividing 115200 by 5787 which gives you 19.906687...
It's close to 20 you'd load the DLAB register with 20. 115200 / 20 gives you 5760. Therefore you're probably getting 5760 baud out of the PC com port. That's probably enough of a difference to cause the issue that you're seeing.

No, the difference from 5760 to 5787 is nowhere near enough to explain any sort of problems. UARTs identify the start of a byte from the leading edge of the start bit, then sample the data in the middle of each bit. This means they are tolerant to errors in Baud rate up to the point where the predicted middle is an edge. That's a half bit error in one full byte, because each byte has a stop bit so there's a re-synchronise event per byte. On half bit in ten bits (8 data, one start, one stop) is 5%. The difference from 5760 to 5787 is only 0.5% so miles inside the safe region.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio