mlx4 VF PF responsibilities - Query Mellanox - linux-kernel

I was going through the Mellanox driver (mlx4) and then I had difficulty understanding which portion of code corresponds to the one executed by the PF(Physical Function Driver) and which portion of code by (Virtual Function Driver) in the SRIOV mode.
My confusion is because, I was of the understanding that the QPs, CQs (and their creation, state mgmt commands) etc are to be performed by the virtual function driver(VF driver).
And the role of the physical function driver(PF driver) is to just take care of the resource_tracker.c and ICM allocation.
But of late, I think I may have understood wrong. This is because there is code that is specifically executed when mlx4_is_master is true/false( indicating PF or VF).
And then, there is code which is not surrounded by this test, which indicates it is executed in both cases(PF driver as well as VF driver).
Is my understanding correct? If yes, then are the QPs, CQs and ethernet tx, rx related functionality is executed both by master and slave?
Is there any way we can clearly separate the files that are used by PF vs the files that are used by VF in the (drivers/net/ethernet/mlx4 sub-directory)?
I would be really thankful and really appreciate all the help/clarification I can get in understanding this.
Thank you so much.
Best Regards,
Bob

If anyone is interested, this question was also posted and answered here:
http://linux-pci.vger.kernel.narkive.com/mxQuEb2w/mlx4-query-regarding-pf-vf-functionality-division

Related

Meaning of IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONISATION

I'm currently developping a miniFilter driver from scratch.
Right now i'm just trying to understand how all of this works, which actions leads to which IRP event etc...
After some tests with the miniSpy filter Driver, I can see those 3 Major operation and can't figure out what is done.
IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION
IRP_MJ_QUERY_INFORMATION
IRP_MJ_RELEASE_FOR_SECTION_SYNCHRONIZATION
I'm usually using this link : https://msdn.microsoft.com/en-us/library/windows/hardware/ff548630(v=vs.85).aspx
But I can't found ACQUIRE/RELEASE_FOR_SECTION_SYNCHRONIZATION.
Can someone explain me what they mean ?
First of all you might want to check this out.
You can think of the IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION as callback for CreateaFileMapping. It essentially tells you that the FILE_OBJECT in question is about to have section object created for it.
IRP_MJ_QUERY_INFORMATION its the file-system callback for ZwQueryInformationFile. Check that one out for more details on various information classes and what structures are behind each buffer for each class.
IRP_MJ_RELEASE_FOR_SECTION_SYNCHRONIZATION has no parameters. Consider it as an equivalent of CloseHandle(SectionHandle). Check this.
Hope it clears things out.
Good luck.

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

HOW do I write from a Spartan6 to the Micron external Cellular RAM on the Nexys3 FPGA Board?

I have looked everywhere, the datasheet, the Xilinx website, digilent, etc. etc. and can't find anything! I was able to use the Adept tool to verify that my Cellular RAM is functioning correctly, but I just can't find any stock VHDL code as a controller to write data to and read data from it!! Help!!
Found this link but it's for asynchronous mode, which is not nearly fast enough:
http://embsi.blogspot.com/2013/01/how-to-use-cellular-ram-from-micron.html
Eventually found this on the Nexys 2 Digilent page:
http://www.digilentinc.com/Products/Detail.cfm?Prod=NEXYS2
under
"Onboard Memory controller reference design"
It's just a shame that this was not included with Nexys 3 details as it would have saved a lot of time!
Hopefully somebody else with this issue could at least find what I posted here and find it quickly...

make_request and queue limits

I'm writing a linux kernel module that emulates a block device.
There are various calls that can be used to tell the block size to the kernel, so it aligns and sizes every request toward the driver accordingly. This is well documented in the "Linux Device Drives 3" book.
The book describes two methods of implementing a block device: using a "request" function, or using a "make_request" function.
It is not clear, whether the queue limit calls apply when using the minimalistic "make_request" approach (which is also the more efficient one if the underlying device is has really no benefit from sequential over random IO, which is the case with me).
I would really like to get the kernel to talk to me using 4K block sizes, but I see smaller bio-s hitting my make_request function.
My question is that should the blk_queue_limit_* affect the bio size when using make_request?
Thank you in advance.
I think I've found enough evidence in the kernel code that if you use make_request, you'll get correctly sized and aligned bios.
The answer is:
You must call blk_queue_make_request first, because it sets queue limits to defaults. After this, set queue limits as you'd like.
It seems that every part of the kernel submitting bios are do check for validity, and it's up to the submitter to do these checks. I've found incomplete validation in submit_bio and generic_make_request. But as long as no one does tricks, it's fine.
Since it's a policy to submit correct bio's, but it's up to the submitter to take care, and no one in the middle does, I think I have to implement explicit checks and fail the wrong bio-s. Since it's a policy, it's fine to fail on violation, and since it's not enforced by the kernel, it's a good thing to do explicit checks.
If you want to read a bit more on the story, see http://tlfabian.blogspot.com/2012/01/linux-block-device-drivers-queue-and.html.

Snoop interprocess communications

Has anyone tried to create a log file of interprocess communications? Could someone give me a little advice on the best way to achieve this?
The question is not quite clear, and comments make it less clear, but anyway...
The two things to try first are ipcs and strace -e trace=ipc.
If you want to log all IPC(seems very intensive), you should consider instrumentation.
Their are a lot of good tools for this, check out PIN in perticular, this section of the manual;
In this example, we show how to do
more selective instrumentation by
examining the instructions. This tool
generates a trace of all memory
addresses referenced by a program.
This is also useful for debugging and
for simulating a data cache in a
processor.
If your doing some heavy weight tuning and analysis, check out TAU (Tuning and analysis utilitiy).
Communication to a kernel driver can take many forms. There is usually a special device file for communication, or there can be a special socket type, like NETLINK. If you are lucky, there's a character device to which read() and write() are the sole means of interaction - if that's the case then those calls are easy to intercept with a variety of methods. If you are unlucky, many things are done with ioctls or something even more difficult.
However, running 'strace' on the program using the kernel driver to communicate can reveal just about all it does - though 'ltrace' might be more readable if there happens to be libraries the program uses for communication. By tuning the arguments to 'strace', you can probably get a dump which contains just the information you need:
First, just eyeball the calls and try to figure out the means of kernel communication
Then, add filters to strace call to log only the kernel communication calls
Finally, make sure strace logs the full strings of all calls, so you don't have to deal with truncated data
The answers which point to IPC debugging probably are not relevant, as communicating with the kernel almost never has anything to do with IPC (atleast not the different UNIX IPC facilities).

Resources