error LNK2001: unresolved external symbol _fltused in wdk - wdk

I am trying to define a double data type variable in a C code which is going to be used in the Windows kernel. The code compiles but gives error while linking. I tried using libcntpr.lib in the source file and also defining __fltused variable in the code but to no avail. I'll really appreciate if someone can help me on how to use this.

Don't know if still applicable to current WDK but Walter Oney demotivates the use of floating point stuff in drivers here.
The problem is worse than just finding the right library,
unfortunately. The C compiler's floating point support assumes that it
will be operating in a an application environment where you can
initialize the coprocessor, install some exception handlers, and then
blast away. It also assumes that the operating system will take care
of saving and restoring each thread's coprocessor context as required
by all the thread context switches that occur from then on.
These assumptions aren't usually true in a driver. Furthermore, the
runtime library support for coprocessor exceptions can't work because
there's a whole bunch of missing infrastructure.
What you basically need to do is write your code in such a way that
you initialize the coprocessor each time you want to use it (don't
forget KeSaveFloatingPointState and KeRestoreFloatingPointState). Set
things up so that the coprocessor will never generate an exception,
too. Then you can simply define the symbol __fltused somewhere to
satisfy the linker. (All that symbol usually does is drag in the
runtime support. You don't want that support becuase, as I said, it
won't work in kernel mode.) You'll undoubtedly need some assembly
language code for the initialization steps.
If you have a system thread that will be doing all your floating point
math, you can initialize the coprocesor once at the start of the
thread. The system will save and restore your state as necessary from
then on.
Don't forget that you can only do floating point at IRQL <
DISPATCH_LEVEL.
There's FINIT, among other things. If you're rusty on coprocessor
programming, my advice would be to tell your management that this is a
specialized problem that will require a good deal of study to solve.
Then fly off to Martinique for a week or so (after hurricane season,
that is) to perform the study in an appropriate environment.
Seriously, if you're unfamiliar with FINIT and other math coprocessor
instructions, this is probably not something you should be
incorporating into your driver.
There is also an interesting read from Microsoft: C++ for Kernel Mode Drivers: Pros and Cons
On x86 systems, the floating point and multimedia units are not
available in kernel mode unless specifically requested. Trying to use
them improperly may or may not cause a floating-point fault at raised
IRQL (which will crash the system), but it could cause silent data
corruption in random processes. Improper use can also cause data
corruption in other processes; such problems are often difficult to
debug.

Related

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

GCC: In what way is visibility internal "pretty useless in real world usage"?

I am currently developing a library for QNX (x86) using GCC, and I want to make some symbols which are used exclusively in the library and are invisible to other modules, notably to the code which uses the library.
This works already, but, while doing the research how to achieve it, I have found a very worrying passage in GCC's documentation (see http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Code-Gen-Options.html#Code-Gen-Options, explanation for flag -fvisibility):
Despite the nomenclature, default always means public; i.e., available
to be linked against from outside the shared object. protected and
internal are pretty useless in real-world usage so the only other
commonly used option is hidden. The default if -fvisibility isn't
specified is default, i.e., make every symbol public—this causes the
same behavior as previous versions of GCC.
I am very interested in how visibility "internal" is pretty useless in real-world-usage. From what I have understood from another passage from GCC's documentation (http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Attributes.html#Function-Attributes, explanation of the visibility attribute), visibility "internal" is even stronger (more useful for me) than visibility "hidden":
Internal visibility is like hidden visibility, but with additional
processor specific semantics. Unless otherwise specified by the psABI,
GCC defines internal visibility to mean that a function is never
called from another module. Compare this with hidden functions which,
while they cannot be referenced directly by other modules, can be
referenced indirectly via function pointers. By indicating that a
function cannot be called from outside the module, GCC may for
instance omit the load of a PIC register since it is known that the
calling function loaded the correct value.
Could anybody explain in depth?
If you just want to hide your internal symbols, just use -fvisibility=hidden. It does exactly what you want.
The internal flag goes much further than the hidden flag. It tells the compiler that ABI compatibility isn't important, since nobody outside the module will ever use the function. If some outside code does manage to call the function, it will probably crash.
Unfortunately, there are plenty of ways to accidentally expose internal functions to the outside world, including function pointers and C++ virtual methods. Plenty of libraries use callbacks to signal events, for example. If your program uses one of these libraries, you must never use an internal function as the callback. If you do, the compiler and linker won't notice anything wrong, and your program will have subtle, hard-to-debug crash bugs.
Even if your program doesn't use function pointers now, it might start using them years down the road when everyone (including you) has forgotten about this restriction. Sacrificing safety for tiny performance gains is usually a bad idea, so internal visibility is not a recommended project-wide default.
The internal visibility is more useful if you have some heavily-used code that you are trying to optimize. You can mark those few specific functions with __attribute__ ((visibility ("internal"))), which tells the compiler that speed is more important than compatibility. You should also leave a comment for yourself, so you remember to never take a pointer to these functions.
I cannot provide in-depth answer, but I think that "internal" might be unpractical because it is processor dependent. You might get expected behaviour on some systems, but on others you get only "hidden".

what's the memory allocation functions can be called from the interrupt environment in AIX?

xmalloc can be used in the process environment only when I write a AIX kernel extension.
what's the memory allocation functions can be called from the interrupt environment in AIX?
thanks.
The network memory allocation routines. Look in /usr/include/net/net_malloc.h. The lowest level is net_malloc and net_free.
I don't see much documentation in IBM's pubs nor the internet. There are a few examples in various header files.
There is public no prototype that I can find for these.
If you look in net_malloc.h, you will see MALLOC and NET_MALLOC macros defined that call it. Then if you grep in all the files under /usr/include, you will see uses of these macros. From these uses, you can deduce the arguments to the macros and thus deduce the arguments to net_malloc itself. I would make one routine that is a pass through to net_malloc that you controlled the interface to.
On your target system, do "netstat -m". The last bucket size you see will be the largest size you can call net_malloc with the M_NOWAIT flag. M_WAIT can be used only at process time and waits for netm to allocate more memory if necessary. M_NOWAIT returns with a 0 if there is not enough memory pinned. At interrupt time, you must use M_NOWAIT.
There is no real checking for the "type" but it is good to pick an appropriate type for debugging purposes later on. The netm output from kdb shows the type.
In a similar fashion, you can figure out how to call net_free.
Its sad IBM has chosen not to document this. An alternative to get this information officially is to pay for an "ISV" question. If you are doing serious AIX development, you want to become an ISV / Partner. It will save you lots of heart break. I don't know the cost but it is within reach of small companies and even individuals.
This book is nice to have too.

Is UNALIGNED memory access required on LINUX (porting from Windows to Linux)

I am porting code from Windows to Linux (Red Hat Linux or Fed). In the existing code, I do find code having (datatype UNALIGNED*) reference.
Can you please let me know
1) is UNALIGNED memory access required when porting to Linux
2) If required, can you please let me know how can I achieve the same.
I have looked around for an linux version. I have come across the use of arm/unaligned.h. When I try to add the same, it gives me an error "No such file or directory".
Thanks.
With recent gcc you might consider using __attribute__ ((__packed__))
But I suggest to avoid using it when possible. The compiler makes a quite good job on aligning fields. And the ABI might define rules for alignment.
You should understand why your source code use UNALIGNED; is it because the data has an externally defined format, or is it for "performance" reasons? Leave the optimization to the compiler!
Alignment is a CPU restriction, not a OS thing. x86 CPUs can do unaligned accesses (with some performance penalty), many others will produce a bus error under the same Linux (or whatever) versions if you try to load a word from something other than an aligned pointer.
The UNALIGNED keyword in MSVC is, on x86, a noop as far as I know. On other architectures it will emit more complicated instruction sequences to make sure that the access completes successfully. Are you trying to find a gcc equivalent? I don't believe one exists.

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources