Enable Enhanced Instruction Set for a single function/file - visual-studio

Is it possible to enable an enhanced instruction set (SSE/AVX) for a single function or file within a visual studio project? I'd like to have multiple versions of a function which target different instruction sets, all within the same output binary

It is not possible to enable custom instruction set for a single function or a single file. However, you can enable custom instruction set for a single translation unit, which is usually a c/cpp file. Note that the instruction set used in headers depends on how the translation unit is compiled (which includes it) and may be different in different cpp files.
I suppose that if you compile different cpp files with different instruction sets, you can then link them together, and the resulting binary would work. Actually, it is important to ensure that calling conventions are compatible everywhere, and I think they would be, unless you use something like __vectorcall (it requires at least SSE2 BTW).
If you want to compile some functions with multiple instruction sets, you might want to look at the this question. In overall it is called "CPU dispatch"

Related

How can I set optimization level per file in Xcode?

I'm writing some performance critical Swift code that I'm sure is safe to be optimized with -Ounchecked. I'd like the rest of the code to be compiled with a less aggressive optimization.
I can set compiler settings perĀ file as per the answer here: Specific compiler flags for specific files in Xcode
How can I use that knowledge to set a specific file in my project to one of Swift's various optimization levels? (i.e. what compiler settings are available to me and how can I use them)
I am not sure whether this is an answer to your question or just a side note but you can disable/enable optimization on specific function, not just per file, using optimize() compiler directive
void* __attribute__((optimize("O0"))) myfuncn(void* pointer) {
// unmodifiable compiler code
}
This will ensure your myfuncn() function will not be optimized

Missing #InitializeRecord

I am working on a delphi 7 project with a minimalistic system.pas /sysinit.pas
When I try to use records in my project my compiler brings this error:
System unit out of date or corrupted: missing '#InitializeRecord'
Since I am trying to program in pure pascal / no RTL is there a way to manually enable/call the Initialization for the records?!
Thank you for your help.
Delphi compiler relies on some "intrinsic functions", which are called by the generated code.
For instance, when you define a record in your code, the Delphi compiler will generate a call to InitializeRecord, even if you do not use any RTL. This is the same for string and dynamic array handling.
So you won't be able to by-pass and ignore those functions, since they are expected to exist by the compiler itself.
Delphi is not meant to strip down the low-level RTL units. I've done that in some cases:
For our LVCL units (similar to your expections), our enhanced RTL files can be compiled especially to be stripped down when LVCL conditional is defined;
For DWPL-based projects, targeting DOS with the Delphi compiler;
The TORO kernel.
FreePascal is much better when down-stripping the system units. Since it targets even embedded systems, you can optionally strip string support, FPU, or even whole heap process.

How To Structure Large OpenCL Kernels?

I have worked with OpenCL on a couple of projects, but have always written the kernel as one (sometimes rather large) function. Now I am working on a more complex project and would like to share functions across several kernels.
But the examples I can find all show the kernel as a single file (very few even call secondary functions). It seems like it should be possible to use multiple files - clCreateProgramWithSource() accepts multiple strings (and combines them, I assume) - although pyopencl's Program() takes only a single source.
So I would like to hear from anyone with experience doing this:
Are there any problems associated with multiple source files?
Is the best workaround for pyopencl to simply concatenate files?
Is there any way to compile a library of functions (instead of passing in the library source with each kernel, even if not all are used)?
If it's necessary to pass in the library source every time, are unused functions discarded (no overhead)?
Any other best practices/suggestions?
Thanks.
I don't think OpenCL has a concept of multiple source files in a program - a program is one compilation unit. You can, however, use #include and pull in headers or other .cl files at compile time.
You can have multiple kernels in an OpenCL program - so, after one compilation, you can invoke any of the set of kernels compiled.
Any code not used - functions, or anything statically known to be unreachable - can be assumed to be eliminated during compilation, at some minor cost to compile time.
In OpenCL 1.2 you link different object files together.

Creating a list similar to .ctors from multiple object files

I'm currently at a point where I need to link in several modules (basically ELF object files) to my main executable due to a limitation of our target (background: kernel, targeting the ARM architecture). On other targets (x86 specifically) these object files would be loaded at runtime and a specific function in them would be called. At shutdown another function would be called. Both of these functions are exposed to the kernel as symbols, and this all works fine.
When the object files are statically linked however there's no way for the kernel to "detect" their presence so to speak, and therefore I need a way of telling the kernel about the presence of the init/fini functions without hardcoding their presence into the kernel - it needs to be extensible. I thought a solution to this might be to put all the init/fini function pointers into their own section - in much the same way you'd expect from .ctors and .dtors - and call through them at the relevant time.
Note that they can't actually go into .ctors, as they require specific support to be running by the time they're called (specifically threads and memory management, if you're interested).
What's the best way of going about putting a bunch of arbitrary function pointers into a specific section? Even better - is it possible to inject arbitrary data into a section, so I could also store stuff like module name (a struct rather than a function pointer, basically). Using GCC targeted to arm-elf.
GCC attributes can be used to specify a section:
__attribute__((section("foobar")))

Patching an EXE using IDA

Say there is a buggy program that contains a sprintf() and i want to change it to a snprintf so it doesn't have a buffer overflow.. how do I do that in IDA??
You really don't want to make that kind of change using information from IDA pro.
Although IDA's disassembly is relatively high quality, it's not high quality enough to support executable rewriting. Converting a call to sprintf to a call to snprintf requires pushing a new argument on to the stack. That requires the introduction of a new instruction, which impacts the EA of everything that follows it in the executable image. Updating those effective addresses requires extremely high quality disassembly. In particular, you need to be able to:
Identify which addresses in the executable are data, and which ones are code
Identify which instruction operands are symbolic (address references) and which instruction operands are numeric.
Ida can't (reliably) give you that information. Also, if the executable is statically linked against the crt, it may not contain snpritnf, which would make performing the rewriting by hand VERY difficult.
There are a few potential workarounds. If there is sufficient padding available in (or after) the function making the call, you might be able to get away with only rewriting a single function. Alternatively, if you have access to object files, and those object files were compiled with the /GY switch (assuming you are using Visual Studio) then you may be able to edit the object file. However, editing the object file may still require substantial fix ups.
Presumably, however, if you have access to the object files you probably also have access to the source. Changing the source is probably your best bet.

Resources