How do I retrieve high and low-order parts of a value from two registers in inline assembly?

How do I retrieve high and low-order parts of a value from two registers in inline assembly? - gcc

I'm currently working on a little game that can run from the boot sector of a hard drive, just for something fun to do. This means my program runs in 16-bit real mode, and I have my compiler flags set up to emit pure i386 code. I'm writing the game in C++, but I do need a lot of inline assembly to talk to the BIOS via interrupt calls. Some of these calls return a 32-bit integer, but stored in two 16-bit registers. Currently I'm doing the following to get my number out of the assembly:
auto getTicks = [](){
uint16_t ticksL{ 0 }, ticksH{ 0 };
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL) : "a"(0x0));
return static_cast<uint32_t>( (ticksH << 16) | ticksL );
};
This is a lambda function I use to call this interrupt function which returns a tick count. I'm aware that there are better methods to get time data, and that I haven't implemented a check for AL to see if midnight has passed, but that's another topic.
As you can see, I have to use two 16-bit values, get the register values separately, then combine them into a 32-bit number the way you see at the return statement.
Is there any way I could retrieve that data into a single 32-bit number in my code right away avoid the shift and bitwise-or? I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.

I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.
As Jester has already pointed out, these are in fact 2 separate registers, so there is no way to retrieve "the original 32-bit register."
One other point: That interrupt modifies the ax register (returning the 'past midnight' flag), however your asm doesn't inform gcc that you are changing ax. Might I suggest something like this:
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL), "=a"(midnight) : "a"(0x0));
Note that midnight is also a uint16_t.

As other answers suggest you can't load DX and CX directly into a 32-bit register. You'd have to combine them as you suggest.
In this case there is an alternative. Rather than using INT 1Ah/AH=0h you can read the BIOS Data Area (BDA) in low memory for the 32-bit DWORD value and load it into a 32-bit register. This is allowed in real mode on i386 processors. Two memory addresses of interest:
40:6C dword Daily timer counter, equal to zero at midnight;
incremented by INT 8; read/set by INT 1A
40:70 byte Clock rollover flag, set when 40:6C exceeds 24hrs
These two memory addresses are in segment:offset format, but would be equivalent to physical address 0x0046C and 0x00470.
All you'd have to do is temporarily set the DS register to 0 (saving the previous value), turn off interrupts with CLI retrieve the values from lower memory using C/C++ pointers, re-enable interrupts with STI and restore DS to the previously saved value. This of course is added overhead in the boot sector compared to using INT 1Ah/AH=0h but would allow you direct access to the memory addresses the BIOS is reading/writing on your behalf.
Note: If DS is set to zero already no need to save/set/restore it. Since we don't see the code that sets up the environment before calling into the C++ code I don't know what your default segment values are. If you don't need to retrieve both the roll over and timer values and only wish to get them individually you can eliminate the CLI/STI.

You're looking for the 'A' constraint, which refers to the dx:ax register pair as a double-wide value. You can see the full set of defined constraints for x86 in the gcc documentation. Unfortunately there are no constraints for any other register pairs, so you have to get them as two values and reassemble them with shift and or, like you describe.

Related

Documentation for MIPS predefined macros

When I compile a C code using GCC to MIPS, it contains code like:
daddiu $28,$28,%lo(%neg(%gp_rel(f)))
And I have trouble understanding instructions starting with %.
I found that they are called macros and predefined macros are dependent on the assembler but I couldn't find description of the macros (as %lo, %neg etc.) in the documentation of gas.
So does there exist any official documentation that explains macros used by GCC when generating MIPS code?
EDIT: The snippet of the code comes from this code.

This is a very odd instruction to find in compiled C code, since this instruction is not just using $28/$gp as a source but also updating that register, which the compiler shouldn't be doing, I would think.  That register is the global data pointer, which is setup on program start, and used by all code accessing near global variables, so it shouldn't ever change once established.  (Share a godbolt.org example, if you would.)
The functions you're referring to are for composing the address of labels that are located in global data.  Unlike x86, MIPS cannot load (or otherwise have) a 32-bit immediate in one instruction, and so it uses multiple instructions to do work with 32-bit immediates including address immediates.  A 32-bit immediate is subdivided into 2 parts — the top 16-bits are loaded using an LUI and the bottom 16-bits using an ADDI (or LW/SW instruction), forming a 2 instruction sequence.
MARS does not support these built-in functions.  Instead, it uses the pseudo instruction, la $reg, label, which is expanded by the assembler into such a sequence.  MARS also allows lw $reg, label to directly access the value of a global variable, however, that also expands to multiple instruction sequence (sometimes 3 instructions of which only 2 are really necessary..).
%lo computes the low 16-bits of a 32-bit address for the label of the argument to the "function".  %hi computes the upper 16-bits of same, and would be used with LUI.  Fundamentally, I would look at these "functions" as being a syntax for the assembly author to communicate to the assembler to share certain relocation information/requirements to the linker.  (In reverse, a disassembler may read relocation information and determine usage of %lo or %hi, and reflect that in the disassembly.)
I don't know %neg() or %gp_rel(), though could guess that %neg negates and %gp_rel produces the $28/$gp relative value of the label.
%lo and %hi are a bit odd in that the value of the high immediate sometimes is offset by +1 — this is done when the low 16-bits will appear negative.  ADDI and LW/SW will sign extend, which will add -1 to the upper 16-bits loaded via LUI, so %hi offsets its value by +1 to compensate when that happens.  This is part of the linker's operation since it knows the full 32-bit address of the label.
That generated code is super weird, and completely different from that generated by the same compiler, but 32-bit version.  I added the option -msym32 and then the generated code looks like I would expect.
So, this has something to do with the large(?) memory model on MIPS 64, using a multiple instruction sequence to locate and invoke g, and swapping the $28/$gp register as part of the call.  Register $25/$t9 is somehow also involved as the generated code sources it without defining it; later, prior to where we would expect the call it sets $25.
One thing I particularly don't understand, though, is where is the actual function invocation in that sequence!  I would have expected a jalr instruction, if it's using an indirect branch because it doesn't know where g is (except as data), but there's virtually nothing but loads and stores.
There are two additional oddities in the output: one is the blank line near where the actual invocation should be (maybe those are normal, but usually don't see those inside a function) and the other is a nop that is unnecessary but might have been intended for use in the delay slot following an invocation instruction.

Registers usage during compilation

I found information that general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables, but do they have any other purpose? Also which memory are this registers part of(cache/RAM)?
Finally what does global pointer gp in register r26 points to?

Also which memory are this registers part of(cache/RAM)?
Register are on-processors storage allowing a fast data transfer (2 reads/1 write per cycle). They store variables that can represent memory addresses, but, besides that, are completely unrelated to memory or cache.
I found information that general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables, but do they have any other purpose?
Registers are use with respect to hardware or software conventions. Hardware conventions are related to the instruction set architecture. For instance, the call instruction transfers control to a subroutine and stores return address in register r31 (ra). Very nasty things are likely to happen if you overwrite r31 register by any mean without precautions. Software conventions are supposed to insure a proper behavior if used consistently within software. They indicate which register have special use, which need to be saved when context switching, etc. These conventions can be changed without hardware modifications, but doing so will probably require changes in several software tools (compiler, linker, loader, OS, ...).
general purpose registers r1-r23 and r26-r28 are used by the compiler to store local variables
Actually, some registers are reserved.
r1 is used by asm for macro expansion. (sw)
r2-r7 are used by the compiler to pass arguments to functions or get return values. (sw)
r24-r25 can only be used by exception handlers. (sw)
r26-r28 hold different pointers (global, stack, frame) that are set either by the runtime or the compiler and cannot be modified by the programmer.(sw)
r29-r31 are hw coded returns addresses for subprograms or interrupts/exceptions. (hw)
So only r8-r23 can used by the compiler.
but do they have any other purpose?
No, and that's why they can be freely used by the compiler or programmer.
Finally what does global pointer in register r26 points to?
Accessing memory with load or stores have a based memory addressing. Effective address for ldx or stx (where 'x' is is b, bu, h, etc depending on data characteristics) is computed by adding a register and a 16 bits immediate. This only allows to go an an address within +/-32k of the content of register.
If the processor has the address of a var in a register (for instance the value returned by a malloc) the immediate allows to do a displacement to access fields in a struct, next array value, etc.
If the address is local or global, it must be computed by the program. Pointers registers are used to that purpose. Local vars addresses are computed by adding an immediate to the stack pointer (r27or sp).
Addresses of global or static vars are computed by adding an integer to the global pointer (r26 or gp). Content of gp corresponds to the start of the memory data segment and is initialized by the loader just before program execution and must not be modified. The immediate displacement with respect to the start of data segment is computed by the linker when it defines memory layout.
Note that this only allows to access 64k memory because of the 16 bits immediate width. If the size of global/static variables exceeds this value and a var is not within this range, a couple of instructions are required to enter the 32 bits of the address of the var before the data transfer. With gp this is not required and it is a way to provide a faster access to global variables.

What does "a GP/function address pair" mean in IA-64?

What does "a GP/function address pair" mean in Itanium C++ ABI? What does GP stand for?

Short explanation: gp is, for all practical means, a hidden parameter to all functions that comply with the Itanium ABI. It's a kind of this pointer to the global variables the function uses. As far as I know, no mainstream OS does it anymore.
GP stands for "globals pointer". It's a base address for data statically allocated by executables, and the Itanium architecture has a register just for it.
For instance, if you had these global variables and this function in your program:
int foo;
int bar;
int baz;
int func()
{
foo++;
bar += foo;
baz *= bar / foo;
return foo + bar + baz;
}
The gp/function pair would conceptually be &foo, &func. The code generated for func would refer to gp to find where the globals are located. The compiler knows foo can be found at gp, bar can be found at gp + 4 and baz can be found at gp + 8.
Assuming func is defined in an external library, if you call it from your program, the compiler will use a sequence of instructions like this one:
save current gp value to the stack;
load code address from the pair for func into some register;
load gp value from same pair into GP;
perform indirect call to the register where we stored the code address;
restore old gp value that we saved on the stack before, resume calling function.
This makes executables fully position-independent since they don't ever store absolute addresses to data symbols, and therefore makes it possible to maintain only one instance of any executable file in memory, no matter how many processes use it (you could even load the same executable multiple times within a single process and still only have one copy of the executable code systemwide), at the cost of making function pointers a little weird. With the Itanium ABI, a function pointer is not a code address (like it is with "regular" x86 ABIs): it's an address to a gp value and a code address, since that code address might not be worth much if it can't access its global variables, just like a method might not be able to do much if it doesn't have a this pointer.
The only other ABI I know that uses this concept was the Mac OS Classic PowerPC ABI. They called those pairs "transition vectors".
Since x86_64 supports RIP-relative addressing (x86 did not have an equivalent EIP-relative addressing), it's now pretty easy to create position-independent code without having to use an additional register or having to use "enhanced" function pointers. Code and data just have to be kept at constant offsets. Therefore, this part of the Itanium ABI is probably gone for good on Intel platforms.
From the Itanium Register Conventions:
8.2 The gp Register
Every procedure that references statically-allocated data or calls another procedure requires a pointer to its data segment in the gp register, so that it can access its static data and its linkage tables. Each load module has its own data segment, and the gp register must be set correctly prior to calling any entry point within that load module.
The linkage conventions require that each load module define exactly one gp value to refer to a location within its short data segment. It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The DLL loader will determine the absolute value of the gp register for each load module after loading its data segment into memory.
For calls within a load module, the gp register will remain unchanged, so calls known to be local can be optimized accordingly.
For calls between load modules, the gp register must be initialized with the correct gp value for the new load module, and the calling function must ensure that its own gp value is saved and restored.

Just a comment about this quote from the other answer:
It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries.
What this is talking about: Itanium has three different ways to put a value into a register (where 'immediate' here means 'offset from the base'). You can support a full 64 bit offset from anywhere, but it takes two instructions:
// r34 has base address
movl r33 = <my immediate>
;;
add r35 = r34, r35
;;
Not only does that take 2 separate clocks, it takes 3 instruction slots across 2 bundles to make that happen.
There are two shorter versions: add14 (also adds) and add22 (also addl). The difference was in the immediate size each could handle. Each took a single 'A' slot iirc, and completed in a single clock.
add14 could use any register as the source & target, but could only handle up to 14 bit immediates.
add22 could use any register as the target, but for source, only two bits were allocated. So you could only use r0, r1, r2, r3 as the source regs. r0 is not a real register - it's hardwired to 0. But using one of the other 3 as a local stack registers, means you can address 256 times the memory using simple offsets, compared to using the local stack registers. Therefore, if you put your global base address into r1 (the convention), you could access that much more local offsets before having to do a separate movl and/or modifying gp for the next section of code.

Why doesn't gcc handle volatile register?

I'm working on a timing loop for the AVR platform where I'm counting down a single byte inside an ISR. Since this task is a primary function of my program, I'd like to permanently reserve a processor register so that the ISR doesn't have to hit a memory barrier when its usual code path is decrement, compare to zero, and reti.
The avr-libc docs show how to bind a variable to a register, and I got that working without a problem. However, since this variable is shared between the main program (for starting the timer countdown) and the ISR (for actually counting and signaling completion), it should also be volatile to ensure that the compiler doesn't do anything too clever in optimizing it.
In this context (reserving a register across an entire monolithic build), the combination volatile register makes sense to me semantically, as "permanently store this variable in register rX, but don't optimize away checks because the register might be modified externally". GCC doesn't like this, however, and emits a warning that it might go ahead and optimize away the variable access anyway.
The bug history of this combination in GCC suggests that the compiler team is simply unwilling to consider the type of scenario I'm describing and thinks it's pointless to provide for it. Am I missing some fundamental reason why the volatile register approach is in itself a Bad Idea, or is this a case that makes semantic sense but that the compiler team just isn't interested in handling?

The semantics of volatile are not exactly as you describe "don't optimize away checks because the register might be modified externally" but are actually more narrow: Try to think of it as "don't cache the variable's value from RAM in a register".
Seen this way, it does not make any sense to declare a register as volatile because the register itself cannot be 'cached' and therefore cannot possibly be inconsistent with the variable's 'actual' value.
The fact that read accesses to volatile variables are usually not optimzed away is merely a side effect of the above semantics, but it's not guaranteed.
I think GCC should assume by default that a value in a register is 'like volatile' but I have not verified that it actually does so.
Edit:
I just did a small test and found:
avr-gcc 4.6.2 does not treat global register variables like volatiles with respect to read accesses, and
the Naggy extension for Atmel Studio detects an error in my code: "global register variables are not supported".
Assuming that global register variables are actually considered "unsupported" I am not surprised that gcc treats them just like local variables, with the known implications.
My test code looks like this:
uint8_t var;
volatile uint8_t volVar;
register uint8_t regVar asm("r13");
#define NOP asm volatile ("nop\r\n":::)
int main(void)
{
var = 1; // <-- kept
if ( var == 0 ) {
NOP; // <-- optimized away, var is not volatile
}
volVar = 1; // <-- kept
if ( volVar == 0 ) {
NOP; // <-- kept, volVar *is* volatile
}
regVar = 1; // <-- optimized away, regVar is treated like a local variable
if ( regVar == 0 ) {
NOP; // <-- optimized away consequently
}
for(;;){}
}

The reason you would use the volatile keyword on AVR variables is to, as you said, avoid the compiler optimizing access to the variable. The question now is, how does this happen though?
A variable has two places it can reside. Either in the general purpose register file or in some location in RAM. Consider the case where the variable resides in RAM. To access the latest value of the variable, the compiler loads the variable from RAM, using some form of the ld instruction, say lds r16, 0x000f. In this case, the variable was stored in RAM location 0x000f and the program made a copy of this variable in r16. Now, here is where things get interesting if interrupts are enabled. Say that after loading the variable, the following occurs inc r16, then an interrupt triggers and its corresponding ISR is run. Within the ISR, the variable is also used. There is a problem, however. The variable exists in two different versions, one in RAM and one in r16. Ideally, the compiler should use the version in r16, but this one is not guaranteed to exist, so it loads it from RAM instead, and now, the code does not operate as needed. Enter then the volatile keyword. The variable is still stored in RAM, however, the compiler must ensure that the variable is updated in RAM before anything else happens, thus the following assembly may be generated:
cli
lds r16, 0x000f
inc r16
sei
sts 0x000f, r16
First, interrupts are disabled. Then, the the variable is loaded into r16. The variable is increased, interrupts are enabled and then the variable is stored. It may appear confusing for the global interrupt flag to be enabled before the variable is stored back in RAM, but from the instruction set manual:
The instruction following SEI will be executed before any pending interrupts.
This means that the sts instruction will be executed before any interrupts trigger again, and that the interrupts are disabled for the minimum amount of time possible.
Consider now the case where the variable is bound to a register. Any operations done on the variable are done directly on the register. These operations, unlike operations done to a variable in RAM, can be considered atomic, as there is no read -> modify -> write cycle to speak of. If an interrupt triggers after the variable is updated, it will get the new value of the variable, since it will read the variable from the register it was bound to.
Also, since the variable is bound to a register, any test instructions will utilize the register itself and will not be optimized away on the grounds the compiler may have a "hunch" it is a static value, given that registers by their very nature are volatile.
Now, from experience, when using interrupts in AVR, I have sometimes noticed that the global volatile variables never hit RAM. The compiler kept them on the registers all the time, bypassing the read -> modify -> write cycle alltogether. This was due, however, to compiler optimizations, and it should not be relied on. Different compilers are free to generate different assembly for the same piece of code. You can generate a disassembly of your final file or any particular object files using the avr-objdump utility.
Cheers.

Reserving a register for one variable for a complete compilation unit is probably too restrictive for a compiler's code generator. That is, every C routine would have to NOT use that register.
How do you guarantee that other called routines do NOT use that register once your code goes out of scope? Even stuff like serial i/o routines would have to NOT use that reserved register. Compilers do NOT recompile their run-time libraries based on a data definition in a user program.
Is your application really so time sensitive that the extra delay for bringing memory up from L2 or L3 can be detected? If so, then your ISR might be running so frequently that the required memory location is always available (i.e. it doesn't get paged back down thru the cache) and thus does NOT hit a memory barrier (I assume by memory barrier you are referring to how memory in a cpu really operates, through caching, etc.). But for this to really be true the up would have to have a fairly large L1 cache and the ISR would have to run at a very high frequency.
Finally, sometimes an application's requirements make it necessary to code it in ASM in which case you can do exactly what you are requesting!

(8051) Check if a single bit is set

I'm writing a program for a 8051 microcontroller. In the first part of the program I do some calculations and based on the result, I either light the LED or not (using CLR P1.7, where P1.7 is the port the LED is attached to in the microcontroller).
In the next part of the program I want to retrieve the bit, perhaps store it somewhere, and use it in a if-jump instruction like JB. How can I do that?
Also, I've seen the instruction MOV C, P1.7 in a code sample. What's the C here?

The C here is the 8051's carry flag - called that because it can be used to hold the "carry" when doing addition operations on multiple bytes.
It can also be used as a single-bit register - so (as here) where you want to move bits around, you can load it with a port value (such as P1.7) then store it somewhere else, for example:
MOV C, P1.7
MOV <bit-address>, C
Then later you can branch on it using:
JB <bit-address>, <label>

Some of the special function registers are also bit addressable. I believe its all the ones ending in 0 or 8. Don't have a reference in front of me but you can do something like setb r0.1. That way if you need the carry for something you dont have to worry about pushing it and using up space on your stack.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio