How to tell GCC to place an inline assembly instruction at a specific position? - gcc

I am working on the bootloader for a processor architecture that is based on ORPSoC. To execute a program, the bootloader loads it into memory and then jumps to the beginning of that program.
Now I need the custom instruction l.cust1 inserted in the delay slot of the jump. This instruction is implemented by the processor and activates decryption of the following instructions. That is the reason why it has to be placed in the delay slot. Any later, and the program could not be executed, as its instructions are encrypted. Similarly, if the decryption is activated too early, the bootloader crashes because it is not encrypted.
I am now wondering whether it is possible to tell GCC where to place the l.cust1 instruction. Currently I have to manually modify the bootloader binary accordingly.
Inserting inline assembly __asm__("l.cust1\n\t"); in the bootloader's C source code results in the instruction being added somewhere before the relevant jump:
1fc2e10: 9c 21 01 b4 l.addi r1,r1,436
1fc2e14: 70 00 00 00 l.cust1 # switching on decryption
1fc2e18: 18 40 01 ff l.movhi r2,0x1ff
1fc2e1c: 9c 72 ff ff l.addi r3,r18,-1
1fc2e20: a8 42 7c 94 l.ori r2,r2,0x7c94
1fc2e24: 9c 90 00 04 l.addi r4,r16,4
1fc2e28: 85 62 00 60 l.lwz r11,96(r2)
1fc2e2c: 48 00 58 00 l.jalr r11
1fc2e30: 9d c0 00 00 l.addi r14,r0,0
However, I need it to be located in the delay slot of the jump:
1fc2e10: 9c 21 01 b4 l.addi r1,r1,436
1fc2e14: 9d c0 00 00 l.addi r14,r0,0
1fc2e18: 18 40 01 ff l.movhi r2,0x1ff
1fc2e1c: 9c 72 ff ff l.addi r3,r18,-1
1fc2e20: a8 42 7c 94 l.ori r2,r2,0x7c94
1fc2e24: 9c 90 00 04 l.addi r4,r16,4
1fc2e28: 85 62 00 60 l.lwz r11,96(r2)
1fc2e2c: 48 00 58 00 l.jalr r11
1fc2e30: 70 00 00 00 l.cust1 # switching on decryption

Put the l.cust1 in the same inline assembler statement as the jump, which must be declared volatile as it has side effects and have "memory" in the clobber list as it depends on the contents of memory.

You will want to use asm volatile and an artificial dependency to the code that you wish the inline assembly to proceed. The example taken from the gcc documentation is below:
asm volatile ("mtfsf 255,%1" : "=X"(sum): "f"(fpenv));
sum = x + y;
According to the documentation volatile alone is not enough. In other words the compiler still might change the order of the following code:
asm volatile("mtfsf 255,%0" : : "f" (fpenv));
sum = x + y;
By adding the dependency you ensure the placement of the inline assembly. In the first code snippet by adding a dependency on sum it is ensured that the compiler will not reorder the code as it believes it will be generating a different result due to the dependency. You can find another example of this on this webpage in the "C code optimization" section.

Related

parse the following ibeacon packet

I am trying to parse this ibeacon packet received by scanning through a hci socket
b'\x01\x03\x00\x18\xbe\x99m\xf3\x14\x1e\x02\x01\x1a\x1a\xffL\x00\x02\x15e\xec\xe2\x90\xc7\xdbM\xd0\xb8\x1aV\xa6-b 2\x00\x00\x00\x02\xc5\xcc'
hex format 01 03 00 18 be 99 6d f3 14 1e 02 01 1a 1a ff 4c 00 02 15 65 ec e2 90 c7 db 4d d0 b8 1a 56 a6 2d 62 20 32 00 00 00 02 c5 cc
the parameters after applying the parser are
'UUID': '65ece290c7db4dd0b81a56a62d622032', 'MAJOR': '0000', 'MINOR': '0002', 'TX': -59, 'RSSI': -60
I am not sure if the RSSI portion of this parsing is right.
Referring to this https://stackoverflow.com/a/19040616/10355673
the last bit of the beacon advertising packet is the TX power value.
so how do we get the rssi value? here, I have taken rssi to be cc and tx to be c5. Is this correct?
There are flags headers before the manufacturer advertisement sequence shown below, but you really don't care about the flags. Here are the bytes you care about:
ff # manufacturee adv type
4c 00 # apple Bluetooth company code
02 15 # iBeacon type code
65 ec e2 90 c7 db 4d d0 b8 1a 56 a6 2d 62 20 32 # proximity uuid
00 00 # major
00 02 # minor
c5 # measured power (tx power)
cc # crc
Proximity UUUD: 65ece290-c7db-4dd0-b81a-56a62d622032,
Major: 0,
Minor: 2,
Measured Power: -59 dBm
The RSSI is not part of the transmitted packet, but a measurement taken by the receiver based on the strength of the signal. It will typically be a slightly different value for each packet that is received. You get this value from an API on a mobile device or embedded system that fetches it from the bluetooth chip.

What is the algorithm behind Cassandra's token function?

The Token function in my driver doesn't support a composite partition key, but it works very well with a single partition key, it takes a binary in 8 bits form as an input and pass it to murmur3 hash function and extract the 64-signed-little-integer (Token) from the result of murmur3 and ignore any extra binary buffer.
So my hope is to generate the binary for a composite partition key and then pass it to murmur3 as usual, an algorithm or bitwise operations will be really helpful or at least a source in any programming language.
I don't mean murmur3 part, only the token side which converts/mixes the composite partition key and outputs raw bytes in binary form.
Take a look at the drivers since they have generate the token to find the correct coordinator. https://github.com/datastax/java-driver/blob/8be7570a3c7fbba773ae2581bbf26e8196e7d6fb/driver-core/src/main/java/com/datastax/driver/core/Token.java#L112
Its slightly different than the typical murmur3 due to a bug when it was made and inability to change it without breaking existing clusters. So I would recommend copying it from them or better yet, use the existing drivers to find the token.
Finally I found a solution to my question :The Algorithm to compute the token for a composite partition key :
Primary_key((text, int)) -> therefore the partition key is a composite_partition_key (text, int).
Example : a row with composite_partition_key ('hello', 1)
Applying the algorithm :
1- lay-out the components of the composite partition key in big-endian (16 bits) presentation :
first_component = 'hello' -> 68 65 6c 6c 6f
sec_component = 1 -> 00 00 00 01
68 65 6c 6c 6f 00 00 00 01
2- add two-bytes length of component before each component
first_component = 'hello', length= 5-> 00 05 68 65 6c 6c 6f
sec_component = 1, therefore length= 4 -> 00 04 00 00 00 01
00 05 68 65 6c 6c 6f 00 04 00 00 00 01
3- add zero-value after each component
first_component = 'hello' -> 00 05 68 65 6c 6c 6f 00
sec_component = 1 -> 00 04 00 00 00 01 00
4- result
00 05 68 65 6c 6c 6f 00 00 04 00 00 00 01 00
now pass the result as whatever binary base your murmur3 function understand (make sure it's cassandra variant).

Is math.bits in Go using machine code instructions?

The new golang package "math/bits" provides useful functions. The source code shows how the function result are computed.
Are these functions replaced by the corresponding processor OP codes when available?
Yes, as stated in Go 1.9 Release Notes: New bit manipulation package:
Go 1.9 includes a new package, math/bits, with optimized implementations for manipulating bits. On most architectures, functions in this package are additionally recognized by the compiler and treated as intrinsics for additional performance.
Some related / historical github issues:
math/bits: an integer bit twiddling library #18616
proposal: cmd/compile: intrinsicify user defined assembly functions #17373
Provide a builtin for population counting / hamming distance #10757
Yes it does. You can see an example of it working here
first compile this
package main
import (
"fmt"
"math/bits"
)
func count(i uint) {
fmt.Printf("%d has %d length\n", i, bits.Len(i))
}
func main() {
for i := uint(0); i < 100; i++ {
count(i)
}
}
Then look at the disassembly. Note the BSR instruction - Bit Scan Reverse which from the manual: Searches the value in a register or a memory location (second operand) for the most-significant set bit.
0000000000489620 <main.count>:
489620: 64 48 8b 0c 25 f8 ff mov %fs:0xfffffffffffffff8,%rcx
489627: ff ff
489629: 48 3b 61 10 cmp 0x10(%rcx),%rsp
48962d: 0f 86 f2 00 00 00 jbe 489725 <main.count+0x105>
489633: 48 83 ec 78 sub $0x78,%rsp
489637: 48 89 6c 24 70 mov %rbp,0x70(%rsp)
48963c: 48 8d 6c 24 70 lea 0x70(%rsp),%rbp
489641: 48 8b 84 24 80 00 00 mov 0x80(%rsp),%rax
489648: 00
489649: 48 0f bd c8 bsr %rax,%rcx ; **** Bit scan reverse instruction ***
48964d: 48 89 44 24 48 mov %rax,0x48(%rsp)
489652: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
489659: 48 0f 44 c8 cmove %rax,%rcx
48965d: 48 8d 41 01 lea 0x1(%rcx),%rax

In WinDbg commands, how can I refer to a register with the same name as a global variable?

Suppose I want to debug this program using the WinDbg, cdb, or ntsd debuggers for Windows:
/* test.c */
#include <stdio.h>
int rip = 42;
int main(void)
{
puts("Hello world!");
return (0);
}
I compile the program for AMD64 and run it under WinDbg. I set a breakpoint at main(), and when the breakpoint hits, I want to inspect the value at the RIP register (program counter), and the memory around that value if the value is treated as a pointer.
I can see the value of the register directly with r rip, but when I try to look at the memory around that address, WinDbg shows me a different address! Having read the symbols in test.pdb, WinDbg sees that rip is a global variable declared in the C code and shows me the memory around &rip.
0:000> bu test!main
0:000> g
Breakpoint 0 hit
test!main:
00007ff6`de1868d0 4883ec28 sub rsp,28h
0:000> r rip
rip=00007ff6de1868d0
0:000> db rip
00007ff6`de1f2000 2a 00 00 00 ff ff ff ff-01 00 00 00 00 00 00 00 *...............
00007ff6`de1f2010 01 00 00 00 02 00 00 00-ff ff ff ff ff ff ff ff ................
00007ff6`de1f2020 00 00 00 00 00 00 00 00-43 46 92 e5 1b df 00 00 ........CF......
00007ff6`de1f2030 bc b9 6d 1a e4 20 ff ff-00 00 00 00 00 00 00 00 ..m.. ..........
00007ff6`de1f2040 00 01 00 00 00 00 00 00-ca b0 1e de f6 7f 00 00 ................
00007ff6`de1f2050 00 00 00 00 00 80 00 00-00 00 00 00 00 80 00 00 ................
00007ff6`de1f2060 d0 66 fc c2 f2 01 03 00-ab 90 ec 5e 22 c0 b2 44 .f.........^"..D
00007ff6`de1f2070 a5 dd fd 71 6a 22 2a 15-00 00 00 00 00 00 00 00 ...qj"*.........
0:000> ? rip
Evaluate expression: 140698265264128 = 00007ff6`de1f2000
0:000> ? dwo(rip)
Evaluate expression: 42 = 00000000`0000002a
This is really annoying, but as long as I'm aware of it, it isn't a problem when manually reading data like this. But if I want to use the register value, for example in scripting the debugger, then there is no easy workaround:
0:000> bu test!main ".if (dwo(rip) == 0n42) { .echo Whoops! I don't want to get here! }"
0:000> g
Whoops! I don't want to get here!
test!main:
00007ff6`de1868d0 4883ec28 sub rsp,28h
This problem, that symbols in the program hide register names, makes things really difficult for me. An actual scenario this broke:
I wanted to set a breakpoint on CreateFileW(), a very commonly called Windows API function.
Since I only cared about one particular file, I wanted to inspect the filename, which is passed in the RCX register, and continue past the breakpoint unless the filename matched the file I wanted.
But I couldn't write this condition, because another module in the program defined a symbol foobar!rcx, and any references to rcx I make in the command to execute on the breakpoint refer to that global variable!
So how do I tell WinDbg that yes, I really meant to read the register? And what if I want to write that register? There must be a simple thing I am missing here.
As noted in passing by another question, you can put an at sign (#) in front of a register name to force it to be interpreted as a register or pseudo-register, bypassing the attempt to parse it as a hexadecimal number or a symbol.
Registers and Pseudo-Registers in MASM Expressions
You can use registers and pseudo-registers within MASM expressions. You can add an at sign (#) before all registers and pseudo-registers. The at sign causes the debugger to access the value more quickly. This at sign is unnecessary for the most common x86-based registers. For other registers and pseudo-registers, we recommend that you add the at sign, but it is not actually required. If you omit the at sign for the less common registers, the debugger tries to parse the text as a hexadecimal number, then as a symbol, and finally as a register.

Re-attempting BASIC 6502 N-byte integer addition?

I initially (asked for help) and wrote a BASIC program in the 6502 pet emulator which added two n-byte integers. However, my feedback was that it was simply adding two 16 bit integers (not adding n-byte integers).
Can anyone help me understand this feedback by looking at my code and point me in the right direction to make a program that adds two n-byte integers?
Thank You for the collaboration!
Documentation:
Adds two n-byte integers using absolute indexed addressing. The addends begin at memory locations $0600, $0700 and the answer is at $0800. Byte length of the integers is at $0600 (¢ —> 256)
Machine Code:
18 a2 00 ac 00 06 bd 00
07 7d 00 08 9d 00 09 e8
00 88 00 d0
Op Codes, Documentation, Variables:
A1 = $0600
B1 = $0700
B2 = $0800
Z1 = $0900
[START] = $0500
CLC 18 // loads x with 0
LDX A2 00 // loads length on Y
LDY A1 AC 00 06 // load first operand
loop: LDA B1, x BD 00 07 // adds second operand
ADC B2, x 7D 00 08 // store result
STA Z1, x 9D 00 09 // go to next byte
INX E8 00 // count how many are left
DEY 88 00 // do more if needed
BNE loop D0
It looked to me like your code does what you claim -- adds two N byte operands in little-endian byte order. I vaguely remembered the various addressing modes of the 6502 from my misspent youth and the code seems fine. X is used to index the current byte from the two numbers, Y is a counter for the length of the operands in bytes and you loop over those bytes, stored at addresses 0x0700 and 0x0800 and write the result at address 0x0900.
Rather than get the Commodore 64 out of the attic and try it out I used an online virtual 6502 simulator. On this site we can set the memory address and load the byte values in. They even link to a page to assemble opcodes too. So setting the memory locations 0x0600 to "04" and both 0x0700 and 0x0800 to "04 03 02 01" we should see this code add these two 32 bit values (0x01020304 + 0x01020304 == 0x02040608).
Stepping through the code by clicking on the PC register and setting it to 0x0500 and then single stepping we see there is a bug in your machine code. After INX which compiles to E8 we hit a spurious 0x00 value(BRK) which terminates. The corrected code as below runs to completion and the expected value is seen by reading the memory at 0x0900.
0000 CLC 18
0001 LDX #$00 A2 00
0003 LDY $0600 AC 00 06
0006 LOOP: LDA $0700,X BD 00 07
0009 ADC $0800,X 7D 00 08
000C STA $0900,X 9D 00 09
000F INX E8
0010 DEY 88
0011 BNE LOOP: D0 F3
Memory dump:
:0900 08 06 04 02 00 00 00 00
:0908 00 00 00 00 00 00 00 00

Resources