Is it possible to force particular registers in inline assembly code?

Is it possible to force particular registers in inline assembly code? - avr

I have the following assembly code:
__asm__ __volatile__ (
"1: subi %0, 1" "\n\t"
"brne 1b"
: "=d" (__count)
: "M" (__count));
which results in the following compiler ouptut
ce: 81 50 subi r24, 0x01 ; 1
d0: f1 f7 brne .-4 ; 0xce <main>
d2: 80 e0 ldi r24, 0x00 ; 0
d4: 90 e0 ldi r25, 0x00 ; 0
How can i achieve the following:
ce: 81 50 subi r16, 0x01 ; 1
d0: f1 f7 brne .-4 ; 0xce <main>
d2: 80 e0 ldi r16, 0x00 ; 0
Is it even possible to tell the compiler to use r16 instead of r24:r25? That way i can reduce the cycle count by 1 which is used by the ldi r25,0x00 line.
Thanks
Jack

This question is old and you most certainly already solved it, but for archiving purposes, let me answer it: yes, you can. Declare __count like this:
register <type> __count __asm__ ("r16");
And voilá! Using the GNU extension explicit register variables, you've declared that the C variable __count should always be placed in r16 wherever it is used - including outside of an ASM call.
Note that this declaration should have local scope, otherwise the compiler will avoid using this register in other functions.

Check out this: http://www.nongnu.org/avr-libc/user-manual/inline_asm.html#io_ops
It seems you can't force it to use a specific register. However, if you use "=a" instead of "=d" you would restrict it to registers r16..r23 which should be what you want (because you just don't want it to use the 'paired' registers r24/r25)

Related

Multiplication table in VMLab / AVR

I am trying to figure out this question. I posted my code below. It doesn't work properly. it seems like its not multiplying the 2 least significant nibbles. I don't know AVR very well.
Write AVR that generates a multiplication table for SRAM addresses 0x0100 to 0x01FF. The value at each address is the product of the two least significant nibbles of the address. For example, at address 0x0123, the multiplicand is 3 and the multiplier is 2. calculate the product (6 in this case) and store it at address 0x0123. The answer should be about 10-12 lines of code with a loop
.include "C:\VMLAB\include\m168def.inc"
ldi r27, 0x01
ldi r26, 0x00
ldi r30, 0xff
main:
mov r16, r26
andi r16,0x0f
mov r17,r27
andi r17,0xf0
swap r17
mul r17, r16
st x+, r16
dec r30
brne main

According to the manual, the MUL instruction stores its result in R0 and R1. So you need to read R0 after MUL, not R16.

You have three errors in your code:
mul stores results to R1:R0 pair registers
if use X as the index register and content mark symbolically as ABCD then CD is located in XL register, not in XH (r27). AB is constant (01)
you loops only 255 times not 256
ldi xh, 0x01
ldi xl, 0x00
ldi r30, 0x00
main:
mov r16, xl ;put D to R16
andi r16,0x0f
mov r17, xl ;put C to R17
andi r17,0xf0
swap r17
mul r17, r16 ;multiply C x D
st x+, r0 ;store result low byte to memory
dec r30 ;repeat 256 times
brne main

How to setup two different led delays for a specific sequence?

I am working on a program that is suppose to create a certain blink sequence on my arduino board (atmega328p). The pattern that I am trying to create is,
ON for 1/2 second
OFF for 1/2 second
ON for 1/2 second
Off for one full second
Repeat this sequence.
I approached the problem by creating two different delays one for the 1/2 sec and other for the 1 sec, and then I call them.
If I only have one delay the light will work with that pattern but once I put both delays in the loop together the light does not even follow the pattern. I apologize if this is a easy question, I don't know if I am approaching this right.
Here is my code:
#include "config.h"
.section .data
dummy: .byte 0 ; dummy global variable
.section .text
.global main
.extern delay
.org 0x0000
main:
; clear the SREG register
eor r1, r1 ; cheap zero
out _(SREG), r1 ; clear flag register
; set up the stack
ldi r28, (RAMEND & 0x00ff)
ldi r29, (RAMEND >> 8)
out _(SPH), r29
out _(SPL), r28
; initialize the CPU clock to run at full speed
ldi r24, 0x80
sts CLKPR, r24 ; allow access to clock setup
sts CLKPR, r1 ; run at full speed
; set up the LED port
sbi LED_DIR, LED_PIN ; set LED pin to output
cbi LED_PORT, LED_PIN ; start with the LED off
; enter the blink loop
1: rcall toggle
rcall delay
rcall delay2
rjmp 1b
toggle:
in r24, LED_PORT ; get current bits
ldi r25, (1 << LED_PIN) ; LED is pin 5
eor r24, r25 ; flip the bit
out LED_PORT, r24 ; write the bits back
ret
delay: ; 1/2 sec delay loop
ldi r21, 41
ldi r22, 150
ldi r23, 127
1: dec r23
brne 1b
dec r22
brne 1b
dec r21
brne 1b
ret
delay2: ; 1 sec delay loop
ldi r18, 82
ldi r19, 43
ldi r20, 0
2: dec r20
brne 2b
dec r19
brne 2b
dec r18
brne 2b
ret

How to do analogRead() in AVR assembly language?

If I need to be specific: I'm asking about ATmega328P chip. The analog pins are under PortC on this chip.
I have learnt that digitalWrite can be done using out, and digitalRead using in.
But how can I do analogRead ?? Please explain. I'm new to this.
EXTRA: It would be helpful if you show analogWrite too (In the sense of PWM).

You can read the source code of analogRead from the Arduino environment:
https://github.com/arduino/ArduinoCore-avr/blob/master/cores/arduino/wiring_analog.c
The important thing is to find all the places where it reads or writes from a special function register (SFR) like ADMUX, and then make sure you do the same thing in your assembly code.
You should also look at the ATmega328P datasheet, which defines all of those SFRs, as a way to double check that you are doing the correct thing.
If you have further trouble, I recommend asking a new question where you show some code and get specific about exactly what part of analogRead is confusing to you.

This is for the future visitors who stumble upon here...
As mentioned by Rev1.0, Arduino C does make things too easy for you. A lot of complicated things are going on under the hood when you write a simple statement analogRead(). But it's not that complicated once you understand it. You should definitely read up on ADCs.
As mentioned by David Grayson, you should definitely take a look at the source code of analogRead(). Here is the datasheet of ATmega328P and the instruction set manual for ATmega328P to help you understand what is going on.
You can read this and this to get some idea on how to exactly write the code.
Now, here is what I came up with for my use-case in my project.
The bold-face words are there to tell you that this code was NOT written for a general use-case. Copy-Pasting this will most probably not work.
You see the amuont of links in this post? Read all of them. Below is only for using as a reference in case you get stuck and it might help.
adcInit:
ldi r16, 0b01100000 ; Voltage Reference: AVcc with external capacitor at AREF pin
sts ADMUX, r16 ; Enable ADC Left Adjust Result
; Analog Channel: ADC0
ldi r16, 0b10000101 ; Enable ADC
sts ADCSRA, r16 ; ADC Prescaling Factor: 32
ret
adcRead:
ldi r16, 0b01000000 ; Set ADSC flag to Trigger ADC Conversion process
lds r17, ADCSRA ;
or r17, r16 ;
sts ADCSRA, r17 ;
ret
adcWait:
lds r17, ADCSRA ; Observe the ADIF flag, it gets set by hardware when ADC conversion completes
sbrs r17, 4 ;
jmp adcWait ; Keep checking until the flag is set by hardware
ldi r16, 0b00010000 ; Set the flag again to signal 'ready-to-be-cleared' by hardware
lds r17, ADCSRA ;
or r17, r16 ;
sts ADCSRA, r17 ;
ret
It is used like this:
call adcInit
mainLoop:
call adcRead
call adcWait
lds r18, ADCL ; Must read ADCL first, and ADCH after that
lds r19, ADCH

After a long time struggle with me, I survey the datasheet of ATmega 328P and many google surfing articles, the simple and workable code is completed as below.
; UNO_asmADCapp.asm
; revised by bsliao: 2020/5/12 下午 03:39:20, TEST OK 2020/05/13, 11:33
; Reference:
; https://stackoverflow.com/questions/38972805/
; [1] how-to-code-an-adc-for-an-atmega328p-in-assembly
; Author : Dario, Created: 8/14/2016 7:34:43 AM
; [2] https://robotics.ee.uwa.edu.au/courses/des/labprep/
; LabPrep%205%20-%20Timers%20and%20ADC%20in%20ATMEL.pdf
; [3] https://www.avrfreaks.net/forum/adc-converter-assembly-using-atmega328p-mcu
; AD0 --- uno A0
; value ADCH (b9 b8) ADCL (b7- b0) <Internal> --- PB1(uno d9) PB0 (d8), PD7-PD0 (uno D7 -D0)
#define F_CPU 16000000UL
.def temp =r16
; Replace with your application code
.include "./m328Pdef.inc"
.org 0x000
rjmp start
; .org 0x002A
; rjmp ADC_conversion_complete_Handler
start:
eor r1, r1
out SREG, r1
ldi temp, HIGH(RAMEND)
out SPH, r16
ldi temp, LOW(RAMEND)
out SPL, r16
setup:
ldi temp, 0xFF ; set r16 = 1111 1111
out ddrb, temp ; set all d pins as output
out ddrd, temp ; set all b pins as output
configADC0:
;------initialize ADC0 ------- Set ADMUX and ADCSRA:
;REF1 REFS0 ALLAR - (MUX3 MUX2 MUX1 MUX0 )=(0000)
;Aref=5.0 V is used, default right-adjust result, analog in at AIN0 (ADC0)
LDI temp, 0x00
STS ADMUX, temp
;ADcENable, (ADPS2 ADPS1 ADPS0 )=(000) : division factor=128 16Mhz/128: ADC0 is applied.
LDI temp, (1<<ADEN)|(1<<ADPS2)|(1<<ADPS1)|(1<<ADPS0)
STS ADCSRA, temp
andi temp, 0b11011111
STS ADCSRA, temp
; the first conversion
LDS temp,ADCSRA
ori temp, (1<<ADSC);
STS ADCSRA, temp
LOOP:
; start the next single conversion on ADCn, here n=0
LDS temp,ADCSRA
ori temp, (1<<ADSC);
STS ADCSRA, temp
adc_read_loop:
// while (bit_is_set(ADCSRA, ADSC));
lds temp,ADCSRA
sbrc temp,ADSC ;after ADC0 conversion over, the bit ADSC in the ADCSRA is set to zero and the bit ADIF is set to one.
rjmp adc_read_loop
read_ADC_value:
lds r24,ADCL
lds r25,ADCH
display_ADC_value:
andi r25, 0x03
out PORTB, r25 ; LEDs active high, PORTB most significant byte
com r24 ; LEDs active low
out PORTD, r24 ; PORTD less significant byte
call one_sec_delay
rjmp LOOP
one_sec_delay:
ldi r20, 20
ldi r21, 255
ldi r22, 255
delay: dec r22
brne delay
dec r21
brne delay
dec r20
brne delay
ret

Gameboy emulation - Clarification need on CD instruction

I'm currently in the process of writing a Gameboy emulator, and I've noticed something that seems strange to me.
My emulator is hitting a jump instruction 0xCD, for example CD B6 FF, but my understanding was that a jump should only be jumping to an address within cartridge ROM (0x7FFF maximum), because I'm assuming the CPU can only execute instructions from ROM, not RAM. The ROM in question is Dr. Mario, which I'd expect to only be carrying out valid operations. 0xFFB6 is in high RAM, which seems odd to me.
Am I correct in my thinking? If I am, presumably that means my program counter is somehow ending up at the wrong address and that the CB is actually part of another instruction's data, and not an instruction itself?
I'd be grateful for some clarification, thanks.
For reference, I've been using Gameboy Opcodes and CPU docs to implement the instructions. I know they contain a few errors, and I think I've accounted for them (for example, 0xE2 being listed as a two-byte instruction, when it's only one)

Just checked Dr. Mario 1.1, it copies the VBlank int routine at hFFB6 at startup, then when VBlank happens, the routine at 0:01A6 is called, which calls the OAM DMA transfer routine.
During OAM DMA transfer, the CPU can only access HRAM, so writing a short routine in HRAM that will wait for the transfer to be completed is required. The OAM DMA transfer takes 160 µs, so you usually make a loop that will wait this amount of time after specifying the OAM transfer source.
This is the part of the initialization routine run at startup that copies the DMA transfer routine to HRAM:
...
ROM0:027E 0E B6 ld c,B6 ;destination hFFB6
ROM0:0280 06 0A ld b,0A ;length 0xA
ROM0:0282 21 86 23 ld hl,2386 ;source 0:2386
ROM0:0285 2A ldi a,(hl) ;copy OAM DMA transfer routine from source
ROM0:0286 E2 ld (ff00+c),a ;paste to destination
ROM0:0287 0C inc c ;destination++
ROM0:0288 05 dec b ;length--
ROM0:0289 20 FA jr nz,0285 ;loop until DMA transfer routine is copied
...
When VBlank happens, it jumps to the routine at 0:01A6:
ROM0:0040 C3 A6 01 jp 01A6
Which contains a call to our OAM DMA transfer routine, waiting for DMA to be completed:
ROM0:01A6 F5 push af
ROM0:01A7 C5 push bc
ROM0:01A8 D5 push de
ROM0:01A9 E5 push hl
ROM0:01AA F0 B1 ld a,(ff00+B1)
ROM0:01AC A7 and a
ROM0:01AD 28 0B jr z,01BA
ROM0:01AF FA F1 C4 ld a,(C4F1)
ROM0:01B2 A7 and a
ROM0:01B3 28 05 jr z,01BA
ROM0:01B5 F0 EF ld a,(ff00+EF)
ROM0:01B7 A7 and a
ROM0:01B8 20 09 jr nz,01C3
ROM0:01BA F0 E1 ld a,(ff00+E1)
ROM0:01BC FE 03 cp a,03
ROM0:01BE 28 03 jr z,01C3
ROM0:01C0 CD B6 FF call FFB6 ;OAM DMA transfer routine is in HRAM
...
OAM DMA transfer routine:
HRAM:FFB6 3E C0 ld a,C0
HRAM:FFB8 E0 46 ld (ff00+46),a ;source is wC000
HRAM:FFBA 3E 28 ld a,28 ;loop start
HRAM:FFBC 3D dec a
HRAM:FFBD 20 FD jr nz,FFBC ;wait for the OAM DMA to be completed
HRAM:FFBF C9 ret ;ret to 0:01C3

Here is my analysis:
Looking for CD B6 FF in the raw ROM I can only find it in one place of the memory which is 0x01C0 (448 in decimal).
So I decided to disassemble the ROM, to see if it is a valid instruction.
I used gb-disasm to disassemble the ROM. Here are the values from 0x150 (ROM start) to address 0x201.
[0x00000100] 0x00 NOP
[0x00000101] 0xC3 0x50 0x01 JP $0150
[0x00000150] 0xC3 0xE8 0x01 JP $01E8
[0x00000153] 0x01 0x0E 0xD0 LD BC,$D00E
[0x00000156] 0x0A LD A,[BC]
[0x00000157] 0xA7 AND A
[0x00000158] 0x20 0x0D JR NZ,$0D ; 0x167
[0x0000015A] 0xF0 0xCF LDH A,[$CF] ; HIMEM
[0x0000015C] 0xFE 0xFE CP $FE
[0x0000015E] 0x20 0x04 JR NZ,$04 ; 0x164
[0x00000160] 0x3E 0x01 LD A,$01
[0x00000162] 0x18 0x01 JR $01 ; 0x165
[0x00000164] 0xAF XOR A
[0x00000165] 0x02 LD [BC],A
[0x00000166] 0xC9 RET
[0x00000167] 0xFA 0x46 0xD0 LD A,[$D046]
[0x0000016A] 0xE0 0x01 LDH [$01],A ; SB
[0x0000016C] 0x18 0xF6 JR $F6 ; 0x164
[0x000001E8] 0xAF XOR A
[0x000001E9] 0x21 0xFF 0xDF LD HL,$DFFF
[0x000001EC] 0x0E 0x10 LD C,$10
[0x000001EE] 0x06 0x00 LD B,$00
[0x000001F0] 0x32 LD [HLD],A
[0x000001F1] 0x05 DEC B
[0x000001F2] 0x20 0xFC JR NZ,$FC ; 0x1F0
[0x000001F4] 0x0D DEC C
[0x000001F5] 0x20 0xF9 JR NZ,$F9 ; 0x1F0
[0x000001F7] 0x3E 0x0D LD A,$0D
[0x000001F9] 0xF3 DI
[0x000001FA] 0xE0 0x0F LDH [$0F],A ; IF
[0x000001FC] 0xE0 0xFF LDH [$FF],A ; IE
[0x000001FE] 0xAF XOR A
[0x000001FF] 0xE0 0x42 LDH [$42],A ; SCY
[0x00000201] 0xE0 0x43 LDH [$43],A ; SCX
The way we have to disassemble a ROM is by following the flow of instructions. For example, we know that the main program starts at position 0x150. So we should start disassembling there. Then we follow instruction by instruction until we hit any JUMP instruction (JP, JR, CALL, RET, etc). From that moment on the flow of the program is forked in two and we should follow both paths to disassemble.
The think to understand here is that if I show you a random memory position in a ROM, you cannot tell me if it is data or instructions. The only way to find out is by following the program flow. We need to define blocks of code that start in a jump destination and end in another jump instruction.
gb-disasm skips any memory position that is not inside a code block. 0x16C marks the end of a block.
[0x0000016C] 0x18 0xF6 JR $F6 ; 0x164
The next block starts on 0x1E8. We know that because it is the destination address of a jump located on 0x150.
[0x00000150] 0xC3 0xE8 0x01 JP $01E8
Memory block from 0x16E until 0x1E8 is not consider a code block. That's why you don't see the memory position 0x01C0 listed as an instruction.
So there you are, it is very likely that you are interpreting the instructions in a wrong way. If you want to be 100% sure, you can disassemble the whole room and check if any instruction points to 0x16E-0x1E8 and reads it as raw data, such as a tile or something.
Please leave a comment if you agree with the analysis.

AVR Studio - AVR Simulator 2 Carry Flag Issue

I've just started an assembly line programming class and I'm having an issue with a problem, I'm adding 240 to 49 and I know it will overflow, my goal is to make Register 1 equal to 1 when these numbers overflow. I know that the carry flag is set when I add them but I'm unsure of how to use this flag to make r1 equal to 1.
This program should calculate:
; R0 = R16 + R17 + R18
;
;--*1 Do not change anything between here and the line starting with *--
.cseg
ldi r16, 0x30
ldi r17, 0x31
ldi r18, 0x32
;*--1 Do not change anything above this line to the --*
;***
; Your code goes here:
;
add r0, r16
add r0, r17
add r0, r18
;****
;--*2 Do not change anything between here and the line starting with *--
done: jmp done
;*--2 Do not change anything above this line to the --*

I'm sure there are smarter ways, but you can use brcs, branch if carry set:
add r0, r16
add r0, r17
add r0, r18
brcs carry ; Branch if carry set
carry: ldi r1, 0x1 ; Branch destination

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio