Bus error when executing `emms` MMX instruction - gcc

I'm working on a port of some software with inline assembly because we took a few bug reports from a Debian maintainer under X32. The code is fine under both X86 and X64.
We're catching a bus error on the emms instruction:
...
0x005520fd <+3885>: pop %rsp
0x005520fe <+3886>: emms
=> 0x00552100 <+3888>: pop %rbx
0x00552101 <+3889>: jmpq 0x5519e3
0x00552106 <+3894>: nopw %cs:0x0(%rax,%rax,1)
...
According to the manual, the following exceptions are raised:
Exceptions:
RM PM VM SMM Description
#UD #UD #UD #UD If CR0.EM = 1
#NM #NM #NM #NM If CR0.TS = 1
#MF #MF #MF #MF If pending FPU Exception
Here is the mask used in the MMX status register:
mxcsr 0x1f80 [ IM DM ZM OM UM PM ]
I don't believe I have access to the control registers to determine what actually caused the exception, so I'm having trouble locating the cause of the bus error.
What are some of the potential causes of the bus error? Or how can I trouble shoot this further?
Here's info float:
(gdb) info float
R7: Empty 0xffffffffffffffffffff
R6: Empty 0xffffa5a5a5a5a5a5a5a5
R5: Empty 0xfffffedcba9876543210
R4: Empty 0xffffb182db48cf349120
R3: Empty 0xffff926cd0b6a839b535
R2: Empty 0xfffff373de2d49584e7a
R1: Empty 0xffff16166e76b1bb925f
=>R0: Empty 0xffff24f0130c63ac9332
Status Word: 0x0000
TOP: 0
Control Word: 0x037f IM DM ZM OM UM PM
PC: Extended Precision (64-bits)
RC: Round to nearest
Tag Word: 0xffff
Instruction Pointer: 0x00:0x00000000
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
And here's from info registers:
(gdb) info registers
rax 0xffffcb58 0xffffcb58
rbx 0x30 0x30
rcx 0x14f3 0x14f3
rdx 0x61d560 0x61d560
rsi 0xffffcb08 0xffffcb08
rdi 0x14 0x14
rbp 0xffffcb58 0xffffcb58
rsp 0xb62f7cbfffffc8d8 0xb62f7cbfffffc8d8
r8 0x0 0x0
r9 0x40 0x40
r10 0x2e676e696e6e7572 0x2e676e696e6e7572
r11 0x246 0x246
r12 0x9028a0 0x9028a0
r13 0xffffcaf0 0xffffcaf0
r14 0x8f6120 0x8f6120
r15 0xffffca6c 0xffffca6c
rip 0x552100 0x552100
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 0x33
ss 0x2b 0x2b
ds 0x2b 0x2b
es 0x2b 0x2b
fs 0x63 0x63
gs 0x0 0x0
Here's a breakout of the MMX status register bits:
IM - Invalid Operation Mask
DM - Denormalized Mask
ZM - Divide By Zero Mask
OM - Overflow Mask
UM - Underflow Mask
PM - Precision Mask

EMMS is executing without problem, as shown the arrow and the value of rip the fault is with the following pop because rsp points to invalid memory. The correct value of rsp is something less than 0x100000000.

Related

VMWare Workstation's strange bootloader behaviour (failing to print a character)

.code16
.text
.org 0x0
.global _start
_start:
jmp _testing
nop
_testing:
mov $0x0E, %ah
mov the_byte, %al #the line in question
int $0x10
jmp .
the_byte: .byte 0x41
.fill (510-(.-_start)), 1, 0
.word 0xAA55
This simple 'bootloader', as it were, is supposed to print out A to the screen, but it fails to do so when I'm using VMWare Workstation 16 (unlike Bochs, which happily shows A on its screen). If I change the line in question to
mov $0x41, %al
I can see A on VMWare Workstation as well.
Have you by any chance got any idea what can be causing such strange behaviour?
PS. It is indeed loaded into 0x7C00 with a separate linker file.
Apparently, mov the_byte, %al is assembled into something akin to mov %ds:0x7C0C, %al, hence DS had to be zeroed out (along with other segment pointers):
cli
mov %cs, %ax # CS is set to 0x0
mov %ax, %es # ES = CS = 0x0
mov %ax, %ds # DS = CS = 0x0
mov %ax, %ss # SS = CS = 0x0
sti

AVR TWI (I2C) problem: Operand 1 out of range

I´m using Atmega2560 and 24c16A EEPROM for testing an I2C code. For every state transition, controller responds by changing the status register TWSR. After start condition is transmitted, EEPROM responds and then also for Device address and write instruction (SLA+R/W), status register updates fine. But when I transmit next 8 bits, instead of ACK (Data byte sent ACK), status changes to repeated start. The code is given below. I never could find the possible solution to make it work.
.include "./m2560def.inc"
.list
.cseg
.org 0x00
jmp inicio ; PC = 0x0000 RESET
inicio:
LDI R21, HIGH(RAMEND) ;Set Up Stack
OUT SPH, R21
LDI R21, LOW(RAMEND)
OUT SPL, R21
CALL I2C_INIT ;Initialize TWI(I2C)
CALL I2C_START ;Transmit START condition
LDI R27, 0b11010000 ;SLA(0b1001100) + W(0)
CALL I2C_WRITE ;Write R27 ato the I2C bus
LDI R27, 0b11110000 ;Data to be transmitted
CALL I2C_WRITE ;Write R27 ato the I2C bus
CALL I2C_STOP ;Transmit STOP condition
HERE: RJMP HERE
;----------------------------I2C_INIT-----------------------------
I2C_INIT:
LDI R21, 0
OUT TWSR, R21 ;Set prescaler bits to 0
LDI R21, 0x47 ;R21 = 0x47
OUT TWBR, R21 ;Fclk = 50 KHz (8 MHz Xtal)
LDI R21, (1<<TWEN) ;R21 = 0x04
OUT TWCR, R21 ;HEnable TWI (I2C)
RET
;----------------------------I2C_START-----------------------------
I2C_START:
LDI R21, (1<<TWINT)|1<<(TWSTA)|(1<<TWEN)
OUT TWCR, R21 ;Transmit START condition
WAIT1:
IN R21, TWCR ;Read Control Register TWCR into R21
SBRS R21, TWINT ;Skip the next line if TWINT is 1
RJMP WAIT1 ;Jump a WAIT1 if TWINT is 1
RET
;----------------------------I2C_WRITE -----------------------------
I2C_WRITE:
OUT TWDR, R27 ;Move the byte into TWRD
LDI R21, (1<<TWINT)|(1<<TWEN)
OUT TWCR, R21 ;Configure TWCR to send TWDR
WAIT3:
IN R21, TWCR ;Read Control Register TWCR into R21
SBRS R21, TWINT ;Skip the next line if TWINT is 1
RJMP WAIT3 ;Jump a WAIT3 if TWINT is 1
RET
;----------------------------I2C_STOP------------------------------
I2C_STOP:
LDI R21, (1<<TWINT)|1<<(TWSTO)|(1<<TWEN)
OUT TWCR, R21 ;Transmit STOP condition
RET
;----------------------------I2C_READ------------------------------
I2C_READ:
LDI R21, (1<<TWINT)|(1<<TWEN)
OUT TWCR, R21
WAIT2:
IN R21, TWCR ;Read Control Register TWCR into R21
SBRS R21, TWINT ;Skip the next line if TWINT is 1
RJMP WAIT2 ;Jump a WAIT2 if TWINT is 1
IN R27, TWCR ;Read received data into R21
RET
The output is
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(41,0): error: Operand 1 out of range: 0xb9
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(43,0): error: Operand 1 out of range: 0xb8
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(45,0): error: Operand 1 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(51,0): error: Operand 1 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(53,0): error: Operand 2 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(60,0): error: Operand 1 out of range: 0xbb
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(62,0): error: Operand 1 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(64,0): error: Operand 2 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(72,0): error: Operand 1 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(78,0): error: Operand 1 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(80,0): error: Operand 2 out of range: 0xbc
D:\ATMEL_AVR\I2C_TWI\I2C_TWI.asm(83,0): error: Operand 2 out of range: 0xbc
Assembly failed, 12 errors, 0 warnings
The code lines involved in these errors are:
OUT TWSR, R21
OUT TWBR, R21
OUT TWCR, R21
OUT TWCR, R21
IN R21, TWCR
OUT TWDR, R27
OUT TWCR, R21
IN R21, TWCR
OUT TWCR, R21
OUT TWCR, R21
IN R21, TWCR
IN R27, TWCR
In AVR normally all I/O registers are mapped into the memory space, starting from address 0x0020
in and out instructions could be used ONLY with the first 64 of them (with 0x20...0x5F memory addresses)
sbi, cbi, sbis and sbic has further limitation and can be used only with first 32 I/O registres (with memory address 0x20...0x3F). Please refer to the AVR Instruction Set Manual
In the datasheet to ATmega2560, section 33 "Register Summary", page 401, you can see the TWI registers have addresses 0xB8 thru 0xBD, i.e. they cannot be accessed using in and out instructions. You have to use STS and LDS.
I.e. LDS R21, TWCR instead of IN R21, TWCR etc.
Also, always be careful when accessing the first 64 registers. Make sure the corresponding names are defining their I/O and not RAM addresses. I.e. PORTA should be equal to 0x02 (IO address for in / out instruction) and not 0x22 (RAM address or PORTA, which, when accessed using in or out, equals to EEARH register).

Bootloader memory location

This is a part of a bootloader that I am studying from
`[ORG 0x00]
[BITS 16]
SECTION .text
jmp 0x07c0:START ; set CS(segment register) to 0x07C0 and jump to START label.
TOTALSECTORCOUNT:
dw 0x02
KERNEL32SECTORCOUNT:
dw 0x02
START:
mov ax, 0x07c0
mov ds, ax ; set DS(segment register) to the address of the bootloader.
mov ax, 0xb800
mov es, ax ; set ES(segment register) to the address of the video memory starting address.
; stack initialization
mov ax, 0x0000
mov ss, ax
mov sp, 0xfffe
mov bp, 0xfffe
; clear the screen
mov si, 0
CLEARSCREEN:
mov byte[es:si], 0
mov byte[es:si + 1], 0x0a
add si, 2
cmp si, 80 * 25 * 2
jl CLEARSCREEN
; print welcome message`
I don't understand the beginning: jmp 0x07C0:START How does it set the CS register?
And what are the two variables TOTALSECTORCOUNT and KERNEL32SECTORCOUNT for? They don't appear anywhere in the bootsector file and if I remove them, the bootloader fails to load the welcome message.
Removing the parts causes the OS to fail to load. So what is the significance of that jmp statement and the two variables?
``[ORG 0x00]
[BITS 16]
jmp START
START:
mov ax, 0x07c0
mov ds, ax ; set DS(segment register) to the address of the bootloader.
mov ax, 0xb800
mov es, ax ; set ES(segment register) to the address of the video memory starting address.
; stack initialization
mov ax, 0x0000
mov ss, ax
mov sp, 0xfffe
mov bp, 0xfffe
`
I am not great with assembly and usually use the AT&T syntax also. I have however written a bootloader before.
Hopefully you have learnt about the segmented addressing system used in 16 bit applications. The cs register holds the code segment. http://wiki.osdev.org/Segmentation
jmp 0x07C0:START ;This is a long jump
jmp segment:offset
A long jump sets the cs register to segment parameter and then does a jump to the offset parameter. When you do a short jump the cs register doesn't change. I assume that it would contain 0x0. You can use a short jump but you must tell your assembler or linker where the code will be run.
EDIT: After reading the code again there is the [org 0x00] line. This sets the cs register to 0x00 by default. If you wanted to use the short jump try changing this line to [org 0x7c00]
CS should already be set to 0x7c00 by the BIOS so the line:
jmp 0x07c0:START
can be replaced by:
jmp START
The two variables you mention must be use elsewhere in the code to load the kernel. However, it appears you haven't posted the whole code here.
Without seeing the rest of the bootsector code, we cannot help.

Triple fault on stack in C code [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am writing a toy kernel for learning purposes, and I am having a bit of trouble with it. I have made a simple bootloader that loads a segment from a floppy disk (which is written in 32 bit code), then the bootloader enables the A20 gate and turns on protected mode. I can jump to the 32 bit code fine if I write it in assembler, but if I write it in C, I get a triple fault. When I disassemble the C code I can see that the first two instructions involve setting up a new stack frame. This is the key difference between the working ASM code and the failing C code.
I am using NASM v2.10.05 for the ASM code, and GCC from the DJGPP 4.72 collection for the C code.
This is the bootloader code:
org 7c00h
BITS 16
entry:
mov [drive], dl ;Save the current drive
cli
mov ax,cs ; Setup segment registers
mov ds,ax ; Make DS correct
mov ss,ax ; Make SS correct
mov bp,0fffeh
mov sp,0fffeh ;Setup a temporary stack
sti
;Set video mode to text
;===================
mov ah, 0
mov al, 3
int 10h
;===================
;Set current page to 0
;==================
mov ah, 5
mov al, 0
int 10h
;==================
;Load the sector
;=============
call load_image
;=============
;Clear interrupts
;=============
cli
;=============
;Disable NMIs
;============
in ax, 70h
and ax, 80h ;Set the high bit to 1
out 70h, ax
;============
;Enable A20:
;===========
mov ax, 02401h
int 15h
;===========
;Load the GDT
;===========
lgdt [gdt_pointer]
;===========
;Clear interrupts
;=============
cli
;=============
;Enter protected mode
;==================
mov eax, cr0
or eax, 1 ;Set the low bit to 1
mov cr0, eax
;==================
jmp 08h:clear_pipe ;Far jump to clear the instruction queue
;======================================================
load_image:
reset_drive:
mov ah, 00h
; DL contains *this* drive, given to us by the BIOS
int 13h
jc reset_drive
read_sectors:
mov ah, 02h
mov al, 01h
mov ch, 00h
mov cl, 02h
mov dh, 00h
; DL contains *this* drive, given to us by the BIOS
mov bx, 7E0h
mov es, bx
mov bx, 0
int 13h
jc read_sectors
ret
;======================================================
BITS 32 ;Protected mode now!
clear_pipe:
mov ax, 10h ; Save data segment identifier
mov ds, ax ; Move a valid data segment into the data segment register
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax ; Move a valid data segment into the stack segment register
mov esp, 90000h ; Move the stack pointer to 90000h
mov ebp, esp
jmp 08h:7E00h ;Jump to the kernel proper
;===============================================
;========== GLOBAL DESCRIPTOR TABLE ==========
;===============================================
gdt: ; Address for the GDT
gdt_null: ; Null Segment
dd 0
dd 0
gdt_code: ; Code segment, read/execute, nonconforming
dw 0FFFFh ; LIMIT, low 16 bits
dw 0 ; BASE, low 16 bits
db 0 ; BASE, middle 8 bits
db 10011010b ; ACCESS byte
db 11001111b ; GRANULARITY byte
db 0 ; BASE, low 8 bits
gdt_data: ; Data segment, read/write, expand down
dw 0FFFFh
dw 0
db 0
db 10010010b
db 11001111b
db 0
gdt_end: ; Used to calculate the size of the GDT
gdt_pointer: ; The GDT descriptor
dw gdt_end - gdt - 1 ; Limit (size)
dd gdt ; Address of the GDT
;===============================================
;===============================================
drive: db 00 ;A byte to store the current drive in
times 510-($-$$) db 00
db 055h
db 0AAh
And this is the kernel code:
void main()
{
asm("mov byte ptr [0x8000], 'T'");
asm("mov byte ptr [0x8001], 'e'");
asm("mov byte ptr [0x8002], 's'");
asm("mov byte ptr [0x8003], 't'");
}
The kernel simply inserts those four bytes into memory, which I can check as I am running the code in a VMPlayer virtual machine. If the bytes appear, then I know the code is working. If I write code in ASM that looks like this, then the program works:
org 7E00h
BITS 32
main:
mov byte [8000h], 'T'
mov byte [8001h], 'e'
mov byte [8002h], 's'
mov byte [8003h], 't'
hang:
jmp hang
The only differences are therefore the two stack operations I found in the disassembled C code, which are these:
push ebp
mov ebp, esp
Any help on this matter would be greatly appreciated. I figure I am missing something relatively minor, but crucial, here, as I know this sort of thing is possible to do.
Try using the technique here:
Is there a way to get gcc to output raw binary?
to produce a flat binary of the .text section from your object file.
I'd try moving the address of your protected mode stack to something else - say 0x80000 - which (according to this: http://wiki.osdev.org/Memory_Map_(x86)) is RAM which is guaranteed free for use (as opposed to the address you currently use - 0x90000 - which apparently may not be present - depending a bit on what you're using as a testing environment).

Inline Assembly -- Access local char* variable -- gcc

I am trying to write a simple program that prints out a C string without using one of the linux system calls or the standard C library functions. This is for learning purposes only, and I would never do this in production (unless I got really good at it =)).
First my system info:
[mehoggan#fedora sandbox-print_chars]$ uname -a
Linux fedora.laptop 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 UTC 2011 i686 i686 i386 GNU/Linux
[mehoggan#fedora sandbox-print_chars]$ gcc --version
gcc (GCC) 4.5.1 20100924 (Red Hat 4.5.1-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Next the code:
#include <unistd.h>
#include <sys/syscall.h>
void main()
{
char *str = "Hello World";
while(*(str) != '\0') {
//printf("%c", *(str++));
//syscall(__NR_write, 1, *(str++), 1);
__asm__( "movl %0, %%ecx" :"=c" (str));
__asm__( "movl $0X4, %eax" );
__asm__( "movl $0X1, %ebx" );
__asm__( "movl $0X1, %edx" );
__asm__( "int $0X80" );
str++;
}
return;
}
Compiled with the following makefile:
all: sandbox_c
sandbox_c: sandbox.c
gcc -Wall -o sandbox_c ./sandbox.c
gcc -S -Wall -o sandbox_c.asm ./sandbox.c
Things compile just fine, I just cant get the syntax right to get the thing to work. Your corrections are greatly appreciated, but if you could also point me to how you obtained the solution that would be great. I am trying to get better at using the man pages etc.
ADDITION
Running the executable through gdb I can see that ecx is not being pointed at the right address:
[mehoggan#fedora sandbox-print_chars]$ gdb ./sandbox_c
GNU gdb (GDB) Fedora (7.2-52.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/mehoggan/Code/Assembly/sandbox/sandbox-print_chars/sandbox_c...done.
(gdb) break sandbox.c:8
Breakpoint 1 at 0x80483a2: file ./sandbox.c, line 8.
(gdb) run
Starting program: /home/mehoggan/Code/Assembly/sandbox/sandbox-print_chars/sandbox_c
Breakpoint 1, main () at ./sandbox.c:8
8 while(*(str) != '\0') {
Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686
(gdb) step
11 __asm__( "movl %0, %%ecx" :"=c" (str));
(gdb) info registers
eax 0x48 72
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x567ff4 5668852
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483a4 0x80483a4 <main+16>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) step
12 __asm__( "movl $0X4, %eax" );
(gdb) info registers
eax 0x48 72
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x34092fad 873017261
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483ab 0x80483ab <main+23>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) step
13 __asm__( "movl $0X1, %ebx" );
(gdb) info registers
eax 0x4 4
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x34092fad 873017261
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483b0 0x80483b0 <main+28>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) step
14 __asm__( "movl $0X1, %edx" );
(gdb) info registers
eax 0x4 4
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x1 1
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483b5 0x80483b5 <main+33>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) step
15 __asm__( "int $0X80" );
(gdb) info registers
eax 0x4 4
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x1 1
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483ba 0x80483ba <main+38>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) step
16 str++;
(gdb) info registers
eax 0xfffffff2 -14
ecx 0x34092fad 873017261
edx 0x1 1
ebx 0x1 1
esp 0xbffff2a4 0xbffff2a4
ebp 0xbffff2b8 0xbffff2b8
esi 0x0 0
edi 0x0 0
eip 0x80483bc 0x80483bc <main+40>
eflags 0x200206 [ PF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
Try this:
__asm__ volatile ( "movl $0X4, %eax
movl $0X1, %ebx
movl $0X1, %edx
int $0X80"
: /* outputs: */ /* none */
: /* inputs: */ "c" (str)
: /* clobbers: */ "eax", "ebx", "edx");
I've not tested it, but the syntax looks right. You might need to add some more "clobbers" if the syscall overwrites anything else - check the documentation.
Breaking it down:
There's no need to move %0 to %ecx because the "c" constraint already did that.
str is an input, but you had it as an output.
volatile tells the compiler not to remove it - it has no output so the compiler might think it can.
You need to tell the compiler what registers are 'clobbered' by the code. I've added the obvious ones, but the system call might clobber more?
You need to put them in all one asm or the compiler might think it can move them around.
For whatever it's worth, calling "int 0x80" is making a system call.
You're just not using "printf", or using the standard C library wrappers for Linux syscalls.
And there's nothing wrong with that :)
Anyway, here's a complete example illustrating EXACTLY what you're after:
http://asm.sourceforge.net/intro/hello.html
'Hope that helps ... and have fun!

Resources