I have a simple NASM code like below. I want to set the value 43 (which is the +3 offset in trx array) to value 99.
section .data
trx db 25,21,17,43
section .text
global _start
_start:
mov [trx+3], byte 99
last:
mov rax, 60
mov rdi, 0
syscall
When i debug and the _start function passed, it works. The value 43 changed to 99.
(gdb) i var
All defined variables:
Non-debugging symbols:
0x00000000006000c4 trx
0x00000000006000c8 __bss_start
0x00000000006000c8 _edata
0x00000000006000c8 _end
(gdb) x/4b &trx
0x6000c4: 25 21 17 43
(gdb) break _start
Breakpoint 1 at 0x4000b0
(gdb) run
Starting program: /home/hexdemsion/Desktop/asm/exec
Breakpoint 1, 0x00000000004000b0 in _start ()
(gdb) stepi
0x00000000004000b8 in last ()
(gdb) x/4b &trx
0x6000c4: 25 21 17 99
Now how can i set that value directly in GDB ? I have tried this command in GDB, but still doesn't work.
(gdb) set 0x00000000006000c4+3 = 99
Left operand of assignment is not an lvalue.
(gdb) set {int}0x00000000006000c4+3 = 99
Left operand of assignment is not an lvalue.
(gdb) set {b}0x00000000006000c4+3 = 99
No symbol table is loaded. Use the "file" command.
For addition, i don't provide any debug information in assemble time.
nasm -f elf64 -o obj.o source.asm; ld -o exec obj.o
You almost had it; use set {char}(0x00000000006000c4+3) = 99.
Here's a more detailed explanation:
In gdb's set statement, the expression to the left of the = can be a convenience variable, or a register name, or an lvalue corresponding to some object in the target.
An lvalue is an object that has an address, a type, and is assignable.
A literal or computed address such as 0x00000000006000c4 or 0x00000000006000c4+3 isn't an lvalue, but you can cast it to an lvalue using *(type *)(0x00000000006000c4+3) or {type}(0x00000000006000c4+3).
Gdb knows about C primitive types, plus whatever types the executable and libraries you're debugging may contain in their symbol tables or debug sections. In your case, since you want to set a byte, you'd use C's char type.
(gdb) x/4b &trx
0x6000c4: 25 21 17 43
(gdb) set {char}(0x00000000006000c4+3) = 99
(gdb) x/4b &trx
0x6000c4: 25 21 17 99
Related
In the toy program below, I declare a variable in the .text section and writes to it, which gives a segmentation-fault, since the .text section is marked as READ-ONLY:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100a
End of assembler dump.
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401000 in start ()
(gdb)
Here is the objdump output:
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 00000200 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00000014 00402000 00402000 00000400 2**2
CONTENTS, ALLOC, LOAD, DATA
However, linking using the --omagic switch (disables READ-ONLY .text section) yields the following results:
ld --omagic -o test.exe test.obj
test.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001f 00401000 00401000 000001d0 2**4
CONTENTS, ALLOC, LOAD, CODE
1 .idata 00000014 00402000 00402000 000003d0 2**2
CONTENTS, ALLOC, LOAD, DATA
But debugging this using GDB gives the following (weird) results:
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: dec %ebp
0x00401001 <+1>: pop %edx
0x00401002 <+2>: nop
0x00401003 <+3>: add %al,(%ebx)
0x00401005 <+5>: add %al,(%eax)
0x00401007 <+7>: add %al,(%eax,%eax,1)
End of assembler dump.
(gdb) stepi
0x00401001 in start ()
(gdb) stepi
0x00401002 in start ()
(gdb) stepi
0x00401003 in start ()
(gdb) stepi
0x00401005 in start ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x00401005 in start ()
(gdb)
First of all, I still get a segmentation fault, but the assembly code has also changed structure?
How can I link the .text section as writable on Windows 10 x64?
Toy program:
BITS 32
section .text
global _start
_start:
mov [var], dword 2
var: dd 0
ret
For some reason, ld completely changes the PE executable linked using the --omagic option.
A quick comparison of the files using the cmp utility shows:
137 177 222
141 0 320
142 6 5
213 0 320
214 2 1
217 142 205
218 154 353
397 0 320
398 2 1
437 0 320
438 4 3
465 0 307
...
So lots of differences, although ld should in principle only change the sections flags of the section header (.text), i.e. set the flag IMAGE_SCN_MEM_WRITE.
Changing the flags manually using HxD, i.e. setting byte at offset 0x19F to 0xE0 solves the issue...
A trial run of the program with interchanged order of var and ret (otherwise the program crash):
Breakpoint 1, 0x00401000 in start ()
(gdb) disassemble
Dump of assembler code for function start:
=> 0x00401000 <+0>: movl $0x2,0x40100b
0x0040100a <+10>: ret
End of assembler dump.
(gdb) stepi
0x0040100a in start ()
(gdb) disassemble
Dump of assembler code for function start:
0x00401000 <+0>: movl $0x2,0x40100b
=> 0x0040100a <+10>: ret
End of assembler dump.
(gdb) x/wx var
0x40100b <var>: 0x00000002
(gdb)
and we see things work as expected.
My conclusion is that ld somehow generates a badly formatted PE executable, and I see that #RossRidge has the answer to this (ld doesn't respect the file alignment of sections).
The --omagic flag is causing the GNU linker to generate a bad PECOFF executable. Sections must aligned in the file with a minimum file alignment of 512 bytes, but the linker puts the .text section at file offset of 0x1d0.
Instead of using the --omagic flag, generate your executable normally and then use objcopy to change the flags in the section header:
ld -o test-tmp.exe test.obj
$(OBJCOPY) --set-section-flags .text=code,data,alloc,contents,load test-tmp.exe test.exe
UPDATE: Sure enough, it was a bug in the latest version of nasm. I "downgraded" and after fixing my code as shown in the answer I accepted, everything is working properly. Thanks, everyone!
I'm having problems with what should be a very simple program in 32-bit assembler on OS X.
First, the code:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
push hello
call _printf
push 0
call _exit
It assembles and links, but when I run the executable it crashes with a segmentation fault: 11.
The command lines to assemble and link are:
nasm -f macho32 hello32x.asm -o hello32x.o
I know the -o there is not 100 percent necessary
Linking:
ld -lc -arch i386 hello32x.o -o hello32x
When I run it into lldb to debug it, everything is fine until it enters into the call to _printf, where it crashes as shown below:
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x00001fac hello32x`main + 8, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x00001fac hello32x`main + 8
hello32x`main:
-> 0x1fac <+8>: calll 0xffffffff991e381e
0x1fb1 <+13>: pushl $0x0
0x1fb3 <+15>: calll 0xffffffff991fec84
0x1fb8: addl %eax, (%eax)
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
(lldb) s
Process 1029 stopped
* thread #1: tid = 0x97a4, 0x991e381e libsystem_c.dylib`vfprintf + 49, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x890a7ff8)
frame #0: 0x991e381e libsystem_c.dylib`vfprintf + 49
libsystem_c.dylib`vfprintf:
-> 0x991e381e <+49>: xchgb %ah, -0x76f58008
0x991e3824 <+55>: popl %esp
0x991e3825 <+56>: andb $0x14, %al
0x991e3827 <+58>: movl 0xc(%ebp), %ecx
As you can see toward the bottom, it stops due to a bad access error.
16-byte Stack Alignment
One serious issue with your code is stack alignment. 32-bit OS/X code requires 16-byte stack alignment at the point you make a CALL. The Apple IA-32 Calling Convention says this:
The function calling conventions used in the IA-32 environment are the same as those used in the System V IA-32 ABI, with the following exceptions:
Different rules for returning structures
The stack is 16-byte aligned at the point of function calls
Large data types (larger than 4 bytes) are kept at their natural alignment
Most floating-point operations are carried out using the SSE unit instead of the x87 FPU, except when operating on long double values. (The IA-32 environment defaults to 64-bit internal precision for the x87 FPU.)
You subtract 12 from ESP to align the stack to a 16 byte boundary (4 bytes for return address + 12 = 16). The problem is that when you make a CALL to a function the stack MUST be 16 bytes aligned just prior to the CALL itself. Unfortunately you push 4 bytes before the call to printf and exit. This misaligns the stack by 4, when it should be aligned to 16 bytes. You'll have to rework the code with proper alignment. As well you must clean up the stack after you make a call. If you use PUSH to put parameters on the stack you need to adjust ESP after your CALL to restore the stack to its previous state.
One naive way (not my recommendation) to fix the code would be to do this:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 8
push hello ; 4(return address)+ 8 + 4 = 16 bytes stack aligned
call _printf
add esp, 4 ; Remove arguments
push 0 ; 4 + 8 + 4 = 16 byte alignment again
call _exit ; This will not return so no need to remove parameters after
The code above works because we can take advantage of the fact that both functions (exit and printf) require exactly one DWORD being placed on the stack for parameters. 4 bytes for main's return address, 8 for the stack adjustment we made, 4 for the DWORD parameter = 16 byte alignment.
A better way to do this is to compute the amount of stack space you will need for all your stack based local variables (in this case 0) in your main function, plus the maximum number of bytes you will need for any parameters to function calls made by main and then make sure you pad enough bytes to make the value evenly divisible by 12. In our case the maximum number of bytes needed to be pushed for any one given function call is 4 bytes. We then add 8 to 4 (8+4=12) to become evenly divisible by 12. We then subtract 12 from ESP at the start of our function.
Instead of using PUSH to put parameters on the stack you can now move the parameters directly onto the stack into the space we have reserved. Because we don't PUSH the stack doesn't get misaligned. Since we didn't use PUSH we don't need to fix ESP after our function calls. The code could then look something like:
section .data
hello db "Hello, world", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
If you wanted to pass multiple parameters you place them manually on the stack as we did with a single parameter. If we wanted to print an integer 42 as part of our call to printf we could do it this way:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
When run we should get:
Hello, world 42
16-byte Stack Alignment and a Stack Frame
If you are looking to create a function with a typical stack frame then the code in the previous section has to be adjusted. Upon entry to a function in a 32-bit application the stack is misaligned by 4 bytes because the return address was placed on the stack. A typical stack frame prologue looks like:
push ebp
mov ebp, esp
Pushing EBP into the stack after entry to your function still results in a misaligned stack, but it is misaligned now by 8 bytes (4 + 4).
Because of that the code must subtract 8 from ESP rather than 12. As well when determining the space needed to hold parameters, local stack variables, and pad bytes for alignment the stack allocation size will have to be evenly divisible by 8, not by 12. Code with a stack frame could look like:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
default rel
global _main
extern _printf, _exit
_main:
push ebp
mov ebp, esp ; Set up stack frame
sub esp, 8 ; 16-byte align stack + room for parameters passed
; to functions we call
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
xor eax, eax ; Return value = 0
mov esp, ebp
pop ebp ; Remove stack frame
ret ; We linked with C library that calls _main
; after initialization. We can do a RET to
; return back to the C runtime code that will
; exit the program and return the value in EAX
; We can do this instead of calling _exit
Because you link with the C library on OS/X it will provide an entry point and do initialization before calling _main. You can call _exit but you can also do a RET instruction with the program's return value in EAX.
Yet Another Potential NASM Bug?
I discovered that NASM v2.12 installed via MacPorts on El Capitan seems to generate incorrect relocation entries for _printf and _exit, and when linked to a final executable the code doesn't work as expected. I observed almost the identical errors you did with your original code.
The first part of my answer still applies about stack alignment, however it appears you will need to work around the NASM issue as well. One way to do this install the NASM that comes with the latest XCode command line tools. This version is much older and only supports Macho-32, and doesn't support the default directive. Using my previous stack aligned code this should work:
section .data
hello db "Hello, world %d", 0x0a, 0x00
section .text
;default rel ; This directive isn't supported in older versions of NASM
global _main
extern _printf, _exit
_main:
sub esp, 12 ; 16-byte align stack
mov [esp+4], dword 42 ; Second parameter at esp+4
mov [esp],dword hello ; First parameter at esp+0
call _printf
mov [esp], dword 0 ; First parameter at esp+0
call _exit
To assemble with NASM and link with LD you could use:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
ld -macosx_version_min 10.8 -no_pie -arch i386 -o hello32x hello32x.o -lc
Alternatively you could link with GCC:
/usr/bin/nasm -f macho hello32x.asm -o hello32x.o
gcc -m32 -Wl,-no_pie -o hello32x hello32x.o
/usr/bin/nasm is the location of the XCode command line tools version of NASM that Apple distributes. The version I have on El Capitan with latest XCode command line tools is:
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Jan 14 2016
I don't recommend NASM version 2.11.08 because it has a serious bug related to macho64 format. I recommend 2.11.09rc2. I have tested that version here and it does seem to work properly with the code above.
I working through some example in Windows System Programming 4th. Using windbg.exe I'm trying to inspect the parameters passed to a function (GetCurrentDirectoryA). Below is the source.
int _tmain (int argc, LPTSTR argv [])
{
/* Buffer to receive current directory allows for the CR,
LF at the end of the longest possible path. */
TCHAR pwdBuffer [DIRNAME_LEN];
DWORD lenCurDir;
lenCurDir = GetCurrentDirectory (DIRNAME_LEN, pwdBuffer);
if (lenCurDir == 0)
ReportError (_T ("Failure getting pathname."), 1, TRUE);
if (lenCurDir > DIRNAME_LEN)
ReportError (_T ("Pathname is too long."), 2, FALSE);
PrintMsg (GetStdHandle (STD_OUTPUT_HANDLE), pwdBuffer);
return 0;
}
First I dump the local variables using dv -t -v. In this case I'm interested in the pwdBuffer.
0018ff3c int argc = 0n1
0018ff40 char ** argv = 0x00582470
0018fe18 unsigned long lenCurDir = 0x775b994a
0018fe24 char [262] pwdBuffer = char [262] ""
Then I set a breakpoint at Kernel32!GetCurrentDirectoryA. Which yields the following.
00 0018ff34 00428759 00000001 00582470 005824c0 kernel32!GetCurrentDirectoryA
What I don't understand is value of the parameters to the Function. I was expecting to see 0018fe24 as one value representing pwdbuffer.
The next thing I do is gu. Which executes Kernel32!GetCurrentDirectoryA to its end.
Thereafter I dumped the pwdBuffer value that I got initially with the dv -v -t command.
0:000> da 0018fe24
0018fe24 "C:\microsoft_press\WSP4_Examples"
0018fe44 "\Utility_4_dll"
This is what I expect from the buffer. So my question is why didn't I see this 0018fe24 value passed to GetCurrentDirectory?
Try single stepping past the mov ebp, esp instruction at the start of GetCurrentDirectoryA. The numbers you're seeing look like values from your _tmain function, specifically, its frame pointer (EBP), its return address, and its arguments argc and argv (along with the hidden envp parameter). Once EBP is loaded with the correct frame pointer for GetCurrentDirectoryA, windbg may be able to display the function's arguments correctly.
The stack should show the parameters to the function on hitting the break-point not after you single step i just had similar code (without crt window Apis only) and ran it through and windbg works as expected
when analyzing unknown or potentially malware binaries one unthoughtful single step can result in fatal infection. If logic exists don't use lucky charms.
my current directory
:\>echo %cd%
C:\temp\temp\temp\temp
contents of current directory
:\>ls -l
total 12
-rwxrwxrwx 1 Admin 0 197 2015-10-10 16:29 compile.bat
-rw-rw-rw- 1 Admin 0 336 2015-10-10 16:13 getdir.cpp
-rw-rw-rw- 1 Admin 0 145 2015-10-10 16:47 wtf.txt
src code for test
:\>type getdir.cpp
#include <windows.h>
int main (void) {
PCHAR buff=0;int bufflen=0;
bufflen=GetCurrentDirectory(0,NULL);
buff = (PCHAR)VirtualAlloc(NULL,bufflen,MEM_COMMIT,PAGE_READWRITE);
if(buff){
GetCurrentDirectory(bufflen,buff);
MessageBox(NULL,buff,"Current Directory",MB_OK);
VirtualFree(buff,0,MEM_RELEASE);
}
}
compiled with
:\>type compile.bat
#call "C:\Program Files\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" x86
cl /Zi /EHsc /O2 /nologo /W4 /analyze *.cpp /link /SUBSYSTEM:Windows /RELEASE /E
NTRY:main user32.lib kernel32.lib
pause
:\>compile.bat
Setting environment for using Microsoft Visual Studio 2010 x86 tools.
getdir.cpp
Press any key to continue . . .
executed with
:\>cdb -cf wtf.txt getdir.exe
-cf command line to windbg / cdb takes a file whose content will be executed as if you type them at the prompt
the contents of wtf.txt is
:\>type wtf.txt
bp kernel32!VirtualAlloc
g
gu
? #eax
bc *
bp kernel32!GetCurrentDirectoryA
g
dd #esp l3
r $t0 = poi(#esp+8)
? #$t0
gu
da #$t0;
g
q
on the first system break
set a breakpoint on virtualalloc and run the binary
when the breakpoint is hit goup (we are interested only in the return value from this function ) and inspect eax (return value from function)
clear all breakpoints
set a breakpoint in GetCurrentDirectoryA
execute the binary again with g on hitting the breakpoint inspect the stack
with dd #esp l3
(display three dwords from Stack pointer one return address and two function parameters to the Function GetCurrentDirectoryA()
note the stack will contain the same address we previously inspected at the return of VirtualAlloc using ? #eax
save the address of buffer to a pseudo variable and go up
print the ascii string from the buffer da #$t0
exit the session
the result of this session is as follows note we got 35000 as the allocated memory address of buffer from virtual alloc and that was indeed passed to GetCurrentDirectory and that hold the string Current directory
:\>cdb -cf wtf.txt getdir.exe
0:000> bp kernel32!VirtualAlloc
0:000> g
Breakpoint 0 hit
kernel32!VirtualAlloc:
7c809af1 8bff mov edi,edi
0:000> gu
getdir!main+0x21:
00401021 8bf0 mov esi,eax
0:000> ? #eax
Evaluate expression: 3473408 = 00350000 <--------
0:000> bc *
0:000> bp kernel32!GetCurrentDirectoryA
0:000> g
Breakpoint 0 hit
kernel32!GetCurrentDirectoryA:
7c83502e 8bff mov edi,edi
0:000> dd #esp l3
0013ffac 0040102b 00000017 00350000 <------
0:000> r $t0 = poi(#esp+8)
0:000> ? #$t0
Evaluate expression: 3473408 = 00350000 <----------
0:000> gu
getdir!main+0x2b:
0040102b 6a00 push 0
0:000> da #$t0;
00350000 "C:\temp\temp\temp\temp"
0:000> g
edit all others being same just added a kb command to the script file and executed to show the stacktrace
Sorry for my bad English.
My workflow:
I write simple program for gnu asm (GAS) test_c.s:
.intel_syntax noprefix
.globl my_string
.data
my_string:
.ascii "Hello, world!\0"
.text
.globl main
main:
push rbp
mov rbp, rsp
sub rsp, 32
lea rcx, my_string
call printf
add rsp, 32
pop rbp
ret
Compile asm-source with debug symbols:
gcc -g test_c.s
Debug a.exe in GDB:
gdb a -q
Reading symbols from C:\a.exe...done.
(gdb) start
Temporary breakpoint 1 at 0x4014e4: file test_c.s, line 14.
Starting program: C:\a.exe
[New Thread 3948.0x45e4]
Temporary breakpoint 1, main () at test_c.s:14
14 sub rsp, 32
(gdb) whatis my_string
type = <data variable, no debug info> <-------------------- why?
(gdb) info variables
All defined variables:
...
Non-debugging symbols:
0x0000000000403000 __data_start__
0x0000000000403000 __mingw_winmain_nShowCmd
0x0000000000403010 my_string <-------------------- why?
....
Why 'my_string' is 'no debug info'-variable?
How can I recognize, that 'my_string' is user defined variable? Some gcc-flags or gas-directives?
P.S.: The file test_c.s listed above is generated by gcc from simple c application test_c.c:
#include<stdio.h>
char my_string[] = "Hello, world!";
int main(void)
{
printf(my_string);
}
gcc test_c.c -S -masm=intel
I try to debug this C-application and get expected result:
gcc -g test_c.c
gdb a -q
Reading symbols from C:\a.exe...done.
(gdb) start
Temporary breakpoint 1 at 0x4014ed: file test_c.c, line 7.
Starting program: C:\a.exe
[New Thread 11616.0x1688]
Temporary breakpoint 1, main () at test_c.c:7
7 printf(my_string);
(gdb) whatis my_string
type = char [18] <-------------------- OK
(gdb) info variables
...
File test_c.c:
char my_string[18]; <-------------------- OK
...
The problem is that I need for debug information related to the GAS-source, not C
P.S.S.: MinGW-builds x64 v.4.8.1
The reason is simple: you should have generated the asm file from the c file with debugging enabled, that is gcc test_c.c -S -masm=intel -g, to have the compiler emit the required information. If you do that, you will notice a section named .debug_info in your asm source, which, unfortunately, isn't user friendly.
I am getting unexpected global variable read results when compiling the following code in avr-gcc 4.6.2 for ATmega328:
#include <avr/io.h>
#include <util/delay.h>
#define LED_PORT PORTD
#define LED_BIT 7
#define LED_DDR DDRD
uint8_t latchingFlag;
int main() {
LED_DDR = 0xFF;
for (;;) {
latchingFlag=1;
if (latchingFlag==0) {
LED_PORT ^= 1<<LED_BIT; // Toggle the LED
_delay_ms(100); // Delay
latchingFlag = 1;
}
}
}
This is the entire code. I would expect the LED toggling to never execute, seeing as latchingFlag is set to 1, however the LED blinks continuously. If latchingFlag is declared local to main() the program executes as expected: the LED never blinks.
The disassembled code doesn't reveal any gotchas that I can see, here's the disassembly of the main loop of the version using the global variable (with the delay routine call commented out; same behavior)
59 .L4:
27:main.cpp **** for (;;) {
60 .loc 1 27 0
61 0026 0000 nop
62 .L3:
28:main.cpp **** latchingFlag=1;
63 .loc 1 28 0
64 0028 81E0 ldi r24,lo8(1)
65 002a 8093 0000 sts latchingFlag,r24
29:main.cpp **** if (latchingFlag==0) {
66 .loc 1 29 0
67 002e 8091 0000 lds r24,latchingFlag
68 0032 8823 tst r24
69 0034 01F4 brne .L4
30:main.cpp **** LED_PORT ^= 1<<LED_BIT; // Toggle the LED
70 .loc 1 30 0
71 0036 8BE2 ldi r24,lo8(43)
72 0038 90E0 ldi r25,hi8(43)
73 003a 2BE2 ldi r18,lo8(43)
74 003c 30E0 ldi r19,hi8(43)
75 003e F901 movw r30,r18
76 0040 3081 ld r19,Z
77 0042 20E8 ldi r18,lo8(-128)
78 0044 2327 eor r18,r19
79 0046 FC01 movw r30,r24
80 0048 2083 st Z,r18
31:main.cpp **** latchingFlag = 1;
81 .loc 1 31 0
82 004a 81E0 ldi r24,lo8(1)
83 004c 8093 0000 sts latchingFlag,r24
27:main.cpp **** for (;;) {
84 .loc 1 27 0
85 0050 00C0 rjmp .L4
The lines 71-80 are responsible for port access: according to the datasheet, PORTD is at address 0x2B, which is decimal 43 (cf. lines 71-74).
The only difference between local/global declaration of the latchingFlag variable is how latchingFlag is accessed: the global variable version uses sts (store direct to data space) and lds (load direct from data space) to access latchingFlag, whereas the local variable version uses ldd (Load Indirect from Data Space to Register) and std (Store Indirect From Register to Data Space) using register Y as the address register (which can be used as a stack pointer, by avr-gcc AFAIK). Here are the relevant lines from the disassembly:
63 002c 8983 std Y+1,r24
65 002e 8981 ldd r24,Y+1
81 004a 8983 std Y+1,r24
The global version also has latchingFlag in the .bss section. I am really not what to attribute the different global vs. local variable behavior to. Here's the avr-gcc command-line (notice -O0):
/usr/local/avr/bin/avr-gcc \
-I. -g -mmcu=atmega328p -O0 \
-fpack-struct \
-fshort-enums \
-funsigned-bitfields \
-funsigned-char \
-D CLOCK_SRC=8000000UL \
-D CLOCK_PRESCALE=8UL \
-D F_CPU="(CLOCK_SRC/CLOCK_PRESCALE)" \
-Wall \
-ffunction-sections \
-fdata-sections \
-fno-exceptions \
-Wa,-ahlms=obj/main.lst \
-Wno-uninitialized \
-c main.cpp -o obj/main.o
With -Os compiler flags the loop is gone from the disassembly, but can be forced to be there again if latchingFlag is declared volatile, in which case the unexpected persists for me.
According to your disassembler listing, latchingFlag global variable is located at RAM address 0. This address corresponds to mirrored register r0 and is not a valid RAM address for global variable.
After couple checks and code compares in EE chat I noticed that my version of avr-gcc (4.7.0) stores the value for latchFlag in 0x0100, whereas Egor Skriptunoff mentioned SRAM addres 0 being in OP's assembly listing.
Looking at OP's disassembly (the avr-dump version), I noticed that OP's compiler (4.6.2) stores latchFlag value in a different address (specifically, 0x060) than my compiler (version 4.7.0), which stores latchFlag value at address 0x0100.
My advice is to update the avr-gcc version to at least version 4.7.0. The advantage of 4.7.0 rather than latest and greatest available is the ability to compare the generated code again with my findings.
Of course if 4.7.0 solves the issue, then there is harm in upgrading to a more recent version (if available).
Egor Skriptunoff suggestion is almost exactly right: the SRAM variable is mapped to the wrong memory address. The latchingFlag variable is not at 0x0100 address, which is the first valid SRAM address, but is mapped to 0x060, overlapping the WDTCSR register. This can be seen in the disassembly lines like the following one:
lds r24, 0x0060
THis line is supposed to load the value of latchingFlag from SRAM, and we can see that location 0x060 is used instead of 0x100.
The problem has to with a bug in the binutils which two conditions are met:
The linker is invoked with --gc-sections flag (compiler options: -Wl,--gc-sections) to save code space
None of your SRAM variables are initialized (i.e. initialized to non-zero values)
When both of these conditions are met, the .data section gets removed. When the .data section is missing, the SRAM variables start at address 0x060 instead of 0x100.
One solution is to reinstall binutils: the current versions have this bug fixed. Another solution is to edit your linker scripts: on Ubuntu this is probably in /usr/lib/ldscripts. For ATmega168/328 the script that needs to be edited is avr5.x, but you should really edit all them, otherwise you could run into this bug on other AVR platforms. The change that needs to be made is the following one:
.data : AT (ADDR (.text) + SIZEOF (.text))
{
PROVIDE (__data_start = .) ;
- *(.data)
+ KEEP(*(.data))
So replace the line *(.data) with KEEP(*(.data)). This ensures that the .data section is not discarded, and consequently the SRAM variable addresses start at 0x0100