8051 with Keil-C filesize issue using a Megawin processor - 8051

I've written a program using Keil C for a MegaWin 8051 MPC82G516A. When I check the file size of the Intel generated hex file it has a size of 8kb (I see the code in the binary code window), but when I go to program the device using Megawin's tool it increases the code size to around 29kb!? Can anyone provide the reason for why it might be doing this?
Also, something else that is strange is the fact that it seems to be writing the code at the top of the processor memory and not at the start. There are like 4 bytes at the start of the code, but the complete rest of it is in the end of the memory.
Please help
Cameron.

you write that the file size of the intel hex file with your code is about 8k. Part of your program is written to the bottom of the address space.
Another part is written to the bottom of the address space.
The intel hex file not only contains the program code but also the address where that code should be written to.
You can check yourself if your file contains code for the bottom and the top of the address space.
Some information about intel hex format: http://www.keil.com/support/man/docs/oh166/oh166_ih_record.htm
If this is the case you can check the .m51 file which is generated by the linker during the build process.
This file contains information about the modules included in your programm and the addresses they are linked to.
Perhaps there is some linker setting in your project which tells the linker behave as you tell.

I was miss-reading the file size on the editor. Also, Keil's free version starts at position 2000Kb. It's part of it's limitations of the evaluation version.

Related

why need linker script and startup code?

I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.

Intel Pin Tool: Get instruction from address

I'm using Intel's Pin Tool to do some binary instrumentation, and was wondering if there an API to get the instruction byte code at a given address.
Something like:
instruction = getInstructionatAddr(addr);
where addr is the desired address.
I know the function Instruction (used in many of the simple/manual examples) given by Pin gets the instruction, but I need to know the instructions at other addresses. I perused the web with no avail. Any help would be appreciated!
CHEERS
wondering if there an API to get the instruction byte code at a given
address
Yes, it's possible but in a somewhat contrived way: with PIN you are usually interested in what is executed (or manipulated through the executed instructions), so everything outside the code / data flow is not of any interest for PIN.
PIN is using (and thus ships with) Intel XED which is an instruction encoder / decoder.
In your PIN installation you should have and \extra folder with two sub-directories: xed-ia32 and xed-intel64 (choose the one that suits your architecture). The main include file for XED is xed-interface.h located in the \include folder of the aforementioned directories.
In your Pintool, given any address in the virtual space of your pintooled program, use the PIN_SafeCopy function to read the program memory (and thus bytes at the given address). The advantage of PIN_SafeCopy is that it fails graciously even if it can't read the memory, and can read "shadowed" parts of the memory.
Use XED to decode the instruction bytes for you.
For an example of how to decode an instruction with XED, see the first example program.
As the small example uses an hardcoded buffer (namely itext in the example program), replace this hardcoded buffer with the destination buffer you used in PIN_SafeCopy.
Obviously, you should make sure that the memory you are reading really contains code.
AFAIK, it is not possible to get an INS type (the usual type describing an instruction in PIN) from an arbitrary address as only addresses in the code flow will "generate" an INS type.
As a side note:
I know the function Instruction (used in many of the simple/manual
examples) given by Pin gets the instruction
The Instruction routine used in many PIN example is called an "Instrumentation routine": its name is not relevant in itself.
Pin_SafeCopy may help you. This API could copy memory content from the address space of target process to one specified buffer.

x86 segmentation, DOS, MZ file format, and disassembling

I'm disassembling "Test Drive III". It's a 1990 DOS game. The *.EXE has MZ format.
I've never dealt with segmentation or DOS, so I would be grateful if you answered some of my questions.
1) The game's system requirements mention 286 CPU, which has protected mode. As far as I know, DOS was 90% real mode software, yet some applications could enter protected mode. Can I be sure that the app uses the CPU in real mode only? IOW, is it guaranteed that the segment registers contain actual offset of the segment instead of an index to segment descriptor?
2) Said system requirements mention 1 MB of RAM. How is this amount of RAM even meant to be accessed if the uppermost 384 KB of the address space are reserved for stuff like MMIO and ROM? I've heard about UMBs (using holes in UMA to access RAM) and about HMA, but it still doesn't allow to access the whole 1 MB of physical RAM. So, was precious RAM just wasted because its physical address happened to be reserved for UMA? Or maybe the game uses some crutches like LIM EMS or XMS?
3) Is CS incremented automatically when the code crosses segment boundaries? Say, the IP reaches 0xFFFF, and what then? Does CS switch to the next segment before next instruction is executed? Same goes for SS. What happens when SP goes all the way down to 0x0000?
4) The MZ header of the executable looks like this:
signature 23117 "0x5a4d"
bytes_in_last_block 117
blocks_in_file 270
num_relocs 0
header_paragraphs 32
min_extra_paragraphs 3349
max_extra_paragraphs 65535
ss 11422
sp 128
checksum 0
ip 16
cs 8385
reloc_table_offset 30
overlay_number 0
Why does it have no relocation information? How is it even meant to run without address fixups? Or is it built as completely position-independent code consisting from program-counter-relative instructions? The game comes with a cheat utility which is also an MZ executable. Despite being much smaller (8448 bytes - so small that it fits into a single segment), it still has relocation information:
offset 1
segment 0
offset 222
segment 0
offset 272
segment 0
This allows IDA to properly disassemble the cheat's code. But the game EXE has nothing, even though it clearly has lots of far pointers.
5) Is there even such thing as 'sections' in DOS? I mean, data section, code (text) section etc? The MZ header points to the stack section, but it has no information about data section. Is data and code completely mixed in DOS programs?
6) Why even having a stack section in EXE file at all? It has nothing but zeroes. Why wasting disk space instead of just saying, "start stack from here"? Like it is done with BSS section?
7) MZ header contains information about initial values of SS and CS. What about DS? What's its initial value?
8) What does an MZ executable have after the exe data? The cheat utility has whole 3507 bytes in the end of the executable file which look like
__exitclean.__exit.__restorezero._abort.DGROUP#.__MMODEL._main._access.
_atexit._close._exit._fclose._fflush._flushall._fopen._freopen._fdopen
._fseek._ftell._printf.__fputc._fputc._fputchar.__FPUTN.__setupio._setvbuf
._tell.__MKNAME._tmpnam._write.__xfclose.__xfflush.___brk.___sbrk._brk._sbrk
.__chmod.__close._ioctl.__IOERROR._isatty._lseek.__LONGTOA._itoa._ultoa.
_ltoa._memcpy._open.__open._strcat._unlink.__VPRINTER.__write._free._malloc
._realloc.__REALCVT.DATASEG#.__Int0Vector.__Int4Vector.__Int5Vector.
__Int6Vector.__C0argc.__C0argv.__C0environ.__envLng.__envseg.__envSize
Is this some kind of debugging symbol information?
Thank you in advance for your help.
Re. 1. No, you can't be sure until you prove otherwise to yourself. One giveaway would be the presence of MOV CR0, ... in the code.
Re. 2. While marketing materials aren't to be confused with an engineering specification, there's a technical reason for this. A 286 CPU could address more than 1M of physical address space. The RAM was only "wasted" in real mode, and only if an EMM (or EMS) driver wasn't used. On 286 systems, the RAM past 640kb was usually "pushed up" to start at the 1088kb mark. The ISA and on-board peripherals' memory address space was mapped 1:1 into the 640-1024kb window. To use the RAM from the real mode needed an EMM or EMS driver. From protected mode, it was simply "there" as soon as you set up the segment descriptor correctly.
If the game actually needed the extra 384kb of RAM over the 640kb available in the real mode, it's a strong indication that it either switched to protected mode or required the services or an EMM or EMS driver.
Re. 3. I wish I remembered that. On reflection, I wish not :) Someone else please edit or answer separately. Hah, I did know it at some point in time :)
Re. 4. You say "[the code] has lots of instructions like call far ptr 18DCh:78Ch". This implies one of three things:
Protected mode is used and the segment part of the address is a selector into the segment descriptor table.
There is code there that relocates those instructions without DOS having to do it.
There is code there that forcibly relocates the game to a constant position in the address space. If the game doesn't use DOS to access on-disk files, it can remove DOS completely and take over, gaining lots of memory in the process. I don't recall whether you could exit from the game back to the command prompt. Some games where "play until you reboot".
Re. 5. The .EXE header does not "point" to any stack, there is no stack section you imply, the concept of sections doesn't exist as far as the .EXE file is concerned. The SS register value is obtained by adding the segment the executable was loaded at with the SS value from the header.
It's true that the linker can arrange sections contiguously in the .EXE file, but such sections' properties are not included in the .EXE header. They often can be reverse-engineered by inspecting the executable.
Re. 6. The SS and SP values in the .EXE header are not file pointers. The EXE file might have a part that maps to the stack, but that's entirely optional.
Re. 7. This is already asked and answered here.
Re. 8. This looks like a debug symbol list. The cheat utility was linked with the debugging information left in. You can have completely arbitrary data there - often it'd various resources (graphics, music, etc.).

Flash size of a bare metal ARM program

How to know the flash size of a bare metal arm code. If I have the elf is it possible to know how much flash will be required to store the program? For example if I have the elf file that is supposed to go into a ARM based MCU, how can I determine how much of the MCU's flash will be consumed by the code?
The ELF headers should contain the information you need. You can use either the objdump (with -h) or readelf tool to read these. Those tools should be included with your toolchain.
Basically, you're looking to add up the size all the loadable sections, such as .text and .data. Look for the LOAD flag in the output from objdump, for example.
You can ignore non-loadable sections such as .comment, .debug and .bssĀ· Some of those are there for the benefit of the debugger, for example, and some are just placeholders for memory that will be used by the program at run time, but contains no pre-existing data.
When I say "add up the size", that's not strictly true; the linker will have already allocated each section to a specific address in flash (I'm assuming your program will run directly from ROM), so you need to find the end-address of the last section to determine how much is left.

For strict education purposes, what exact format of bytes/bits do modern BIOS understand?

BIOS will look in the first 512 bytes of the first sector(at least on PC BIOS, AmeriTrend, PhoenixBIOS, etc.), and any .bin file binary formatted block of bytes will be understood by BIOS, am I correct here?
I just want to ask this to be certain, and because I want to assure that I don't make mistakes when writing my operating system carefully.
The BIOS will be executing under the processor and native-architecture obviously, so once I instruct BIOS with the binary to have the processor move the bytes in to memory I can then transfer control to my software which will then instruct the processor on what it does next, right?
I just want to know if I have this right, and I assure you this isn't spam, as I'm a curious hobbyist who has C/C++, Java, C#, x86 Assembly, and some hardware-design experience as well.
EDIT PEOPLE: I also would like to know if there's a modernized format, file, or block of bytes the BIOS must be assembled/compiled to to be executed, such as a .bin.
As pst comment says, the boot sector is treated as i386 machine code.
The last 2 bytes need to match a special signature (0x55AA), but I think that is it as far as hard requirements.
The code just gets loaded and executed as is.
If you are trying to conform to MBR or GPT partition specs (so that other OS's can see your disk partitions) there is more to it, but that is another thing altogether.
There is no specific "file format" for a boot sector. The BIOS simply reads the raw bytes from the boot sector, and jumps to the first instruction. It is literally just a "block of bytes", the file extension (you keep mentioning .bin) is not relevant at all.

Resources