binary containing just one instruction - gcc

I want to produce a binary file to load to mips target (barebone/no os) that will contain only one jump instruction and will be linked to the specific address. How to do it? With all my attempts, I get quite big files, that apparently contain linked libraries (they are unnecessary for one single jump instruction). I believe I shall get a file of just few bytes in length. How to do it?
Thanks.

Related

How to link a program in GCC to prelinked library?

OK, I have the problem, I do not know exactly the correct terms in order to find what I am looking for on google. So I hope someone here can help me out.
When developing real time programs on embedded devices you might have to iterate a few hundred or thousand times until you get the desired result. When using e.g. ARM devices you wear out the internal flash quite quickly. So typically you develop your programs to reside in the RAM of the device and all is ok. This is done using GCC's functionality to split the code in various sections.
Unfortunately, the RAM of most devices is much smaller than the flash. So at one point in time, your program gets too big to fit in RAM with all variables etc. (You choose the size of the device such that one assumes it will fit the whole code in flash later.)
Classical shared objects do not work as there is nothing like a dynamical linker in my environment. There is no OS or such.
My idea was the following: For the controller it is no problem to execute code from both RAM and flash. When compiling with the correct attributes for the functions this is also no big problem for the compiler to put part of the program in RAM and part in flash.
When I have some functionality running successfully I create a library and put this in the flash. The main development is done in the 'volatile' part of the development in RAM. So the flash gets preserved.
The problem here is: I need to make sure, that the library always gets linked to the exact same location as long as I do not reflash. So a single function must always be on the same address in flash for each compile cycle. When something in the flash is missing it must be placed in RAM or a lining error must be thrown.
I thought about putting together a real library and linking against that. Here I am a bit lost. I need to tell GCC/LD to link against a prelinked file (and create such a prelinked file).
It should be possible to put all the library objects together and link this together in the flash. Then the addresses could be extracted and the main program (for use in RAM) can link against it. But: How to do these steps?
In the internet there is the term prelink as well as a matching program for linux. This is intended to speed up the loading times. I do not know if this program might help me out as a side effect. I doubt it but I do not understand the internals of its work.
Do you have a good idea how to reach the goal?
You are solving a non-problem. Embedded flash usually has a MINIMUM write cycle of 10,000. So even if you flash it 20 times a day, it will last a year and half. An St-Nucleo is $13. So that's less than 3 pennies a day :-). The TYPICAL write cycle is even longer, at about 100,000. It will be a long time before you wear them out.
Now if you are using them for dynamic storage, that might be a concern, depending on the usage patterns.
But to answer your questions, you can build your code into a library .a file easily enough. However, GCC does not guarantee that it links the object code in any order, as it depends on optimization level. Furthermore, only functions that are referenced in a library file is pulled in, so if your function calls change, it may pull in more or less library functions.

How are Opcodes & Operands Defined in a CPU?

Extensive searching has sent me in a loop over the course of 3 days, so I'm depending on you guys to help me catch a break.
Why exactly does one 8-bit sequence of high's and low's perform this action, and 8-bit sequence performs that action.
My intuition tells me that the CPU's circuitry hard-wired one binary sequence to do one thing, and another to do another thing. That would mean different Processor's with potentially different chip circuitry wouldn't define one particular binary sequence as the same action as another?
Is this why we have assembly? I need someone to confirm and/or correct my hypothesis!
Opcodes are not always 8 bits but yes, it is hardcoded/wired in the logic to isolate the opcode and then send you down a course of action based on that. Think about how you would do it in an instruction set simulator, why would logic be any different? Logic is simpler than software languages, there is no magic there. ONE, ZERO, AND, OR, NOT thats as complicated as it gets.
Along the same lines if I was given an instruction set document and you were given an instruction set document and told to create a processor or write an instruction set simulator. Would we produce the exact same code? Even if the variable names were different? No. Ideally we would have programs that are functionally the same, they both parse the instruction and execute it. Logic is no different you give the spec to two engineers you might get two different processors that functionally are the same, one might perform better, etc. Look at the long running processor families, x86 in particular, they re-invent that every couple-three years being instruction set compatible for the legacy instructions while sometimes adding new instructions. Same for ARM and others.
And there are different instruction sets ARM is different from x86 is different from MIPS, the opcodes and/or bits you examine in the instruction vary, for none of these can you simply look at 8 bits, each you have some bits then if that is not enough to uniquely identify the instruction/operation then you need to examine some more bits, where those bits are what the rules are are very specific to each architecture. Otherwise what would be the point of having different names for them if they were the same.
And this information was out there you just didnt look in the right places, there are countless open online courses on the topic, books that google should hit some pages on, as well as open source processor cores you can look at and countless instruction set simulators with source code.

Forth, interpreted or compiled?

Supposedly Forth programs can be "compiled" but I don't see how that is true if they have words that are only evaluated at runtime. For example, there is the word DOES> which stores words for evaluation at runtime. If those words include an EVALUATE or INTERPRET word then there will be a runtime need for the dictionary.
To support such statements it would mean the entire word list (dictionary) would have to be embedded inside the program, essentially what interpreted programs (not compiled programs) do.
This would seem to prevent you from compiling small programs using Forth because the entire dictionary would have to be embedded in the program, even if you used only a fraction of the words in the dictionary.
Is this correct, or is there some way to compile Forth programs without embedding the dictionary? (maybe by not using runtime words at all ??)
Forth programs can be compiled with or without word headers. The headers include the word names (called "name space").
In the scenario you describe, where the program may include run-time evalutation calls such as EVALUATE, the headers will be needed.
The dictionary can be divided into three logically distinct parts: name space, code space, and data space. Code and data are needed for program execution, names are usually not.
A normal Forth program will usually not do any runtime evaluation. So in most cases, the names aren't needed in a compiled program.
The code after DOES> is compiled, so it's not evaluated at run time. It's executed at run time.
Even though names are included, they usually don't add much to program size.
Many Forths do have a way to leave out the names from a program. Some have a switch to remove word headers (the names). Other have cross compilers which keep the names in the host system during compile time, but generate target code without names.
No, the entire dictionary need not be embedded, nor compiled. All that need remain is just the list of words used, and their parent words, (& grandparents, etc.). And the even names of the words aren't necessary, the word locations are enough. Forth code compiled by such methods can be about as compact as it gets, rivaling or even surpassing assembly language in executable size.
Proof by example: Tom Almy's ForthCMP, an '80s-'90s MSDOS compiler that shrunk executable code way down. Its README says:
. Compiles Forth into machine code -- not interpreted.
. ForthCMP is written in Forth so that Forth code can be executed
during compilation, as is customary in Forth applications.
. Very fast -- ForthCMP compiles Forth code into an executable file
in a single pass.
. Generated code is extremely compact. Over 110 Forth "primitives"
are compiled in-line. ForthCMP performs constant expression
folding, strength reduction, register optimization, DO...LOOP
optimization, tail recursion, and various "peephole"
optimizations.
. Built-in assembler.
4C.COM runs under emulators like dosemu or dosbox.
A "Hello World" compiles into a 117 byte .COM file, a wc program compiles to a 3K .COM file (from 5K of source code). No dictionary or external libraries, (aside from standard MSDOS calls, i.e. the OS it runs on).
Forth can be a bear to get your head around from the outside because there is NO standard implementation of the language. Much of what people see are from the early days of Forth when the author (Charles Moore) was still massaging his own thoughts. Or worse, homemade systems that people call Forth because it has a stack but are really not Forth.
So is Forth Interpreted or Compiled?
Short answer: both
Early years:
Forth had a text interpreter facing the programmer. So Interpreted: Check
But... The ':' character enabled the compiler which "compiled" the addresses of the words in the language so it was "compiled" but not as native machine code. It was lists of addresses where the code was in memory. The clever part was that those addresses could be run with a list "interpreter" that was only 2 or 3 instructions on most machines and a few more on an old 8 bit CPU. That meant it was still pretty fast and quite space efficient.
These systems are more of an image system so yes the system goes along with your program but some of those system kernels were 8K bytes for the entire run-time including the compiler and interpreter. Not heavy lifting.
This is what most people think of as Forth. See JonesForth for a literate example. (This was called "threaded code" at the time, not to be confused with multi-threading)
1990ish
Forth gurus and Chuck Moore began to realize that a Forth language primitive could be as little as one machine instruction on modern machines so why not just compile the instruction rather than the address. This became very useful with 32bit machines since the address was sometimes bigger than the instruction. They could then replace the little 3 instruction interpreter with the native CALL/Return instructions of the processor. This was called sub-routine threading. The front end interpreter did not disappear. It simply kicked off native code sub-routines
Today
Commercial Forth systems generate native code, inline many/most primitives and do many of the other optimization tricks you see in modern compilers.
They still have an interpreter facing the programmer. :-)
You can also buy (or build) Forth cross-compilers that create standalone executables for different CPUs that include multi-tasking, TCP/IP stacks and guess what, that text interpreter can be compiled into the executable as an option for remote debugging and configuration if you want it.
So is Forth Interpreted or Compiled? Still both.
You are right that a program that executes INTERPRET (EVALUATE, LOAD, INCLUDE etc.) is obliged to have a dictionary. That is hardly a disadvantage because even a 64 bit executable is merely a 50 K for Linux or MS-Windows. Modern single board computer like the MSP430 can have the whole dictionary in flash memory. See ciforth and noforth respectively. Then there is scripting. If you use Forth as a scripting language, it is similar to perl or python The script is small, and doesn't contain the whole language. It requires though that the language is installed on your computer.
In case of really small computers you can resort to cross compiling or using an umbellical Forth where the dictionary is positioned on a host computer and communicates and programs via a serial line. These are special techniques that are normally not needed. You can't use INTERPRETing code in those cases on the sbc, because obviously there is no dictionary there.
Note: mentioning the DOES> instruction doesn't serve to make the question clearer. I recommend that you edit this out.

Executable size on MAC

I somehow managed to produce a MAC executable of half the usual size using Lazarus, but nobody knows how. I've inspected the executable using nm, but found no relevant differences:
exactly the same maximum address
nearly exactly the same number of lines (123316 vs 123318)
randomly chosen inspected lines were either the same or differed slightly in the address
I wonder how can I find out more. I have no idea about MAC and similarly for binary formats.

Windows API calls from assembly while minimizing program size

I'm trying to write a program in assembly and make the resulting executable as small as possible. Some of what I'm doing requires windows API calls to functions such as WriteProcessMemory. I've had some success with calling these functions, but after compiling and linking, my program comes out in the range of 14-15 KB. (From a source of less than 1 KB) I was hoping for much, much less than that.
I'm very new to doing low level things like this so I don't really know what would need to be done to make the program smaller. I understand that the exe format itself takes up quite a bit of space. Can anything be done to minimize that?
I should mention that I'm using NASM and GCC but I can easily change if that would help.
See Tiny PE for a bunch of tips and tricks you can use to reduce the final size of your executable. Be warned that some of the later techniques in that article are extremely fragile.
The default section alignment for most PE files is 4K to align with the natural system memory layout. If you have a .data, .text and .resource section - that's 12K already. Most of it will be 0's and a waste of space.
There are a few things you can do to minimize this waste. First, reduce the section alignment to 512 bytes (don't know the options needed for nasm/gcc). Second, merge the sections so that you only have a single .text section. This can be a problem though for modern machines with the NX bit turned on. This security feature prevents modification of executable sections of code from things like viruses.
There are also a slew of PE compression tools out there that will compact your PE and decompress it when executed.
I suggest using the DumpBin utility (or GNU's objdump) to determine what takes the most space. It may be resource files, huge global variables or something like that.
FWIW, the smallest programs I can assemble using ML or ML64 are on the order of 3kb. (That's just saying hello world and exiting.)
Give me a small C program (not C++), and I'll show you how to make a 1 ko .exe with it. The smallest size of executable I recommend is 1K, because it will fail to run on some Windows if it's not at least this size.
You merely have to play with linker switches to make it happen!
A good linker to do this is polink.
And if you do everything in Assembly, it's even easier. Just go to the MASM32 forum and you'll see plenty of programs like this.

Resources