Compiling Binary - compilation

I would like to get a feel of how computers originally worked. I know initially with computers such as ENIAC, they physically had to plug in wires in the correct order to make their programs execute. They later used punch cards and finally then came up with assembly language(s). It just build upward from there with FORTRAN, COBOL, etc. Is there any way I am compile 0s and 1s on my computer. If I open textedit, and type in a specific sequence of zeroes and ones, then is how can I make that a binary file and not a text with a sequence of ASCII characters? I am open to any method. (Disclaimer: I know doing things in binary takes forever, I just want to learn how to very basic things.)

The easiest way to do this is to start with an assembler of your choice, in an IDE if you like. Use some sort of debugger (such as an IDE) so you can see the effect of your code without also having to write to console or file.
Rather than writing only binary as text digits, write a complete assembler source using data elements instead of instructions.
So, instead of
.code
main proc
mov eax,5
add eax,6
main endp
end main
you could write:
main proc
db 10111000b, 00000101b, 00000000b, 00000000b, 00000000b
db 10000011b, 11000000b, 00000110b
main endp
end main
db means define byte and the b suffix means binary.
And, with this, you'd be all set up to cheat, but I won't tell you how until you ask so I don't spoil the fun for you.
Here is a good tutorial for getting started on Windows with MASM and Visual Studio 2015.

The way to create a binary data stream depends heavily on its purpose.
Binary data itself is not much of magic. You can take any hex-editor and start typing the desired binary input.
But this is not how computers are programmed nowadays. If you really want to go to the lowest level, you can have a look on assembly programming, which basically allows you to tell your machine the exact instructions it should execute in a more handy way.
But even here you won't have much fun. If you want to be able to actually execute your programs and see some results on your display or perhaps even things like keyboard input, the code would grow really large and hard to write and understand for humans.
This is why we use compilers. Compilers generate such code from a high level language and eliminate the need to write the smallest instruction blocks over and over again.
If you really just want to understand how computers work in principle, download some emulator for a simple CPU (perhaps with a nice GUI)
and play around with it. Edumips is one of those emulators for educational purpose.

Related

Is there any way to know if a DOS application uses graphics mode in it or not?

I am currently trying to create a program that detects if a DOS executable uses non-text mode in it or not.
I am uncertain if there is any semantics for it in the header of the 16-bit executable but the only way I found out (for now), is to check the value of AX every time int 10h(CD10) is found in the code (that is by using some debugger like technology)
Is there any easier and more efficient way to know if a DOS executable might use graphics mode in it or is this the only possible way to achieve it?
Also, is it possible to execute over to a specific offset in the code (position in the exe file) and then find the value of AX at that point of time (programmatically)?
The proposed solution/code can be purely written in Windows (if needed)
(I personally feel the method proposed by me might not be 100% accurate as the value of AX might depend on the flow of the program)

Modifying code during a debugging session.

Does anyone know of a Debugger or Programming Language that allows you to set a break point, and then modify the code and then execute the newly modified code.
This is even more useful if the Debugger also had the ability for reverse debugging. So you could step though the buggy code, stack backwards, fix the code, and then step though it again to see if you fixed the bug. Now that's sexy, is anyone doing this?
I believe the Hot Code Replace in eclipse is what you meant in the problem:
The idea is that you can start a debugging session on a given runtime
workbench and change a Java file in your development workbench, and
the debugger will replace the code in the receiving VM while it is
running. No restart is required, hence the reference to "hot".
But there are limitations:
HCR only works when the class signature does not change; you cannot
remove or add fields to existing classes, for instance. However, HCR
can be used to change the body of a method.
The totalview debugger provides the concept of Evaluation Point which allows user to "fix his code on the fly" or to "patch it" or to examine what if scenario without having to recompile.
Basically, user plants an Evaluation Point at some line and writes a piece of C/C++ or Fortran code he wants to execute instead. Could be a simple printf, goto, a set of if-then-else tests, some for loops etc... This is really powerful and time-sparing.
As for reverse-debugging, it's a highly desirable feature, but I'm not sure it already exists.
http://msdn.microsoft.com/en-us/library/bcew296c%28v=vs.80%29.aspx
The link is for VS 2005 but applies to 2008 and 2010 as well.
Edit, 2015: Read chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming, it answers the question in detail.
The Python debugger, Pdb, allows you to run arbitrary code while paused (like at a breakpoint). For example, let's say you are debugging and have paused at the following line in your program, where the variable hasn't been declared in the program itself :
print (x)
so that moving forward (i.e., running that line) would result in :
NameError: name 'x' is not defined
You can define that variable in the debugger, and have the program continue executing with it :
(Pdb) 'x' in locals()
False
(Pdb) x = 1
(Pdb) 'x' in locals()
True
If you meant that the change should not be provided at the debugger console, but that you want to change the original code in some editor, then have the debugger automatically update the state of the live program in some way, so that the executing program reflects that change, that is called "live programming". (Not to be confused with "live coding" which is live performance of coding -- see TOPLAP -- though there is some confusion.) There has been an interest in research into live programming (and live coding) in the last 2 or 3 years. It is a very difficult problem to solve, and there are many different approaches. You can watch Bret Victor's talk, Inventing on Principle, for some examples of that. Note that those are prototypes only, to illustrate the idea. Hot-swapping of code so that the tree is drawn differently in the next loop of some draw() function, or so that the game character responds differently next time, (or so that the music or visuals are changed during a live coding session), is not that difficult, some languages and systems cater for that explicitly. However, the state of the program is not necessarily then a true reflection of the code (as also in the Pdb example above) -- if e.g. the game character could access an area based on some ability like jumping, and the code is then swapped out, he might never be able to access that area in the game any longer should the game be played from the start. To solve change propagation for general programming is difficult -- you can see that his search example re-runs the code from the start each time a change is made.
True reverse execution is also a tricky problem. There are a number of commercial projects, but almost all of them only record trace data to browse it afterwards, called omniscient debugging (but they are often called reverse-, back-in-time, bidirectional- or time-travel-debuggers, also a lot of confusion). In terms of free and open-source projects, the GNU debugger, gdb, has two modes, one is process record and replay which also only records the program for browsing it afterwards, the other is true reverse debugging which allows you to reverse in a live program. It is extremely slow, as it undoes single machine instruction at a time. The extended python debugger prototype, epdb, also allows for true reversing in a live program, and is much faster as it uses a snapshot/checkpoint and replay mechanism. Here is the thesis and here is the program and the code.

Are there any good reference implementations available for command line implementations for embedded systems?

I am aware that this is nothing new and has been done several times. But I am looking for some reference implementation (or even just reference design) as a "best practices guide". We have a real-time embedded environment and the idea is to be able to use a "debug shell" in order to invoke some commands. Example: "SomeDevice print reg xyz" will request the SomeDevice sub-system to print the value of the register named xyz.
I have a small set of routines that is essentially made up of 3 functions and a lookup table:
a function that gathers a command line - it's simple; there's no command line history or anything, just the ability to backspace or press escape to discard the whole thing. But if I thought fancier editing capabilities were needed, it wouldn't be too hard to add them here.
a function that parses a line of text argc/argv style (see Parse string into argv/argc for some ideas on this)
a function that takes the first arg on the parsed command line and looks it up in a table of commands & function pointers to determine which function to call for the command, so the command handlers just need to match the prototype:
int command_handler( int argc, char* argv[]);
Then that function is called with the appropriate argc/argv parameters.
Actually, the lookup table also has pointers to basic help text for each command, and if the command is followed by '-?' or '/?' that bit of help text is displayed. Also, if 'help' is used for a command, the command table is dumped (possible only a subset if a parameter is passed to the 'help' command).
Sorry, I can't post the actual source - but it's pretty simple and straight forward to implement, and functional enough for pretty much all the command line handling needs I've had for embedded systems development.
You might bristle at this response, but many years ago we did something like this for a large-scale embedded telecom system using lex/yacc (nowadays I guess it would be flex/bison, this was literally 20 years ago).
Define your grammar, define ranges for parameters, etc... and then let lex/yacc generate the code.
There is a bit of a learning curve, as opposed to rolling a 1-off custom implementation, but then you can extend the grammar, add new commands & parameters, change ranges, etc... extremely quickly.
You could check out libcli. It emulates Cisco's CLI and apparently also includes a telnet server. That might be more than you are looking for, but it might still be useful as a reference.
If your needs are quite basic, a debug menu which accepts simple keystrokes, rather than a command shell, is one way of doing this.
For registers and RAM, you could have a sub-menu which just does a memory dump on demand.
Likewise, to enable or disable individual features, you can control them via keystrokes from the main menu or sub-menus.
One way of implementing this is via a simple state machine. Each screen has a corresponding state which waits for a keystroke, and then changes state and/or updates the screen as required.
vxWorks includes a command shell, that embeds the symbol table and implements a C expression evaluator so that you can call functions, evaluate expressions, and access global symbols at runtime. The expression evaluator supports integer and string constants.
When I worked on a project that migrated from vxWorks to embOS, I implemented the same functionality. Embedding the symbol table required a bit of gymnastics since it does not exist until after linking. I used a post-build step to parse the output of the GNU nm tool for create a symbol table as a separate load module. In an earlier version I did not embed the symbol table at all, but rather created a host-shell program that ran on the development host where the symbol table resided, and communicated with a debug stub on the target that could perform function calls to arbitrary addresses and read/write arbitrary memory. This approach is better suited to memory constrained devices, but you have to be careful that the symbol table you are using and the code on the target are for the same build. Again that was an idea I borrowed from vxWorks, which supports both teh target and host based shell with the same functionality. For the host shell vxWorks checksums the code to ensure the symbol table matches; in my case it was a manual (and error prone) process, which is why I implemented the embedded symbol table.
Although initially I only implemented memory read/write and function call capability I later added an expression evaluator based on the algorithm (but not the code) described here. Then after that I added simple scripting capabilities in the form of if-else, while, and procedure call constructs (using a very simple non-C syntax). So if you wanted new functionality or test, you could either write a new function, or create a script (if performance was not an issue), so the functions were rather like 'built-ins' to the scripting language.
To perform the arbitrary function calls, I used a function pointer typedef that took an arbitrarily large (24) number of arguments, then using the symbol table, you find the function address, cast it to the function pointer type, and pass it the real arguments, plus enough dummy arguments to make up the expected number and thus create a suitable (if wasteful) maintain stack frame.
On other systems I have implemented a Forth threaded interpreter, which is a very simple language to implement, but has a less than user friendly syntax perhaps. You could equally embed an existing solution such as Lua or Ch.
For a small lightweight thing you could use forth. Its easy to get going ( forth kernels are SMALL)
look at figForth, LINa and GnuForth.
Disclaimer: I don't Forth, but openboot and the PCI bus do, and I;ve used them and they work really well.
Alternative UI's
Deploy a web sever on your embedded device instead. Even serial will work with SLIP and the UI can be reasonably complex ( or even serve up a JAR and get really really complex.
If you really need a CLI, then you can point at a link and get a telnet.
One alternative is to use a very simple binary protocol to transfer the data you need, and then make a user interface on the PC, using e.g. Python or whatever is your favourite development tool.
The advantage is that it minimises the code in the embedded device, and shifts as much of it as possible to the PC side. That's good because:
It uses up less embedded code space—much of the code is on the PC instead.
In many cases it's easier to develop a given functionality on the PC, with the PC's greater tools and resources.
It gives you more interface options. You can use just a command line interface if you want. Or, you could go for a GUI, with graphs, data logging, whatever fancy stuff you might want.
It gives you flexibility. Embedded code is harder to upgrade than PC code. You can change and improve your PC-based tool whenever you want, without having to make any changes to the embedded device.
If you want to look at variables—If your PC tool is able to read the ELF file generated by the linker, then it can find out a variable's location from the symbol table. Even better, read the DWARF debug data and know the variable's type as well. Then all you need is a "read-memory" protocol message on the embedded device to get the data, and the PC does the decoding and displaying.

How to debug a program without a debugger? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Interview question-
Often its pretty easier to debug a program once you have trouble with your code.You can put watches,breakpoints and etc.Life is much easier because of debugger.
But how to debug a program without a debugger?
One possible approach which I know is simply putting print statements in your code wherever you want to check for the problems.
Are there any other approaches other than this?
As its a general question, its not restricted to any specific language.So please share your thoughts on how you would have done it?
EDIT- While submitting your answer, please mention a useful resource (if you have any) about any concept. e.g. Logging
This will be lot helpful for those who don't know about it at all.(This includes me, in some cases :)
UPDATE: Michal Sznajderhas put a real "best" answer and also made it a community wiki.Really deserves lots of up votes.
Actually you have quite a lot of possibilities. Either with recompilation of source code or without.
With recompilation.
Additional logging. Either into program's logs or using system logging (eg. OutputDebugString or Events Log on Windows). Also use following steps:
Always include timestamp at least up to seconds resolution.
Consider adding thread-id in case of multithreaded apps.
Add some nice output of your structures
Do not print out enums with just %d. Use some ToString() or create some EnumToString() function (whatever suits your language)
... and beware: logging changes timings so in case of heavily multithreading you problems might disappear.
More details on this here.
Introduce more asserts
Unit tests
"Audio-visual" monitoring: if something happens do one of
use buzzer
play system sound
flash some LED by enabling hardware GPIO line (only in embedded scenarios)
Without recompilation
If your application uses network of any kind: Packet Sniffer or I will just choose for you: Wireshark
If you use database: monitor queries send to database and database itself.
Use virtual machines to test exactly the same OS/hardware setup as your system is running on.
Use some kind of system calls monitor. This includes
On Unix box strace or dtrace
On Windows tools from former Sysinternals tools like http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx, ProcessExplorer and alike
In case of Windows GUI stuff: check out Spy++ or for WPF Snoop (although second I didn't use)
Consider using some profiling tools for your platform. It will give you overview on thing happening in your app.
[Real hardcore] Hardware monitoring: use oscilloscope (aka O-Scope) to monitor signals on hardware lines
Source code debugging: you sit down with your source code and just pretend with piece of paper and pencil that you are computer. Its so called code analysis or "on-my-eyes" debugging
Source control debugging. Compare diffs of your code from time when "it" works and now. Bug might be somewhere there.
And some general tips in the end:
Do not forget about Text to Columns and Pivot Table in Excel. Together with some text tools (awk, grep or perl) give you incredible analysis pack. If you have more than 32K records consider using Access as data source.
Basics of Data Warehousing might help. With simple cube you may analyse tons of temporal data in just few minutes.
Dumping your application is worth mentioning. Either as a result of crash or just on regular basis
Always generate you debug symbols (even for release builds).
Almost last but not least: most mayor platforms has some sort of command line debugger always built in (even Windows!). With some tricks like conditional debugging and break-print-continue you can get pretty good result with obscure bugs
And really last but not least: use your brain and question everything.
In general debugging is like science: you do not create it you discover it. Quite often its like looking for a murderer in a criminal case. So buy yourself a hat and never give up.
First of all, what does debugging actually do? Advanced debuggers give you machine hooks to suspend execution, examine variables and potentially modify state of a running program. Most programs don't need all that to debug them. There are many approaches:
Tracing: implement some kind of logging mechanism, or use an existing one such as dtrace(). It usually worth it to implement some kind of printf-like function that can output generally formatted output into a system log. Then just throw state from key points in your program to this log. Believe it or not, in complex programs, this can be more useful than raw debugging with a real debugger. Logs help you know how you got into trouble, while a debugger that traps on a crash assumes you can reverse engineer how you got there from whatever state you are already in. For applications that you use other complex libraries that you don't own that crash in the middle of them, logs are often far more useful. But it requires a certain amount of discipline in writing your log messages.
Program/Library self-awareness: To solve very specific crash events, I often have implemented wrappers on system libraries such as malloc/free/realloc which extensions that can do things like walk memory, detect double frees, attempts to free non-allocated pointers, check for obvious buffer over-runs etc. Often you can do this sort of thing for your important internal data types as well -- typically you can make self-integrity checks for things like linked lists (they can't loop, and they can't point into la-la land.) Even for things like OS synchronization objects, often you only need to know which thread, or what file and line number (capturable by __FILE__, __LINE__) the last user of the synch object was to help you work out a race condition.
If you are insane like me, you could, in fact, implement your own mini-debugger inside of your own program. This is really only an option in a self-reflective programming language, or in languages like C with certain OS-hooks. When compiling C/C++ in Windows/DOS you can implement a "crash-hook" callback which is executed when any program fault is triggered. When you compile your program you can build a .map file to figure out what the relative addresses of all your public functions (so you can work out the loader initial offset by subtracting the address of main() from the address given in your .map file). So when a crash happens (even pressing ^C during a run, for example, so you can find your infinite loops) you can take the stack pointer and scan it for offsets within return addresses. You can usually look at your registers, and implement a simple console to let you examine all this. And voila, you have half of a real debugger implemented. Keep this going and you can reproduce the VxWorks' console debugging mechanism.
Another approach, is logical deduction. This is related to #1. Basically any crash or anomalous behavior in a program occurs when it stops behaving as expected. You need to have some feed back method of knowing when the program is behaving normally then abnormally. Your goal then is to find the exact conditions upon which your program goes from behaving correctly to incorrectly. With printf()/logs, or other feedback (such as enabling a device in an embedded system -- the PC has a speaker, but some motherboards also have a digital display for BIOS stage reporting; embedded systems will often have a COM port that you can use) you can deduce at least binary states of good and bad behavior with respect to the run state of your program through the instrumentation of your program.
A related method is logical deduction with respect to code versions. Often a program was working perfectly at one state, but some later version is not longer working. If you use good source control, and you enforce a "top of tree must always be working" philosophy amongst your programming team, then you can use a binary search to find the exact version of the code at which the failure occurs. You can use diffs then to deduce what code change exposes the error. If the diff is too large, then you have the task of trying to redo that code change in smaller steps where you can apply binary searching more effectively.
Just a couple suggestions:
1) Asserts. This should help you work out general expectations at different states of the program. As well familiarize yourself with the code
2) Unit tests. I have used these at times to dig into new code and test out APIs
One word: Logging.
Your program should write descriptive debug lines which include a timestamp to a log file based on a configurable debug level. Reading the resultant log files gives you information on what happened during the execution of the program. There are logging packages in every common programming language that make this a snap:
Java: log4j
.Net: NLog or log4net
Python: Python Logging
PHP: Pear Logging Framework
Ruby: Ruby Logger
C: log4c
I guess you just have to write fine-grain unit tests.
I also like to write a pretty-printer for my data structures.
I think the rest of the interview might go something like this...
Candidate: So you don't buy debuggers for your developers?
Interviewer: No, they have debuggers.
Candidate: So you are looking for programmers who, out of masochism or chest thumping hamartia, make things complicated on themselves even if they would be less productive?
Interviewer: No, I'm just trying to see if you know what you would do in a situation that will never happen.
Candidate: I suppose I'd add logging or print statements. Can I ask you a similar question?
Interviewer: Sure.
Candidate: How would you recruit a team of developers if you didn't have any appreciable interviewing skill to distinguish good prospects based on relevant information?
Peer review. You have been looking at the code for 8 hours and your brain is just showing you what you want to see in the code. A fresh pair of eyes can make all the difference.
Version control. Especially for large teams. If somebody changed something you rely on but did not tell you it is easy to find a specific change set that caused your trouble by rolling the changes back one by one.
On *nix systems, strace and/or dtrace can tell you an awful lot about the execution of your program and the libraries it uses.
Binary search in time is also a method: If you have your source code stored in a version-control repository, and you know that version 100 worked, but version 200 doesn't, try to see if version 150 works. If it does, the error must be between version 150 and 200, so find version 175 and see if it works... etc.
use println/log in code
use DB explorer to look at data in DB/files
write tests and put asserts in suspicious places
More generally, you can monitor side effects and output of the program, and trigger certain events in the program externally.
A Print statement isn't always appropriate. You might use other forms of output such as writing to the Event Log or a log file, writing to a TCP socket (I have a nice utility that can listen for that type of trace from my program), etc.
For programs that don't have a UI, you can trigger behavior you want to debug by using an external flag such as the existence of a file. You might have the program wait for the file to be created, then run through a behavior you're interested in while logging relevant events.
Another file's existence might trigger the program's internal state to be written to your logging mechanism.
like everyone else said:
Logging
Asserts
Extra Output
&
your favorite task manager or process
explorer
links here and here
Another thing I have not seen mentioned here that I have had to use quite a bit on embedded systems is serial terminals.
You can cannot a serial terminal to just about any type of device on the planet (I have even done it to embedded CPUs for hydraulics, generators, etc). Then you can write out to the serial port and see everything on the terminal.
You can get real fancy and even setup a thread that listens to the serial terminal and responds to commands. I have done this as well and implemented simple commands to dump a list, see internal variables, etc all from a simple 9600 baud RS-232 serial port!
Spy++ (and more recently Snoop for WPF) are tremendous for getting an insight into Windows UI bugs.
A nice read would be Delta Debugging from Andreas Zeller. It's like binary search for debugging

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Resources