Is it possible to capture the bitstream of post-interpreted code? (pre-execution) eg. speedup calls I make often - bash

I've wondered this many times and in many cases, and I like to learn so general or close-but-more needed answers are acceptable to me.
I'll get specific, to help explain the question. Please remember that this question is more about accelerating common interpreted language calls (yes, exactly the same arguments), than it is about the specific programs I'm calling in this case.
Here we go:
Using i3WM I use i3lock-fancy to lock my workspace with a key-combo mapped to the command:
i3lock-fancy -p -f /usr/share/fonts/fantasque_mono.ttf
So here is why I think this is possible, though my google-fu has failed me:
i3lock-fancy is a bash script, and bash is an interpreted language
each time I run the command I call it with the same arguments
Theoretically the interpreter is spitting out the same bitstream to be executed, right?
Please don't complain about portability, I understand it, the captured bitstream, would not be
For visual people:
When I call the above command > bash interpreter converts bash-code to byte-code > CPU executes byte-code
I want to:
execute command > bash interpreter converts to byte-code > save to file
so that I can effectively skip interpretation (since it's EXACTLY the same every time):
call file > CPU executes byte-code
What I tried:
Looking around on SO before asking the question lead me shc which is similar in some ways to what I'm asking for.
But this is not what shc is for (thanks #stefan)
is there a way to do this which is more like what I've described?
Simply put, is there a way to interpret bash, and save the result without actually running it?

Related

What is the best way to determine all commands that would be executed by a shell script?

I'd like to be able to parse a shell script and return the set of commands (excluding shell keywords like if and for) that it could possibly execute, including all commands fed by pipes and all commands included in backticks.
What would be an appropriate way to approach this? I'm thinking regular expressions but it may not be sufficient. But I don't know enough parsing theory to figure this out properly.
Edit: Thanks for the responses so far. I see how external input would be a problem in determining this. How about if we exclude this from analysis, would it be a reasonable task in this case?
You can't, or put differently: Every command could be potentially executed.
Here's an example script.
$(cat somerandomfile)
What will it do? The answer is dependent on what is inside somerandomfile. To determine the set of potentially executed commands, you'd have to evalute the whole environment (which changes basically with every clock tick).
Let's assume that you have program X that answers the question that shell script Y uses command Z. Then we can construct shell script Y' that executes any program and after it finishes it calls command Z. So the hypothetical program X, that you ask for, solves the Halting Problem which is undecidable.

Debugging a program without source code (Unix / GDB)

This is homework. Tips only, no exact answers please.
I have a compiled program (no source code) that takes in command line arguments. There is a correct sequence of a given number of command line arguments that will make the program print out "Success." Given the wrong arguments it will print out "Failure."
One thing that is confusing me is that the instructions mention two system tools (doesn't name them) which will help in figuring out the correct arguments. The only tool I'm familiar with (unless I'm overlooking something) is GDB so I believe I am missing a critical component of this challenge.
The challenge is to figure out the correct arguments. So far I've run the program in GDB and set a breakpoint at main but I really don't know where to go from there. Any pro tips?
Are you sure you have to debug it? It would be easier to disassemble it. When you disassemble it look for cmp
There exists not only tools to decompile X86 binaries to Assembler code listings, but also some which attempt to show a more high level or readable listing. Try googling and see what you find. I'd be specific, but then, that would be counterproductive if your job is to learn some reverse engineering skills.
It is possible that the code is something like this: If Arg(1)='FOO' then print "Success". So you might not need to disassemble at all. Instead you only might need to find a tool which dumps out all strings in the executable that look like sequences of ASCII characters. If the sequence you are supposed to input is not in the set of characters easily input from the keyboard, there exist many utilities that will do this. If the program has been very carefully constructed, the author won't have left "FOO" if that was the "password" in plain sight, but will have tried to obscure it somewhat.
Personally I would start with an ltrace of the program with any arbitrary set of arguments. I'd then use the strings command and guess from that what some of the hidden argument literals might be. (Let's assume, for the moment, that the professor hasn't encrypted or obfuscated the strings and that they appear in the binary as literals). Then try again with one or two (or the requisite number, if number).
If you're lucky the program was compiled and provided to you without running strip. In that case you might have the symbol table to help. Then you could try single stepping through the program (read the gdb manuals). It might be tedious but there are ways to set a breakpoint and tell the debugger to run through some function call (such as any from the standard libraries) and stop upon return. Doing this repeatedly (identify where it's calling into standard or external libraries, set a breakpoint for the next instruction after the return, let gdb run the process through the call, and then inspect what the code is doing besides that.
Coupled with the ltrace it should be fairly easy to see the sequencing of the strcmp() (or similar) calls. As you see the string against which your input is being compared you can break out of the whole process and re-invoke the gdb and the program with that one argument, trace through 'til the next one and so on. Or you might learn some more advanced gdb tricks and actually modify your argument vector and restart main() from scratch.
It actually sounds like fun and I might have my wife whip up a simple binary for me to try this on. It might also create a little program to generate binaries of this sort. I'm thinking of a little #INCLUDE in the sources which provides the "passphrase" of arguments, and a make file that selects three to five words from /usr/dict/words, generates that #INCLUDE file from a template, then compiles the binary using that sequence.

BASH shell process control - any other examples of controlling/scheduling work

I've inherited a medium sized project in which the main (batch) program is fed work through a large set of shell scripts that do a lot of process control (waiting for process to complete, sleeping, checking for conditions, etc) [ and reprocessed through perl scripts ]
Are there other examples of process control by shell scripts ? I would like to see what other people have done as a comparison. (as i'm not really fond of the 6,668 line shell script)
It may lead to that the current program works and doesn't need to be messed with or for maintenance reasons - it's too cumbersome and doing it another way will be easier to maintain, but I need other examples.
To reduce the "generality" of the question here's an example of what I'm looking for: procsup
Inquisitor project relies on process control from shell scripts extensively. You might want to see it's directory with main function set or directory with tests (i.e. slave processes) that it runs.
This is quite general question, and therefore giving specific answers may be a little bit difficult. (And you wont be happy with 5000 lines long example.) Most probably architecture of your application is faulty, and requires rather complete rework.
As you probably already know, process control with bash is pretty simple:
./test_script.sh &
test_script_pid=$!
wait $test_script_pid # waits until it's done
./test_script2.sh
echo $? # Prints return code of previous command
You can do same things with for example Python subprocess (or with Perl, obviously). If you have complex architecture with large number of different programs, then process is obviously non-trivial.
That is an awfully bug shell script. Have you considered refactoring it?
From the sound of it, there may be a lot of instances where you could replace several lines of code with a call to a shell function. If you can simplify the code in this way, then it will be easier to see where there are errors in the logic.
I've used this tactic successfully with a humongous PERL script and it turned out to have some serious logic errors and to be a security risk because it had embedded passwords that were obfuscated in an easily reversible way. The passwords that were exposed could have been used by persons unknown (well, a disgruntled employee) to shut down an entire global network.
Some managers were leaning towards making a security exception because this script was so important, but when the logic error was explained and it was clear that this script was providing incorrect data, it was decided that no data was better than dirty data. The guy who wrote that script taught himself programming with a PERL book and the writing of the script.

How to bundle bash completion with a program and have it work in the current shell?

I sweated over the question above. The answer I'm going to supply took me a while to piece together, but it still seems hopelessly primitive and hacky compared to what one could do were completion to be redesigned to be less staticky. I'm almost afraid to ask if there's some good reason that completion logic seems to be completely divorced from the program it's completing for.
I wrote a command line library (can be seen in scala trunk) which lets you flip a switch to have a "--bash" option. If you run
./program --bash
It calculates the completion file, writes it out to a tempfile, and echoes
. /path/to/temp/file
to the console. The result is that you can use backticks like so:
`./program --bash`
and you will have completion for "program" in the current shell since it will source the tempfile.
For a concrete example: check out scala trunk and run test/partest.

Pitfalls of using shell scripts to wrap a program?

Consider I have a program that needs an environment set. It is in Perl and I want to modify the environment (to search for libraries a special spot).
Every time I mess with the the standard way to do things in UNIX I pay a heavy price and I pay a penalty in flexibility.
I know that by using a simple shell script I will inject an additional process into the process tree. Any process accessing its own process tree might be thrown for a little bit of a loop.
Anything recursive to a nontrivial way would need to defend against multiple expansions of the environment.
Anything resembling being in a pipe of programs (or closing and opening STDIN, STDOUT, or STDERR) is my biggest area of concern.
What am I doing to myself?
What am I doing to myself?
Getting yourself all het up over nothing?
Wrapping a program in a shell script in order to set up the environment is actually quite standard and the risk is pretty minimal unless you're trying to do something really weird.
If you're really concerned about having one more process around — and UNIX processes are very cheap, by design — then use the exec keyword, which instead of forking a new process, simply exec's a new executable in place of the current one. So, where you might have had
#!/bin/bash -
FOO=hello
PATH=/my/special/path:${PATH}
perl myprog.pl
You'd just say
#!/bin/bash -
FOO=hello
PATH=/my/special/path:${PATH}
exec perl myprog.pl
and the spare process goes away.
This trick, however, is almost never worth the bother; the one counter-example is that if you can't change your default shell, it's useful to say
$ exec zsh
in place of just running the shell, because then you get the expected behavior for process control and so forth.

Resources