This question already has answers here:
How do Linux binary installers (.bin, .sh) work?
(5 answers)
Closed 1 year ago.
I noticed that the 64-Bit Command Line Anaconda Installer for macOS is a large 400+ MB Bash/Bourne shell script.
When I tried to read it, I noticed that its first 555 lines are readable text, but the following part of the script is in the binary format, probably encrypted.
See https://www.anaconda.com/products/individual and https://repo.anaconda.com/archive/Anaconda3-2021.05-MacOSX-x86_64.sh.
I noticed similar scripts, such as Tcl scripts associated with electronic design automation software.
How do we transform source code files, such as scripts (shell scripts, or Tcl/Perl/Python/Ruby scripts, or C++/Java/Scala/Haskell/Lisp source code), into partially readable text and binary otherwise?
Can we just merge two parts, one in ASCII/text format, and the other in binary format?
That said, how do we obtain the binary executable for scripts, such as shell scripts or Tcl/Perl/Python/Ruby scripts?
I know how to obtain binary executables for C++ and C, and FORTRAN.
Other than using a platform-specific (in terms of operating system and hardware configuration, such as processor type or instruction set architecture) compiler to compile scripts into binary executables, and concatenating them text files with the binary files, how else can I do it?
Are there software applications that do this? What techniques, in terms of algorithms, do these software applications use?
Thank you so much, and have a good day!
To answer one of your questions, here is a helpful guide to embedding a binary file into a shell/bash script:
https://www.xmodulo.com/embed-binary-file-bash-script.html
(code example below is taken from this link)
The body of the shell script needs to be commands to isolate & execute the binary data contained within.
The trick is to place an "exit" command at the end of the written script followed by a unique delimiter line (which is "__PAYLOAD_BEGINS__" in the below example):
#!/bin/bash
# line number where payload starts
PAYLOAD_LINE=$(awk '/^__PAYLOAD_BEGINS__/ { print NR + 1; exit 0; }' $0)
# directory where a binary executable is to be saved
WORK_DIR=/tmp
# name of an embedded binary executable
EXE_NAME=dummy_executable
# extract the embedded binary executable
tail -n +${PAYLOAD_LINE} $0 | base64 -d | cat > ${WORK_DIR}/${EXE_NAME}
chmod +x ${WORK_DIR}/${EXE_NAME}
# run the executable as needed
${WORK_DIR}/${EXE_NAME}
exit 0
__PAYLOAD_BEGINS__
Then you can append the script with base64-encoded binary data:
$ base64 dummy_executable >> script.sh
You could also append the binary data without base64-encoding, but this is not recommended as you will not be able to edit the script again after doing so.
Shell Scripts with Payload
In Anaconda3....sh there is nothing encrypted. There are multiple binary files appended to the end of the script. Creating such a file yourself is trivial. Open a terminal and run
cat script.sh file1 file2 ... > script-with-payload.sh
The only tricky part is to write a script.sh that can handle the payload.
For starters, write exit at the end of your script.sh, so that the shell does not try to interpret the binary part as shell commands when executing script-with-payload.sh.
Then, somewhere inside script.sh use something like tail, sed, or dd to extract the binary data at the end of the script.
For a concrete example see Combine a shell script and a zip file into a single executable for deployment or Self-extracting script in sh shell or How do Linux binary installers (.bin, .sh) work?.
In Anaconda3....sh they use dd commands to extract a Mach-O 64-bit x86_64 executable (14'807'207 bytes) and a tar.bz2 file (438'910'836 bytes). Comments in the script point out that the shell script was generated by shar.py.
Remaining Questions
How do we transform source code files, such as [...] C++/Java/Scala/Haskell/Lisp [...] into partially readable text and binary otherwise?
C++, Java, and so on have to be compiled to be run, so distributing the uncompiled text file with an embedded payload doesn't really make sense.
how do we obtain the binary executable for scripts, such as shell scripts or Tcl/Perl/Python/Ruby scripts?
This is an entirely different question and has to be answered for each scripting language independently. The general answer is, you don't. Scripting languages are not meant to be compiled.
Are there software applications that do this?
Yes, by searching for bash payload or bash selfextracting you can find quite a few tools. However, most of them seem rather hacked together. The most officially are is GNU sharutils and makeself.
What techniques, in terms of algorithms, do these software applications use?
The principle is always the same: concat a script and some payload, then let the script extract the payload from itself. There is no "algorithm" involved.
Related
This question already has answers here:
How to obtain the first letter in a Bash variable?
(7 answers)
Closed 3 years ago.
I am trying to my a custom terminal command. I just learned I am supposed to do it using the Unix script? I don't really know much of what that is and am still trying to figure it out. What I do know is that $1 is an arg is it possible to make it a variable and then get the first letter like you could in python?
EX:
str = 'happy'
str[0] = 'h'
You're asking a few different things here.
I am trying to my a custom terminal command.
That could mean a few different things, but the most obvious meaning is that you want to add an executable to your path so that when you type it at the terminal, it runs just like any other executable on your system. This requires just a few things:
the executable permission must be set.
the file must specify how it can be executed. For interpreted programs such as bash scripts or python scripts, you can do so by beginning the file with a "shebang line" that specifies the interpreter for the file.
the file must be in one of the locations specified by your $PATH.
I just learned I am supposed to do it using the Unix script?
there's no such thing as a "unix script", but what you seem to be referring to is a "shell script". Though these are commonly associated with unix, they're no more inherently a unix script than any other language. A shell, such as bash, sh, or any other, is just an interpreted language that is designed so that it is convenient to be used interactively by a human as well as being programmatically executed as part of a saved file.
I don't really know much of what that is and am still trying to figure it out.
Let's get into some specifics.
First I edit a file called 'hello-world' to contain:
#!/bin/bash
echo "Hello, world!"
Note that this filename has no "extension". Though heuristics based on file extension are sometimes used (espeically in windows) to determine a file type, unix typically sees a file "extension" as part of the arbitrary file name. The thing that makes this a potentially executable bash script is the specification of that interpreter on the shebang line.
We can run our script right now from bash, just as we could if we wrote a python script.
$ bash hello-world
hello, world!
To make the bash implicit, we mark the file as executable. This enables the linux operating system to consult the beginning "magic bytes" of the file to determine how to run it. Thes beginning bytes might signify an ELF file (a compiled executable, written in eg C, C++, or go). Or, it might be #! which just so happens means , "read the rest of this first line to determine the command to run, and pass the rest of this file into that command to be interpreted.
$ chmod +x hello-world
ls -l will show us the "permissions" on the file (more accurately called the "file mode", hence chmod rather than chperm) . The x stands for executable, so we have enabled the use of the leading bytes to determine method of execution. Remember, the first two bytes of this file, and the rest of that first line, then specify that this file should be "run through bash" so to speak.
$ ls -l hello-world
-rwxr-xr-x 1 danfarrell staff 33 Dec 27 20:02 hello-world
Now we can run the file from the current directory:
$ ./hello-world
hello, world!
At this point, the only difference between this command and any other on the system, is that you have to specify its location. That's because my current directory is not in the system path. In short, the path (accessible in a unix shell via the $PATH variable) specifies an ordered list of locations that should be searched for a specified command whose location is not otherwise specified.
For example, there's a very common program called whoami. I can run it directly from my terminal without specifying a location of the executable:
$ whoami
danfarrell
This is because there's a location in my $PATH in which the shell was able to find that command. Let's take a closer look. First, here's my path:
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin
And there's also a convenient program called whereis which can help show which path elements supply a named executable:
$ whereis whoami
/usr/bin/whoami
Sure enough, whoami is in one of the elements of the $PATH. (Actually I shared a simplified $PATH. Yours might be somewhat longer).
Finally, then, we can get to the last thing. If I put hello-world in one of the $PATH elements, I will be able to invoke it without a path. There are two ways to do this: we can move the executable to a location specified in the path, or we can add a new location to the path. For simplicity's sake I'll choose the first of these.
$ sudo cp hello-world /usr/local/bin/
Password:
I needed to use sudo to write to /usr/local/bin because it's not accessible as my user directly - that's quite standard.
Finally, I've achieved the goal of being able to run my very important program from any location, without specifying the executable's location.
$ hello-world
hello, world!
$ which hello-world
/usr/local/bin/hello-world
It works! I've created what might be described as a "custom terminal command".
What I do know is that $1 is an arg is it possible to make it a variable and then get the first letter like you could in python?
Well, one option would be to simply write the custom terminal command in python. If python is available,
$ which python
/usr/bin/python
You can specify it in a shebang just like a shell can be:
#!/usr/bin/env python
print("hello, world!"[0])
$ hello-world
h
it works!
Okay, confession time. I actually used #!/usr/bin/env python, not /usr/bin/python. env helps find the correct python to use in the user's environment, rather than hard coding one particular python. If you've been using python during the very long running python 2 to python 3 migration, you can no doubt understand why I"m reticent to hard code a python executable in my program.
It's certainly possible to get the first letter of a string in a bash script. But it's also very possible to write a custom command in a program other than shell. Python is an excellent choice for string manipulation, if you know it. I often use python for shell one-liners that need to interact with json, a format that doesn't lend itself well to standard unix tool stream editing.
Anyway, at the expense of incurring SO community's ire by reanswering an "already answered" question, I'll include a version in shell (Credit goes to David C Rankin)
#!/bin/bash
echo "${1:0:1}"
$ hello-world hiworld
h
Is there a way to see the original code of a executable sh script. (I am very new to Linux and trying to understand what things do and such.)
If you know how I need very clear step by step process so I can just type i the commands and run them.
Thanks for your help. Trying to learn (Windows man for 25 years here)
A shell script specifically can be seen in the original text form by simply printing the contents of the file:
cat disk-space.sh.x
Several caveats:
If you mean an executable rather than a script the situation is different. Scripts are read by an interpreter at runtime, which then executes it line by line. Executables may be either scripts or ELF binaries. The latter have been transformed from the original source code to a machine readable form which is very much harder to read for humans.
The extension of the file (.sh.x or .x) does not control whether the file contents are executed as a binary or script.
If the file really is a script it may have been obfuscated, meaning that the source code on your system has deliberately been changed to make the resulting file hard to read.
I have no idea how to do that, so I come here for help :) Here is what I'd need. I need to parse some configuration files or bash/sh scripts on a Red Hat Linux system, and look for the paths to the files/commands/scripts meant to be executed by them. The configuration files can have different syntax or be using different languages.
Here are the files I have to look at:
Config scripts:
/etc/inittab
/var/spool/cron/root
/var/spool/cron/tabs/root
/etc/crontab
/etc/xinetd.conf
Files located under /etc/cron.d/* recursively
Bash / Sh scripts:
Files located under /etc/init.d/* or /etc/rc.d/* recursively. These folders contain only shell scripts so maybe all the other files listed above need separate treatment.
Now here's the challenges that I can think of:
The paths within the files may be absolute or relatives ;
The paths within the files may be at the beginning of lines or preceded by a character such as space, colon or semicolon ;
File paths expressed as arguments to commands/scripts must be ignored ;
Paths to directories must be ignored ;
Shell functions or built-in commands must be ignored ;
Some examples (extracted from /etc/init.d/avahi-daemon):
if [ -s /etc/localtime ]; then
cp -fp /etc/localtime /etc/avahi/etc >/dev/null 2>&1
-> Only /bin/cp and /bin/[ must be returned in the snippet above (its the only commands actually executed)
AVAHI_BIN=/usr/sbin/avahi-daemon
$AVAHI_BIN -r
-> /usr/sbin/avahi-daemon must be returned, but only because the variable is called after.
Note that I do not have access to the actual filesystem, I just have a copy of the files to parse.
After writing this up, I realize how complicated it is and unlikely to have a 100% working solution... But if you like programming challenges :)
The good part is I can use any scripting language: bash/sh/grep/sed/awk, php, python, perl, ruby or a combination of these..
I tried to start writing up in PHP but I am struggling to get coherent results.
Thanks!
The language you use to implement this doesn't matter. What matters is that the problem is undecidable, because it is equivalent to the halting problem.
Just as we know that it is impossible to determine if a program will halt, it is impossible to know if a program will call another program. For example, you may think your script will invoke X then Z, but if X never returns, Z will never be invoked. Also, you may not notice that your script invokes Y, because the string Y may be determined dynamically and never actually appear in the program text.
There are other problems which may stymie you along the way, too, such as:
python -c 'import subprocess; subprocess.call("ls")'
Now you need not only a complete parser for Bash, but also for Python. Not to mention solve the halting problem in Python.
In other words, what you want is not possible. To make it feasible you would have to significantly reduce the scope of the problem, e.g. "Find everything starting with /usr/bin or /bin that isn't in a comment". And it's unclear how useful that would be.
I'm trying to create a shell script via cygwin that will automatically build an executable and run it. It's a very simple format of
#!/bin/bash
gcc test.c -o hello
./hello.exe
When I enter the 2nd and 3rd lines separately, everything works normally. However, if I save those 3 lines into a .sh file, the resulting .exe built has some extra character added in that will always throw off the last line.
helloļ.exe
I can't even replicate the file name because no tool, including the character map/MS word/other ASCII tools online will give me any result. Some online tool gave me the ASCII result , but as far as I can tell that doesn't correspond to anything meaningful. How can I avoid this problem in my shell script?
Very likely you have Windows linefeeds in the .sh file. Make sure you have Unix linefeeds.
I have a bunch of scripts (which can't be modified) written on Windows. Windows allows relative paths in its #! commands. We are trying to run these scripts on Unix but Bash only seems to respect absolute paths in its #! directives. I've looked around but haven't been able to locate an option in Bash or a program designed to replace and interpreter name. Is it possible to override that functionality -- perhaps even by using a different shell?
Typically you can just specify the binary to execute the script, which will cause the #! to be ignored. So, if you have a Python script that looks like:
#!..\bin\python2.6
# code would be here.
On Unix/Linux you can just say:
prompt$ python2.6 <scriptfile>
And it'll execute using the command line binary. I view the hashbang line as one which asks the operating system to use the binary specified on the line, but you can override it by not executing the script as a normal executable.
Worst case you could write some wrapper scripts that would explicitly tell the interpreter to execute the code in the script file for all the platforms that you'd be using.