I have a program which prints something that contains null bytes \0 and special characters like \x1f and newlines. For instance:
someprogram
#!/bin/bash
printf "ALICE\0BOB\x1fCHARLIE\n"
Given such a program, I want to read its output in such a way that all those special characters are captured in a shell variable output. So, if I run:
echo $output
because I'm not giving -e, I'd want the output to be:
ALICE\0BOB\x1fCHARLIE\n
How can this be achieved?
My first attempt was:
output=$(someprogram)
But I got this echoed output which doesn't have the special characters:
./myscript.sh: line 2: warning: command substitution: ignored null byte in input
ALICEBOBCHARLIE
I also tried to use read as follows:
output=""
while read -r
do
output="$output$REPLY"
done < <(someprogram)
Then I got rid of the warning but the output is still missing all special characters:
ALICEBOBCHARLIE
So how can I capture the output of someprogram in such a way that I have all the special characters in my resulting string?
EDIT: Note that it is possible to have such strings in bash:
$ x="ALICE\0BOB\x1fCHARLIE\n"
$ echo $x
ALICE\0BOB\x1fCHARLIE\n
So that shouldn't be the problem.
EDIT2: I'll reformulate the question a little bit now that I got an accepted answer and I understood things a little bit better. So, I just needed to be able to store the output of someprogram in some shell variable in such a way that I can print it to stdout without any changes in any special characters as if someprogram was just piped directly to stdout.
You just can't store zero byte in bash variables. It's impossible.
The usual solution is to convert the stream of bytes into hexadecimal. Then convert it back each time you want to do something with it.
$ x=$(printf "ALICE\0BOB\x1fCHARLIE\n" | xxd -p)
$ echo "$x"
414c49434500424f421f434841524c49450a
$ <<<"$x" xxd -p -r | hexdump -C
00000000 41 4c 49 43 45 00 42 4f 42 1f 43 48 41 52 4c 49 |ALICE.BOB.CHARLI|
00000010 45 0a |E.|
00000012
You can also write your own serialization and deserialization functions for the purpose.
Another idea I have is to for example read the data into an array by using zero byte as a separator (as any other byte is valid). This however will have problems with distinguishing the trailing zero byte:
$ readarray -d '' arr < <(printf "ALICE\0BOB\x1fCHARLIE\n")
$ printf "%s\0" "${arr[#]}" | hexdump -C
00000000 41 4c 49 43 45 00 42 4f 42 1f 43 48 41 52 4c 49 |ALICE.BOB.CHARLI|
00000010 45 0a 00 |E..|
# ^^ additional zero byte if input doesn't contain a trailing zero byte
00000013
I am new to shell script
I Have tried to multiply two hex numbers in shell script in the following manner.
initial= expr 0x10000 \* 0x22
echo $initial
While running the script,The following error is seen.
expr: non-numeric argument
Can someone point out what might the mistake?
No need to expr, use $(( )) just like this:
$ echo $((0x10000 * 0x22))
2228224
Or you can use bc like this, indicating input is hex (ibase) and desired output also in hex (obase) (as Adobe's deleted answer states):
$ echo "ibase=16; obase=16; 10000*22" | bc
09 11 05 16 20
$ echo "ibase=16; 10000*22" | bc
2228224
Is there any comprehensive list of characters that need to be escaped in Bash? Can it be checked just with sed?
In particular, I was checking whether % needs to be escaped or not. I tried
echo "h%h" | sed 's/%/i/g'
and worked fine, without escaping %. Does it mean % does not need to be escaped? Was this a good way to check the necessity?
And more general: are they the same characters to escape in shell and bash?
There are two easy and safe rules which work not only in sh but also bash.
1. Put the whole string in single quotes
This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.
'I'\''m a s#fe $tring which ends in newline
'
sed command: sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"
2. Escape every char with a backslash
This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with ""
\I\'\m\ \a\ \s\#\f\e\ \$\t\r\i\n\g\ \w\h\i\c\h\ \e\n\d\s\ \i\n\ \n\e\w\l\i\n\e"
"
sed command: sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.
2b. More readable version of 2
There's an easy safe set of characters, like [a-zA-Z0-9,._+:#%/-], which can be left unescaped to keep it more readable
I\'m\ a\ s#fe\ \$tring\ which\ ends\ in\ newline"
"
sed command: LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+#%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'.
Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.
Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.
(You can easily validate the rules by reading the POSIX spec for sh. For bash, check the reference manual linked by #AustinPhillips)
format that can be reused as shell input
Edit february 2021: bash ${var#Q}
Under bash, you could store your variable content with Parameter Expansion's # command for Parameter transformation:
${parameter#operator}
Parameter transformation. The expansion is either a transforma‐
tion of the value of parameter or information about parameter
itself, depending on the value of operator. Each operator is a
single letter:
Q The expansion is a string that is the value of parameter
quoted in a format that can be reused as input.
...
A The expansion is a string in the form of an assignment
statement or declare command that, if evaluated, will
recreate parameter with its attributes and value.
Sample:
$ var=$'Hello\nGood world.\n'
$ echo "$var"
Hello
Good world.
$ echo "${var#Q}"
$'Hello\nGood world.\n'
$ echo "${var#A}"
var=$'Hello\nGood world.\n'
Old answer
There is a special printf format directive (%q) built for this kind of request:
printf [-v var] format [arguments]
%q causes printf to output the corresponding argument
in a format that can be reused as shell input.
Some samples:
read foo
Hello world
printf "%q\n" "$foo"
Hello\ world
printf "%q\n" $'Hello world!\n'
$'Hello world!\n'
This could be used through variables too:
printf -v var "%q" "$foo
"
echo "$var"
$'Hello world\n'
Quick check with all (128) ascii bytes:
Note that all bytes from 128 to 255 have to be escaped.
for i in {0..127} ;do
printf -v var \\%o $i
printf -v var $var
printf -v res "%q" "$var"
esc=E
[ "$var" = "$res" ] && esc=-
printf "%02X %s %-7s\n" $i $esc "$res"
done |
column
This must render something like:
00 E '' 1A E $'\032' 34 - 4 4E - N 68 - h
01 E $'\001' 1B E $'\E' 35 - 5 4F - O 69 - i
02 E $'\002' 1C E $'\034' 36 - 6 50 - P 6A - j
03 E $'\003' 1D E $'\035' 37 - 7 51 - Q 6B - k
04 E $'\004' 1E E $'\036' 38 - 8 52 - R 6C - l
05 E $'\005' 1F E $'\037' 39 - 9 53 - S 6D - m
06 E $'\006' 20 E \ 3A - : 54 - T 6E - n
07 E $'\a' 21 E \! 3B E \; 55 - U 6F - o
08 E $'\b' 22 E \" 3C E \< 56 - V 70 - p
09 E $'\t' 23 E \# 3D - = 57 - W 71 - q
0A E $'\n' 24 E \$ 3E E \> 58 - X 72 - r
0B E $'\v' 25 - % 3F E \? 59 - Y 73 - s
0C E $'\f' 26 E \& 40 - # 5A - Z 74 - t
0D E $'\r' 27 E \' 41 - A 5B E \[ 75 - u
0E E $'\016' 28 E \( 42 - B 5C E \\ 76 - v
0F E $'\017' 29 E \) 43 - C 5D E \] 77 - w
10 E $'\020' 2A E \* 44 - D 5E E \^ 78 - x
11 E $'\021' 2B - + 45 - E 5F - _ 79 - y
12 E $'\022' 2C E \, 46 - F 60 E \` 7A - z
13 E $'\023' 2D - - 47 - G 61 - a 7B E \{
14 E $'\024' 2E - . 48 - H 62 - b 7C E \|
15 E $'\025' 2F - / 49 - I 63 - c 7D E \}
16 E $'\026' 30 - 0 4A - J 64 - d 7E E \~
17 E $'\027' 31 - 1 4B - K 65 - e 7F E $'\177'
18 E $'\030' 32 - 2 4C - L 66 - f
19 E $'\031' 33 - 3 4D - M 67 - g
Where first field is hexa value of byte, second contain E if character need to be escaped and third field show escaped presentation of character.
Why ,?
You could see some characters that don't always need to be escaped, like ,, } and {.
So not always but sometime:
echo test 1, 2, 3 and 4,5.
test 1, 2, 3 and 4,5.
or
echo test { 1, 2, 3 }
test { 1, 2, 3 }
but care:
echo test{1,2,3}
test1 test2 test3
echo test\ {1,2,3}
test 1 test 2 test 3
echo test\ {\ 1,\ 2,\ 3\ }
test 1 test 2 test 3
echo test\ {\ 1\,\ 2,\ 3\ }
test 1, 2 test 3
To save someone else from having to RTFM... in bash:
Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !.
...so if you escape those (and the quote itself, of course) you're probably okay.
If you take a more conservative 'when in doubt, escape it' approach, it should be possible to avoid getting instead characters with special meaning by not escaping identifier characters (i.e. ASCII letters, numbers, or '_'). It's very unlikely these would ever (i.e. in some weird POSIX-ish shell) have special meaning and thus need to be escaped.
Using the print '%q' technique, we can run a loop to find out which characters are special:
#!/bin/bash
special=$'`!##$%^&*()-_+={}|[]\\;\':",.<>?/ '
for ((i=0; i < ${#special}; i++)); do
char="${special:i:1}"
printf -v q_char '%q' "$char"
if [[ "$char" != "$q_char" ]]; then
printf 'Yes - character %s needs to be escaped\n' "$char"
else
printf 'No - character %s does not need to be escaped\n' "$char"
fi
done | sort
It gives this output:
No, character % does not need to be escaped
No, character + does not need to be escaped
No, character - does not need to be escaped
No, character . does not need to be escaped
No, character / does not need to be escaped
No, character : does not need to be escaped
No, character = does not need to be escaped
No, character # does not need to be escaped
No, character _ does not need to be escaped
Yes, character needs to be escaped
Yes, character ! needs to be escaped
Yes, character " needs to be escaped
Yes, character # needs to be escaped
Yes, character $ needs to be escaped
Yes, character & needs to be escaped
Yes, character ' needs to be escaped
Yes, character ( needs to be escaped
Yes, character ) needs to be escaped
Yes, character * needs to be escaped
Yes, character , needs to be escaped
Yes, character ; needs to be escaped
Yes, character < needs to be escaped
Yes, character > needs to be escaped
Yes, character ? needs to be escaped
Yes, character [ needs to be escaped
Yes, character \ needs to be escaped
Yes, character ] needs to be escaped
Yes, character ^ needs to be escaped
Yes, character ` needs to be escaped
Yes, character { needs to be escaped
Yes, character | needs to be escaped
Yes, character } needs to be escaped
Some of the results, like , look a little suspicious. Would be interesting to get #CharlesDuffy's inputs on this.
Characters that need escaping are different in Bourne or POSIX shell than Bash. Generally (very) Bash is a superset of those shells, so anything you escape in shell should be escaped in Bash.
A nice general rule would be "if in doubt, escape it". But escaping some characters gives them a special meaning, like \n. These are listed in the man bash pages under Quoting and echo.
Other than that, escape any character that is not alphanumeric, it is safer. I don't know of a single definitive list.
The man pages list them all somewhere, but not in one place. Learn the language, that is the way to be sure.
One that has caught me out is !. This is a special character (history expansion) in Bash (and csh) but not in Korn shell. Even echo "Hello world!" gives problems. Using single-quotes, as usual, removes the special meaning.
I presume that you're talking about bash strings. There are different types of strings which have a different set of requirements for escaping. eg. Single quotes strings are different from double quoted strings.
The best reference is the Quoting section of the bash manual.
It explains which characters needs escaping. Note that some characters may need escaping depending on which options are enabled such as history expansion.
I noticed that bash automatically escapes some characters when using auto-complete.
For example, if you have a directory named dir:A, bash will auto-complete to dir\:A
Using this, I runned some experiments using characters of the ASCII table and derived the following lists:
Characters that bash escapes on auto-complete: (includes space)
!"$&'()*,:;<=>?#[\]^`{|}
Characters that bash does not escape:
#%+-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
(I excluded /, as it cannot be used in directory names)
I have a lot of this kind of string and I want to find a command to convert it in ascii, I tried with echo -e and od, but it did not work.
0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2
This worked for me.
$ echo 54657374696e672031203220330 | xxd -r -p
Testing 1 2 3$
-r tells it to convert hex to ascii as opposed to its normal mode of doing the opposite
-p tells it to use a plain format.
This code will convert the text 0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2 into a stream of 11 bytes with equivalent values. These bytes will be written to standard out.
TESTDATA=$(echo '0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2' | tr '.' ' ')
for c in $TESTDATA; do
echo $c | xxd -r
done
As others have pointed out, this will not result in a printable ASCII string for the simple reason that the specified bytes are not ASCII. You need post more information about how you obtained this string for us to help you with that.
How it works: xxd -r translates hexadecimal data to binary (like a reverse hexdump). xxd requires that each line start off with the index number of the first character on the line (run hexdump on something and see how each line starts off with an index number). In our case we want that number to always be zero, since each execution only has one line. As luck would have it, our data already has zeros before every character as part of the 0x notation. The lower case x is ignored by xxd, so all we have to do is pipe each 0xhh character to xxd and let it do the work.
The tr translates periods to spaces so that for will split it up correctly.
You can use xxd:
$cat hex.txt
68 65 6c 6c 6f
$cat hex.txt | xxd -r -p
hello
You can use something like this.
$ cat test_file.txt
54 68 69 73 20 69 73 20 74 65 78 74 20 64 61 74 61 2e 0a 4f 6e 65 20 6d 6f 72 65 20 6c 69 6e 65 20 6f 66 20 74 65 73 74 20 64 61 74 61 2e
$ for c in `cat test_file.txt`; do printf "\x$c"; done;
This is text data.
One more line of test data.
The values you provided are UTF-8 values. When set, the array of:
declare -a ARR=(0xA7 0x9B 0x46 0x8D 0x1E 0x52 0xA7 0x9B 0x7B 0x31 0xD2)
Will be parsed to print the plaintext characters of each value.
for ((n=0; n < ${#ARR[*]}; n++)); do echo -e "\u${ARR[$n]//0x/}"; done
And the output will yield a few printable characters and some non-printable characters as shown here:
For converting hex values to plaintext using the echo command:
echo -e "\x<hex value here>"
And for converting UTF-8 values to plaintext using the echo command:
echo -e "\u<UTF-8 value here>"
And then for converting octal to plaintext using the echo command:
echo -e "\0<octal value here>"
When you have encoding values you aren't familiar with, take the time to check out the ranges in the common encoding schemes to determine what encoding a value belongs to. Then conversion from there is a snap.
The echo -e must have been failing for you because of wrong escaping.
The following code works fine for me on a similar output from your_program with arguments:
echo -e $(your_program with arguments | sed -e 's/0x\(..\)\.\?/\\x\1/g')
Please note however that your original hexstring consists of non-printable characters.
Make a script like this:
bash
#!/bin/bash
echo $((0x$1)).$((0x$2)).$((0x$3)).$((0x$4))
Example:
sh converthextoip.sh c0 a8 00 0b
Result:
192.168.0.11
I have a (java) program that prints a line of hex numbers to stdout every 5ish seconds, until the program is terminated by the user.
I would like to redirect that output to a bash script so I could convert each of those hex numbers independently to decimal, then print the parsed line to stdout.
I tried using myProgram | myScript but that did the piping before any lines were printed, then didn't keep listening to stdout. I then tried myProgram > myScript, and that just overwrote the script.
Ideas?
Edit: adding output from the runs, (sorry for the poor formatting, I couldn't get it all in the code highlighting) so the middle of the output is not highighted).
Here is the script
#!/bin/bash
echo $0
echo $#
echo $1
Here is how my program runs while it goes straight to stdout this would continue forever if I didn't terminate it.
mmmm#mmmm:~/mmmm/mmmm/mmmmm$ java net.tinyos.tools.Listen -comm
serial#/dev/ttyUSB0:micaz
serial#/dev/ttyUSB0:57600: resynchronising
00 FF FF 00 02 04 22 93 00 02 02 C9
00 FF FF 00 03 04 22 93 00 03 03 0E
00 FF FF 00 02 04 22 93 00 03 03 0E
00 FF FF 00 02 04 22 93 00 02 02 C9
^Z
[5]+ Stopped java net.tinyos.tools.Listen -comm
serial#/dev/ttyUSB0:micaz
Here is where I try to pipe it to my script (which i have set to print the number of command line arguments and the first argument. It just freeze after this...
mmmm#mmmm:~/mmmm/mmmm/mmmmm$$ java net.tinyos.tools.Listen -comm serial#/dev/ttyUSB0:micaz | ./parser.sh
./parser.sh
0
serial#/dev/ttyUSB0:57600: resynchronising
Diagnosis
When you use this script like this:
java javaprog | myScript
and myScript contains:
#!/bin/bash
echo $0
echo $#
echo $1
Then the output from the script will be its name (myScript) from the echo $0, the number of arguments it was passed (0) from the echo $#, and the first argument (an empty line is echoed) from the echo $1. The script then exits (successfully). The issue is nothing to do with buffering; it is all to do with the script not reading anything from its standard input. Even a trivial modification would be an improvement:
#!/bin/bash
while read data; do echo $data; done
That's a slower form of cat, except that it normalizes random sequences of spaces and tabs into single spaces, stripping leading and trailing spaces off the line. It would at least demonstrate the script processing the output from the Java program.
Trying awk
To do what you're after, you should probably replace that with an awk program or something similar. This is a first draft, but it stands some chance of working:
awk '{for(i = 1; i <= NF; i++) { x = "0x" $i + 0; printf(" %d", x); printf "\n";}'
This says 'for each line (because there is no pattern before the open brace)', do 'for each of the fields 1..NF, convert the field into an explicit hex string with the 0x prefix and adding 0, then print the value as a decimal number (trusting awk to convert a string such as '0xC9' to a number).
Using Perl
Unfortunately, a little testing shows that this does not work; the problem is getting a value other than 0 for x. So, ... time to fall back on Perl in awk-emulation mode:
$ echo '00 C9 28 13 A0 FF 01' |
> perl -na -e 'for ($i = 0; $i < scalar(#F); $i++) { printf(" %d", hex $F[$i]); }
> printf "\n";'
0 201 40 19 160 255 1
$
That works - it's even fairly easy to understand. The -n option means 'read each line of data and execute the commands in the script on each line (but do not print $_ at the end)'. The -a option combined with either -n (as here, or -p which is like -n except it prints $_ automatically) means 'automatically split the input into the array #F. The script then processes each element of #F in each line (rather verbosely), using the hex function to convert the string in $F[$i] to a number and then printing that number with printf(). The verbosity can be reduced (this is Perl: There's More Than One Way To Do It, or TMTOWTDI - tim-toady) with:
$ echo '00 C9 28 13 A0 FF 01' |
> perl -na -e 'foreach my $i (#F) { printf(" %d", hex $i); } printf "\n";'
0 201 40 19 160 255 1
$
Same result, less code. There might be more abbreviated techniques; that's compact enough without being wholly illegible.
\1. check if your system has the unbuffer command installed
which unbuffer
(typically systems that are using bash are Linux-based, and have unbuffer available)
\2. If yes,
unbuffer myProgram | myScript
edit
As you have shown us your shell script as
#!/bin/bash
echo $0
echo $#
echo $1
Please recall that the values you are echoing, $0, $#, $1 are positional parameters to bash related to the command line arguments. Typically options or filenames for processing.
To print the whole line, the # of fields on the line, and the value of the first line, awk is a perfect solution to this problem.
Try changing your script to
cat myScript.awk
#!/bin/awk -f
{
print $0
print $NF
print $1
}
chmod 755 myScript.awk
Hmm.. Seeing ^Z to stop input tells me you are using Windows or are you using bash under Cygwin?
I hope this helps.
This might be a buffering issue. The GNU Coreutils come with a tool called stdbuf. If it is available on your system, try running:
stdbuf -o0 program | stdbuf -i0 script