Assign string containing null-character (\0) to a variable in Bash - bash

While trying to process a list of file-/foldernames correctly (see my other questions) through the use of a NULL-character as a delimiter I stumbled over a strange behaviour of Bash that I don't understand:
When assigning a string containing one or more NULL-character to a variable, the NULL-characters are lost / ignored / not stored.
For example,
echo -ne "n\0m\0k" | od -c # -> 0000000 n \0 m \0 k
But:
VAR1=`echo -ne "n\0m\0k"`
echo -ne "$VAR1" | od -c # -> 0000000 n m k
This means that I would need to write that string to a file (for example, in /tmp) and read it back from there if piping directly is not desired or feasible.
When executing these scripts in Z shell (zsh) the strings containing \0 are preserved in both cases, but sadly I can't assume that zsh is present in the systems running my script while Bash should be.
How can strings containing \0 chars be stored or handled efficiently without losing any (meta-) characters?

In Bash, you can't store the NULL-character in a variable.
You may, however, store a plain hex dump of the data (and later reverse this operation again) by using the xxd command.
VAR1=`echo -ne "n\0m\0k" | xxd -p | tr -d '\n'`
echo -ne "$VAR1" | xxd -r -p | od -c # -> 0000000 n \0 m \0 k

As others have already stated, you can't store/use NUL char:
in a variable
in an argument of the command line.
However, you can handle any binary data (including NUL char):
in pipes
in files
So to answer your last question:
can anybody give me a hint how strings containing \0 chars can be
stored or handled efficiently without losing any (meta-) characters?
You can use files or pipes to store and handle efficiently any string with any meta-characters.
If you plan to handle data, you should note additionally that:
Only the NUL char will be eaten by variable and argument of the command line, you can check this.
Be wary that command substitution (as $(command..) or `command..`) has an additional twist above being a variable as it'll eat your ending new lines.
Bypassing limitations
If you want to use variables, then you must get rid of the NUL char by encoding it, and various other solutions here give clever ways to do that (an obvious way is to use for example base64 encoding/decoding).
If you are concerned by memory or speed, you'll probably want to use a minimal parser and only quote NUL character (and the quoting char). In this case this would help you:
quote() { sed 's/\\/\\\\/g;s/\x0/\\x00/g'; }
Then, you can secure your data before storing them in variables and
command line argument by piping your sensitive data into quote, which will output a safe data stream without NUL chars. You can get back
the original string (with NUL chars) by using echo -en "$var_quoted" which will send the correct string on the standard output.
Example:
## Our example output generator, with NUL chars
ascii_table() { echo -en "$(echo '\'0{0..3}{0..7}{0..7} | tr -d " ")"; }
## store
myvar_quoted=$(ascii_table | quote)
## use
echo -en "$myvar_quoted"
Note: use | hd to get a clean view of your data in hexadecimal and
check that you didn't loose any NUL chars.
Changing tools
Remember you can go pretty far with pipes without using variables nor argument in command line, don't forget for instance the <(command ...) construct that will create a named pipe (sort of a temporary file).
EDIT: the first implementation of quote was incorrect and would not deal correctly with \ special characters interpreted by echo -en. Thanks #xhienne for spotting that.
EDIT2: the second implementation of quote had bug because of using only \0 than would actually eat up more zeroes as \0, \00, \000 and \0000 are equivalent. So \0 was replaced by \x00. Thanks for #MatthijsSteen for spotting this one.

Use uuencode and uudecode for POSIX portability
xxd and base64 are not POSIX 7 but uuencode is.
VAR="$(uuencode -m <(printf "a\0\n") /dev/stdout)"
uudecode -o /dev/stdout <(printf "$VAR") | od -tx1
Output:
0000000 61 00 0a
0000003
Unfortunately I don't see a POSIX 7 alternative for the Bash process <() substitution extension except writing to file, and they are not installed in Ubuntu 12.04 by default (sharutils package).
So I guess that the real answer is: don't use Bash for this, use Python or some other saner interpreted language.

I love jeff's answer. I would use Base64 encoding instead of xxd. It saves a little space and would be (I think) more recognizable as to what is intended.
VAR=$(echo -ne "foo\0bar" | base64)
echo -n "$VAR" | base64 -d | xargs -0 ...
As for -e, it is needed for the echo of a literal string with an encoded null ('\0'), though I also seem to recall something about "echo -e" being unsafe if you're echoing any user input as they could inject escape sequences that echo will interpret and end up with bad things. The -e flag is not needed when echoing the encoded stored string into the decode.

Here’s a maximally memory-efficient solution, that just escapes the NULL bytes with an \xFF.
(Since I wasn’t happy with base64 or the like. :)
esc0() { sed 's/\xFF/\xFF\xFF/g; s/\x00/\xFF0/g'; }
cse0() { sed 's/\xFF0/\xFF\x00/g; s/\xFF\(.\)/\1/g'; }
It of course escapes any actual \xFF by doubling it too, so it works exactly like when backslashes are used for escaping. This is also why a simple mapping can’t be used, and referring to the match in the replacement is required.
Here’s an example that paints gradients onto the framebuffer (doesn’t work in X), using variables to pre-render blocks and lines for speed:
width=7680; height=1080; # Set these to your framebuffer’s size.
blocksPerLine=$(( $width / 256 ))
block="$( for i in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do for j in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do echo -ne "\x$i$j"; done; done | esc0 )"
line="$( for ((b=0; b < blocksPerLine; b++)); do echo -en "$block"; done )"
for ((l=0; l <= $height; l++)); do echo -en "$line"; done | cse0 > /dev/fb0
Note how $block contains escaped NULLs (plus \xFFs), and at the end, before writing everything to the framebuffer, cse0 unescapes them.

Related

How to remove duplicate with bash script command xargs when the string has some quotes ""?

I am a newbie in bash script.
Here is my environment:
Mac OS X Catalina
/bin/bash
I found here a mix of several commands to remove the duplicate string in a string.
I needed for my program which updates the .zhrc profile file.
Here is my code:
#!/bin/bash
a='export PATH="/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/local/bin:"'
myvariable=$(echo "$a" | tr ':' '\n' | sort | uniq | xargs)
echo "myvariable : $myvariable"
Here is the output:
xargs: unterminated quote
myvariable :
After some test, I know that the source of the issue is due to some quotes "" inside my variable '$a'.
Why am I so sure?
Because when I execute this code for example:
#!/bin/bash
a="/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home:/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home"
myvariable=$(echo "$a" | tr ':' '\n' | sort | uniq | xargs)
echo "myvariable : $myvariable"
where $a doesn't contain any quotes, I get the correct output:
myvariable : /Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home
I tried to search for a solution for "xargs: unterminated quote" but each answer found on the web is for a particular case which doesn't correspond to my problem.
As I am a newbie and this line command is using several complex commands, I was wondering if anyone know the magic trick to make it work.
Basically, you want to remove duplicates from a colon-separated list.
I don't know if this is considered cheating, but I would do this in another language and invoke it from bash. First I would write a script for this purpose in zsh: It accepts as parameter a string with colon separtors and outputs a colon-separated list with duplicates removed:
#!/bin/zsh
original=${1?Parameter missing} # Original string
# Auxiliary array, which is set up to act like a Set, i.e. without
# duplicates
typeset -aU nodups_array
# Split the original strings on the colons and store the pieces
# into the array, thereby removing duplicates. The core idea for
# this is stolen from:
# https://stackoverflow.com/questions/2930238/split-string-with-zsh-as-in-python
nodups_array=("${(#s/:/)original}")
# Join the array back with colons and write the resulting string
# to stdout.
echo ${(j':')nodups_array}
If we call this script nodups_string, you can invoke it in your bash-setting as:
#!/bin/bash
a_path="/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/local/bin:"
nodups_a_path=$(nodups_string "$a_path")
my_variable="export PATH=$nodups_a_path"
echo "myvariable : $myvariable"
The overall effect would be literally what you asked for. However, there is still an open problem I should point out: If one of the PATH components happens to contain a space, the resulting export statement can not validly be executed. This problem is also inherent into your original problem; you just didn't mention it. You could do something like
my_variable=export\ PATH='"'$nodups_a_path"'"'
to avoid this. Of course, I wonder why you take such an effort to generat a syntactically valid export command, instead of simply building the PATH by directly where it is needed.
Side note: If you would use zsh as your shell instead of bash, and only want to keep your PATH free of duplicates, a simple
typeset -iU path
would suffice, and zsh takes care of the rest.
With awk:
awk -v RS=[:\"] 'NR > 1 { pth[$0]="" } END { for (i in pth) { if (i !~ /[[:space:]]+/ && i != "" ) { printf "%s:",i } } }' <<< "$a"
Set the record separator to : and double quotes. Then when the number record is greater than one, set up an array called pth with the path as the index. At the end, loop through the array, re printing the paths separated with :

Shell - How can I split a line of text into fragments using SED?

I got a output from a command and it is something like this 2048,4096,8192,16384,24576,32768.
I want to split it into 6 different files but only the numbers, not the commas e.g
The initial text: 2048,4096,8192,16384,24576,32768 be split into: 2048 to the file A, 4096 to the file B, 8192 to the file C and so on.
That output follows this rules:
There are always 6 spaces, separated by commas
The numbers are always from 3 to 5 "length" (I don't know the proper English word)
As I told you, commas doesn't interest me because I'm going to do mathematical operations with those numbers
I tried to delete the last X numbers but didn't get a way to "detect" a comma so the operation can stop.
Is it possible using SED?
The following requires on no commands external to a POSIX-compliant shell (such as busybox ash, which you're most likely to be using on Android):
csv=/system/file.csv
IFS=, read a b c d e f <"$csv"
echo "$a" >A
echo "$b" >B
echo "$c" >C
echo "$d" >D
echo "$e" >E
echo "$f" >F
This does assume that the files to be written (A, B, C, D, E and F) are all inside the current working directory. If you want to write them somewhere else, either amend their names, or use cd to change to that other directory.
I'm not sure sed is the right tool for that.
With a simple Bash script:
IFS=',' read -ra val < file.csv
for i in "${val[#]}"; do
echo $i > file$(( ++j ))
done
It writes each values of you csv into file1, file2, etc. :
The read command assigns values from file.csv to array variable val.
Using loop, each value is written to file.
Just make sure you have write permissions in the current directory. If not, change the redirection (eg: > /dirWithWritePermissions/).
This might work for you (GNU sed, bash and parallel):
parallel --xapply echo {1} ">>" part{2} :::: <(sed 's/,/\n/g' file.csv) ::: {1..6}
This "zips" together two files reusing the shorter file as necessary.
N.B. Remember to remove any part* files before applying this command otherwise those files will grow (>> appends).
declare list_of_files=(fileA fileB fileC fileD fileE)
readarray a <<< $(sed 's/,/\n/;{;P;D;}' <<< '2048,4096,8192,16384,24576,32768')
for i in ${a}; do
echo $i > "${list_of_files["$((num++))"]}"
done
Explanation:
s/,/\n/ substitutes every comma with a newline
{ starts a command group
P prints everything in the pattern buffer up to the first newline
D Detetes everything in the pattern buffer up to the first newline and then restarts the current command group
} ends the command group
EDIT:
Let's say you want to copy the information into /system/file but want to have every number in its own row:
$ sed 's/,/\n/;{;P;D;}' < /sys/module/lowmemorykiller/parameters/minfree > /system/file
This will create a new file /system/file that will contain the formatted output.
EDIT: even shorter would: sed 's/,/\n/g', which works just by replacing every comma with a new space (which is done by the g at the end).
Also note that while sed is a nice to tool to use (you gotta love it for its confusing language and commands...), the better and faster way is to use the bash built in read.

Bash Version of C64 Code Art: 10 PRINT CHR$(205.5+RND(1)); : GOTO 10

I picked up a copy of the book 10 PRINT CHR$(205.5+RND(1)); : GOTO 10
http://www.amazon.com/10-PRINT-CHR-205-5-RND/dp/0262018462
This book discusses the art produced by the single line of Commodore 64 BASIC:
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
This just repeatedly prints randomly character 205 or 206 to the screen from the PETSCII set:
http://en.wikipedia.org/wiki/PETSCII
https://vimeo.com/26472518
I'm not sure why the original uses the characters 205 and 206 instead of the identical 109 and 110. Also, I prefer to add a clear at the beginning. This is what I usually type into the C64:
1?CHR$(147)
2?CHR$(109.5+RND(1));:GOTO2
RUN
You can try this all for yourself in an emulator, such as this one using Flash or JavaScript:
http://codeazur.com.br/stuff/fc64_final/
http://www.kingsquare.nl/jsc64
When inputting the above code into the emulators listed, you'll need to realize that
( is *
) is (
+ is ]
I decided it would be amusing to write a bash line to do something similar.
I currently have:
clear; while :; do [ $(($RANDOM%2)) -eq 0 ] && (printf "\\") || (printf "/"); done;
Two questions:
Any suggestions for making this more concise?
Any suggestions
for a better output character? The forward and backward slash are
not nearly as beautiful since their points don't line up. The characters used from PETSCII are special characters, not slashes. I didn't see anything in ASCII that could work as well, but maybe you can suggest a way to pull in a character from UTF-8 or something else?
Best ANSWERS So Far
Shortest for bash (40 characters):
yes 'c=(╱ ╲);printf ${c[RANDOM%2]}'|bash
Here is a short one for zsh (53 characters):
c=(╱ ╲);clear;while :;do printf ${c[RANDOM%2+1]};done
Here is an alias I like to put in my .bashrc or .profile
alias art='c=(╱ ╲);while :;do printf "%s" ${c[RANDOM%2]};done'
Funny comparing this to the shortest I can do for C64 BASIC (23 characters):
1?C_(109.5+R_(1));:G_1
The underscores are shift+H, shift+N, and shift+O respectively. I can't paste the character here since they are specific to PETSCII. Also, the C64 output looks prettier ;)
You can read about the C64 BASIC abbreviations here:
http://www.commodore.ca/manuals/c64_programmers_reference/c64-programmers_reference_guide-02-basic_language_vocabulary.pdf
How about this?
# The characters you want to use
chars=( $'\xe2\x95\xb1' $'\xe2\x95\xb2' )
# Precompute the size of the array chars
nchars=${#chars[#]}
# clear screen
clear
# The loop that prints it:
while :; do
printf -- "${chars[RANDOM%nchars]}"
done
As a one-liner with shorter variable names to make it more concise:
c=($'\xe2\x95\xb1' $'\xe2\x95\xb2'); n=${#c[#]}; clear; while :; do printf -- "${c[RANDOM%n]}"; done
You can get rid of the loop if you know in advance how many characters to print (here 80*24=1920)
c=($'\xe2\x95\xb1' $'\xe2\x95\xb2'); n=${#c[#]}; clear; printf "%s" "${c[RANDOM%n]"{1..1920}"}"
Or, if you want to include the characters directly instead of their code:
c=(╱‬ ╲); n=${#c[#]}; clear; while :; do printf "${c[RANDOM%n]}"; done
Finally, with the size of the array c precomputed and removing unnecessary spaces and quotes (and I can't get shorter than this):
c=(╱‬ ╲);clear;while :;do printf ${c[RANDOM%2]};done
Number of bytes used for this line:
$ wc -c <<< 'c=(╱‬ ╲);clear;while :;do printf ${c[RANDOM%2]};done'
59
Edit. A funny way using the command yes:
clear;yes 'c=(╱ ╲);printf ${c[RANDOM%2]}'|bash
It uses 50 bytes:
$ wc -c <<< "clear;yes 'c=(╱ ╲);printf \${c[RANDOM%2]}'|bash"
51
or 46 characters:
$ wc -m <<< "clear;yes 'c=(╱ ╲);printf \${c[RANDOM%2]}'|bash"
47
After looking at some UTF stuff:
2571 BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT
2572 BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT
(╱‬ and ╲) seem best.
f="╱╲";while :;do print -n ${f[(RANDOM % 2) + 1]};done
also works in zsh (thanks Clint on OFTC for giving me bits of that)
Here is my 39 character command line solution I just posted to #climagic:
grep -ao "[/\\]" /dev/urandom|tr -d \\n
In bash, you can remove the double quotes around the [/\] match expression and make it even shorter than the C64 solution, but I've included them for good measure and cross shell compatibility. If there was a 1 character option to grep to make grep trim newlines, then you could make this 27 characters.
I know this doesn't use the Unicode characters so maybe it doesn't count. It is possible to grep for the Unicode characters in /dev/urandom, but that will take a long time because that sequence comes up less often and if you pipe it the command pipeline will probably "stick" for quite a while before producing anything due to line buffering.
Bash supports Unicode now, so we don't need to use UTF-8 character sequences such as $'\xe2\x95\xb1'.
This is my most-correct version: it loops, prints either / or \ based on a random number as others do.
for((;;x=RANDOM%2+2571)){ printf "\U$x";}
41
My previous best was:
while :;do printf "\U257"$((RANDOM%2+1));done
45
And this one 'cheats' using embedded Unicode (I think for obviousness, maintainability, and simplicity, this is my favourite).
Z=╱╲;for((;;)){ printf ${Z:RANDOM&1:1};}
40
My previous best was:
while Z=╱╲;do printf ${Z:RANDOM&1:1};done
41
And here are some more.
while :;do ((RANDOM&1))&&printf "\U2571"||printf "\U2572";done
while printf -v X "\\\U%d" $((2571+RANDOM%2));do printf $X;done
while :;do printf -v X "\\\U%d" $((2571+RANDOM%2));printf $X;done
while printf -v X '\\U%d' $((2571+RANDOM%2));do printf $X;done
c=('\U2571' '\U2572');while :;do printf ${c[RANDOM&1]};done
X="\U257";while :;do printf $X$((RANDOM%2+1));done
Now, this one runs until we get a stack overflow (not another one!) since bash does not seem to support tail-call elimination yet.
f(){ printf "\U257"$((RANDOM%2+1));f;};f
40
And this is my attempt to implement a crude form of tail-process elimination. But when you have had enough and press ctrl-c, your terminal will vanish.
f(){ printf "\U257"$((RANDOM%2+1));exec bash -c f;};export -f f;f
UPDATE:
And a few more.
X=(╱ ╲);echo -e "\b${X[RANDOM&1]"{1..1000}"}" 46
X=("\U2571" "\U2572");echo -e "\b${X[RANDOM&1]"{1..1000}"}" 60
X=(╱ ╲);while :;do echo -n ${X[RANDOM&1]};done 46
Z=╱╲;while :;do echo -n ${Z:RANDOM&1:1};done 44
Sorry for necroposting, but here's bash version in 38 characters.
yes 'printf \\u$[2571+RANDOM%2]'|bash
using for instead of yes inflates this to 40 characters:
for((;;)){ printf \\u$[2571+RANDOM%2];}
109 chr for Python 3
Which was the smallest I could get it.
#!/usr/bin/python3
import random
while True:
if random.randrange(2)==1:print('\u2572',end='')
else:print('\u2571',end='')
#!/usr/bin/python3
import random
import sys
while True:
if random.randrange(2)==1:sys.stdout.write("\u2571")
else:sys.stdout.write("\u2572")
sys.stdout.flush()
Here's a version for Batch which fits in 127 characters:
cmd /v:on /c "for /l %a in (0,0,0) do #set /a "a=!random!%2" >nul & if "!a!"=="0" (set /p ".=/" <nul) else (set /p ".=\" <nul)"

reading a very long line from without breaking into multiple line in shell script

I have a file which has very long rows of data. When i try to read using shell script, the data comes into multiple lines,ie, breaks at certain points.
Example row:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18-MAY-10|||||18-MAY-10|405819121107402|Outgoing International Calls,Outgoing Calls,WAP,Call Waiting,MMS,Data Service,National Roaming-Voice,Outgoing International Calls except home country,Conference Call,STD,Call Forwarding-Barr,CLIP,Incoming Calls,INTSNS,WAPSNS,International Roaming-Voice,ISD,Incoming Calls When Roaming Internationally,INTERNET||For You Plan||||||||||||||||||
All this is the content of a single line.
I use a normal read like this :
var=`cat pranay.psv`
for i in $var; do
echo $i
done
The output comes as:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18- MAY-10|||||18-MAY-10|405819121107402|Outgoing
International
Calls,Outgoing
Calls,WAP,Call
Waiting,MMS,Data
Service,National
Roaming-Voice,Outgoing
International
Calls
except
home
country,Conference
Call,STD,Call
Forwarding-Barr,CLIP,Incoming
Calls,INTSNS,WAPSNS,International
Roaming-Voice,ISD,Incoming
Calls
When
Roaming
Internationally,INTERNET||For
You
Plan||||||||||||||||||
How do i print all in single line??
Please help.
Thanks
This is because of word splitting. An easier way to do this (which also disbands with the useless use of cat) is this:
while IFS= read -r -d $'\n' -u 9
do
echo "$REPLY"
done 9< pranay.psv
To explain in detail:
$'...' can be used to create human readable strings with escape sequences. See man bash.
IFS= is necessary to avoid that any characters in IFS are stripped from the start and end of $REPLY.
-r avoids interpreting backslash in text specially.
-d $'\n' splits lines by the newline character.
Use file descriptor 9 for data storage instead of standard input to avoid greedy commands like cat eating all of it.
You need proper quoting. In your case, you should use the command read:
while read line ; do
echo "$line"
done < pranay.psv

Replacing HTML ascii codes via a bash script?

I need a way to replace HTML ASCII codes like ! with their correct character in bash.
Is there a utility I could run my output through to do this, or something along those lines?
$ echo '!' | recode html/..
!
$ echo '<∞>' | recode html/..
<∞>
I don't know of an easy way, here is what I suppose I would do...
You might be able to script a browser into reading the file in and then saving it as text. If lynx supports html character entities then it might be worth looking in to. If that doesn't work out...
The general solution to something like this is done with sed. You need a "higher order" edit for this, as you would first start with an entity table and then you would edit that table into an edit script itself with a multiple-step procedure. Something like:
. . .
s/&Dagger;/‡/g<br />
s/&#8221;/”/g<br />
. . .
Then, encapsulate this as html, read it in to a browser, and save it as text in the character set you are targeting. If you get it to produce lines like:
s/</</g
then you win. A bash script that calls sed or ex can be driven by the substitute commands in the file.
Here is my solution with the standard Linux toolbox.
$ foo="This is a line feed
And e acute:é with a grinning face 😀."
$ echo "$foo"
This is a line feed
And e acute:é with a grinning face 😀.
$ eval "$(printf '%s' "$foo" | sed 's/^/printf "/;s/&#0*\([0-9]*\);/\$( [ \1 -lt 128 ] \&\& printf "\\\\$( printf \"%.3o\\201\" \1)" || \$(which printf) \\\\U\$( printf \"%.8x\" \1) )/g;s/$/\\n"/')" | sed "s/$(printf '\201')//g"
This is a line feed
And e acute:é with a grinning face 😀.
You see that it works for all kinds of escapes, even Line Feed, e acute (é) which is a 2 byte UTF-8 and even the new emoticons which are in the extended plane (4 bytes unicode).
This command works ALSO with dash which is a trimmed down shell (default shell on Ubuntu) and is also compatible with bash and shells like ash used by the Synology.
If you don't mind sticking with bash and dropping the compatibility, you can make is much simpler.
Bits used should be in any decent Linux box (or OS X?)
- which
- printf (GNU and builtin)
- GNU sed
- eval (shell builtin)
The bash only version don't need which nor the GNU printf.

Resources