using special characters in bash variables / array - bash

I'm having issues declaring all printable characters in an array in a bash script. I’d like to display all printable characters through a loop 4 times.
Example
array=( a b c d … z A B C … Z 1 2 3 … 0 ! # # $ % ^ & * ( ) _ +)
For chr1 in ${array[#]}
Do
For chr2 in ${array[#]}
Do
Echo $chr1$chr2
Done
Done
I've been able to get the space character to print with using ${array[value of space]} but I still haven't been able to get the * character to print. It tends to print a list of files for some reason.
Any idea how I can get this to work?

Quotes! More quotes!
array=( a b c d z A B C Z 1 2 3 0 '!' '#' '#' '$' '%' '^' '&' '*' '(' ')' '_' '+')
for chr1 in "${array[#]}"
do
for chr2 in "${array[#]}"
do
echo "$chr1$chr2"
done
done
Slap quotes around the special characters in your array declaration, and slap double quotes around the variable accesses in the loops.

In shell scripts, quoting is your friend. Always.
array=(a b c d e f g h i j k l m z A B C Z 1 2 3 0 \! \# \# \$ \% \^ \& \* \( \) _ +)
for chr1 in "${array[#]}"; do
for chr2 in "${array[#]}"; do
echo "$chr1$chr2"
done
done
Works fine here.

chr () { printf "\\$(($1/64*100+$1%64/8*10+$1%8))"; }
ord () { printf '%s' "$(( ( 256 + $(printf '%d' "'$1"))%256 ))"; }
for i in {32..126}
do
for j in {32..126}
do
chr $i; chr $j
echo
done
done

the * is a special character to bash.
wild card [asterisk]. The * character serves as a "wild card" for filename expansion in globbing. By itself, it matches every filename in a given directory.
From the absolute bash scripting guide[1], you'll want to escape it like the first answer does.
[1] http://tldp.org/LDP/abs/html/special-chars.html

Related

When performing sed command, some tabs in a file are converted to a single whitespace

Background
I have a .xyz file from which I need to remove a specific set of lines from. As well as do some text replacements. I have a separate .txt file that contains a list of integers, corresponding to line numbers that need to be removed, and another for the lines which need replacing. This file will be called atomremove.txt and looks as follows. The other file is structured similarly.
Just as a preemptive TL;DR: The tabs in my input file that happen to have one extra whitespace (because they justify to a certain position regardless of one extra whitespace), end up being converted to a single whitespace in the output file.
14
13
11
10
4
The xyz file from which I need to remove lines will look like something like this.
24
Comment block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
B 17.99018 17.98940 2.24243
C 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
H 20.41328 14.02079 2.31959
H 22.06640 8.65013 2.27145
C 19.33725 17.20040 2.26894
H 13.96336 17.42048 2.19342
H 21.69450 3.68090 2.22196
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
C 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
My Code
I am successful in doing the line removal, and the replacements, although the output is not as expected. It appears to replace some of the tabs with the whitespace, specifically for lines that have a 'y' coordinate with only 5 decimals. I am going to share the resulting output first, and then my code.
Here is the output
19
Comment Block
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
Here is my code.
atomstorefile="./extract_internal/atomremove.txt"
atomchangefile="./extract_internal/atomchange.txt"
temp="temp.txt"
tempp="tempp.txt"
temppp="temppp.txt"
filestoreloc="./"$basefilename"_xyzoutputs/chops"
#get number of files in directory and set a loop for that # of files
numfiles=$( ls "./"$basefilename"_xyzoutputs/splits" | wc -l )
numfiles=$(( numfiles/2 ))
counter=1
while [ $counter -lt $(( numfiles + 1 )) ];
do
#set a loop for each split half
splithalf=1
while [ $splithalf -lt 3 ];
do
#storing the xyz file in a temp file for edits (non destructive)
cat ./"$basefilename"_xyzoutputs/splits/split"$splithalf"-geometry$counter.xyz > $temp
#changin specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}s/C/H/" $temp
done < $atomchangefile
# removing specified atoms
while read line;
do
line=$(( line + 2 ))
sed -i "${line}d" $temp
done < $atomstorefile
remainatoms=$( wc -l $temp | awk '{print $1}' )
remainatoms=$(( remainatoms - 2 ))
tail -n $remainatoms $temp > $tempp
echo $remainatoms > "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
echo Comment Block >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
cat $tempp >> "$filestoreloc"/split"$splithalf"-geometry$counter.xyz
splithalf=$(( splithalf + 1 ))
done
counter=$(( counter + 1 ))
done
I am sure the solution is simple. Any insight into what is causing this issue would be very appreciated.
Not sure what you are doing but you file can be fixed using column -t < filename command.
Example :
❯ cat test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯ column -t < test
H 18.38385 15.26701 2.28399
C 19.32295 15.80772 2.28641
O 16.69023 17.37471 2.23138
H 22.72612 1.13322 2.17619
C 14.47116 18.37823 2.18809
C 15.85803 18.42398 2.20614
C 20.51484 15.08859 2.30584
C 22.77653 3.65203 2.19000
C 19.33725 17.20040 2.26894
C 23.01832 9.16815 2.25575
C 23.48143 2.42830 2.16161
H 22.07113 11.03567 2.32659
C 13.75496 19.59644 2.16380
O 23.01248 6.08053 2.20226
C 12.41476 19.56937 2.14732
C 16.54400 19.61620 2.20021
C 23.50500 4.83405 2.17735
H 23.03249 10.56089 2.28599
O 17.87129 19.42333 2.22107
~
❯
The reason you wreck your whitespace is that you need to quote your strings. But a much superior solution is to refactor all of this monumentally overcomplicated shell script to a simple sed or Awk script.
Assuming the line numbers all indicate line numbers in the original input file, try this.
tmp=$(mktemp -t atomtmpXXXXXXXXX) || exit
trap 'rm -f "$tmp"' ERR EXIT
( sed 's%$%s/C/H/%' extract_internal/atomchange.txt
sed 's%$%d%' extract_internal/atomremove.txt ) >"$tmp"
ls -l "$tmp"; nl "$tmp" # debugging
for file in "$basefilename"_xyzoutputs/splits/*; do
dst= "$basefilename"_xyzoutputs/chops/${file#*/splits/}
sed -f "$tmp" "$file" >"$dst"
done
This combines the two input files into a new sed script (remarkably, by way of sed); the debugging line lets you inspect the result (probably remove it once you understand how this works).
Your question doesn't really explain how the input files relate to the output files so I had to guess a bit. One of the important changes is to avoid sed -i when you are not modifying an existing file; but above all, definitely avoid repeatedly overwriting the same file with sed -i.

Taking a count from file, I want to print no of variables using shell/bash

Taking count from file, say if count = 5, I want to print 5 variables. i.e. A B C D E.
If count = 2, Print 2 variables A B, etc.
I have tried using the ASCII values but couldn't go through it.
for i in {1..5}; do
count=5; a=0;
printf "\x$(printf %x '65+$a')";
count=count+1;
done
if count = 5, I want to print 5 variables. i.e. A B C D E. If count = 2, Print 2 variables A B, etc.
Here's a program that matches your style that does what you are looking for:
a=0
for i in {1..5}; do
printf "\x$(printf %x $(( 65 + a )) )";
a=$((a+1));
done
The first thing to note is that in order to do math in bash, you'll need to use the $(( )) operation. Above, you can see I replaced you '65+$a' with $(( 65 + a )) . That's the big news that you need to get math done.
There were a couple of other little issues, but you were stuck on the $(()) stuff so they weren't clear yet. Incidentally, the 'a' variable can be completely removed from the program to just use the 'i' variable like this:
for i in {1..5}; do
printf "\x$(printf %x $(( 64 + i )) )";
done
I had to change the constant to 64 since we are now counting starting at 1.
The {1..5} expression is a good short cut for 1 2 3 4 5, but you won't be able to put a variable into it. So, if you need to add a count variable back in, consider using the seq program instead like this:
count=$1
for i in $(seq 1 $count); do
printf "\x$(printf %x $(( 64 + i )) )";
done
Note that $() is different than the math operator $(()). $() runs a subcommand returning the results.
method 1: simple brace expansion
#!/bin/bash
# generate a lookup table
vars=( - $(echo {A..Z}) )
# use the elements
for i in {1..5}; do
echo ${vars[$i]}
done
{A..Z} generates 26 strings: A, B, ..., Z
which get stored in an array variable by vars=(...)
we prepend a - that we'll ignore
we can then do 1-based indexing into the array
limited to 26 variables (or whatever range we choose)
method 2: multiple brace expansion to generate arbitrary long variables
#!/bin/bash
if [[ ! $1 =~ ^[0-9]+$ ]]; then
echo "Usage: $0 count"
exit
fi
cmd='{A..Z}'
for (( i=$1; i>26; i=i/26 )); do
cmd="${A..Z}$cmd"
done
vars=( $(eval echo $cmd) )
for (( i=0; i<$1; i++ )); do
echo ${vars[$i]}
done
i/26 does integer division (throws away the remainder)
I'm lazy and generate "more than enough" variables rather than attempting to calculate how many is "exactly enough"
{a..b}{a..b}{a..b} becomes aaa aab aba abb baa bab bba bbb
using eval lets us do the brace expansion without knowing in advance how many sets are needed
Sample output:
$ mkvar.sh 10000 |fmt -64 | tail -5
ORY ORZ OSA OSB OSC OSD OSE OSF OSG OSH OSI OSJ OSK OSL OSM
OSN OSO OSP OSQ OSR OSS OST OSU OSV OSW OSX OSY OSZ OTA OTB
OTC OTD OTE OTF OTG OTH OTI OTJ OTK OTL OTM OTN OTO OTP OTQ
OTR OTS OTT OTU OTV OTW OTX OTY OTZ OUA OUB OUC OUD OUE OUF
OUG OUH OUI OUJ OUK OUL OUM OUN OUO OUP

Bash variable not decrementing in a pipeline

The variable x in the first example doesn't get decremented, while in the second example it works. Why?
Non working example:
#!/bin/bash
x=100
f() {
echo $((x--)) | tr 0-9 A-J
# this also wouldn't work: tr 0-9 A-J <<< $((x--))
f
}
f
Working example:
#!/bin/bash
x=100
f() {
echo $x | tr 0-9 A-J
((x--))
# this also works: a=$((x--))
f
}
f
I think it's related to subshells since I think that the individual commands in the pipeline are running in subshells.
It does decrement if you don't use a pipeline (and avoid a sub shell forking):
x=10
f() {
if ((x)); then
echo $((x--))
f
fi
}
Then call it as:
f
it will print:
10
9
8
7
6
5
4
3
2
1
Since decrement is happening inside the subshell hence current shell doesn't see the decremented value of x and goes in infinite recursion.
EDIT: You can try this work around:
x=10
f() {
if ((x)); then
x=$(tr 0-9 A-J <<< $x >&2; echo $((--x)))
f
fi
}
f
To get this output:
BA
J
I
H
G
F
E
D
C
B

Add space between every letter

How can I add spaces between every character or symbol within a UTF-8 document? E.g. 123hello! becomes 1 2 3 h e l l o !.
I have BASH, OpenOffice.org, and gedit, if any of those can do that.
I don't care if it sometimes leaves extra spaces in places (e.g. 2 or 3 spaces in a single place is no problem).
Shortest sed version
sed 's/./& /g'
Output
$ echo '123hello!' | sed 's/./& /g'
1 2 3 h e l l o !
Obligatory awk version
awk '$1=$1' FS= OFS=" "
Output
$ echo '123hello!' | awk '$1=$1' FS= OFS=" "
1 2 3 h e l l o !
sed(1) can do this:
$ sed -e 's/\(.\)/\1 /g' < /etc/passwd
r o o t : x : 0 : 0 : r o o t : / r o o t : / b i n / b a s h
d a e m o n : x : 1 : 1 : d a e m o n : / u s r / s b i n : / b i n / s h
It works well on e.g. UTF-8 encoded Japanese content:
$ file japanese
japanese: UTF-8 Unicode text
$ sed -e 's/\(.\)/\1 /g' < japanese
E X I F 中 の 画 像 回 転 情 報 対 応 に よ り 、 一 部 画 像 ( 特 に 『
$
sed is ok but this is pure bash
string=hello
for ((i=0; i<${#string}; i++)); do
string_new+="${string:$i:1} "
done
Since you have bash, I am will assume that you have access to sed. The following command line will do what you wish.
$ sed -e 's:\(.\):\1 :g' < input.txt > output.txt
I like these solutions because they do not have a trailing space like the rest
here.
GNU awk:
echo 123hello! | awk NF=NF FS=
GNU awk:
echo 123hello! | awk NF=NF FPAT=.
POSIX awk:
echo 123hello! | awk '{while(a=substr($0,++b,1))printf b-1?FS a:a}'
This might work for you:
echo '1 23h ello ! ' | sed 's/\s*/ /g;s/^\s*\(.*\S\)\s*$/\1/;l'
1 2 3 h e l l o !$
1 2 3 h e l l o !
In retrospect a far better solution:
sed 's/\B/ /g' file
Replaces the space between letters with a space.
string='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
echo ${string} | sed -r 's/(.{1})/\1 /g'
Pure POSIX Shell version:
addspace() {
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
printf '%c ' "$__addspace_str"
__addspace_str="${__addspace_str#?}"
done
printf '%c' "$__addspace_str"
}
Or if you need to put it in a variable:
addspace_var() {
addspace_result=""
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
addspace_result="$addspace_result${__addspace_str%${__addspace_str#?}} "
__addspace_str="${__addspace_str#?}"
done
addspace_result="$addspace_result$__addspace_str"
}
addspace_var abc
echo "$addspace_result"
Tested with dash, ksh, zsh, bash (+ bash --posix), and busybox ash.
Explanation
${x#?}
This parameter expansion removes the first character of x. ${x#...} in general removes a prefix given by a pattern, and ? matches any single character.
printf '%c ' "$str"
The %c format parameter transforms the string argument into its first character, so the full format string '%c ' prints the first character of the string followed by a space. Note that if the string was empty this would cause issues, but we already checked that it wasn't before, so it's fine. To print the first character safely in any situation we can use '%.1s', but I like living dangerously :3j
${x%${x#?}}
This is an alternate way to get the first character of the string. We already know that ${x#?} is all but the first character. Well, ${x%...} removes ... from the end of x, so ${x%${x#?}} removes all but the first character from the end of x, leaving only the first one.
__prefixed_variable_names
POSIX doesn't define local, so to avoid variable conflicts it's safer to create unique names that are unlikely to clobber each other. I am starting to experiment using M4 to generate unique names while not having to destroy my code every time but it's probably overkill for people who don't use shell as much as me.
[ -n "${str#?}" ]
Why not just [ -n "$str" ]? It's to avoid the dreaded trailing space, it's also why we have a little statement guy at the bottom there outside the loop. The loops goes until the string is one character long, then we finish outside of it so we can append this last character without adding a space.
When should I use this?
This is good for small inputs in long running loops, since it avoids the overhead of calling an external process, but for larger inputs it starts lagging behind fast, specially the var version. (I fault the ${x%${x#?}} trick).
Benchmark Commands
# addspace
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace \"$input\" >/dev/null; done"
# addspace_var
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace_var \"$input\" >/dev/null; done"
# sed for comparison
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do echo \"$input\" | sed 's/./& /g' >/dev/null; done"
Input Length = 3
addspace addspace_var sed
real 0m0,106s 0m0,106s 0m10,651s
user 0m0,077s 0m0,075s 0m9,349s
sys 0m0,029s 0m0,031s 0m3,030s
Input Length = 200
addspace addspace_var sed
real 0m6,050s 0m47,115s 0m11,049s
user 0m5,557s 0m46,919s 0m9,727s
sys 0m0,488s 0m0,068s 0m3,085s
Input Length = 1000
addspace addspace_var sed
real 0m55,989s TBD 0m11,534s
user 0m53,560s TBD 0m10,214s
sys 0m2,428s TBD 0m2,975s
(Yeah, I was waiting a bit for that last var one.)
In situations like this you can simply check the length of the input and call the appropriate function for maximum performance.
addspace() {
if [ ${#1} -lt 100 ]; then
addspace_builtins "$1"
else
addspace_proccess "$1"
fi
}

BASH: Iterating two v ariables in a for loop

I am having two files numbers.txt(1 \n 2 \n 3 \n 4 \n 5 \n) and alpha.txt (a \n n \n c \n d \n e \n)
Now I want to iterate both the files at the same time something like.
for num in `cat numbers.txt` && alpha in `cat alpha.txt`
do
echo $num "blah" $alpha
done
Or other idea I was having is
for num in `cat numbers.txt`
do
for alpha in `cat alpha.txt`
do
echo $num 'and' $alpha
break
done
done
but this kind of code always take the first value of $alpha.
I hope my problem is clear enough.
Thanks in advance.
Here it is what I actually intended to do. (Its just an example)
I am having one more file say template.txt having content.
variable1= NUMBER
variable2= ALPHA
I wanted to take the output from two files i.e numbers.txt and alpha.txt(one line from both at a time) and want to replace the NUMBER and ALPHA with the respective content from those two files.
so here it what I did as i got to know how to iterate both files together.
paste number.txt alpha.txt | while read num alpha
do
cp template.txt temp.txt
sed -i "{s/NUMBER/$num/g}" temp.txt
sed -i "{s/ALPHA/$alpha/g}" temp.txt
cat temp.txt >> final.txt
done
Now what i am having in final.txt is:
variable1= 1
variable2= a
variable1= 2
variable2= b
variable1= 3
variable2= c
variable1= 4
variable2= d
variable1= 5
variable2= e
variable1= 6
variable2= f
variable1= 7
variable2= g
variable1= 8
variable2= h
variable1= 9
variable2= i
variable1= 10
variable2= j
Its very simple and stupid approach. I wanted to know is there any other way to do this??
Any suggestion will be appreciated.
No, your question isn't clear enough. Specifically, the way you wish to iterate through your files is unclear, but assuming you want to have an output such as:
1 blah a
2 blah b
3 blah c
4 blah d
5 blah e
you can use the paste utility, like this:
paste number.txt alpha.txt | while read alpha num ; do
echo "$num and $alpha"
done
or even:
paste -d# alpha num | sed 's/#/ blah /'
Your first loop is impossible in bash. Your second one, without the break, would combine each line from numbers.txt with each line from alpha.txt, like this:
1 AND a
1 AND n
1 AND c
...
2 AND a
...
3 AND a
...
4 AND a
...
Your break makes it skip all lines from the alpha.txt, except the 1st one (bmk has already explained it in his answer)
It should be possible to organize the correct loop using the while loop construction, but it would be rather ugly.
There're lots of easier alternatives which maybe a better choice, depending on specifics of your task. For example, you could try this:
paste numbers.txt alpha.txt
or, if you really want your "AND"s, then, something like this:
paste numbers.txt alpha.txt | sed 's/\t/ AND /'
And if your numbers are really sequential (and you can live without 'AND'), you can simply do:
cat -n alpha.txt
Here is an alternate solution according to the first model you suggested:
while read -u 5 a && read -u 6 b
do
echo $a $b
done 5<numbers.txt 6<alpha.txt
The notation 5<numbers.txt tells the shell to open numbers.txt using file descriptor 5. read -u 5 a means read from a value for a from file descriptor 5, which has been associated with numbers.txt.
The advantage of this approach over paste is that it gives you fine-grain control over how you merge the two files. For example you could read one line from the first file and twice from the second file.
In your second example the inner loop is executed only once because of the break. It will simply jump out of the loop, i.e. you will always only get the first element of alpha.txt. Therefore I think you should remove it:
for num in `cat numbers.txt`
do
for alpha in `cat alpha.txt`
do
echo $num 'and' $alpha
done
done
If multiple loop isn't specifically your requirement but getting corresponding lines is then you may try the following code:
for line in `cat numbers.txt`
do
echo $line "and" $(cat alpha.txt| head -n$line | tail -n1 )
done
The head gets you the number of lines equal to the value of line and tail gets you the last element.
#tollboy, I think the answer you are looking for is this:
count=1
for item in $(paste number.txt alpha.txt); do
if [[ "${item}" =~ [a-zA-Z] ]]; then
echo "variable${count}= ${item}" >> final.txt
elif [[ "${item}" =~ [0-9] ]]; then
echo "variable${count}= ${item}" >> final.txt
fi
count=$((count+1))
done
When you type paste number.txt alpha.txt in your console, you see:
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
From bash's point of view $(paste number.txt alpha.txt) it looks like this:
1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 j
So for each item in that list, figure out if it is alpha or numeric, and print it to the output file.
Lastly, increment the count.

Resources