Replace the second entry in all lines containing a specific string in bash - bash

I would like to replace the second entry of all (space or tab-separated) lines containing a specific string. In the following text
1078.732700000 0.00001000 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00001000 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
I would like to replace 0.00001000 by 0.00005214 (searching for the string 81SaWoLa). The result should be
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
Could you please help me how to perform it?

Try this for GNU sed:
$ cat input.txt
1078.732700000 0.00001000 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00001000 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
$ sed -r '/81SaWoLa/s/^([^ \t]+[ \t]+)[^ \t]+(.*)/\10.00005214\2/' input.txt
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL

You may use awk to achieve your goal more easily.
$ awk '/81SaWoLa/{$2="0.00005214"}1' file
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
With the variables to be used, used the command as followed,
$ var1=81SaWoLa
$ var2=0.00005214
$ awk -v var1=$var1 -v var2=$var2 '$0 ~ var1{$2=var2}1' file
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL

Use awk:
$ awk '/pattern/ { $2 = "new string." }; 1' input.txt
In your case:
$ awk '/81SaWoLa/ { $2 = "0.00005214" }; 1' input.txt
# ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^
# For lines Replace second Print each line, same as:
# matching column with ``{ print $0 }''. ``$0''
# 81SaWoLa ``0.00005214'' contains the whole line.
Use -v name=val to specify a variable:
$ awk -v pat="81SaWoLa" \
-v rep="0.00005214" \
'$0 ~ pat { $2 = rep }; 1' input.txt

Related

Sort text file after the 16th column with p-values in bash

I have a tab delimited textfile with 18 column and more than 300000 rows. I have also a header line and I would sort the whole text file by the 16th column, which contains p-values. So I would like to sort it, having the lowest p-values above and also leaving the headline as it is.
I already have a code, it doesn't give me any error message, but it only shows the header line in the output file, and nothing else.
Here is my file:
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
Output should look like this:
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
Here is my code:
awk 'NR==1; NR > 1 {print $0 | "sort -g -rk 16,16"}' file.txt > file_out.txt
I'm guessing your sort doesn't have a -g option and so it's failing and not producing any output. Try this instead just using POSIX options:
$ awk 'NR==1; NR > 1 {print | "sort -nrk 16,16"}' file
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
Would you please try the following:
cat <(head -n 1 file.txt) <(tail -n +2 file.txt | sort -nk16,16) > file_out.txt
Using GNU awk (for array sorting):
awk 'NR==1 { print;next } { map[$3][$16]=$0 } END { PROCINFO["sorted_in"]="#ind_num_asc";for(i in map) { for(j in map[i]) { print map[i][j] } } }' file
Explanation
awk 'NR==1 {
print;next # Header record, print and skip to the next line
}
{
map[$3][$16]=$0 # None header line - create a two dimensional array indexed by ID (assuming that it is unique in the file) and by 16th field with the line as the value
}
END { PROCINFO["sorted_in"]="#ind_num_asc"; # Set the array sorting to index number ascending
for(i in map) {
for(j in map[i]) {
print map[i][j] # Loop through the array printing the values
}
}
}' file
I suggest you to try next script:
#!/bin/bash
head -n 1 file.txt > file_out.txt
tail -n +2 file.txt | sort -k 16 >> file_out.txt
This definitely works, according to your output sample, when you convert the blanks into tabs, obviously.
awk to the rescue!
$ awk 'NR==1; NR>1{print | "sort -k16n"}' file | column -t
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571

Need help to find average, min and max values in shell script from text file (again)

This is an update to a question I posted before. I've gotten a little farther into this but need help with a new problem.
I'm working on a shell script right now. I need to loop through a text file, grab the text from it, and find the average number, max number and min number from each line of numbers then print them in a chart with the name of each line. This is the text file:
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
This is my code so far. It should be working but it won't do any of the calculations. First loop grabs the line of text, second loop separates the name from the numbers, these two work. tHe thrid loop takes the numbers and does the calculations. It keeps giving me an error saying "expr: non integer argument", why is it doing that? I shouldn't
#!/bin/bash
while read line
do
echo $line | while read first second
do
echo $first
echo $second
sum=0
max=0
min=0
len=0
for arg in $second
do
sum=`expr $sum + $arg`
if [ $min > $arg ]
then
set min=$arg
fi
if [ $max < $arg ]
then
set max=$arg
fi
len=`expr $len + 1`
done
avg=`expr $sum / $len`
echo $avg
echo $min
echo $max
done
done < mystats.txt
This is the desired output when you type "bash statcalc.sh -s name mystats.txt"
Experiment Name Average Max Min
collect1 27 84 0
exp2 5 22 0
Experiment1 3 9 0
jump1 21 82 -1
jump2 31 88 5
taker1 13 44 2
Using awk
awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}'
Demo :
$awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}' file.txt | column -t
Experiment Name Average Max Min
Experiment1 4 9 0
collect1 27 84 0
jump1 22 82 -1
exp2 5 22 0
jump2 26 88 5
taker1 11 44 2
$cat file.txt
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
$

How to replace a character in a row based on the numbers in a column as the index of that character?

I have the following file:
#0035e19a-bf41-43ee-b01e-f386c5d9969b
TAGTATATTTTGTTTAGTTATGTTGGGTGGTGATTTTATGAGTTTTTGTTATTTATGAAA
&$'&&%'&')-1:96)$$$'##&'%&2&&&:?9537=&&*&<6CC##
2 0255 0 39 216 255 255
3 0254 1 19 236 255 255
7 0255 0 42 213 255 255
10 0255 0 61 194 255 255
15 0255 0 1 254 255 255
I want to replace letters in the second row with "C" based on the numbers in the first column after the 3rd row (2,3,7,10,15) as their index.
output like this:
#0035e19a-bf41-43ee-b01e-f386c5d9969b
TCCTATCTTCTGTTCAGTTATGTTGGGTGGTGATTTTATGAGTTTTTGTTATTTATGAAA
&$'&&%'&')-1:96)$$$'##&'%&2&&&:?9537=&&*&<6CC##
2 0255 0 39 216 255 255
3 0254 1 19 236 255 255
7 0255 0 42 213 255 255
10 0255 0 61 194 255 255
15 0255 0 1 254 255 255
I know how to replace for example for one number as an index but as my table and second row are long it is not possible to do it one by one for each number as an index.
Thank you in advance
The following script with comments inside:
# input copied from your post
cat <<EOF >file
#0035e19a-bf41-43ee-b01e-f386c5d9969b
TAGTATATTTTGTTTAGTTATGTTGGGTGGTGATTTTATGAGTTTTTGTTATTTATGAAA
&$'&&%'&')-1:96)$$$'##&'%&2&&&:?9537=&&*&<6CC##
2 0255 0 39 216 255 255
3 0254 1 19 236 255 255
7 0255 0 42 213 255 255
10 0255 0 61 194 255 255
15 0255 0 1 254 255 255
EOF
# this will be a script executed with sed
sedscript=$(
# get all lines except 3 first lines
<file tail -n+4 |
# extract first field
cut -d' ' -f1 |
# for each field
# printf the command for sed
# in the second line substitute
# any character at position for input
# taken from https://stackoverflow.com/questions/9318021/change-string-char-at-index-x
xargs printf "2s/./C/%d\n"
)
# execute the script on the file
sed -i "$sedscript" file
cat file
will output:
#0035e19a-bf41-43ee-b01e-f386c5d9969b
TCCTATCTTCTGTTCAGTTATGTTGGGTGGTGATTTTATGAGTTTTTGTTATTTATGAAA
&$'&&%'&')-1:96)152512$'##&'%&2&&&:?9537=&&*&<6CC##
2 0255 0 39 216 255 255
3 0254 1 19 236 255 255
7 0255 0 42 213 255 255
10 0255 0 61 194 255 255
15 0255 0 1 254 255 255
Tested on tutorialspoint.
I create a sed script with the lines 2s/./C/<number> which command in sed substitutes the characters for C in the second line at specified index. So for each index I create such line, and then such script is run with sed.
$ cat tst.awk
{ rec[NR] = $0 }
NR > 3 { rec[2] = substr(rec[2],1,$1-1) "C" substr(rec[2],$1+1) }
END {
for (i=1; i<=NR; i++) {
print rec[i]
}
}
$ awk -f tst.awk file
#0035e19a-bf41-43ee-b01e-f386c5d9969b
TCCTATCTTCTGTTCAGTTATGTTGGGTGGTGATTTTATGAGTTTTTGTTATTTATGAAA
&$'&&%'&')-1:96)$$$'##&'%&2&&&:?9537=&&*&<6CC##
2 0255 0 39 216 255 255
3 0254 1 19 236 255 255
7 0255 0 42 213 255 255
10 0255 0 61 194 255 255
15 0255 0 1 254 255 255
Here's an answer using just bash:
#!/bin/bash
declare -a lines
lines=()
(
while IFS= read -r line
do
if [ "${line:0:1}" = "#" ]
then
for li in "${lines[#]}"
do
echo -e "$li"
done
unset lines
lines=()
lines+=("$line") # #...
IFS= read -r line
lines+=("$line") # TCA...
IFS= read -r line
lines+=("$line") # &...
else
lines+=("$line") # <num>...
pos="${line/\ */}" #strip from first space to leave number
bpos="$((pos-1))"
# using acta as intermediary to not lose my head
acta="${lines[1]}"
acta="${acta:0:bpos}C${acta:$pos}" # substitute C
lines[1]="$acta"
fi
done < "$1"
for li in "${lines[#]}"
do
echo -e "$li"
done
)
Shell script composed of descriptive functions
Here's a solution with a shell script containing various functions with (hopefully) helpful names.
Usage
Assuming the file is save as input.txt and the shell script below is named script.sh, you will be able to generate the output by using:
$ .\script.sh input.txt
Actual Script
#!/bin/sh
read_file_input_output_var() {
output_var="$(cat "$1")"
}
get_table() {
tail -n +4 $1
}
get_first_column() {
cut -d ' ' -f 1
}
replace_character_at_column() {
sed_str="2s/./C/$1"
sed $sed_str
}
replace_characters_in_output_var() {
while read i
do
output_var="$(echo "$output_var" | replace_character_at_column $i)"
done
}
print_result() {
echo "$output_var"
}
main() {
read_file_input_output_var $1
get_table $1 |
get_first_column |
replace_characters_in_output_var
print_result
}
main $1

R plotting boxplot with different amount of entries

I have a matrix that is 50x2. But column 2 has different amount of entries. How can I make a box plot where the x axis is position and the y axis are the different counts? Ideally, I'd like to take the absolute value of the counts. Thanks in advance!
> mat.count[1:50,]
position count
1 136873135 0
2 136873136 0
3 136873137 0
4 136873138 0
5 136873139 0
6 136873140 -15
7 136873141 0
8 136873142 0
9 136873143 0
10 136873144 0
11 136873145 0
12 136873146 0
13 136873147 0
14 136873148 0
15 136873149 0
16 136873150 0
17 136873151 0
18 136873152 0
19 136873153 0
20 136873154 0
21 136873155 0
22 136873156 0
23 136873157 0
24 136873158 0
25 136873159 0
26 136873160 0
27 136873161 0
28 136873162 0
29 136873163 0
30 136873164 0
31 136873165 0
32 136873166 0
33 136873167 0
34 136873168 -1
35 136873169 0
36 136873170 0
37 136873171 0
38 136873172 0
39 136873173 -70
40 136873174 -66
41 136873175 -73,-1,-1,-1,-73,-1
42 136873176 -52
43 136873177 0
44 136873178 0
45 136873179 -66,-1
46 136873180 -1
47 136873181 0
48 136873182 -68,-75
49 136873183 -67,-67
50 136873184 -60,-56,-56

Maths in a while loop causing random negative numbers

So I have done this in both python and bash, and the code I am about to post probably has a world of things wrong with it but it is generally very basic and I cannot see a reason that it would cause this 'bug' which I will explain soon.. I have done the same in Python, but much more professionally and cleanly and it also causes this error (at some point, the maths generates a negative number, which makes no sense.)
#!/bin/bash
while [ 1 ];
do
zero=0
ARRAY=()
ARRAY2=()
first=`command to generate a list of numbers`
sleep 1
second=`command to generate a list of numbers`
# so now we have two data sets, 1 second between the capture of each.
for i in $first;
do
ARRAY+=($i)
done
for i in $second;
do
ARRAY2+=($i)
done
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
ARRAY=()
ARRAY2=()
zero=0
c=0
first=``
second=``
math=''
done
So the script grabs a set of data, waits 1 second, grabs it again, does math on the two sets to get the difference, that difference is printed. It's very simple, and I have done it elegantly in Python too - no matter how I would do it every now and then, could be anywhere from 3 loops in to 30 loops in, we will get negative numbers.. like so:
START 0 0 0 0 0 19 10 563 0
-34 19 14 2 0
-1302 1198
-532 639
-1078 1119 1 0 0
-843 33 880 0 5
-8
-13508 8773 4541 988 181
-12
-205 217
-9 7 1
-360 303 60 1 0 0
-12
-96 98 3
-870 904
-130
-2105 2264 6
-3084 1576 1650
-939 971
-2249 1150 1281
-693 9 513 142 76 expr: syntax error
Please help, I simply can't find anything about this.
Sample OUTPUT as requested:
ARRAY1 OUTPUT
1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 781 947 1 1 206 9 1 3 2 81 2602 7 158 1 1 43 91 1 120 6589 6 2534 1092 1 6014 7 2 2 37 1 1 1 80 2 1 1270 15448 66 1 10238 1 10794 16061 4 1 1 1 9754 5617 1123 926 3 24 10 16
ARRAY2 OUTPUT
1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47 787 947 1 1 206 9 1 3 2 81 2602 7 159 1 1 43 91 1 120 6869 6 2534 1092 1 6044 7 2 2 37 1 1 1 80 2 1 1270 15563 66 1 10293 1 10804 16134 4 1 1 1 9755 5633 1135 928 3 24 10 16
START
The answer lies in Russell Uhl's comment above. Your loop runs one time to many(this is your code):
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
To fix, you need to change the test condition from c <= ${#ARRAY2[#]} to c < ${#ARRAY2[#]}:
for (( c=$zero; c < ${#ARRAY2[#]}; c++ ))
do
echo $((${ARRAY2[$c]} - ${ARRAY[$c]}))
done
I've also changed the expr to use arithmetic evaluation builtin $((...)).
The test script (sum.sh):
#!/bin/bash
zero=0
ARRAY=()
ARRAY2=()
first="1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 7
second="1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47
for i in $first; do
ARRAY+=($i)
done
# Alternately as chepner suggested:
ARRAY2=($second)
for (( c=$zero; c < ${#ARRAY2[#]}; c++ )); do
echo -n $((${ARRAY2[$c]} - ${ARRAY[$c]})) " "
done
Running it:
samveen#precise:/tmp$ echo $BASH_VERSION
4.2.25(1)-release
samveen#precise:/tmp$ bash sum.sh
0 0 0 0 0 0 0 0 14 6 476 0 0 0 4 4 0 0 0 0 0 0 0 16 4 0 0 0 48 0 0 0 0 27 0 0 0 0 0 16 0 0 0 0 501 62 36 0 8 0 0 0 5 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 280 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 115 0 0 55 0 10 73 0 0 0 0 1 16 12 2 0 0 0 0
EDIT:
* Added improvements from suggestions in comments.
I think the problem has to be when the two arrays don't have the same size. It's easy to reproduce that syntax error -- one of the operands for the minus operator is an empty string:
$ a=5; b=3; expr $a - $b
2
$ a=""; b=3; expr $a - $b
expr: syntax error
$ a=5; b=""; expr $a - $b
expr: syntax error
$ a=""; b=""; expr $a - $b
-
Try
ARRAY=( $(command to generate a list of numbers) )
sleep 1
ARRAY2=( $(command to generate a list of numbers) )
if (( ${#ARRAY[#]} != ${#ARRAY2[#]} )); then
echo "error: different size arrays!"
echo "ARRAY: ${#ARRAY[#]} (${ARRAY[*]})"
echo "ARRAY2: ${#ARRAY2[#]} (${ARRAY2[*]})"
fi
"The error occurs whenever the first array is smaller than the second" -- of course. You're looping from 0 to the array size of ARRAY2. When ARRAY has fewer elements, you'll eventually try to access an index that does not exist in the array. When you try to reference an unset variable, bash gives you the empty string.
$ a=(1 2 3)
$ b=(4 5 6 7)
$ i=2; expr ${a[i]} - ${b[i]}
-3
$ i=3; expr ${a[i]} - ${b[i]}
expr: syntax error

Resources