How to delete lines that do not match the pattern except the first line?
To delete lines except first i use
sed '1!d'
To delete lines that do not match patter i use
sed '/pattern/!d'
How can i use both of conditions? Things like
sed '1!/pattern/!d'
doesn't work, says unknown command: '/'
As example (if pattern="rcu")
from input
PID PPID PRI NI VSZ RSS STAT TIME CMD
1 0 19 0 33664 4832 Ss 00:00:07 /sbin/init splash
2 0 19 0 0 0 S 00:00:00 [kthreadd]
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
8 2 39 -20 0 0 I< 00:00:00 [mm_percpu_wq]
9 2 19 0 0 0 S 00:00:04 [ksoftirqd/0]
get
PID PPID PRI NI VSZ RSS STAT TIME CMD
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
Thanks
Output first line and all lines which contain rcu:
sed -n '1p; /rcu/p' file
You could also make a script that reads the file and “extracts” first line that don’t match the pattern (to another temporary file or to the string), make other operations like deleting lines that you want to be deleted and paste that line back in the same place in the file.
As you have tagged awk
awk '{if ($0~/rcu/) print}'
Demo :
$ cat file1.txt
PID PPID PRI NI VSZ RSS STAT TIME CMD
1 0 19 0 33664 4832 Ss 00:00:07 /sbin/init splash
2 0 19 0 0 0 S 00:00:00 [kthreadd]
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
8 2 39 -20 0 0 I< 00:00:00 [mm_percpu_wq]
9 2 19 0 0 0 S 00:00:04 [ksoftirqd/0]
$ awk '{if ($0~/rcu/) print}' file1.txt
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
$
If when you say except the first if you mean except the first line of the input even if it doesn't match the regexp then this is what you want:
$ awk '/rcu/ || NR==1' file
PID PPID PRI NI VSZ RSS STAT TIME CMD
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
or if you mean except the first line that doesn't match the regexp then this is what you want:
$ awk '/rcu/ || !c++' file
PID PPID PRI NI VSZ RSS STAT TIME CMD
3 2 39 -20 0 0 I< 00:00:00 [rcu_gp]
4 2 39 -20 0 0 I< 00:00:00 [rcu_par_gp]
Related
I have a tab delimited textfile with 18 column and more than 300000 rows. I have also a header line and I would sort the whole text file by the 16th column, which contains p-values. So I would like to sort it, having the lowest p-values above and also leaving the headline as it is.
I already have a code, it doesn't give me any error message, but it only shows the header line in the output file, and nothing else.
Here is my file:
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
Output should look like this:
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
Here is my code:
awk 'NR==1; NR > 1 {print $0 | "sort -g -rk 16,16"}' file.txt > file_out.txt
I'm guessing your sort doesn't have a -g option and so it's failing and not producing any output. Try this instead just using POSIX options:
$ awk 'NR==1; NR > 1 {print | "sort -nrk 16,16"}' file
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
Would you please try the following:
cat <(head -n 1 file.txt) <(tail -n +2 file.txt | sort -nk16,16) > file_out.txt
Using GNU awk (for array sorting):
awk 'NR==1 { print;next } { map[$3][$16]=$0 } END { PROCINFO["sorted_in"]="#ind_num_asc";for(i in map) { for(j in map[i]) { print map[i][j] } } }' file
Explanation
awk 'NR==1 {
print;next # Header record, print and skip to the next line
}
{
map[$3][$16]=$0 # None header line - create a two dimensional array indexed by ID (assuming that it is unique in the file) and by 16th field with the line as the value
}
END { PROCINFO["sorted_in"]="#ind_num_asc"; # Set the array sorting to index number ascending
for(i in map) {
for(j in map[i]) {
print map[i][j] # Loop through the array printing the values
}
}
}' file
I suggest you to try next script:
#!/bin/bash
head -n 1 file.txt > file_out.txt
tail -n +2 file.txt | sort -k 16 >> file_out.txt
This definitely works, according to your output sample, when you convert the blanks into tabs, obviously.
awk to the rescue!
$ awk 'NR==1; NR>1{print | "sort -k16n"}' file | column -t
filename CHROM ID x11_CT x12_CT CT1 CT2 SampleSize x21_CT x21 x22_CT x22 x11 x12 chIGSFA P_value GZD ZGSR
V1003 1 rs12562034 128 0 1068 14 541 940 0.868761552680222 14 0.0129390018484288 0.118299445471349 0 0.951515008754774 0.329333964471109 0.00612270697448755 0.041938142300103
V1003 1 rs3131972 212 1 1068 14 541 856 0.791127541589649 13 0.0120147874306839 0.195933456561922 0.000924214417744917 0.70567673346914 0.400882778478405 0.00649170940375354 0.0361163844076152
V1003 1 rs12131377 78 0 1060 14 537 982 0.914338919925512 14 0.0130353817504655 0.0726256983240224 0 0.555433052966582 0.456106209942983 0.0037868148101911 0.0321609387794883
V1003 1 rs3131962 170 1 1066 14 540 896 0.82962962962963 13 0.012037037037037 0.157407407407407 0.000925925925925926 0.40191966550969 0.526099523335894 0.00450617283950613 0.027281782875571
I have two files that I need to merge together based on what string they contain in a specific column.
File 1 looks like this:
1 1655 1552 189
1 1433 1552 185
1 1623 1553 175
1 691 1554 182
1 1770 1554 184
1 1923 1554 182
1 1336 1554 181
1 660 1592 179
1 743 1597 179
File 2 looks like this:
1 1552 0 0 2 -9 G A A A
1 1553 0 0 2 -9 A A G A
1 1554 0 751 2 -9 A A A A
1 1592 0 577 1 -9 G A A A
1 1597 0 749 2 -9 A A G A
1 1598 0 420 1 -9 A A A A
1 1600 0 0 1 -9 A A G G
1 1604 0 1583 1 -9 A A A A
1 1605 0 1080 2 -9 G A A A
I am wanting to match column 3 from file 1 to column 2 on file 2, with my output looking like:
1 1655 1552 189 0 0 2 -9 G A A A
1 1433 1552 185 0 0 2 -9 G A A A
1 1623 1553 175 0 0 2 -9 A A G A
1 691 1554 182 0 751 2 -9 A A A A
1 1770 1554 184 0 751 2 -9 A A A A
1 1923 1554 182 0 751 2 -9 A A A A
1 1336 1554 181 0 751 2 -9 A A A A
1 660 1592 179 0 577 1 -9 G A A A
1 743 1597 179 0 749 2 -9 A A G A
I am not interested in keeping any lines in file 2 that are not in file 1. Thanks in advance!
Thanks to #Abelisto I managed to figure something out 4 hours later!
sort -k 3,3 File1.txt > Pheno1.txt
awk '($2 >0)' File2.ped > Ped1.ped
sort -k 2,2 Ped1.ped > Ped2.ped
join -1 3 -2 2 Pheno1.txt Ped2.ped > Ped3.txt
cut -d ' ' -f 1,4,5 --complement Ped3.txt > Output.ped
My real File2 actually contained negative values in the 2nd column (thankfully my real File1 didn't have any negatives) hence the use of awk to remove those rows
Using awk:
awk 'NR == FNR { arr[$2]=$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10 } NR != FNR { print $1" "$2" "$3" "$4" "arr[$3] }' file2 file1
Process file2 first (NR==FNR) Set up an array called arr with the 3rd space delimited field as the index and the 3rd to 10th fields as values separated with a space. Then when processing the first file (NR!=FNR) print the 1st to the 4th space delimited fields followed by the contents of arr, index field 3.
Since $1 seems like constant 1 and I have no idea about rowcounts of either file (800,000 columns in file2 sounded a lot) I'm hashing file1 instead:
$ awk '
NR==FNR {
a[$3]=a[$3] (a[$3]==""?"":ORS) $2 OFS $3 OFS $4
next
}
($2 in a) {
n=split(a[$2],t,ORS)
for(i=1;i<=n;i++) {
$2=t[i]
print
}
}' file1 file2
Output:
1 1655 1552 189 0 0 2 -9 G A A A
1 1433 1552 185 0 0 2 -9 G A A A
1 1623 1553 175 0 0 2 -9 A A G A
1 691 1554 182 0 751 2 -9 A A A A
1 1770 1554 184 0 751 2 -9 A A A A
1 1923 1554 182 0 751 2 -9 A A A A
1 1336 1554 181 0 751 2 -9 A A A A
1 660 1592 179 0 577 1 -9 G A A A
1 743 1597 179 0 749 2 -9 A A G A
When posting a question, please add details such as row and column counts to it. Better requirements yield better answers.
The command '(sleep 4 ; echo q) | topas -P |tee /tmp/top' on AIX produces this output in a command text file and I am not able to remove the blank spaces from it. I have tried, sed/perl and awk commands to print only the lines containing characters however nothing has helped. Do I need to convert this file to a ASCII text format to use sed/per/grep or awk to remove the empty lines?.
$ file /tmp/top
/tmp/top: commands text
$ head -n33 /tmp/top
DATA TEXT PAGE PGFAULTS
USER PID PPID PRI NI RES RES SPACE TIME CPU% I/O OTH COMMAND
root 9044256 20447712 60 20 2.43M 304K 2.43M 0:00 2.1 0 253 topas
root 14942646 8913178 60 20 72.0M 37.8M 72.0M 0:42 0.2 0 1 TaniumCl
root 20447712 21889434 60 20 148K 312K 508K 0:00 0.2 0 0 ksh
root 21955056 20447712 60 20 216K 36.0K 216K 0:00 0.1 0 3 sed
root 24838602 20447712 60 20 120K 8.00K 120K 0:00 0.1 0 1 tee
root 9830690 10355194 60 20 120K 4.00K 120K 0:00 0.1 0 0 sleep
root 12255642 13893896 60 41 57.5M 39.8M 57.5M 33:42 0.1 0 0 mmfsd
root 10355194 20447712 60 20 148K 312K 508K 0:00 0.1 0 0 ksh
root 9109790 4063622 39 20 12.9M 3.68M 12.9M 5:19 0.1 0 0 rmcd
root 13697394 4063622 60 20 8.27M 55.9M 8.27M 17:18 0.1 0 0 backup_a
root 20906328 1 60 20 1.81M 0 1.81M 3:15 0.0 0 0 nfsd
root 4260244 1 60 20 620K 88.0K 620K 41:23 0.0 0 0 getty
root 1573172 0 37 41 960K 0 960K 15:17 0.0 0 0 gil
nagios 9240876 4063622 60 20 23.7M 736K 23.7M 9:43 0.0 0 0 ncpa_pas
root 4391332 1 60 20 12.5M 252K 12.5M 4:43 0.0 0 0 secldapc
a_RTHOMA 8323456 12059082 60 20 636K 3.06M 1016K 0:00 0.0 0 0 sshd
root 8388902 4063622 60 20 1.76M 1.05M 1.76M 7:03 0.0 0 0 clcomd
root 3539312 1 60 20 448K 0 448K 5:07 0.0 0 0 lock_rcv
root 3670388 1 60 20 448K 0 448K 4:18 0.0 0 0 sec_rcv
root 5767652 4063622 48 8 392K 324K 392K 2:49 0.0 0 0 xntpd
root 6816242 1 60 20 1.19M 0 1.19M 1:05 0.0 0 0 rpc.lock
root 459026 0 16 41 640K 0 640K 2:19 0.0 0 0 reaffin
root 23921008 1 60 20 1.00M 0 1.00M 4:36 0.0 0 0 n4cb
lpar2rrd 23200112 25625020 64 22 868K 120K 868K 0:00 0.0 0 0 vmstat
root 7143896 1 40 41 448K 0 448K 0:48 0.0 0 0 nfsWatch
root 6160840 1 60 20 448K 0 448K 0:09 0.0 0 0 j2gt
Looks like your output file has some ASCII characters so it's better to print only those lines which starting from letters do like:
awk '/^[a-zA-Z]/' Input_file
I would like to replace the second entry of all (space or tab-separated) lines containing a specific string. In the following text
1078.732700000 0.00001000 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00001000 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
I would like to replace 0.00001000 by 0.00005214 (searching for the string 81SaWoLa). The result should be
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
Could you please help me how to perform it?
Try this for GNU sed:
$ cat input.txt
1078.732700000 0.00001000 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00001000 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
$ sed -r '/81SaWoLa/s/^([^ \t]+[ \t]+)[^ \t]+(.*)/\10.00005214\2/' input.txt
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
You may use awk to achieve your goal more easily.
$ awk '/81SaWoLa/{$2="0.00005214"}1' file
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
With the variables to be used, used the command as followed,
$ var1=81SaWoLa
$ var2=0.00005214
$ awk -v var1=$var1 -v var2=$var2 '$0 ~ var1{$2=var2}1' file
1078.732700000 0.00005214 1 0 0 39 13 27 0 0 0 40 14 26 81SaWoLa.43 BAD LABEL, REASSIGNED
-1077.336700000 0.00005214 1 0 0 45 12 34 0 0 0 46 13 34 81SaWoLa.48 BAD LABEL
Use awk:
$ awk '/pattern/ { $2 = "new string." }; 1' input.txt
In your case:
$ awk '/81SaWoLa/ { $2 = "0.00005214" }; 1' input.txt
# ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ ^
# For lines Replace second Print each line, same as:
# matching column with ``{ print $0 }''. ``$0''
# 81SaWoLa ``0.00005214'' contains the whole line.
Use -v name=val to specify a variable:
$ awk -v pat="81SaWoLa" \
-v rep="0.00005214" \
'$0 ~ pat { $2 = rep }; 1' input.txt
So I have done this in both python and bash, and the code I am about to post probably has a world of things wrong with it but it is generally very basic and I cannot see a reason that it would cause this 'bug' which I will explain soon.. I have done the same in Python, but much more professionally and cleanly and it also causes this error (at some point, the maths generates a negative number, which makes no sense.)
#!/bin/bash
while [ 1 ];
do
zero=0
ARRAY=()
ARRAY2=()
first=`command to generate a list of numbers`
sleep 1
second=`command to generate a list of numbers`
# so now we have two data sets, 1 second between the capture of each.
for i in $first;
do
ARRAY+=($i)
done
for i in $second;
do
ARRAY2+=($i)
done
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
ARRAY=()
ARRAY2=()
zero=0
c=0
first=``
second=``
math=''
done
So the script grabs a set of data, waits 1 second, grabs it again, does math on the two sets to get the difference, that difference is printed. It's very simple, and I have done it elegantly in Python too - no matter how I would do it every now and then, could be anywhere from 3 loops in to 30 loops in, we will get negative numbers.. like so:
START 0 0 0 0 0 19 10 563 0
-34 19 14 2 0
-1302 1198
-532 639
-1078 1119 1 0 0
-843 33 880 0 5
-8
-13508 8773 4541 988 181
-12
-205 217
-9 7 1
-360 303 60 1 0 0
-12
-96 98 3
-870 904
-130
-2105 2264 6
-3084 1576 1650
-939 971
-2249 1150 1281
-693 9 513 142 76 expr: syntax error
Please help, I simply can't find anything about this.
Sample OUTPUT as requested:
ARRAY1 OUTPUT
1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 781 947 1 1 206 9 1 3 2 81 2602 7 158 1 1 43 91 1 120 6589 6 2534 1092 1 6014 7 2 2 37 1 1 1 80 2 1 1270 15448 66 1 10238 1 10794 16061 4 1 1 1 9754 5617 1123 926 3 24 10 16
ARRAY2 OUTPUT
1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47 787 947 1 1 206 9 1 3 2 81 2602 7 159 1 1 43 91 1 120 6869 6 2534 1092 1 6044 7 2 2 37 1 1 1 80 2 1 1270 15563 66 1 10293 1 10804 16134 4 1 1 1 9755 5633 1135 928 3 24 10 16
START
The answer lies in Russell Uhl's comment above. Your loop runs one time to many(this is your code):
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
To fix, you need to change the test condition from c <= ${#ARRAY2[#]} to c < ${#ARRAY2[#]}:
for (( c=$zero; c < ${#ARRAY2[#]}; c++ ))
do
echo $((${ARRAY2[$c]} - ${ARRAY[$c]}))
done
I've also changed the expr to use arithmetic evaluation builtin $((...)).
The test script (sum.sh):
#!/bin/bash
zero=0
ARRAY=()
ARRAY2=()
first="1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 7
second="1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47
for i in $first; do
ARRAY+=($i)
done
# Alternately as chepner suggested:
ARRAY2=($second)
for (( c=$zero; c < ${#ARRAY2[#]}; c++ )); do
echo -n $((${ARRAY2[$c]} - ${ARRAY[$c]})) " "
done
Running it:
samveen#precise:/tmp$ echo $BASH_VERSION
4.2.25(1)-release
samveen#precise:/tmp$ bash sum.sh
0 0 0 0 0 0 0 0 14 6 476 0 0 0 4 4 0 0 0 0 0 0 0 16 4 0 0 0 48 0 0 0 0 27 0 0 0 0 0 16 0 0 0 0 501 62 36 0 8 0 0 0 5 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 280 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 115 0 0 55 0 10 73 0 0 0 0 1 16 12 2 0 0 0 0
EDIT:
* Added improvements from suggestions in comments.
I think the problem has to be when the two arrays don't have the same size. It's easy to reproduce that syntax error -- one of the operands for the minus operator is an empty string:
$ a=5; b=3; expr $a - $b
2
$ a=""; b=3; expr $a - $b
expr: syntax error
$ a=5; b=""; expr $a - $b
expr: syntax error
$ a=""; b=""; expr $a - $b
-
Try
ARRAY=( $(command to generate a list of numbers) )
sleep 1
ARRAY2=( $(command to generate a list of numbers) )
if (( ${#ARRAY[#]} != ${#ARRAY2[#]} )); then
echo "error: different size arrays!"
echo "ARRAY: ${#ARRAY[#]} (${ARRAY[*]})"
echo "ARRAY2: ${#ARRAY2[#]} (${ARRAY2[*]})"
fi
"The error occurs whenever the first array is smaller than the second" -- of course. You're looping from 0 to the array size of ARRAY2. When ARRAY has fewer elements, you'll eventually try to access an index that does not exist in the array. When you try to reference an unset variable, bash gives you the empty string.
$ a=(1 2 3)
$ b=(4 5 6 7)
$ i=2; expr ${a[i]} - ${b[i]}
-3
$ i=3; expr ${a[i]} - ${b[i]}
expr: syntax error