How can I compare two 2D-array files with bash? - bash

I have two 2D-array files to read with bash.
What I want to do is extract the elements inside both files.
These two files contain different rows x columns such as:
file1.txt (nx7)
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
.
.
.
file2.txt (mx3)
DESC W S
AAA 100 100
CCC 135 135
EEE 789 789
.
.
.
Here is what I want to do:
Extract the element in DESC column of file2.txt then find the corresponding element in file1.txt.
Extract the W,S elements in such row of file2.txt then find the corresponding W,S elements in such row of file1.txt.
If [W1==W2 && S1==S2]; then echo "${DESC[colindex]} ok"; else echo "${DESC[colindex]} NG"
How can I read this kind of file as a 2D array with bash or is there any convenient way to do that?

bash does not support 2D arrays. You can simulate them by generating 1D array variables like array1, array2, and so on.
Assuming DESC is a key (i.e. has no duplicate values) and does not contain any spaces:
#!/bin/bash
# read data from file1
idx=0
while read -a data$idx; do
let idx++
done <file1.txt
# process data from file2
while read desc w2 s2; do
for ((i=0; i<idx; i++)); do
v="data$i[1]"
[ "$desc" = "${!v}" ] && {
w1="data$i[4]"
s1="data$i[5]"
if [ "$w2" = "${!w1}" -a "$s2" = "${!s1}" ]; then
echo "$desc ok"
else
echo "$desc NG"
fi
break
}
done
done <file2.txt
For brevity, optimizations such as taking advantage of sort order are left out.
If the files actually contain the header NO DESC ID TYPE ... then use tail -n +2 to discard it before processing.
A more elegant solution is also possible, which avoids reading the entire file in memory. This should only be relevant for really large files though.

If the rows order is not needed be preserved (can be sorted), maybe this is enough:
join -2 2 -o 1.1,1.2,1.3,2.5,2.6 <(tail -n +2 file2.txt|sort) <(tail -n +2 file1.txt|sort) |\
sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\) \2 \3/\1 OK/' |\
sed '/ OK$/!s/\([^ ]*\) .*/\1 NG/'
For file1.txt
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
and file2.txt
DESC W S
AAA 000 100
CCC 135 135
EEE 789 000
FCK xxx 135
produces:
AAA NG
CCC OK
EEE NG
Explanation:
skip the header line in both files - tail +2
sort both files
join the needed columns from both files into one table like, in the result will appears only the lines what has common DESC field
like next:
AAA 000 100 100 100
CCC 135 135 135 135
EEE 789 000 789 789
in the lines, which have the same values in 2-4 and 3-5 columns, substitute every but 1st column with OK
in the remainder lines substitute the columns with NG

Related

bash: conserve tab with spaces for alignment with column

I am trying to display .tsv files aligned nicely as columns, and yet allow limiting display to the current screen width. I am able to get this done in the following way that works in general but will fail if the input contains a particular character that is used by column. The current solution that I am using presently works as follows:
bash$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
I tried using tab itself directly but could not make it work. And with default option for column, any whitespace and not just tabs are used so it does not work for me. Would be thankful for any better alternative than the above.
PS:
A sample is shown below
bash:~$ cat sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | column -n -t | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$
You can set column to use tab as character to be used to delimit columns with -s:
column -t -s $'\t' -n sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y

Search phrases and terms (stored in a file) in a text article

I have two text files, one containing keywords and phrases (file1.txt) and a paragraph-based text file (file2.txt). I'm trying to find the keywords/phrases in file1.txt that appeared on file2.txt
Here's a sample data:
File 1 (file1.txt):
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
File 2 (file2.txt)
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
Desired output:
123 111 1111
ABC 000
A 999
Thank you
I've tried the grep command:
grep -Fxf file1.txt file2.txt > output.txt
And I'm getting a blank output.txt
What suggestions do you have to get the right output?
try
grep -o -f file1.txt <file2.txt
-o < print only matching pattern
-f < search for this string line by line
< < Standard input
Demo :
$cat file1.txt
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
$cat file2.txt
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
$grep -o -f file1.txt <file2.txt
ABC 000
A 999
123 111 1111
Thank you
$

bash reading multiple text file and setting variables

how can read each lines and then set each string as separate variables.
for example:
555 = a
abc = b
5343/abc = c
22 = d
2323 = e
233/2344 = f
test1.txt
555 abc 5343/abc
444 cde 343/ccc
test2.txt
22 2323 233/2344
112 223 13/12
echo $a $d $f
desired output:
555 22 233/2344
444 112 13/12
Following script will set each line as variable but i would strings in each line as variable.
paste test1.txt test2.txt | while IFS="$(printf '\t')" read -r f1 f2
do
printf 'codesonar %s %s\n' "$f1 $f2"
done
You have to use the variables you want in your read.
$: paste test1.txt test2.txt |
> while read a b c d e f g h i j k l m n o p q etc
> do echo $a $d $f
> done
555 22 233/2344
444 112 13/12
Am I missing something?

Counting the number of 10-digit numbers in a file

I need to count the total number of instances in which a 10-digit number appears within a file. All of the numbers have leading zeros, e.g.:
This is some text. 0000000001
Returns:
1
If the same number appears more than once, it is counted again, e.g.:
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
Returns:
3
Sometimes there are no spaces between the numbers, but each continuous string of 10-digits should be counted:
00000000010000000010000000000100000000010000000001
Returns:
5
How can I determine the total number of 10-digit numbers appearing in a file?
Try this:
grep -o '[0-9]\{10\}' inputfilename | wc -l
The last requirement - that you need to count multiple numbers per line - excludes grep, as far as I know it can count only per-line.
Edit: Obviously, I stand corrected by Nate :) grep's -o option is what I was looking for.
You can however do this easily with sed like this:
$ cat mkt.sh
sed -r -e 's/[^0-9]/./g' -e 's/[0-9]{10}/num /g' -e 's/[0-9.]//g' $1
$ for i in *.txt; do echo --- $i; cat $i; echo --- number count; ./mkt.sh $i|wc -w; done
--- 1.txt
This is some text. 0000000001
--- number count
1
--- 2.txt
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
--- number count
3
--- 3.txt
00000000010000000010000000000100000000010000000001
--- number count
5
--- 4.txt
1 2 3 4 5 6 6 7 9 0
11 22 33 44 55 66 77 88 99 00
123456789 0
--- number count
0
--- 5.txt
1.2.3.4.123
1234567890.123-AbceCMA-5553///q/\1231231230
--- number count
2
$
This might work for you:
cat <<! >test.txt
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
00000000010000000010000000000100000000010000000001
1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 0 j
12345 67890 12 34 56 78 90
!
sed 'y/X/ /;s/[0-9]\{10\}/\nX\n/g' test.txt | sed '/X/!d' | sed '$=;d'
8
"I need to count the total number of instances in which a 10-digit number appears within a file. All of the numbers have leading zeros"
So I think this might be more accurate:
$ grep -o '0[0-9]\{9\}' filename | wc -l

Extract numbers from filenames disregarding extensions

I'm making a script to rename some video files. Some are named XXX blah blah.ext and some are XXX - XXX blah blah.ext where "X" are digits. Furthermore, some files are .avi and some are mp4. What I'd like is to extract the numbers from these files, separated by a space if there is more than one, and to disregard the "4" in ".mp4" files.
My current implementation is egrep -o "[[:digit:]]*", and while this does separate numbers into different outputs, it also considers ".mp4".
Using sed I've not only not been able to produce different outputs for every number, but it also includes the "4". Note: I'm very new to sed i.e. I began learning it for the purpose of writing this script.
How can I do this?
for file in *
do
echo $file | sed 's/\..*$//' | egrep -o "[[:digit:]]*"
done
You should find this to be pretty robust:
sed 's/^[^[:digit:]]*\([[:digit:]]\+\)[^[:digit:]]\+\( [[:digit:]]\+\)\?[^[:digit:]]\+[[:digit:]]\?$/\1\2/'
If your sed supports -r, you can eliminate the backslashes which are used for escaping:
sed -r 's/^[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+( [[:digit:]]+)?[^[:digit:]]+[[:digit:]]?$/\1\2/'
Demo:
$ echo '123 blah blah.avi
234 blah blah.mp4
345 - 678 blah blah.avi
901 - 234 blah blah.mp4' |
sed -r 's/^[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+( [[:digit:]]+)?[^[:digit:]]+[[:digit:]]?$/\1\2/'
123
234
345 678
901 234
This depends on there being a space in the filename before the second number (when there is one). If there are files that don't have that, then a simple modification can make it work.
This might work for you:
# echo '123 bla bla.avi
456 - 789 bla bla.avi
012bla bla.avi
345-678blabla.avi
901 bla bla.mp4
234 - 567 bla bla.mp4
890bla bla.mp4
123 - 456 - 789 bla bla.mp4' |
sed 's/[^0-9]*[0-9]$//;s/[^0-9]\+/ /g'
123
456 789
012
345 678
901
234 567
890
123 456 789

Resources