here is the question: I have two files:
file1:
aaa
bbb
ccc
ddd
file2:
bbb
ddd
HOW TO USE DIFF TO GET THIS OUTPUT (only differences)
aaa
ccc
If what you want is records unique to file1, then :
$ comm -23 <(sort file1) <(sort file2)
aaa
ccc
Related
I have two text files, one containing keywords and phrases (file1.txt) and a paragraph-based text file (file2.txt). I'm trying to find the keywords/phrases in file1.txt that appeared on file2.txt
Here's a sample data:
File 1 (file1.txt):
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
File 2 (file2.txt)
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
Desired output:
123 111 1111
ABC 000
A 999
Thank you
I've tried the grep command:
grep -Fxf file1.txt file2.txt > output.txt
And I'm getting a blank output.txt
What suggestions do you have to get the right output?
try
grep -o -f file1.txt <file2.txt
-o < print only matching pattern
-f < search for this string line by line
< < Standard input
Demo :
$cat file1.txt
123 111 1111
ABC 000
A 999
B 000
C 111
Thank you
$cat file2.txt
Hello!
The following order was completed: ABC 000
Item 1 (A 999)
Item 2 (X 412)
Item 3 (8 357)
We will call: 123 111 1111 if we encounter any issues
Thank you very much!
$grep -o -f file1.txt <file2.txt
ABC 000
A 999
123 111 1111
Thank you
$
cat file_a
aaa
bbb
ccc
cat file_b
ddd
eee
fff
cat file_x
bbb
ccc
ddd
eee
I want to cat file_a file_b | remove_from_stream_what_is_in(file_x)
Result:
aaa
fff
If there is no basic filter to do this with, then I wonder if there is a way with ruby -ne '...'.
Try:
$ cat file_a file_b | grep -vFf file_x
aaa
fff
-v means remove matching lines.
-F tells grep to treat the match patterns as fixed strings, not regular expressions.
-f file_x tells grep to get the match patterns from the lines of file_x.
Other options that you may want to consider are:
-w tells grep to match only complete words.
-x tells grep to match only complete lines.
IO.write('file_a', %w| aaa bbb ccc |.join("\n")) #=> 11
IO.write('file_b', %w| ddd eee fff |.join("\n")) #=> 11
IO.write('file_x', %w| bbb ccc ddd eee |.join("\n")) #=> 15
From Ruby:
IO.readlines('file_a', chomp: true) + IO.readlines('file_b', chomp: true) -
IO.readlines('file_x', chomp: true)
#=> ["aaa", "fff"]
I have 2 CSV files, where the 1st one is my main CSV that contains all the columns I need. The 2nd CSV contains 2 columns, where the 1st column is an identifier, and the 2nd column is replacement value. For example
Main.csv
aaa 111 bbb 222 ccc 333
ddd 444 eee 555 fff 666
iii 777 jjj 888 kkk 999
lll 101 eee 201 nnn 301
replacement.csv
bbb abc
jjj def
eee ghi
I want the results to look like the following, where for example the 3rd column of the main.csv is the identifier and 1st column of the replacement.csv. By using that as an identifier, the 5th column of main.csv should be replaced with 2nd column of replacement.csv. Also, the main.csv can have repeated values, so all the values should be changed to the appropriate replacement value
aaa 111 bbb 222 abc 333
ddd 444 eee 555 ghi 666
iii 777 jjj 888 def 999
lll 101 eee 201 ghi 301
I tried a code like this
while read col1 col2 col3 col4 col5 col6
do
while read col7 col8
do
if[$col7==col3]
then
col5=col8
fi
done < RepCSV
done < MainCSV > MainCSV
But it did not work.
I'm quite new to bash, so the help will be appreciated. Thanks in advance
Using awk:
$ awk '
NR==FNR { # process the first file
a[$1]=$2 # hash $2 to a, $1 as key
next # next record
}
{ # second file
$5=($3 in a?a[$3]:$5) ยค replace $5 based on $3
}1' replacement main
aaa 111 bbb 222 abc 333
ddd 444 eee 555 ghi 666
iii 777 jjj 888 def 999
lll 101 eee 201 ghi 301
I have two 2D-array files to read with bash.
What I want to do is extract the elements inside both files.
These two files contain different rows x columns such as:
file1.txt (nx7)
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
.
.
.
file2.txt (mx3)
DESC W S
AAA 100 100
CCC 135 135
EEE 789 789
.
.
.
Here is what I want to do:
Extract the element in DESC column of file2.txt then find the corresponding element in file1.txt.
Extract the W,S elements in such row of file2.txt then find the corresponding W,S elements in such row of file1.txt.
If [W1==W2 && S1==S2]; then echo "${DESC[colindex]} ok"; else echo "${DESC[colindex]} NG"
How can I read this kind of file as a 2D array with bash or is there any convenient way to do that?
bash does not support 2D arrays. You can simulate them by generating 1D array variables like array1, array2, and so on.
Assuming DESC is a key (i.e. has no duplicate values) and does not contain any spaces:
#!/bin/bash
# read data from file1
idx=0
while read -a data$idx; do
let idx++
done <file1.txt
# process data from file2
while read desc w2 s2; do
for ((i=0; i<idx; i++)); do
v="data$i[1]"
[ "$desc" = "${!v}" ] && {
w1="data$i[4]"
s1="data$i[5]"
if [ "$w2" = "${!w1}" -a "$s2" = "${!s1}" ]; then
echo "$desc ok"
else
echo "$desc NG"
fi
break
}
done
done <file2.txt
For brevity, optimizations such as taking advantage of sort order are left out.
If the files actually contain the header NO DESC ID TYPE ... then use tail -n +2 to discard it before processing.
A more elegant solution is also possible, which avoids reading the entire file in memory. This should only be relevant for really large files though.
If the rows order is not needed be preserved (can be sorted), maybe this is enough:
join -2 2 -o 1.1,1.2,1.3,2.5,2.6 <(tail -n +2 file2.txt|sort) <(tail -n +2 file1.txt|sort) |\
sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\) \2 \3/\1 OK/' |\
sed '/ OK$/!s/\([^ ]*\) .*/\1 NG/'
For file1.txt
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
and file2.txt
DESC W S
AAA 000 100
CCC 135 135
EEE 789 000
FCK xxx 135
produces:
AAA NG
CCC OK
EEE NG
Explanation:
skip the header line in both files - tail +2
sort both files
join the needed columns from both files into one table like, in the result will appears only the lines what has common DESC field
like next:
AAA 000 100 100 100
CCC 135 135 135 135
EEE 789 000 789 789
in the lines, which have the same values in 2-4 and 3-5 columns, substitute every but 1st column with OK
in the remainder lines substitute the columns with NG
How to do natural sort on uniq -c output?
When the counts are <10, the uniq -c | sort output looks fine:
alvas#ubi:~/testdir$ echo -e "aaa\nbbb\naa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt
alvas#ubi:~/testdir$ cat test.txt
aaa
bbb
aa
cd
ada
aaa
bbb
aa
ccd
aa
alvas#ubi:~/testdir$ cat test.txt | sort | uniq -c | sort
1 ada
1 ccd
1 cd
2 aaa
2 bbb
3 aa
but when the counts are > 10 and even in thousands/hundreds the sort messes up because it's sorting by string and not by natural integer sort:
alvas#ubi:~/testdir$ echo -e "aaa\nbbb\naa\nnaa\nnaa\naa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnnaa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt
alvas#ubi:~/testdir$ cat test.txt | sort | uniq -c | sort
10 naa
1 ada
1 ccd
1 cd
1 nnaa
2 aaa
2 bbb
4 aa
How to do natural sort output of "uniq -c" in descending/acsending order?
Use -n in your sort command, so that it sorts numerically. Also -r allows you to reverse the result:
$ sort test.txt | uniq -c | sort -n
1 ada
1 ccd
1 cd
1 nnaa
2 aaa
2 bbb
4 aa
10 naa
$ sort test.txt | uniq -c | sort -nr
10 naa
4 aa
2 bbb
2 aaa
1 nnaa
1 cd
1 ccd
1 ada
From man sort:
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons