How to create a loop using values from a different file? [duplicate] - bash

This question already has an answer here:
Find patterns of a file in another file and print out a corresponding field of the latter maintaining the order
(1 answer)
Closed 5 years ago.
I have a file containing following numbers
file1.txt
1
5
6
8
14
I have another file named rmsd.txt which contains values like the following
1 2.12
2 3.1243
3 4.156
4 3.22
5 3.882
6 8.638
7 8.838
8 7.5373
9 10.7373
10 8.3527
11 3.822
12 5.672
13 7.23
14 5.9292
I want to get the values of column 2 from rmsd.txt for the numbers present in file.txt. I want to get something like the following
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
I can do that by do like that grep 1 rmsd.txt and so on but it will take a long time. I was trying a for loop something like
for a in awk '{print $1}' file.txt; do
grep $a rmsd.txt >result.txt
done
But it didn't work. Maybe it is very simple and I am thinking in a wrong direction. Any help will be highly appreciated.

for WORD in `cat FILE`
do
echo $WORD
command $WORD > $WORD
done
Origina source
EDIT: Here is your code with few fixes:
for a in `awk '{print $1}' file.txt`
do
grep $a rmsd.txt >>result.txt
done

This is tailor made job for awk:
awk 'NR==FNR{a[$1]; next} $1 in a' file1.txt msd.txt
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
Most likely it a duplicate, as soon as I find a good dup, I will mark it so.

awk to the rescue!
$ awk 'NR==FNR{a[$1];next} $1 in a' file1 file2
1 2.12
5 3.882
6 8.638
8 7.5373
14 5.9292
or with sort/join
$ join <(sort file1) <(sort file2) | sort -n
or grep/sed
$ grep -f <(sed 's/.*/^&\\b/' file1) file2

Related

is it possible to get the content of file1 minus file2 by using bash cmd?

I have two files:
log.txt
log.bak2022.06.20.10.00.txt
the log.bak2022.06.20.10.00.txt is the backup of log.txt at 2022.06.20 10:00.
but the log.txt is a content-increasing file.
now I have a requirement, I want get the content of log.txt minus log.bak2022.06.20.10.00.txt, then write into a new file.
is it possible to implement it?
Assumptions:
the small file contains N lines, and these N lines are an exact match for the 1st N lines in the big file
Sample inputs:
$ cat small
4
2
1
3
$ cat big
4
2
1
3
8
10
9
4
One comm idea:
$ comm --nocheck-order -13 small big
8
10
9
4
One awk idea:
$ awk '
FNR==NR { max=FNR; next }
FNR>max
' small big
8
10
9
4
One wc/sed idea:
$ max=$(wc -l < small)
$ ((max++))
$ sed -n "$max,$ p" big
8
10
9
4
awk-based solution without need for unix piping | chains, regex, function calling, or array splitting :
{m,n,g}awk '(_+= NR==FNR ) < FNR' FS='^$' small.txt big.txt
8
10
9
4

Optimizing grep -f piping commands [duplicate]

This question already has answers here:
Inner join on two text files
(5 answers)
Closed 4 years ago.
I have two files.
file1 has some keys that start have abc in the second column
et1 abc
et2 abc
et55 abc
file2 has the column 1 values and some other numbers I need to add up:
1 2 3 4 5 et1
5 5 5 5 5 et100
3 3 3 3 3 et55
5 5 5 5 4 et1
6 6 6 6 3 et1
For the keys extracted in file1, I need to add up the corresponding column 5 if it matches. File2 itself is very large
This command seems to be working but it is very slow:
egrep -isr "abc" file1.tcl | awk '{print $1}' | grep -vwf /dev/stdin file2.tcl | awk '{tl+=$5} END {print tl}'
How would I go about optimizing the pipe. Also what am I doing wrong with grep -f. Is it generally not recommended to do something like this.
Edit: Expected output is the sum of all column5 in file2 when the column6 key is present in file1
Edit2:Expected output: Since file 1 has keys "et1, et2 and et55", in file2 adding up the column 5 with matching keys in rows 1,3,4 and 5, the expected output is [5+3+4+3=15]
Use a single awk to read file1 into the keys of an array. Then when reading file2, add $5 to a total variable when $6 is in the array.
awk 'NR==FNR {if ($2 == "abc") a[$1] = 0;
next}
$6 in a {total += $5}
END { print total }
' file1.tcl file2.tcl
Could you please try following, with reading first Input_file2.tcl and with less loops. Since your expected output is not clear so haven't completely tested it.
awk 'FNR==NR{a[$NF]+=$(NF-1);next} $2=="abc"{print $1,a[$1]+0}' file2.tcl file1.tcl

Variable in commands in bash

I wrote program that should write words from example.txt from the longest to the shortest. I don't know how exactly '^.{$v}$' should look like to make it work?
#!/bin/bash
v=30
while [ $v -gt 0 ] ; do
grep -P '^.{$v}$' example.txt
v=$(($v - 1))
done
I tried:
${v}
$v
"$v"
It is my first question, sorry for any mistake :)
What you're doing is not how you'd approach this problem in shell. Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the issues and then this is how you'd really do what you're trying to do in a shell script:
$ cat file
now
is
the
winter
of
our
discontent
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n | cut -f3-
discontent
winter
now
the
our
is
of
To understand what that's doing, look at the awk output:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file
3 1 now
2 2 is
3 3 the
6 4 winter
2 5 of
3 6 our
10 7 discontent
The first number is the length of each line and the second number is the order the lines appeared in the input file so when we come to sort it:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n
10 7 discontent
6 4 winter
3 1 now
3 3 the
3 6 our
2 2 is
2 5 of
we can sort by length (longest first) with -k1rn but retain the order from the input file for lines that are the same length by adding -k2n. Then the cut just removes the 2 leading numbers that awk added for sort to use.
use :
grep -P "^.{$v}$" example.txt

How can I use awk to sort columns by the last value of a column?

I have a file like this (with hundreds of lines and columns)
1 2 3
4 5 6
7 88 9
and I would like to re-order columns basing on the last line values (or a specific line values)
1 3 2
4 6 5
7 9 88
How can I use awk (or other) to accomplish this task?
Thank you in advance for your help
EDIT: I would like to thank everybody and to apologize if I wasn't enough clear.
What I would like to do is:
take a line (for example the last one);
reorder the columns of the matrix using the sorted values of the chosen line to derermine the order.
So, the last line is 7 88 9, which sorted is 7 9 88, then the three columns have to be reordered in a way such that, in this case, the last two columns are swapped.
A four-column more generic example, based on the last line again:
Input:
1 2 3 4
4 5 6 7
7 88.0 9 -3
Output:
4 1 3 2
7 4 6 5
-3 7 9 88.0
Here's a quick, dirty and improvable solution: (edited because OP clarified that numbers are floating point).
$ cat test.dat
1 2 3
4 5 6
.07 .88 -.09
$ awk "{print $(printf '$%d%.0s\n' \
$(i=0; for x in $(tail -n1 test.dat); do
echo $((++i)) $x
done |
sort -k2g) | paste -sd,)}" test.dat
3 1 2
6 4 5
-.09 .07 .88
To see what's going on there (or at least part of it):
$ echo "{print $(printf '$%d%.0s\n' \
$(i=0; for x in $(tail -n1 test.dat); do
echo $((++i)) $x
done |
sort -k2g) | paste -sd,)}" test.dat
{print $3,$1,$2} test.dat
To make it work for an arbitrary line, replace tail -n1 with tail -n+$L|head -n1
This problem can be elegantly solved using GNU awk's array sorting feature. GNU awk allows you to control array traversal using PROCINFO. So two passes of the file are required, the first pass to split the last record into an array and the second pass to loop through the indices of the array in value order and output fields based on indices. The code below probably explains it better than I do.
awk 'BEGIN{PROCINFO["sorted_in"] = "#val_num_asc"};
NR == FNR {for (x in arr) delete arr[x]; split($0, arr)};
NR != FNR{sep=""; for (x in arr) {printf sep""$x; sep=" "} print ""}' file.txt file.txt
4 1 3 2
7 4 6 5
-3 7 9 88.0
Update:
Create a file called transpose.awk like this:
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str OFS a[i,j];
}
print str
}
}
Now here is the script that should do work for you:
awk -f transpose.awk file | sort -n -k $(awk 'NR==1{print NF}' file) | awk -f transpose.awk
1 3 2
4 6 5
7 9 88
I am using transpose.awk twice here. Once to transpose rows to columns then I am doing numeric sorting by last column and then again I am transposing rows to columns. It may not be most efficient solution but it is something that works as per the OP's requirements.
transposing awk script courtesy of: #ghostdog74 from An efficient way to transpose a file in Bash

Computing differences between columns of tab delimited file

I have a tab delimited file of 4 columns and n number of rows.
I want to find the difference in values present in column 3 and 2 and want to store them in another file.
This is what I am doing
cat filename | awk '{print $3 - $2}'>difference
and it is not working. How can I improve the code?
Solution:
I was missing the closing single quotation, and my eyes were so tuned to the screen that I couldn't figure it out in 35 lines code what was going wrong...and out of frustration I wrote the question on forum ... and [to complete] the comedy of errors, the syntax I wrote here [in the] question is correct (as it contains both single quotes).
Thank you all for your help.
Set the field separator if you have other whitespace in the lines.
BEGIN {
FS="\t"
}
Try using -F to force the delimiter as tab and enclose your
cat filename | awk -F"\t" '{print $3 - $2}' > difference
Does anyone test before they give their answers/ awk breaks on white space and not just spaces.
I just did this:
awk '{print $3 - $2}' temp.txt
And it works perfectly.
Here's my file:
1 2 7 4
11 12 13 14
1 12 3 4
1 2 3 4
1 2 3 4
And here's my results:
$ awk '{print $3 - $2}' temp.txt
5
1
-9
1
1
$
In fact, I used your command, and got the same results?
Can you explain what's not working for you? What data are you using, and what results are you getting?
Try this:
cat filename | awk -F '^T' '{print $3 - $4}' > difference
where ^T is tab delimiter (get it by pressing Ctrl+V+T)

Resources