Matching contents of one file with another and returning second column - bash

So I have two txt files
file1.txt
s
j
z
z
e
and file2.txt
s h
f a
j e
k m
z l
d p
e o
and what I want to do is match the first letter of file1 with the first letter of file 2 and return the second column of file 2. so for example excepted output would be
h
e
l
l
o
I'm trying to use join file1.txt file2.txt but that just prints out the entire second file. not sure how to fix this. Thank you.

This is an awk classic:
$ awk 'NR==FNR{a[$1]=$2;next}{print a[$1]}' file2 file1
h
e
l
l
o
Explained:
$ awk '
NR==FNR { # processing file2
a[$1]=$2 # hash records, first field as key, second is the value
next
} { # second file
print a[$1] # output, change the record with related, stored one
}' file2 file1

Related

Add a specific string at the end of each line

I have a mainfile with 4 columns, such as:
a b c d
e f g h
i j k l
in another file, i have one line of text corresponding to the respective line in the mainfile, which i want to add as a new column to the mainfile, such as:
a b c d x
e f g h y
i j k l z
Is this possible in bash? I can only add the same string to the end of each line.
Two ways you can do
1) paste file1 file2
2) Iterate over both files and combine line by line and write to new file
You could use GNU parallel for that:
fe-laptop-m:test fe$ cat first
a b c d
e f g h
i j k l
fe-laptop-m:test fe$ cat second
x
y
z
fe-laptop-m:test fe$ parallel echo ::::+ first second
a b c d x
e f g h y
i j k l z
Do I get you right what you try to achieve?
This might work for you (GNU sed):
sed -E 's#(^.*) .*#/^\1/s/$/ &/#' file2 | sed -f - file1
Create a sed script from file2 that uses a regexp to match a line in file1 and if it does appends the contents of that line in file2 to the matched line.
N.B.This is independent of the order and length of file1.
You can try using pr
pr -mts' ' file1 file2

How to repeat lines in bash and paste with different columns?

is there a short way in bash to repeat the first line of a file as often as needed to paste it with another file in a kronecker product type (for the mathematicians of you)?
What I mean is, I have a file A:
a
b
c
and a file B:
x
y
z
and I want to merge them as follows:
a x
a y
a z
b x
b y
b z
c x
c y
c z
I could probably write a script, read the files line by line and loop over them, but I am wondering if there a short one-line command that could do the same job. I can't think of one and as you can see, I am also lacking some keywords to search for. :-D
Thanks in advance.
You can use this one-liner awk command:
awk 'FNR==NR{a[++n]=$0; next} {for(i=1; i<=n; i++) print $0, a[i]}' file2 file1
a x
a y
a z
b x
b y
b z
c x
c y
c z
Breakup:
NR == FNR { # While processing the first file in the list
a[++n]=$0 # store the row in array 'a' by the an incrementing index
next # move to next record
}
{ # while processing the second file
for(i=1; i<=n; i++) # iterate over the array a
print $0, a[i] # print current row and array element
}
alternative to awk
join <(sed 's/^/_\t/' file1) <(sed 's/^/_\t/' file2) | cut -d' ' -f2-
add a fake key for join to have all records of file1 to match all records of file2, trim afterwards

awk to print all columns from the nth to the last with spaces

I have the following input file:
a 1 o p
b 2 o p p
c 3 o p p p
in the last line there is a double space between the last p's,
and columns have different spacing
I have used the solution from: Using awk to print all columns from the nth to the last.
awk '{for(i=2;i<=NF;i++){printf "%s ", $i}; printf "\n"}'
and it works fine, untill it reaches double-space in the last column and removes one space.
How can I avoid that while still using awk?
Since you want to preserve spaces, let's just use cut:
$ cut -d' ' -f2- file
1 o p
2 o p p
3 o p p p
Or for example to start by column 4:
$ cut -d' ' -f4- file
p
p p
p p p
This will work as long as the columns you are removing are one-space separated.
If the columns you are removing also contain different amount of spaces, you can use the beautiful solution by Ed Morton in Print all but the first three columns:
awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1'
^
number of cols to remove
Test
$ cat a
a 1 o p
b 2 o p p
c 3 o p p p
$ awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' a
o p
o p p
o p p p
GNU sed
remove first n fields
sed -r 's/([^ ]+ +){2}//' file
GNU awk 4.0+
awk '{sub("([^"FS"]"FS"){2}","")}1' file
GNU awk <4.0
awk --re-interval '{sub("([^"FS"]"FS"){2}","")}1' file
Incase FS one doesn't work(Eds suggestion)
awk '{sub(/([^ ] ){2}/,"")}1' file
Replace 2 with number of fields you wish to remove
EDIT
Another way(doesn't require re-interval)
awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' file
Further edit
As advised by EdMorton it is bad to use fields in sub as they may contain metacharacters so here is an alternative(again!)
awk '{for(i=0;i<2;i++)sub(/[^[:space:]]+[[:space:]]*/,"")}1' file
Output
o p
o p p
o p p p
In Perl, you can use split with capturing to keep the delimiters:
perl -ne '#f = split /( +)/; print #f[ 1 * 2 .. $#f ]'
# ^
# |
# column number goes
# here (starting from 0)
If you want to preserve all spaces after the start of the second column, this will do the trick:
{
match($0, ($1 "[ \\t*]+"))
print substr($0, RSTART+RLENGTH)
}
The call to match locates the start of the first 'token' on the line and the length of the first token and the whitespace that follows it. Then you just print everything on the line after that.
You could generalize it somewhat to ignore the first N tokens this way:
BEGIN {
N = 2
}
{
r = ""
for (i=1; i<=N; i++) {
r = (r $i "[ \\t*]+")
}
match($0, r)
print substr($0, RSTART+RLENGTH)
}
Applying the above script to your example input yields:
o p
o p p
o p p p

replace a particular row and column value of one file with another

I have a file containing
a b c d
g h i j
d e f f
and a another file containing
1 2 3 4
5 6 7 8
9 1 0 1
I know that I can extract a particular row and column using
awk 'FNR == 2 {print $3}' fit_detail.txt
But, I need to replace 2nd column and 3rd row of first file with the 2nd row and 3rd column of second file. How I could do this and saves it into another file.
Finally, my output should look like
a b c d
g h i j
d 1 f f
$ awk 'NR==FNR && NR==3 {a=$2} NR==FNR {next} FNR==3 {$2=a} {print}' file2 file1
a b c d
g h i j
d 1 f f
Explanation:
NR==FNR && NR==3 {a=$2}
In awk, NR is the number of records (lines) that have been read in total and FNR is the number of records (lines) that have been read in from the current file. So, when NR==FNR, then we know that we are working on the first file named on the command line. For that file, we select only the third row (NR==3) and save the value of its second column in the variable a.
NR==FNR {next}
If we are processing the first named file on the command line, skip to next line.
FNR==3 {$2=a}
Because of the preceding next statement, it is only possible to get to this command if we are now working on the second named file. For this file, if we are on the third row, change the 2nd column to the value a.
{print}
All lines from the second named file are printed.
Controlling the output format
By default, awk separates output fields with a space. If another output field separator, such as a tab, is desired, it can be specified as follows:
$ awk -v OFS="\t" 'NR==FNR && NR==3 {a=$2} NR==FNR {next} {$2=$2} FNR==3 {$2=a} {print}' file2 file1
a b c d
g h i j
d 1 f f
To accomplish this, we made two changes:
The output field separator (OFS) was specified as a tab with the -v option: -v OFS="\t"
When using a simple print statement, such as {print}, awk will normally apply the new output field separator only if the line had been changed in some way. That is accomplished here with the statement $2=$2. This assigns the second field to itself. Even though this leaves the second field unchanged, it is enough to trigger awk` to replace the old field separators with new ones on output.

Merging two outputs in shell script

I have output of 2 commands like:
op of first cmd:
A B
C D
E F
G H
op of second cmd:
I J
K L
M B
i want to merge both the outputs , and if a value in second column is same for both outputs, I'll take entry set from 1st output..
So , my output should be
A B
C D
E F
G H
I J
K L
//not taking (M B) sice B is already there in first entry(A B) , so giving preference to first output
can i do this using shell script , is there any command?
You can use awk:
awk 'FNR==NR{a[$2];print;next} !($2 in a)' file1 file2
A B
C D
E F
G H
I J
K L
If the order of entries is not important, you can sort on the 2nd column and uniquefy:
sort -u -k2 file1 file2
Both -u and -k are specified in the POSIX standard
This wouldn't work if there are repeated entries in the 2nd column of file1.

Resources