bash: using 2 variables from same file and sed - bash

I have a 2 files:
file1.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T 45000079
rs111285978:45000103:A:AT 45000103
rs190363568:45000168:C:T 45000168
file2.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T rs142159069
rs111285978:45000103:A:AT rs111285978
rs190363568:45000168:C:T rs190363568
Using file2.txt, I want to replace the names (column2 of file1.txt which is column1 of file2.txt) by the entry in column 2. The output file would then be:
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168
I have tried inputing the columns of file2.txt but without success:
while read -r a b
do
cat file1.txt | sed s'/$a/$b/'
done < file2.txt
I am quite new to bash. Also, not sure how to write an output file with my command. Any help would be deeply appreciated.

In your case, using awk or perl would be easier, if you are willing to accept an answer without sed:
awk '(NR==FNR){out[$1]=$2;next}{out[$1]=out[$1]" "$2}END{for (i in out){print out[i]} }' file2.txt file1.txt > output.txt
output.txt :
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168
Note: this assume all symbols in column1 are unique, and that they are all present in both files
explanation:
(NR==FNR){out[$1]=$2;next} : while you are parsing the first file, create a map with the name from the first column as key
{out[$1]=out[$1]" "$2} : append the value from the second column
END{for (i in out){print out[i]} } : print all the values in the map

Apparently $2 of file2 is part of $1 of file1, so you could use awk and redefine FS:
$ awk -F"[: ]" '{print $1,$NF}' file1
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

Related

Extracting unique values between 2 files with awk

I need to get uniq lines when comparing 2 files. These files containing field separator ":" which should be treated as the end of line while comparing strings.
The file1 contains these lines
apple:tasty
apple:red
orange:nice
kiwi:awesome
kiwi:expensive
banana:big
grape:green
orange:oval
banana:long
The file2 contains these lines
orange:nice
banana:long
The output file should be (2 occurrences of orange and 2 occurrences of banana deleted)
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
So the only strings before : should be compared
Is it possible to complete this task in 1 command ?
I tried to complete the task in such way but field separator does not work in that situation.
awk -F: 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2 > outputfile
You basically had it, but $0 refers to the whole line when you want to deal with only the first field, which is $1.
Also you need to take care with the order of the input files. To use the values from file2 for deciding which lines to include from file1, process file2 first:
$ awk -F: 'FNR==NR {a[$1]++; next} !a[$1]' file2 file1
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
One comment: awk is very ineffective with arrays. In real life with big files, better use something like:
comm -3 <(cut -d : -f 1 f1 | sort -u) <(cut -d : -f 1 f2 | sort -u) | grep -h -f /dev/stdin f1 f2

bash - replace a value in first file looking at other file referring it as line number

I need to replace the first value in file1.txt with a value from second file file2.txt considering them as line numbers
For ex:
file1.txt
3|1|D|A
3|2|2018-09-11 11:25:13.000000857|2018-09-11 11:26:03.000000459
file2.txt
12~299673112~S
12~299673232~S
13~299673233~W
13~299673222~W
Output
13~299673233~W|1|D|A
13~299673233~W|2|2018-09-11 11:25:13.000000857|2018-09-11 11:26:03.000000459
Thanks in advance
You may use this awk:
awk 'BEGIN{FS=OFS="|"} NR==FNR{a[FNR]=$0; next} $1 in a{$1=a[$1]; print}' file2 file1
13~299673233~W|1|D|A
13~299673233~W|2|2018-09-11 11:25:13.000000857|2018-09-11 11:26:03.000000459

ksh shell script to print and delete matched line based on a string

I have 2 files like below. I need a script to find string from file2 in file1 and delete the line which contains the string from file1 and put it in another file (output1.txt). Also it shld print the lines deleted and the string if the string doesn't exist in File1 (Ouput2.txt).
File1:
Apple
Boy: Goes to school
Cat
File2:
Boy
Dog
I need output like below.
Output1.txt:
Apple
Cat
Output2.txt:
Dog
Can anyone help please
If you have awk available on your system:
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File2 File1 > Output1.txt
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File1 File2 > Output2.txt
The script is storing in an array a the first element $1 of the first file given in argument.
If the first parameter of the second file is not part of the array, print it.
Note that the delimiter is either a space or a :

interactive shell or bash script to manipulate a text file

I have a text file that contains 2 columns( example below )
Account_name Device_name
12345 1a3T567890f2
Values of the Device_name column then needs to be changed to:
Uppercase letters if letters exist (example 1A3T567890F2)
awk '{ print toupper($0) }' file.txt > file2.txt
The Colon symbol needs to be inserted to separate the value in to 2 char
chunks (example 1A:3T:56:78:90:F2)
sed 's/\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)/\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
I would like to create a script that does those two functions at once.
You can just add \U at the start of your sed's replace expression to switch the following to uppercase :
sed 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
Test run :
$ echo "1a3T567890f2" | sed -r 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g'
1A:3T:56:78:90:F2
You can do everything in awk:
awk '{$2=toupper($2);gsub(/[[:alnum:]]{2}/,"&:", $2);sub(/:[[:space:]]*$/,"",$2)}1' file
That's a bit more intuitive and it works for various amount of digits.

join two file based on column when there is no one by one corespondness in bash script (awk, grep , sed)

file1.txt
112|9305|/inst.exe
112|9305|/lkj.exe
112|9305|/dje.jar
112|9305|/ind.pdf
112|9306|/ma.exe
112|9306|/ngg.pdf
112|9307|/jhhh.dat
112|9312|/ee.dat
112|9312|/qwq.dll
file2.txt
117|9305|www.gahan.com
117|9306|www.google.com
117|9312|www.mihan.com
117|9307|translate.com
expected output
112|9305|www.gahan.com/inst.exe
112|9305|www.gahan.com/lkj.exe
112|9305|www.gahan.com/dje.jar
112|9305|www.gahan.com/ind.pdf
112|9306|www.google.com/ma.exe
112|9306|www.google.com/ngg.pdf
112|9307|translate.com/jhhh.dat
112|9312|www.mihan.com/ee.dat
112|9312|www.mihan.com/qwq.dll
I want to add third column of file2.txt to third column of file1.txt based on second column values. In fact I want join them based on second column but there is no one bye one correspondence between them. How can I do these with awk or grep or sed in shell script.
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} FNR==NR{a[$2]=$3; next} $2 in a{$3=a[$2] $3} 1' file2.txt file1.txt
112|9305|www.gahan.com/inst.exe
112|9305|www.gahan.com/lkj.exe
112|9305|www.gahan.com/dje.jar
112|9305|www.gahan.com/ind.pdf
112|9306|www.google.com/ma.exe
112|9306|www.google.com/ngg.pdf
112|9307|translate.com/jhhh.dat
112|9312|www.mihan.com/ee.dat
112|9312|www.mihan.com/qwq.dll

Resources