awk/sed: How to perform nested replace operation with two files? - bash

I'm a complete newbie to bash operations. I have two files, lets call them file A and file B.
In file A I have a line like this:
STRING_TO_BE_SEARCHED = "SOME_STRING_IN_FILE_A"
In file B I also have a similar line where just the string differs, like this:
STRING_TO_BE_SEARCHED = "SOME_STRING_IN_FILE_B"
What I need to do is to find the lines that start with STRING_TO_BE_SEARCHED in both files and replace the corresponding line in file B with the corresponding line in A.
How can I achieve this? Is it possible to do this in a single command?

Using awk you can do this. This will scan fileA first by scanning first file for given search string and then using it to replace value in fileB.
awk -v s='STRING_TO_BE_SEARCHED' 'BEGIN{ FS=OFS=" = " } FNR == NR && $1 == s {
a[$1] = $2; next } $1 in a { $2 = a[$1] } 1' fileA fileB
To save changes into fileB use:
awk -v s='STRING_TO_BE_SEARCHED' 'BEGIN{ FS=OFS=" = " } FNR == NR && $1 == s {
a[$1] = $2; next } $1 in a { $2 = a[$1] } 1' fileA fileB > $$.tmp && mv $$.tmp fileB

if you don't have any special chars, perhaps two step sed is easier
key='STRING_TO_BE_SEARCHED *= *'; \
val=$(sed 's/'"$key"'//' fileA); \
sed -r 's/('"$key"').*/\1'"$val"'/' fileB
STRING_TO_BE_SEARCHED = "SOME_STRING_IN_FILE_A"
you can make second sed replacement in place by adding -i option.

1- Store the line in variable lineA
lineA=$(perl -ne 'if(/^\QSTRING_TO_BE_SEARCHED\E/){print;exit}' fileA)
2- Replace line in fileB, old file saved as .BAK
perl -i.BAK -pe 'BEGIN{$line=shift#ARGV}if(/^\QSTRING_TO_BE_SEARCHED\E/){$_="$line\n"}' "$lineA" fileB
or create new file: fileB.new without changing fileB
perl -pe 'BEGIN{$line=shift#ARGV}if(/^\QSTRING_TO_BE_SEARCHED\E/){$_="$line\n"}' "$lineA" fileB > fileB.new

Related

awk output to file based on filter

I have a big CSV file that I need to cut into different pieces based on the value in one of the columns. My input file dataset.csv is something like this:
NOTE: edited to clarify that data is ,data, no spaces.
action,action_type, Result
up,1,stringA
down,1,strinB
left,2,stringC
So, to split by action_type I simply do (I need the whole matching line in the resulting file):
awk -F, '$2 ~ /^1$/ {print}' dataset.csv >> 1_dataset.csv
awk -F, '$2 ~ /^2$/ {print}' dataset.csv >> 2_dataset.csv
This works as expected but I am basicaly travesing my original dataset twice. My original dataset is about 5GB and I have 30 action_type categories. I need to do this everyday, so, I need to script the thing to run on its own efficiently.
I tried the following but it does not work:
# This is a file called myFilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Then I run it as:
awk -f myFilter.awk dataset.csv
But I get nothing. Literally nothing, no even errors. Which sort of tell me that my code is simply not matching anything or my print / pipe statement is wrong.
You may try this awk to do this in a single command:
awk -F, 'NR > 1{fn = $2 "_dataset.csv"; print >> fn; close(fn)}' file
With GNU awk to handle many concurrently open files and without replicating the header line in each output file:
awk -F',' '{print > ($2 "_dataset.csv")}' dataset.csv
or if you also want the header line to show up in each output file then with GNU awk:
awk -F',' '
NR==1 { hdr = $0; next }
!seen[$2]++ { print hdr > ($2 "_dataset.csv") }
{ print > ($2 "_dataset.csv") }
' dataset.csv
or the same with any awk:
awk -F',' '
NR==1 { hdr = $0; next }
{ out = $2 "_dataset.csv" }
!seen[$2]++ { print hdr > out }
{ print >> out; close(out) }
' dataset.csv
As currently coded the input field separator has not been defined.
Current:
$ cat myfilter.awk
{
action_type=$2;
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
Invocation:
$ awk -f myfilter.awk dataset.csv
There are a couple ways to address this:
$ awk -v FS="," -f myfilter.awk dataset.csv
or
$ cat myfilter.awk
BEGIN {FS=","}
{
action_type=$2
if (action_type=="1") print $0 >> 1_dataset.csv;
else if (action_type=="2") print $0 >> 2_dataset.csv;
}
$ awk -f myfilter.awk dataset.csv

Edit multiple columns in a line using awk command?

I'm trying to edit 3 columns in a file if the value in column 1 equals a specific string. This is my current attempt:
cp file file.copy
awk -F':' 'OFS=":" { if ($1 == "root1") $2="test"; print}' file.copy>file
rm file.copy
I've only been able to get the awk command working with one column being changed, I want to be able to edit $3 and $8 as well. Is this possible in the same command? Or is it only possible with separate awk commands or with a different command all together?
Edit note: The real command i'll be passing variables to the columns, i.e. $2=$var
It'll be used to edit the /etc/passwd file, sample input/output:
root:$6$fR7Vrjyp$irnF38R/htMSuk0efLSnAten/epf.5v7gfs0q.NcjKcFPeJmB/4TnnmgaAoTUE9.n4p4UyWOgFwB1guJau8AL.:17976::::::
You can create multiple statements for the if condition with a block {}.
awk -F':' 'OFS=":" { if ($1 == "root1") {$2="test"; $3="test2";} print}' file.copy>file
You can also improve your command by using awk's default "workflow": condition{commands}. For this you need to bring the OFS to the input variables (-v flag)
awk -F':' -v OFS=":" '$1=="root1"{$2="test"; $3="test2"; print}' file.copy>file
You may use
# Fake sample values
v1=pass1
v2=pass2
awk -v var1="$v1" -v var2="$v2" 'BEGIN{FS=OFS=":"} $1 == "root1" { $2 = var1; $3 = var2}1' file > tmp && mv tmp file
See the online awk demo:
s="root1:xxxx:yyyy
root11:xxxx:yyyy
root1:zzzz:cccc"
v1=pass1
v2=pass2
awk -v var1="$v1" -v var2="$v2" 'BEGIN{FS=OFS=":"} $1 == "root1" { $2 = var1; $3 = var2}1' <<< "$s"
Output:
root1:pass1:pass2
root11:xxxx:yyyy
root1:pass1:pass2
Note:
-v var1="$v1" -v var2="$v2" pass the variables you need to use in the awk command
BEGIN{FS=OFS=":"} set the field separator
$1 == "root1" check if Field 1 is equal to some value
{ $2 = var1; $3 = var2 } set Field 2 and 3 values
1 calls the default print command
file > tmp && mv tmp file helps you "shrink" the "replace-inplace-like" code.

Copy files containing all lines of an input file

I want to copy files in a directory which contain all the lines of an inputFile. Here is an example:
inputFile
Line3
Line1
LineX
Line4
LineB
file1
Line1
Line2
LineX
LineB
file2
Line100
Line10
LineB
Line4
LineX
Line3
Line1
Line4
Line1
The script is expected to copy only file2 to a destination directory since all lines of the inputFile are found in file2 but not in file1.
I could compare individual file with inputFile as discussed partly here and copy files manually if script produced no output. That is;
awk 'NR==FNR{a[$0];next}!($0 in a)' file1 inputFile
Line3
Line4
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 inputFile
warranting no need to copy file1; however, replacing file2 will produce no result indicating all lines of inputFile are found in file2; so do a cp file2 ../distDir/.
This will be time taking and hope there will be some way I could do it in a for loop. I am not particular about awk, any bash scripting tool can be used.
Thank you,
Assuming the following:
All the files you need to check are in the current directory
The base file is also in the current directory and named inputFile
The target path is ../distDir/
You may run a BASH script like the following which basically loops over all the files, compares them against the base file and copies them if required.
#!/bin/bash
inputFile="./inputFile"
targetDir="../distDir/"
for file in *; do
dif=$(awk 'NR==FNR{a[$0];next}!($0 in a)' $file $inputFile)
if [ "$dif" == "" ]; then
# File contains all lines, copy
cp $file $targetDir
fi
done
bash (with comm + wc commands) solution:
#!/bin/bash
n=$(wc -l inputFile | cut -d' ' -f1) # number of lines of inputFile
for f in /yourdir/file*
do
if [[ $n == $(comm -12 <(sort inputFile) <(sort "$f") | wc -l | cut -d' ' -f1) ]]
then
cp "$f" "/dest/${f##*/}"
fi
done
comm -12 FILE1 FILE2 - output only lines that appear in both files
Could you please try following and let me know if this helps you.
I have written "echo cp " val " destination_path" in system, so you could remove echo from it and put destination_path's actual value too once you are happy with echo result(which will simply print eg--> cp file2 destination_path)
awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file file1 file2
Will add explanation shortly too.
EDIT1: As per OP file named which should be compared by Input_file could be anything so changed code as per that request.
find -type f -exec awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file {} +
Explanation: Adding explanation too as follows.
find -type f -iname "file*" -exec awk 'function check(array,val,count){ ##Using find command to get only the files in a directory, using exec passing their values to awk too.From here awk code starts, creating a function named check here, which will have parameters array,val and count to be passed into it, whenever a call is being made to it.
if(length(array)==count){ ##Checking here if length of array is equal to variable count, if yes then do following action.
system("echo cp " val " destination_path")##Using awks system function here by which we could execute shell commands in awk script, so I have written here echo to only check purposes initially, it will print copy command if any files al lines are matching to Input_file file, if OP is happy with it OP should remove echo then.
}
}
FNR==NR{ ##FNR==NR condition will be only TRUE when very first file named Input_file is being read.
a[$0]; ##creating an array named a whose index is current line.
next ##using next keyword will skip all further statements.
}
val!=FILENAME{ ##checking here when variable val is not having same value as current file name then perform following actions.
check(a,val,count) ##calling check function with passing arguments of array a,val,count.
}
FNR==1{ ##Checking if FNR==1, which will be true whenever a new files first line is being read.
val=FILENAME; ##creating variable named val whose value is current Input_file filename.
count=total=""; ##Nullifying variables count and total now.
delete b ##Deleting array b here.
}
($1 in a) && !b[$1]++{ ##Checking if first field of file is in array a and it is not present more than 1 time in array b then do following
count++ ##incrementing variable named count value to 1 each time cursor comes inside here.
}
END{ ##starting awk END block here.
check(a,val,count) ##Calling function named check with arguments array a,val and count in it.
}
' Input_file {} + ##Mentioning Input_file here
PS: I tested/written this in GNU awk.

AWK: Compare two CSV files

I have two CSV files and I want to compare them using AWK and generate a new file.
file1.csv:
"no","loc"
"abc121","C:/pro/in"
"abc122","C:/pro/abc"
"abc123","C:/pro/xyz"
"abc124","C:/pro/in"
file2.csv:
"no","loc"
"abc121","C:/pro/in"
"abc122","C:/pro/abc"
"abc125","C:/pro/xyz"
"abc126","C:/pro/in"
output.csv:
"file1","file2","Diff"
"abc121","abc121","Match"
"abc122","abc122","Match"
"abc123","","Unmatch"
"abc124","","Unmatch"
"","abc125","Unmatch"
"","abc126","Unmatch"
One way with awk:
script.awk:
BEGIN {
FS = ","
}
NR>1 && NR==FNR {
a[$1] = $2
next
}
FNR>1 {
print ($1 in a) ? $1 FS $1 FS "Match" : "\"\"" FS $1 FS "Unmatch"
delete a[$1]
}
END {
for (x in a) {
print x FS "\"\"" FS "Unmatch"
}
}
Output:
$ awk -f script.awk file1.csv file2.csv
"abc121","abc121",Match
"abc122","abc122",Match
"","abc125",Unmatch
"","abc126",Unmatch
"abc124","",Unmatch
"abc123","",Unmatch
I didn't use awk alone, but if I understood the gist of what you're asking correctly, I think this long one-liner should do it...
join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 file1.csv file2.csv | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e '1d' -e 's/^,/"",/' -e 's/,$/,"" /' -e 's/,,/,"",/g'
Description:
The join portion takes the two CSV files, joins them on the first column (default behavior of join) and outputs all four fields (-o 1.1 2.1 1.2 2.2), making sure to include rows that are unmatched for both files (-a 1 -a 2).
The awk portion takes that output and replaces combination of the 3rd and 4th columns to either "Match" or "Unmatch" based on if they do in fact match or not. I had to make an assumption on this behavior based on your example.
The sed portion deletes the "no","loc" header from the output (-e '1d') and replaces empty fields with open-close quote marks (-e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g'). This last part might not be necessary for you.
EDIT:
As tripleee points out, the above fails if the two initial files are unsorted. Here's an updated command to fix that. It punts the header line and sorts each file before passing them to join...
join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 <( sed 1d file1.csv | sort ) <( sed 1d file2.csv | sort ) | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g'

How to replace all but last matching in a file using bash?

Assuming using bash, having a configuration file like:
param-a=aaaaaa
param-b=bbbbbb
param-foo=first occurence <-- Replace
param-c=cccccc
# param-foo=first commented foo <-- Commented: don't replace
param-d=dddddd
param-e=eeeeee
param-foo=second occurence <-- Rreplace
param-foo=third occurence <-- Last active: don't replace
param-x=xxxxxx1
param-f=ffffff
# param-foo=second commented foo <-- Commented: don't replace
param-x=xxxxxx2
In which you can find multiple commented or uncommented lines of the param-foo,
how can you comment all the uncommented param-foos except the very last active one,
resulting in:
param-a=aaaaaa
param-b=bbbbbb
# param-foo=first occurence <-- Replaced
param-c=cccccc
# param-foo=commented foo <-- Left
param-d=dddddd
param-e=eeeeee
# param-foo=second occurence <-- Replaced
param-foo=third occurence <-- Left
param-x=xxxxxx1
param-f=ffffff
# param-foo=second commented foo <-- Left
param-x=xxxxxx2
Two parts of the question:
1. How to do it with only one known repeating param?
(only param-foo in the example above)
2. How to do it with all multiple active params at once?
(param-foo + param-x in the example above)
Attention: In this case I don't know previously the name of the repeating params!
Thanks
If awk is acceptable, this will do it for param-foo and param-x:
awk -F= -v p='param-foo param-x' 'BEGIN {
ARGV[ARGC++] = ARGV[ARGC - 1]
n = split(p, t, OFS)
for (i = 0; ++i <= n;) _p[t[i]]
}
NR == FNR {
$1 in _p && nr[$1] = NR
next
}
$1 in nr && FNR != nr[$1] {
$0 = "# " $0
}1' infile
You may use a single parameter: p=param-x or add more parameters separated by spaces: p='param-1 param-2 ... param-n'.
Edit: I'm assuming the real input file looks like this:
param-a=aaaaaa
param-b=bbbbbb
param-foo=first occurence
param-c=cccccc
# param-foo=commented foo
param-d=dddddd
param-e=eeeeee
param-foo=second occurence
param-foo=third occurence
param-x=xxxxxx1
param-f=ffffff
param-x=xxxxxx2
Let me know if it's different.
Second edit: providing a solution for mawk users:
awk -F= -v p='param-foo param-x' 'BEGIN {
n = split(p, t, OFS)
for (i = 0; ++i <= n;) _p[t[i]]
}
NR == FNR {
$1 in _p && nr[$1] = NR
next
}
$1 in nr && FNR != nr[$1] {
$0 = "# " $0
}1' infile infile
Adding solution for the latest requirement:
awk -F= 'NR == FNR {
if (NF && !/^#/)
_p[$1]++ && nr[$1] = NR
next
}
$1 in nr && FNR != nr[$1] {
FNR != nr[$1] && $0 = "# " $0
}1' infile infile
I have not tested fully the script, but it worked on the first example:
#!/bin/bash
input_file=/path/to/your/input/file
last_occurence=`nl $input_file | grep 'param-foo' | grep -v '#' | tail -1 | awk -F" " '{print $1}'`
sed -i '/#/!s/param-foo/# param-foo/g' $input_file
sed -i "${last_occurence}s/# param-foo/param-foo/" $input_file
It's very straight forward logic. First we get the last occurrence of param-foo, which is not commented.
The first sed goes and comments all param-foo, which are not commented.
The second sed uses the line_number of last occurence of param-foo and removes the # character. You can easily wrap that in a function and use it inside a loop, providing a list of parameters, instead of only one.
A bit slow for long files, but should work for all the parameters:
grep -v ^# $file |
cut -f1 -d= |
sort -u |
sed 's/^/grep -n . '$file' |
tac |
grep -m1 :/;s/$/= /' |
bash |
sed -r 's%([0-9]+):(.*)=(.*)%\1!s/^\2=/# \2=/%' |
sed -f- $file
This might work:
param="param-foo"
tac input_file |sed '/#/!{/'"$param"'/{x;/./{x;s/'"$param"'/# &/;t};x;h;}}'|tac >output_file
For multiple params:
cp input_file{,.backup}
params=(param-{foo,bar,baz})
tac input_file >backwards_file
for param in "${params[#]}"; do
sed -i '/#/!{/'"$param"'/{x;/./{x;s/'"$param"'/# &/;t};x;h;}}' backwards_file
done
tac backwards_file >output_file
Turn input_file backwards, preprend all but the first occurrence of $param with a comment #,then revert the file.
EDIT:
To extract the params from the file use this piece of code:
params=($(sed -rn '/^#/d;/^$/!s/^\s*([^=]*).*/\1/gp' input_file | sort | uniq))

Resources