Copy files containing all lines of an input file - bash

I want to copy files in a directory which contain all the lines of an inputFile. Here is an example:
inputFile
Line3
Line1
LineX
Line4
LineB
file1
Line1
Line2
LineX
LineB
file2
Line100
Line10
LineB
Line4
LineX
Line3
Line1
Line4
Line1
The script is expected to copy only file2 to a destination directory since all lines of the inputFile are found in file2 but not in file1.
I could compare individual file with inputFile as discussed partly here and copy files manually if script produced no output. That is;
awk 'NR==FNR{a[$0];next}!($0 in a)' file1 inputFile
Line3
Line4
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 inputFile
warranting no need to copy file1; however, replacing file2 will produce no result indicating all lines of inputFile are found in file2; so do a cp file2 ../distDir/.
This will be time taking and hope there will be some way I could do it in a for loop. I am not particular about awk, any bash scripting tool can be used.
Thank you,

Assuming the following:
All the files you need to check are in the current directory
The base file is also in the current directory and named inputFile
The target path is ../distDir/
You may run a BASH script like the following which basically loops over all the files, compares them against the base file and copies them if required.
#!/bin/bash
inputFile="./inputFile"
targetDir="../distDir/"
for file in *; do
dif=$(awk 'NR==FNR{a[$0];next}!($0 in a)' $file $inputFile)
if [ "$dif" == "" ]; then
# File contains all lines, copy
cp $file $targetDir
fi
done

bash (with comm + wc commands) solution:
#!/bin/bash
n=$(wc -l inputFile | cut -d' ' -f1) # number of lines of inputFile
for f in /yourdir/file*
do
if [[ $n == $(comm -12 <(sort inputFile) <(sort "$f") | wc -l | cut -d' ' -f1) ]]
then
cp "$f" "/dest/${f##*/}"
fi
done
comm -12 FILE1 FILE2 - output only lines that appear in both files

Could you please try following and let me know if this helps you.
I have written "echo cp " val " destination_path" in system, so you could remove echo from it and put destination_path's actual value too once you are happy with echo result(which will simply print eg--> cp file2 destination_path)
awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file file1 file2
Will add explanation shortly too.
EDIT1: As per OP file named which should be compared by Input_file could be anything so changed code as per that request.
find -type f -exec awk 'function check(array,val,count){
if(length(array)==count){
system("echo cp " val " destination_path")
}
}
FNR==NR{
a[$0];
next
}
val!=FILENAME{
check(a,val,count)
}
FNR==1{
val=FILENAME;
count=total="";
delete b
}
($1 in a) && !b[$1]++{
count++
}
END{
check(a,val,count)
}
' Input_file {} +
Explanation: Adding explanation too as follows.
find -type f -iname "file*" -exec awk 'function check(array,val,count){ ##Using find command to get only the files in a directory, using exec passing their values to awk too.From here awk code starts, creating a function named check here, which will have parameters array,val and count to be passed into it, whenever a call is being made to it.
if(length(array)==count){ ##Checking here if length of array is equal to variable count, if yes then do following action.
system("echo cp " val " destination_path")##Using awks system function here by which we could execute shell commands in awk script, so I have written here echo to only check purposes initially, it will print copy command if any files al lines are matching to Input_file file, if OP is happy with it OP should remove echo then.
}
}
FNR==NR{ ##FNR==NR condition will be only TRUE when very first file named Input_file is being read.
a[$0]; ##creating an array named a whose index is current line.
next ##using next keyword will skip all further statements.
}
val!=FILENAME{ ##checking here when variable val is not having same value as current file name then perform following actions.
check(a,val,count) ##calling check function with passing arguments of array a,val,count.
}
FNR==1{ ##Checking if FNR==1, which will be true whenever a new files first line is being read.
val=FILENAME; ##creating variable named val whose value is current Input_file filename.
count=total=""; ##Nullifying variables count and total now.
delete b ##Deleting array b here.
}
($1 in a) && !b[$1]++{ ##Checking if first field of file is in array a and it is not present more than 1 time in array b then do following
count++ ##incrementing variable named count value to 1 each time cursor comes inside here.
}
END{ ##starting awk END block here.
check(a,val,count) ##Calling function named check with arguments array a,val and count in it.
}
' Input_file {} + ##Mentioning Input_file here
PS: I tested/written this in GNU awk.

Related

awk from file using echo and output to file

A.txt contains:
/*333*/
asdfasdfadfg
sadfasdfasgadas
###
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
###
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
###
B.txt contains:
555
777
I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name).
So based on the sample above, the result should be :
555.txt, which contains:
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
and 777.txt, which contains:
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
I tried this script but it outputs nothing:
for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done
Thank you in advance
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
FNR==NR{
if($0~/^\/\*/){
line=$0
gsub(/^\/\*|\*\/$/,"",line)
arr[++count]=$0
arr1[line]=count
next
}
arr[count]=(arr[count]?arr[count] ORS:"") $0
next
}
($0 in arr1){
outputFile=$0".txt"
print arr[arr1[$0]] >> (outputFile)
close(outputFile)
}
' file1 file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read.
if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following.
line=$0 ##Setting $0 to line variable here.
gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here.
arr[++count]=$0 ##Creating arr with index of ++count and value is $0.
arr1[line]=count ##Creating arr1 with index of line and value of count.
next ##next will skip all further statements from here.
}
arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value.
next ##next will skip all further statements from here.
}
($0 in arr1){ ##checking if current line is present in arr1 then do following.
outputFile=$0".txt" ##Creating outputFile with current line .txt value here.
print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile.
close(outputFile) ##Closing outputFile in backend to avoid too many opened files error.
}
' file1 file2 ##Mentioning Input_file names here.
Making a few alterations to your code provides the desired outcome with the example data provided:
while read -r f
do
awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt
done < B.txt
cat 555.txt
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
cat 777.txt
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
Does this solve your problem?
Here is another awk solution for this:
awk '
FNR == NR {
map["/*" $0 "*/"] = $0
next
}
$0 in map {
fn = map[$0] ".txt"
}
/^###$/ {
close(fn)
fn = ""
}
fn {print > fn}' B.txt A.txt

issue for condition on unique raws in bash

I want to print rows of a table in a file, the issue is when I use a readline the reprint me the result several times, here is my input file
aa ,DEC ,file1.txt
aa ,CHAR ,file1.txt
cc ,CHAR ,file1.txt
dd ,DEC ,file2.txt
bb ,DEC ,file3.txt
bb ,CHAR ,file3.txt
cc ,DEC ,file1.txt
Here is the result I want to have:
printed in file1.txt
aa#DEC,CHAR
cc#CHAR,DEC
printed in file2.txt
dd#DEC
printed in file3.txt
bb#DEC,CHAR
here is it my attempt :
(cat input.txt|while read line
do
table=`echo $line|cut -d"," -f1
variable=`echo $line|cut -d"," -f2
file=`echo $line|cut -d"," -f3
echo ${table}#${variable},
done ) > ${file}
This can be done in a single pass gnu awk like this:
awk -F ' *, *' '{
map[$3][$1] = (map[$3][$1] == "" ? "" : map[$3][$1] ",") $2
}
END {
for (f in map)
for (d in map[f])
print d "#" map[f][d] > f
}' file
This will populate this data:
=== file1.txt ===
aa#DEC,CHAR
cc#CHAR,DEC
=== file2.txt ===
dd#DEC
=== file3.txt ===
bb#DEC,CHAR
With your shown samples, could you please try following, written and tested in shown samples in GNU awk.
awk '
{
sub(/^,/,"",$3)
}
FNR==NR{
sub(/^,/,"",$2)
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
next
}
(($1,$3) in arr){
close(outputFile)
outputFile=$3
print $1"#"arr[$1,$3] >> (outputFile)
delete arr[$1,$3]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/^,/,"",$3) ##Substituting starting comma in 3rd field with NULL.
}
FNR==NR{ ##Checking condition FNR==NR will be true when first time Input_file is being read.
sub(/^,/,"",$2) ##Substituting starting comma with NULL in 2nd field.
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
##Creating arr with index of 1st and 3rd fields, which has 2nd field as value.
next ##next will skip all further statements from here.
}
(($1,$3) in arr){ ##Checking condition if 1st and 3rd fields are in arr then do following.
close(outputFile) ##Closing output file, to avoid "too many opened files" error.
outputFile=$3 ##Setting outputFile with value of 3rd field.
print $1"#"arr[$1,$3] >> (outputFile)
##printing 1st field # arr value and output it to outputFile here.
delete arr[$1,$3] ##Deleting array element with index of 1st and 3rd field here.
}
' Input_file Input_file ##Mentioning Input_file 2 times here.
You have several errors in your code. You can use the built-in read to split on a comma, and the parentheses are completely unnecessary.
while IFS=, read -r table variable file
do
echo "${table}#${variable}," >>"$file"
done< input.txt
Using $file in a redirect after done is an error; the shell wants to open the file handle to redirect to before file is defined. But as per your requirements, each line should go to a different `file.
Notice also quoting fixes and the omission of the useless cat.
Wrapping fields with the same value onto the same line would be comfortably easy with an Awk postprocessor, but then you might as well do all of this in Awk, as in the other answer you already received.

Replacing text strings in bash using awk

I have a list of files in bash - file1.txt, file2.txt, file3.txt and I would like to make another list that include this strings without .txt, so
names2 = (file1, file2, file3)
Then, I would like to find these strings in a file and add a before this strings. How to do that please?
My Code:
names = (file1.txt, file2.txt, file3.txt)
for i in "${names[#]}"; do
awk '{ gsub("$i","a-$i") }' f.txt > g.txt
f.txt:
TEXT
\connect{file1}
\begin{file2}
\connect{file3}
TEXT
75
Desired output g.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
With sed+printf:
$ names=(file1 file2 file3) # Declare array
$ printf 's/%s/a-&/g\n' "${names[#]}" # Generate sed replacement script
s/file1/a-&/g
s/file2/a-&/g
s/file3/a-&/g
$ sed -f <(printf 's/%s/a-&/g\n' "${names[#]}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
If your array contains .txt suffix, use this:
$ names=(file1.txt file2.txt file3.txt) # Declare array
$ printf 's/%s/a-&/g\n' "${names[#]%.txt}" # Generate sed replacement script
s/file1/a-&/g
s/file2/a-&/g
s/file3/a-&/g
$ sed -f <(printf 's/%s/a-&/g\n' "${names[#]%.txt}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
If the files list contains the names which have overlapping string, you can use the word boundaries (\<,\>) to handle this.
e.g.
$ cat f.txt
TEXT
\connect{file1}
\begin{file2}
\connect{file3file2}
TEXT
75
$ names=(file1.txt file2.txt file3file2.txt) # Declare array
$ printf 's/\<%s\>/a-&/g\n' "${names[#]%.txt}" # Generate sed replacement script
s/\<file1\>/a-&/g
s/\<file2\>/a-&/g
s/\<file3file2\>/a-&/g
$ sed -f <(printf 's/\<%s\>/a-&/g\n' "${names[#]%.txt}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3file2}
TEXT
75
Could you please try following.
Variable with values in shell:
string="file1.txt, file2.txt, file3.txt"
Creating a shell array as follows:
IFS=', ' read -r -a array <<< "$string"
OR if you want to stick with your way of defining array then do like:
array=(file1.txt file2.txt file3.txt)
Passing above shell created array to awk and reading Input_file for doing the final operations.
awk -v arr="${array[*]}" '
BEGIN{
FS=OFS="{"
num=split(arr,array," ")
for(i=1;i<=num;i++){
sub(/\.txt/,"",array[i])
array1[array[i]"}"]
}
}
$2 in array1{
$2="a-"$2
}
1
' Input_file
Explanation: Adding explanation of above code here.
awk -v arr="${array[*]}" ' ##Creating a variable named arr whose value is all elements of array(shell array).
BEGIN{ ##Starting BEGIN section of awk code here.
FS=OFS="{" ##Setting FS and OFS as { here.
num=split(arr,array," ") ##Splitting arr variable into array named array with delimiter space and its length is stored in num variable.
for(i=1;i<=num;i++){ ##Starting for loop from i=1 to till value of variable num.
sub(/\.txt/,"",array[i]) ##Using sub to substitute .txt with NULL in array value whose index is variable named i.
array1[array[i]"}"] ##Creating an array1 whose index is array[i] value with } in it.
} ##Closing for loop here.
} ##Closing BEGIN section of code here.
$2 in array1{ ##Checking condition if $2 of current line is present in array named array1 then do following.
$2="a-"$2 ##Adding string a- with value of $2.
} ##Closing BLOCK for condition here.
1 ##Mentioning 1 will print edited/non-edited line of Input_file.
' Input_file ##Mentioning Input_file name here.

How do I join lines using space and comma

I have the file that contains content like:
IP
111
22
25
I want to print the output in the format IP 111,22,25.
I have tried tr ' ' , but its not working
Welcome to paste
$ paste -sd " ," file
IP 111,22,25
Normally what paste does is it writes to standard output lines consisting of sequentially corresponding lines of each given file, separated by a <tab>-character. The option -s does it differently. It states to paste each line of the files sequentially with a <tab>-character as a delimiter. When using the -d flag, you can give a list of delimiters to be used instead of the <tab>-character. Here I gave as a list " ," indicating, use space and then only commas.
In pure Bash:
# Read file into array
mapfile -t lines < infile
# Print to string, comma-separated from second element on
printf -v str '%s %s' "${lines[0]}" "$(IFS=,; echo "${lines[*]:1}")"
# Print
echo "$str"
Output:
IP 111,22,25
I'd go with:
{ read a; read b; read c; read d; } < file
echo "$a $b,$c,$d"
This will also work:
xargs printf "%s %s,%s,%s" < file
Try cat file.txt | tr '\n' ',' | sed "s/IP,/IP /g"
tr deletes new lines, sed changes IP,111,22,25 into IP 111,22,25
The following awk script will do the requested:
awk 'BEGIN{OFS=","} FNR==1{first=$0;next} {val=val?val OFS $0:$0} END{print first FS val}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here of awk program.
OFS="," ##Setting OFS as comma, output field separator.
} ##Closing BEGIN section of awk here.
FNR==1{ ##Checking if line is first line then do following.
first=$0 ##Creating variable first whose value is current first line.
next ##next keyword is awk out of the box keyword which skips all further statements from here.
} ##Closing FNR==1 BLOCK here.
{ ##This BLOCK will be executed for all lines apart from 1st line.
val=val?val OFS $0:$0 ##Creating variable val whose values will be keep concatenating its own value.
}
END{ ##Mentioning awk END block here.
print first FS val ##Printing variable first FS(field separator) and variable val value here.
}' Input_file ##Mentioning Input_file name here which is getting processed by awk.
Using Perl
$ cat captain.txt
IP
111
22
25
$ perl -0777 -ne ' #k=split(/\s+/); print $k[0]," ",join(",",#k[1..$#k]) ' captain.txt
IP 111,22,25
$

Extracting a field from a line with condition in bash

I am reading lines from a file and need to extract field 3 from lines in another file if fields 5 and 6 from the first file exist in the second file.
I tried to do so with the following but it doesn't work. I appreciate any help.
filename=file.txt
while read -r f1 f2 f3 f4 f5
do
awk '$17 == $f4 && $18 == $f5 {print $3}' file2.txt
done < "$filename"
The correct approach will be something like:
awk '
NR==FNR { a[$17,$18]=$3; next }
($4,$5) in a { print a[$4,$5] }
' file2.txt file.txt
but it's an untested guess since you haven't provided sample input/output yet.
You can do this all in awk, using getline()
awk '{var1=$5; var2=$6
while ((getline < "file2.txt") > 0)
if (index($0, var1) && index($0, var2)) print $3
close("file2.txt")
}' file1.txt
You are reading each line from file1.txt, putting field 5 & 6 into an awk variable to test later. Then using a while/getline to go through each line of the second file, and if both fields are found, then printing $3. Closing the file so that the next loop starts from record 1 of the second file.
Or, if you want to have a bash loop in file1, and then use awk, you can pass the variables in (as mentioned here by someone else), or escape them out.
awk '{if ($2 == '$var1') print $3}' file2.txt
The above will see the bash variable $var1 as a string in awk.

Resources