awk from file using echo and output to file - bash

A.txt contains:
/*333*/
asdfasdfadfg
sadfasdfasgadas
###
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
###
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
###
B.txt contains:
555
777
I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name).
So based on the sample above, the result should be :
555.txt, which contains:
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
and 777.txt, which contains:
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
I tried this script but it outputs nothing:
for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done
Thank you in advance

With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
FNR==NR{
if($0~/^\/\*/){
line=$0
gsub(/^\/\*|\*\/$/,"",line)
arr[++count]=$0
arr1[line]=count
next
}
arr[count]=(arr[count]?arr[count] ORS:"") $0
next
}
($0 in arr1){
outputFile=$0".txt"
print arr[arr1[$0]] >> (outputFile)
close(outputFile)
}
' file1 file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read.
if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following.
line=$0 ##Setting $0 to line variable here.
gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here.
arr[++count]=$0 ##Creating arr with index of ++count and value is $0.
arr1[line]=count ##Creating arr1 with index of line and value of count.
next ##next will skip all further statements from here.
}
arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value.
next ##next will skip all further statements from here.
}
($0 in arr1){ ##checking if current line is present in arr1 then do following.
outputFile=$0".txt" ##Creating outputFile with current line .txt value here.
print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile.
close(outputFile) ##Closing outputFile in backend to avoid too many opened files error.
}
' file1 file2 ##Mentioning Input_file names here.

Making a few alterations to your code provides the desired outcome with the example data provided:
while read -r f
do
awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt
done < B.txt
cat 555.txt
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
cat 777.txt
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
Does this solve your problem?

Here is another awk solution for this:
awk '
FNR == NR {
map["/*" $0 "*/"] = $0
next
}
$0 in map {
fn = map[$0] ".txt"
}
/^###$/ {
close(fn)
fn = ""
}
fn {print > fn}' B.txt A.txt

Related

issue for condition on unique raws in bash

I want to print rows of a table in a file, the issue is when I use a readline the reprint me the result several times, here is my input file
aa ,DEC ,file1.txt
aa ,CHAR ,file1.txt
cc ,CHAR ,file1.txt
dd ,DEC ,file2.txt
bb ,DEC ,file3.txt
bb ,CHAR ,file3.txt
cc ,DEC ,file1.txt
Here is the result I want to have:
printed in file1.txt
aa#DEC,CHAR
cc#CHAR,DEC
printed in file2.txt
dd#DEC
printed in file3.txt
bb#DEC,CHAR
here is it my attempt :
(cat input.txt|while read line
do
table=`echo $line|cut -d"," -f1
variable=`echo $line|cut -d"," -f2
file=`echo $line|cut -d"," -f3
echo ${table}#${variable},
done ) > ${file}
This can be done in a single pass gnu awk like this:
awk -F ' *, *' '{
map[$3][$1] = (map[$3][$1] == "" ? "" : map[$3][$1] ",") $2
}
END {
for (f in map)
for (d in map[f])
print d "#" map[f][d] > f
}' file
This will populate this data:
=== file1.txt ===
aa#DEC,CHAR
cc#CHAR,DEC
=== file2.txt ===
dd#DEC
=== file3.txt ===
bb#DEC,CHAR
With your shown samples, could you please try following, written and tested in shown samples in GNU awk.
awk '
{
sub(/^,/,"",$3)
}
FNR==NR{
sub(/^,/,"",$2)
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
next
}
(($1,$3) in arr){
close(outputFile)
outputFile=$3
print $1"#"arr[$1,$3] >> (outputFile)
delete arr[$1,$3]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/^,/,"",$3) ##Substituting starting comma in 3rd field with NULL.
}
FNR==NR{ ##Checking condition FNR==NR will be true when first time Input_file is being read.
sub(/^,/,"",$2) ##Substituting starting comma with NULL in 2nd field.
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
##Creating arr with index of 1st and 3rd fields, which has 2nd field as value.
next ##next will skip all further statements from here.
}
(($1,$3) in arr){ ##Checking condition if 1st and 3rd fields are in arr then do following.
close(outputFile) ##Closing output file, to avoid "too many opened files" error.
outputFile=$3 ##Setting outputFile with value of 3rd field.
print $1"#"arr[$1,$3] >> (outputFile)
##printing 1st field # arr value and output it to outputFile here.
delete arr[$1,$3] ##Deleting array element with index of 1st and 3rd field here.
}
' Input_file Input_file ##Mentioning Input_file 2 times here.
You have several errors in your code. You can use the built-in read to split on a comma, and the parentheses are completely unnecessary.
while IFS=, read -r table variable file
do
echo "${table}#${variable}," >>"$file"
done< input.txt
Using $file in a redirect after done is an error; the shell wants to open the file handle to redirect to before file is defined. But as per your requirements, each line should go to a different `file.
Notice also quoting fixes and the omission of the useless cat.
Wrapping fields with the same value onto the same line would be comfortably easy with an Awk postprocessor, but then you might as well do all of this in Awk, as in the other answer you already received.

Copy one csv header to another csv with type modification

I want to copy one csv header to another in row wise with some modifications
Input csv
name,"Mobile Number","mobile1,mobile2",email2,Address,email21
test, 123456789,+123456767676,a#test.com,testaddr,a1#test.com
test1,7867778,8799787899898,b#test,com, test2addr,b2#test.com
In new csv this should be like this and file should also be created. And for sting column I will pass the column name so only that column will be converted to string
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
As you see above all these header with type modification should be inserted in different rows
I have tried with below command but this is only for copy first row
sed '1!d' input.csv > output.csv
You may try this alternative gnu awk command as well:
awk -v FPAT='"[^"]+"|[^,]+' 'NR == 1 {
for (i=1; i<=NF; ++i)
print gensub(/"/, "", "g", $i) "." ($i ~ /,/ ? "string" : "auto") "()"
exit
}' file
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
Or using sed:
sed -i -e '1i 1234567890.string(),My address is test.auto(),abc3#gmail.com.auto(),120000003.auto(),abc-003.auto(),3.com.auto()' -e '1d' test.csv
EDIT: As per OP's comment to print only first line(header) please try following.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
exit
}
' Input_file > output_file
Could you please try following, written and tested with GUN awk with shown samples.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program and setting FPAT to [^,]*|"[^"]+".
FNR==1{ ##Checking condition if this is first line then do following.
for(i=1;i<=NF;i++){ ##Running for loop from i=1 to till NF value.
if($i~/^".*,.*"$/){ ##Checking condition if current field starts from " and ends with " and having comma in between its value then do following.
gsub(/"/,"",$i) ##Substitute all occurrences of " with NULL in current field.
print $i".string()" ##Printing current field and .string() here.
}
else{ ##else do following.
print $i".auto()" ##Printing current field dot auto() string here.
}
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

How to compare two files and print the values of both the files which are different

There are 2 files. I need to sort them first and then compare the 2 files and then the difference I need to print the value from File 1 and File 2.
file1:
pair,bid,ask
AED/MYR,3.918000,3.918000
AED/SGD,3.918000,3.918000
AUD/CAD,3.918000,3.918000
file2:
pair,bid,ask
AUD/CAD,3.918000,3.918000
AUD/CNY,3.918000,3.918000
AED/MYR,4.918000,4.918000
Output should be:
pair,inputbid,inputask,outputbid,outtputask
AED/MYR,3.918000,3.918000,4.918000,4.918000
The only difference in 2 files is AED/MYR with different bid/ask rates. How can I print difference value from file 1 and file 2.
I tried using below commands:
nawk -F, 'NR==FNR{a[$1]=$4;a[$2]=$5;next} !($4 in a) || !($5 in a) {print $1 FS a[$1] FS a[$2] FS $4 FS $5}' file1 file2
Result output as below:
pair,bid,ask,bid,ask
AUD/CAD,3.918000,3.918000,3.918000,3.918000
AUD/CHF,3.918000,3.918000,3.918000,3.918000
AUD/CNH,3.918000,3.918000,3.918000,3.918000
AUD/CNY,3.918000,3.918000,3.918000,3.918000
AED/MYR,3.918000,3.918000,4.918000,4.918000
We are still not able to get only the difference.
Could you please try following, written and tested in GNU awk with shown samples.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" '
BEGIN{
FS=OFS=","
}
FNR==NR{
arr[$1]=$0
next
}
($1 in arr) && arr[$1]!=$0{
val=$1
$1=""
sub(/^,/,"")
if(!found){
print header
found=1
}
print arr[val],$0
}' Input_file1 Input_file2
Explanation: Adding detailed explanation for above.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" ' ##Starting awk program from here and setting this to header value here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator and output field separator as comma here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read.
arr[$1]=$0 ##Creating arr with index $1 and keep value as current line.
next ##next will skip all further statements from here.
}
($1 in arr) && arr[$1]!=$0{ ##Checking condition if first field is present in arr and its value NOT equal to $0
val=$1 ##Creating val which has current line value in it.
$1="" ##Nullifying irst field here.
sub(/^,/,"") ##Substitute starting , with NULL here.
if(!found){ ##Checking if found is NULL then do following.
print header ##Printing header here only once.
found=1 ##Setting found here.
}
print arr[val],$0 ##Printing arr with index of val and current line here.
}' Input_file1 Input_file2 ##Mentioning Input_files here.
With bash process substitution, then join and then choosing with awk:
# print header
printf "%s\n" "pair,inputbid,inputask,outputbid,outtputask"
# remove first line from both files, then sort them on first field
# then join them on first field and output first 5 fields
join -t, -11 -21 -o1.1,1.2,1.3,2.2,2.3 <(tail -n +2 file1 | sort -t, -k1) <(tail -n +2 file2 | sort -t, -k1) |
# output only those lines, that columns differ
awk -F, '$2 != $4 || $3 != $5'

Replace the first column in a file with another column in different file using shell

I have two files file1 and file2
file1
Shyam=123=12.3.4.5=user#gmail.com
Shyam=123=12.2.5.4=user#gmail.com
Joshwa=234=14.3.4.67=user#gmail.com
Anil=879=15.3.4.98=user#gmail.com
Anil=765=15.4.5.65=user#gmail.com
.......
file2
Shyam=ShyamLal
Joshwa=JoshwaSam
Anil=AnilAcharya
....
"=" is mentioned as a seperator in file1 and file2.
I want to update names as given in file2. ie.,Shyam will be replaced with ShyamLal, Joshwa will be replaced with JoshwaSam and Anil will be replaced with AnilAcharya. I don't want to use if-else condition, because I have large number of datas.
My output should be like:
ShyamLal=123=12.3.4.5=user#gmail.com
ShyamLal=123=12.2.5.4=user#gmail.com
JoshwaSam=234=14.3.4.67=user#gmail.com
AnilAcharya=879=15.3.4.98=user#gmail.com
AnilAcharya=765=15.4.5.65=user#gmail.com.
I tried this. But don't know whether I am doing right
while IFS= read -r line
do
key=`echo $line | awk -F "=" '{print $1}'` < file1.txt
value=`echo $line | awk -F "=" '{print $2}' < file2.txt`
cat file1.txt | sed 's/$key/$value/g'
done
How can I proceed?
Could you please try following.
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
a[$1]=$2
next
}
($1 in a){
$1=a[$1]
}
1
' Input_file2 Input_file1
Explanation: Adding detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section here.
FS=OFS="=" ##Setting FS and OFS as = for all lines here.
} ##Closing BLOCK for BEGIN section of this program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file file2 is being read.
a[$1]=$2 ##Creating an array named a with index $1 with value of $2 of current line.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if $1 is present in array a this will be done when Input_file1 is being read.
$1=a[$1] ##Setting $1 to array a value with index $1 of current line.
}
1 ##1 will print edited/non-edited line here.
' file2 file1 ##Mentioning Input_file names here.

How to run a bash script in a loop

i wrote a bash script in order to pull substrings and save it to an output file from two input files that looks like this:
input file 1
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
input file 2
gene1 10 20
gene2 40 50
genen x y
my script
>output_file
cat input_file2 | while read row; do
echo $row > temp
geneName=`awk '{print $1}' temp`
startPos=`awk '{print $2}' temp`
endPos=`awk '{print $3}' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' input_file1 >> output file
done
done
how can i make it work in a loop for more than one string in the input file 1?
new input file looks like this:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotypen...
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
I would like to have a different out file for every genotype and that the file name would be the genotype name.
thank you!
If I'm understanding correctly, would you try the following:
awk '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > genotype
print substr($0, start[i], len[i]) >> genotype
}
close(genotype)
}' input_file2 input_file1
input_file1:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotype3
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Input_file2:
gene1 10 20
gene2 40 50
gene3 20 25
[Results]
genotype1:
>gene1
aaaaaaaaaa
>gene2
aaaaaaaaaa
>gene3
aaaaa
genotype2:
>gene1
bbbbbbbbbb
>gene2
bbbbbbbbbb
>gene3
bbbbb
genotype3:
>gene1
nnnnnnnnnn
>gene2
nnnnnnnnnn
>gene3
nnnnn
[EDIT]
If you want to store the output files to a different directory,
please try the following instead:
dir="./outdir" # directory name to store the output files
# you can modify the name as you want
mkdir -p "$dir"
awk -v dir="$dir" '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > dir"/"genotype
print substr($0, start[i], len[i]) >> dir"/"genotype
}
close(dir"/"genotype)
}' input_file2 input_file1
The 1st two lines are executed in bash to define and mkdir the destination directory.
Then the directory name is passed to awk via -v option
Hope this helps.
Could you please try following, where I am assuming that your Input_file1's column which starts with > should be compared with 1st column of Input_file2's first column (since samples are confusing so based on OP's attempt this has been written).
awk '
FNR==NR{
start_point[$1]=$2
end_point[$1]=$3
next
}
/^>/{
sub(/^>/,"")
val=$0
next
}
{
print val ORS substr($0,start_point[val],end_point[val])
val=""
}
' Input_file2 Input_file1
Explanation: Adding explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
start_point[$1]=$2 ##Creating an array named start_point with index $1 of current line and its value is $2.
end_point[$1]=$3 ##Creating an array named end_point with index $1 of current line and its value is $3.
next ##next will skip all further statements from here.
}
/^>/{ ##Checking condition if a line starts from > then do following.
sub(/^>/,"") ##Substituting starting > with NULL.
val=$0 ##Creating a variable val whose value is $0.
next ##next will skip all further statements from here.
}
{
print val ORS substr($0,start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val].
val="" ##Nullifying variable val here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.

Resources