How to increase value of a text variable in a file - shell

file1.text contains below data.
VARIABLE=00
RATE=14
PRICE=100
I need to increment value by 1 only for below whenever I want.
VARIABLE=00 file name: file1.txt
output should be incremented by 1 every time.
output will be like below
VARIABLE=01
in next run VARIABLE=02 and so on....

Could you please try following, written and tested with shown samples in GNU awk.
awk 'BEGIN{FS=OFS="="} /^VARIABLE/{$NF=sprintf("%02d",$NF+1)} 1' Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="=" ##Setting FS and OFS as = here.
}
/^VARIABLE/{ ##Checking condition if line starts from VARIABLE then do following.
$NF=sprintf("%02d",$NF+1) ##Adding 1 last field and saing it to last field with 2 digits value.
}
1 ##1 will print the current line.
' Input_file > temp && mv temp Input_file ##Mentioning Input_file name here.

You can do it quite simply as a one-liner in Perl:
perl -i -pe '/^VARIABLE=/ && s/(\d+)/$&+1/e' file
In case you are unfamiliar with Perl, that says...
Run Perl and modify file in-place. if you come to any lines containing VARIABLE=, substitute the digits on that line with an expression calculated as "whatever the number was +1"
Note that Perl is a standard part of macOS - i.e. automatically included with all versions.

Related

Splitting file based on pattern '\r\n00' in korn shell

My file temp.txt looks like below
00ABC
PQR123400
00XYZ001234
012345
0012233
I want to split the file based on pattern '\r\n00'. In this case temp.txt should split into 3 files
first.txt:
00ABC
PQR123400
second.txt
00XYZ001234
012345
third.txt
0012233
I am trying to use csplit to match pattern '\r\n00' but the debug shows me invalid pattern. Can someone please help me to match the exact pattern using csplit
With your shown samples, please try following awk code. Written and tested in GNU awk.
This code will create files with names like: 1.txt, 2.txt and so on in your system. This will also take care of closing output files in backend so that we don't get in-famous error too many files opened one.
awk -v RS='\r?\n00' -v count="1" '
{
outputFile=(count++".txt")
rt=RT
sub(/\r?\n/,"",rt)
if(!rt){
sub(/\n+/,"")
rt=prevRT
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile
close(outputFile)
prevRT=rt
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk -v RS='\r?\n00' -v count="1" ' ##Starting awk program from here and setting RS as \r?\n00 aong with that setting count as 1 here.
{
outputFile=(count++".txt") ##Creating outputFile which has value of count(increases each time cursor comes here) followed by .txt here.
rt=RT ##Setting RT value to rt here.
sub(/\r?\n/,"",rt) ##Substituting \r?\n with NULL in rt.
if(!rt){ ##If rt is NULL then do following.
sub(/\n+/,"") ##Substituting new lines 1 or more with NULL.
rt=prevRT ##Setting preRT to rt here.
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile ##Printing rt and current line into outputFile.
close(outputFile) ##Closing outputFile in backend.
prevRT=rt ##Setting rt to prevRT here.
}
' Input_file ##Mentioning Input_file name here.

How to increment the numbers in a file by 3 in bash? [duplicate]

file1.text contains below data.
VARIABLE=00
RATE=14
PRICE=100
I need to increment value by 1 only for below whenever I want.
VARIABLE=00 file name: file1.txt
output should be incremented by 1 every time.
output will be like below
VARIABLE=01
in next run VARIABLE=02 and so on....
Could you please try following, written and tested with shown samples in GNU awk.
awk 'BEGIN{FS=OFS="="} /^VARIABLE/{$NF=sprintf("%02d",$NF+1)} 1' Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="=" ##Setting FS and OFS as = here.
}
/^VARIABLE/{ ##Checking condition if line starts from VARIABLE then do following.
$NF=sprintf("%02d",$NF+1) ##Adding 1 last field and saing it to last field with 2 digits value.
}
1 ##1 will print the current line.
' Input_file > temp && mv temp Input_file ##Mentioning Input_file name here.
You can do it quite simply as a one-liner in Perl:
perl -i -pe '/^VARIABLE=/ && s/(\d+)/$&+1/e' file
In case you are unfamiliar with Perl, that says...
Run Perl and modify file in-place. if you come to any lines containing VARIABLE=, substitute the digits on that line with an expression calculated as "whatever the number was +1"
Note that Perl is a standard part of macOS - i.e. automatically included with all versions.

How to add single quote after specific word using sed?

I am trying to write a script to add a single quote after a "GOOD" word .
For example, I have file1 :
//WER GOOD=ONE
//WER1 GOOD=TWO2
//PR1 GOOD=THR45
...
Desired change is to add single quotes :
//WER GOOD='ONE'
//WER1 GOOD='TWO2'
//PR1 GOOD='THR45'
...
This is the script which I am trying to run:
#!/bin/bash
for item in `grep "GOOD" file1 | cut -f2 -d '='`
do
sed -i 's/$item/`\$item/`\/g' file1
done
Thank you for the help in advance !
Could you please try following.
sed "s/\(.*=\)\(.*\)/\1'\2'/" Input_file
OR as per OP's comment to remove empty line use:
sed "s/\(.*=\)\(.*\)/\1'\2'/;/^$/d" Input_file
Explanation: following is only for explanation purposes.
sed " ##Starting sed command from here.
s/ ##Using s to start substitution process from here.
\(.*=\)\(.*\) ##Using sed buffer capability to store matched regex into memory, saving everything till = in 1st buffer and rest of line in 2nd memory buffer.
/\1'\2' ##Now substituting 1st and 2nd memory buffers with \1'\2' as per OP need adding single quotes before = here.
/" Input_file ##Closing block for substitution, mentioning Input_file name here.
Please use -i option in above code in case you want to save output into Input_file itself.
2nd solution with awk:
awk 'match($0,/=.*/){$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047"} 1' Input_file
Explanation: Adding explanation for above code.
awk '
match($0,/=.*/){ ##Using match function to mmatch everything from = to till end of line.
$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047" ##Creating value of $0 with sub-strings till value of RSTART and adding ' then sub-strings till end of line adding ' then as per OP need.
} ##Where RSTART and RLENGTH are variables which will be SET once a TRUE matched regex is found.
1 ##1 will print edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
3rd solution: In case you have only 2 fields in your Input_file then try more simpler in awk:
awk 'BEGIN{FS=OFS="="} {$2="\047" $2 "\047"} 1' Input_file
Explanation of 3rd code: Use only for explanation purposes, for running please use above code itself.
awk ' ##Starting awk program here.
BEGIN{FS=OFS="="} ##Setting FS and OFS values as = for all line for Input_file here.
{$2="\047" $2 "\047"} ##Setting $2 value with adding a ' $2 and then ' as per OP need.
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.

replace names in fasta

I want to change the sequence names in a fasta file according a text file containing new names. I found several approaches but seqkit made a good impression, anyway I can´t get it running. Replace key with value by key-value file
The fasta file seq.fa looks like
>BC1
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
>BC2
TGCATGCATGCATGCATGCATGCATGCATGCATGCATGCG
GCATGCATGCATGCATGCATGCATGCATGCATGCG
>BC3
GCATGCATGCATGCATGCATGCATGCATGCATGCCCCCCC
TGCATGCATGCATG
and the ref.txt tab delimited text file like
BC1 1234
BC2 1235
BC3 1236
using siqkit in Git Bash runs trough the file but doesn´t change the names.
seqkit replace -p' (.+)$' -r' {kv}' -k ref.txt seq.fa --keep-key
I´m used to r and new to bash and can´t find the bug but guess I need to adjust for tab and _ ?
As in the example https://bioinf.shenwei.me/seqkit/usage/#replace part 7. Replace key with value by key-value file the sequence name is tab delimited and only the second part is replaced.
Advise how to adjust the code?
Desired outcome should look like: Replacing BC1 by the number in the text file 1234
>1234
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
>1235
TGCATGCATGCATGCATGCATGCATGCATGCATGCATGCG
GCATGCATGCATGCATGCATGCATGCATGCATGCG
>1236
GCATGCATGCATGCATGCATGCATGCATGCATGCCCCCCC
TGCATGCATGCATG
could you please try following.
awk '
FNR==NR{
a[$1]=$2
next
}
($2 in a) && /^>/{
print ">"a[$2]
next
}
1
' ref.txt FS="[> ]" seq.fa
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##FNR==NR is condition which will be TRUE when 1st Input_file named ref.txt will be read.
a[$1]=$2 ##Creating an array named a whose index is $1 and value is $2 of current line.
next ##next will skip all further statements from here.
} ##Closing BLOCK for FNR==NR condition here.
($2 in a) && /^>/{ ##Checking condition if $2 of current line is present in array a and starts with > then do following.
print ">"a[$2] ##Printing > and value of array a whose index is $2.
next ##next will skip all further statements from here.
}
1 ##Mentioning 1 will print the lines(those which are NOT starting with > in Input_file seq.fa)
' ref.txt FS="[> ]" seq.fa ##Mentioning Input_file names here and setting FS= either space or > for Input_file seq.fa here.
EDIT: As per OP's comment need to add >1234_1 occurrence number too in output so adding following code now.
awk '
FNR==NR{
a[$1]=$2
b[$1]=++c[$2]
next
}
($2 in a) && /^>/{
print ">"a[$2]"_"b[$2]
next
}
1
' ref.txt FS="[> ]" seq.fa
awk solution that doesn't require GNU awk:
awk 'NR==FNR{a[$1]=$2;next}
NF==2{$2=a[$2]; print ">" $2;next}
1' FS='\t' ref.txt FS='>' seq.fa
The first statement is filling the array a with the content of the tab delimited file ref.txt.
The second statement prints all lines of the second files seq.fa with 2 fields given the > as field delimiter.
The last statement prints all lines of that same file.

Replace header of one column by file name

I have about 100 comma-separated text files with eight columns.
Example of two file names:
sample1_sorted_count_clean.csv
sample2_sorted_count_clean.csv
Example of file content:
Domain,Phylum,Class,Order,Family,Genus,Species,Count
Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Zymomonas,Zymomonas mobilis,0.0
Bacteria,Bacteroidetes,Flavobacteria,Flavobacteriales,Flavobacteriaceae,Zunongwangia,Zunongwangia profunda,0.0
For each file, I would like to replace the column header "Count" by sample ID, which is contained in the first part of the file name (sample1, sample2)
In the end, the header should then look like this:
Domain,Phylum,Class,Order,Family,Genus,Species,sample1
If I use my code, the header looks like this:
Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}
for f in *_clean.csv; do echo ${f}; sed -e "1s/Domain,Phylum,Class,Order,Family,Genus,Species,RPMM/Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}/" ${f} > ${f%_clean.csv}_clean2.csv; done
I also tried:
for f in *_clean.csv; do gawk -F"," '{$NF=","FILENAME}1' ${f} > t && mv t ${f%_clean.csv}_clean2.csv; done
In this case, "count" is replaced by the entire file name, but each row of the column contains file name now. The count values are no longer present. This is not what I want.
Do you have any ideas on what else I may try?
Thank you very much in advance!
Anna
If you are ok with awk, could you please try following.
awk 'BEGIN{FS=OFS=","} FNR==1{var=FILENAME;sub(/_.*/,"",var);$NF=var} 1' *.csv
EDIT: Since OP is asking that after 2nd underscore everything should be removed in file's name then try following.
awk 'BEGIN{FS=OFS=","} FNR==1{split(FILENAME,array,"_");$NF=array[1]"_"array[2]} 1' *.csv
Explanation: Adding explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of code from here, which will be executed before Input_file(s) are being read.
FS=OFS="," ##Setting FS and OFS as comma here for all files all lines.
} ##Closing BEGIN section here.
FNR==1{ ##Checking condition if FNR==1 which means very first line is being read for Input_file then do following.
split(FILENAME,array,"_") ##Using split of awk out of box function by splitting FILENAME(which contains file name in it) into an array named array with delimiter _ here.
$NF=array[1]"_"array[2] ##Setting last field value to array 1st element underscore and then array 2nd element value in it.
} ##Closing FNR==1 condition BLOCK here.
1 ##Mentioning 1 will print the rest of the lines for current Input_file.
' *.csv ##Passing all *.csv files to awk program here.

Resources