how to replace multiple text line with new text - bash

I have:
SIESTA_ARCH = unknown
CC = gcc
FPP = $(FC) -E -P -x c
FC = gfortran
and I want to replace this by
SIESTA_ARCH = amd64 (x86_64)
CC = mpicc
FPP = $(FC) -E -P -x c
FC = mpif90

I guess next solution is working for you (edited solution according to answers of the PO):
script.sed
#!/bin/sed -f
/^SIESTA_ARCH = unknown/,/^FC =/{
s/^SIESTA_ARCH =.*/SIESTA_ARCH = amd64 (x86_64)/
s/^CC =.*/CC = mpicc/
s/^FC =.*/FC = mpif90/
}
Invoke as ./script.sed Makefile to see the results on the standard output or as ./script.sed -i Makefile to update the file Makefile.
This solution will change all the occurences of SIESTA_ARCH = unknown and the next line block until a line beginning with FC = into the new values.

In bash you can define a function like this (just execute this one-liner in a terminal or script):
function repl() { FIND="$2" REPLACE="$3" ruby -p -i -e "gsub(ENV['FIND'], ENV['REPLACE'])" "$1"; }
Then you can replace whatever literal strings you want in whatever file, e.g.:
repl ~/Code/Makefile 'SIESTA_ARCH = unknown' 'SIESTA_ARCH = amd64 (x86_64)'
repl ~/Code/Makefile 'CC = gcc' 'CC = mpicc'
repl ~/Code/Makefile 'FC = gfortran' 'FC = mpif90'
Note that this will replace all occurrences of such strings in the specified file.

If you have ed
cat script.ed
H
g/^\(SIESTA_ARCH =\)\(.\+\)$/s//\1 amd64 (x86_64)/
g/^\(CC =\)\(.\+\)$/s//\1 mpicc/
,p
Q
Using the script against your file.
ed -s Makefile < script.ed
Output
SIESTA_ARCH = amd64 (x86_64)
CC = mpicc
FPP = $(FC) -E -P -x c
FC = gfortran
Now change ,p Q to w and q To edit the file in-place.
H
g/^\(SIESTA_ARCH =\)\(.\+\)$/s//\1 amd64 (x86_64)/
g/^\(CC =\)\(.\+\)$/s//\1 mpicc/
w
q
ed -s Makefile < script.ed

Related

Makefile: command line argument -e of echo is passed to the file

In Makefile, when I write to file using echo -e "text" >, -e is also passed:
APIM_5 = echo -e "[Desktop Entry]\nName=$(MAIN)\nExec=$(MAIN)\nIcon=$(MAIN)\nType=Application\nVersion=1.0\nCategories=Utility;" > AppDir/usr/share/applications/$(MAIN).desktop;
But the file I echo into ($(MAIN).desktop) looks like below:
-e [Desktop Entry]
Name=main
Exec=main
Icon=main
Type=Application
Version=1.0
Categories=Utility;
All definitions together and how I call them:
APIM_1 = cd output;
APIM_2 = $(RM) AppDir appimage-build;
APIM_3 = mkdir -p AppDir/usr/bin AppDir/usr/share/applications AppDir/usr/share/icons/hicolor/256x256/apps/ AppDir/usr/lib;
APIM_4 = touch AppDir/usr/share/applications/$(MAIN).desktop;
APIM_5 = echo -e "[Desktop Entry]\nName=$(MAIN)\nExec=$(MAIN)\nIcon=$(MAIN)\nType=Application\nVersion=1.0\nCategories=Utility;" > AppDir/usr/share/applications/$(MAIN).desktop;
APIM_6 = cp $(MAIN) AppDir/usr/bin/;
APIM_7 = cp ../meta/icon/$(MAIN).png AppDir/usr/share/icons/hicolor/256x256/apps/;
APIM_8 = appimage-builder --skip-test;
appimage: all
$(APIM_1) $(APIM_2) $(APIM_3) $(APIM_4) $(APIM_5) $(APIM_6) $(APIM_7) $(APIM_8)
#echo Executing 'appimage' complete!
What causes this?

Using bash functions in snakemake

I am trying to download some files with snakemake. The files (http://snpeff.sourceforge.net/SnpSift.html#dbNSFP) I would like to download are on a google site/drive and my usual wget approach does not work. I found a bash function that does the job (https://www.zachpfeffer.com/single-post/wget-a-Google-Drive-file):
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2 rm -rf /tmp/cookies.txt }
gdrive_download 120aPYqveqPx6jtssMEnLoqY0kCgVdR2fgMpb8FhFNHo test.txt
I have tested this function with my ids in a plain bash script and was able to download all the files. To add a bit to the complexity, I must use a workplace template, and incorporate the function into it.
rule dl:
params:
url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_{genome}/{afile}'
output:
'data/{genome}/{afile}'
params:
id1 = '0B7Ms5xMSFMYlOTV5RllpRjNHU2s',
f1 = 'dbNSFP.txt.gz'
shell:
"""CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id={{params.id1}}" -O- | sed -rn "s/.*confirm=([0-9A-Za-z_]+).*/\1\n/p") && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id={{params.id1}}" -O {{params.f1}} && rm -rf /tmp/cookies.txt"""
#'wget -c {params.url} -O {output}'
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
#expand('tmp/{genome}/info/SnpSift{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
There are four downloadable files for which I have ids (I only show one in params), however I don't know how to call the bash functions as written by ZPfeffer for all the ids I have with snakemake. Additionally, when I run this script, there are several errors, the most pressing being
sed: -e expression #1, char 31: unterminated `s' command
I am far from a snakemake expert, any assistance on how to modify my script to a) call the functions with 4 different ids, b) remove the sed error, and c) verify whether this is the correct url format (currently url = 'https://docs.google.com/uc?export/{afile}) will be greatly appreciated.
You would want to use raw string literal so that snakemake doesn't escape special characters, such as backslash in sed command. For example (notice r in front of shell command):
rule foo:
shell:
r"sed d\s\"
You could use --printshellcmds or -p to see how exactly shell: commands get resolved by snakemake.
Here is how I "solved" it:
import pandas as pd
rule dl:
output:
'data/{genome}/{afile}'
shell:
"sh download_snpsift.sh"
rule checksum:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.md5')
shell:
'md5sum {input} > {output}'
rule file_size:
input:
i = 'data/{genome}/{afile}'
output:
o = temp('tmp/{genome}/{afile}.size')
shell:
'du -csh --apparent-size {input} > {output}'
rule file_info:
"""md5 checksum and file size"""
input:
md5 = 'tmp/{genome}/{afile}.md5',
s = 'tmp/{genome}/{afile}.size'
output:
o = temp('tmp/{genome}/info/{afile}.csv')
run:
with open(input.md5) as f:
md5, fp = f.readline().strip().split()
with open(input.s) as f:
size = f.readline().split()[0]
with open(output.o, 'w') as fout:
print('filepath,size,md5', file=fout)
print(f"{fp},{size},{md5}", file=fout)
rule manifest:
input:
expand('tmp/{genome}/info/{suffix}.csv', genome=('GRCh37','GRCh38'), suffix=('dbNSFP.txt.gz', 'dbNSFP.txt.gz.tbi'))
output:
o = 'MANIFEST.csv'
run:
pd.concat([pd.read_csv(afile) for afile in input]).to_csv(output.o, index=False)
And here is the bash script.
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2
rm -rf /tmp/cookies.txt
}
gdrive_download 0B7Ms5xMSFMYlSTY5dDJjcHVRZ3M data/GRCh37/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlOTV5RllpRjNHU2s data/GRCh37/dbNSFP.txt.gz.tbi
gdrive_download 0B7Ms5xMSFMYlbTZodjlGUDZnTGc data/GRCh38/dbNSFP.txt.gz
gdrive_download 0B7Ms5xMSFMYlNVBJdFA5cFZRYkE data/GRCh38/dbNSFP.txt.gz.tbi

how using eval in makfile command change macros value with bash variable

I have a bash function inside the makefile command and want to change macros value.
Is it possible?
C_DFLAGS :=
gui :
parse_flags () { echo $$1; for word in $$1; do if [ $${word::2} = -D ] ; then $(eval C_D_FLAGS+=$${word}); fi ; done ; } ; parse_flags "-D/test -D/TEST"
#echo "C_D_FLAGS :$(C_D_FLAGS)"
$(eval) will be interpreted before your actual bash function call. You cannot update make variables from bash - it's a downstream process.
However, the code you try to run is fairly simple to replace with a native syntax, i.e.:
$ cat Makefile
C_D_FLAGS :=
gui: C_D_FLAGS += -D/test -D/TEST
gui:
#echo "C_D_FLAGS: $(C_D_FLAGS)"
$ make gui
C_D_FLAGS: -D/test -D/TEST
If the flags are provided from elsewhere, they can also be filtered, i.e.:
$ cat Makefile
C_D_FLAGS :=
gui: C_D_FLAGS += $(filter -D%,$(EXTRA_FLAGS))
gui:
#echo "C_D_FLAGS: $(C_D_FLAGS)"
$ make gui
C_D_FLAGS:
$ make gui EXTRA_FLAGS="-Isomething -DFOO -m32"
C_D_FLAGS: -DFOO

How Validate Id is correct before creating file using makefile

I need to Validate the ID with pattern (Abbbbb-yyy)
Example :
ID := A12345-789 B98765-123 C58730-417
VARIANT := test1 test2 test3
Build and post processing will generate files depends up on VARIANTS :
`sw_main_test1.hex ,sw_main_test1.hex and sw_main_test1.hex `
.PHONY : SW_TEST
SW_TEST :
if <ID is correct>
cp sw_main_test1.hex --> A12345-789.hex
cp sw_main_test2.hex --> B98765-123.hex
cp sw_main_test3.hex --> C58730-417.hex
I am facing issue in validating the ID with pattern
`Abbbbb-yyy.txt`
Where : A=[A-Z]; b=[0-9]; y=[0-9]
Please let me know how to verify ID is correct using regular expressions inside the Makefile using any tool or utility
In this script, I assume, you get your ID from a file (I called it here someidcontent.txt). Then you could write a script like this (assuming, you only working on Linux).
getID = $(shell cat someidcontent.txt)
all:
if [ "$(getID)" == "1234567890" ]; then \
cp -v output.txt ./delivery/$(getID).txt; \
fi
.PHONY: all
Edit
I made a mistake in my previous script. I did not check, if the ID is correct. Now my newer script does this: I read from a file the ID and check it for correctness. If ID is correct, then some file will be copied into target dir with ID number.
# get ID from a file
getID := $(shell cat someidcontent.txt)
# need a hack for successful checking
idToCheck := $(getID)
# check procedure
checkID := $(shell echo $(idToCheck) | grep "[A-Z][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]$$")
all:
ifeq "$(checkID)" "$(idToCheck)"
echo found
cp -v output.txt ./delivery/$(idToCheck).txt;
endif
.PHONY: all
Edit 2
Ok, this was a little bit challenging, but I solved it somehow. Maybe there are also other ways to solve this better. In my solution, I assume that the file with IDs and source filenames look like this (in other words, this is the content of my someidcontent.txt):
A2345-678:output1.txt
B3456-123:output.txt
C0987-987:thirdfile.txt
And this is my makefile with comments for additional explanation. I hope, they are sufficient
# retrieve id and filename data from other file
listContent := $(shell cat someidcontent.txt)
# extract only IDs from other files
checkIDs = $(shell echo $(listContent) | grep -o "[A-Z][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9]")
all:
# iterate only over IDs
# first, give me the ID
# second retrieve the filename part for successful copy procedure
# and copy the file to the target dir with ID as filename
#$(foreach x,$(checkIDs), \
echo $(x); \
cp -v $(shell echo $(listContent) | grep -o "$(x):[A-Z0-9a-z\.]*" | sed "s/[-A-Z0-9]*://g") ./delivery/$(x).t$
)
.PHONY: all
You can check simple string patterns quite ok (don't want to say "nicely") from within make:
[A-F] := A B C D E F#
[a-f] := a b c d e f#
[A-Z] := $([A-F]) G H I J K L M N O P Q R S T U V W X Y Z#
[a-z] := $([a-f]) g h i j k l m n o p q r s t u v w x y z#
[0-9] := 0 1 2 3 4 5 6 7 8 9#
######################################################################
##### $(call explode,_stringlist_,_string_)
## Insert a blank after every occurrence of the strings from _stringlist_ in _string_.
## This function serves mainly to convert a string into a list.
## Example: `$(call explode,0 1 2 3 4 5 6 7 8 9,0xl337c0de)` --> `0 xl3 3 7 c0 de`
explode = $(if $1,$(subst $(firstword $1),$(firstword $1) ,$(call explode,$(wordlist 2,255,$1),$2)),$2)
ID := A12345-789 B98765-123 C58730-417 123456+328
############################################################
# $(call check-id,_id-string_)
# Return 'malformed' or the given id
check-id = $(if $(call check-id-1,$(call explode,- $([A-Z]) $([0-9]),$1)),malformed,$1)
check-id-1 = $(strip $(filter-out $([A-Z]),$(wordlist 1,1,$1)) $(filter-out $([0-9]),$(wordlist 2,6,$1)) $(filter-out -,$(word 7,$1)) $(filter-out $([0-9]),$(wordlist 8,10,$1)) )
$(info $(foreach w,$(ID),$(call check-id,$(w))))

GNU MAKE: functions in dependencies

I would like to generate a number of files using GNU Make using the following recipe.
ina_as%.dat: ina_driver.m ina_as$(word 1,$(subst _epsi, , %)).m
echo "modelType = '$(word 1,$(subst _epsi, , $*))'; ofile = '$#'; epsi = '$(word 2,$(subst _epsi, , $*))';" | cat - $< | nohup matlab -nodesktop -nosplash
The targets are in a format -- ina_as%d_epsi%.2f.dat (e.g. ina_as1_epsi0.50.dat) and the second prerequisite is ina_as%d.m (e.g. ina_as1.m) (notice, the second part _epsi%.2f missing in the prerequisite file name).
I have tried several combination for the implicit rule ($, $$, $(eval $*) etc.), but it still does not work. I think it could be because Make could not understand the functions ( '$(word 1,$(subst _epsi, , %))' ) in the dependency definition.
There is any way to overcome this problem?
Thanks.
Questions like this come up from time to time. The short answer is that Make simply can't do this in a clean way; the text manipulation statements expand before executing any rule (i.e. before % has any value), and Make doesn't handle wildcards (or regular expressions) very well.
The longer answer is that it can be done, but only by resorting to one kludge or another. If your version of Make supports SECONDEXPANSION, you can do it this way:
.SECONDEXPANSION:
ina_as%.dat: ina_as$$(word 1,$$(subst _, ,%)).m
#echo "modelType = '$(word 1,$(subst _epsi, , $*))'; ofile = '$#'; epsi\
= '$(word 2,$(subst _epsi, , $*))';" | cat - $< | nohup matlab -nodesktop\
-nosplash
If not, you can resort to recursive Make (useful sometimes, no matter what they say):
ina_as%.dat :
#$(MAKE) dummy MODELTYPE=`echo $* | sed "s/_.*//"` EPSI=`echo $* | sed \
"s/.*_epsi//"`
dummy: ina_as$(MODELTYPE).m
#echo "modelType = $(MODELTYPE); ofile = ina_as$(MODELTYPE)_epsi$(EPSI)\
; epsi = $(EPSI);" | cat - ina_as$(MODELTYPE).m | nohup matlab -nodesktop\
-nosplash

Resources