Bash: replace part of filename - bash

I have a command I want to run on all of the files of a folder, and the command's syntax looks like this:
tophat -o <output_file> <input_file>
What I would like to do is a script that loops over all the files in an arbitrary folder and also uses the input file names to create similar, but different, output file names. The file names looks like this:
input name desired output name
path/to/sample1.fastq path/to/sample1.bam
path/to/sample2.fastq path/to/sample2.bam
Getting the input to work seems simple enough:
for f in *.fastq
do
tophat -o <output_file> $f
done
I tried using output=${f,.fastq,.bam} and using that as the output parameter, but that doesn't work. All I get is an error: line 3: ${f,.fastq,.bam}: bad substitution. Is this the way to do what I want, or should I do something else? If it's the correct way, what am I doing wrong?
[EDIT]:
Thanks for all the answers! A bonus question, though... What if I have files named like this, instead:
path/to/sample1_1.fastq
path/to/sample1_2.fastq
path/to/sample2_1.fastq
path/to/sample2_2.fastq
...
... where I can have an arbitrary number of samples (sampleX), but all of them have two files associated with them (_1 and _2). The command now looks like this:
tophat -o <output_file> <input_1> <input_2>
So, there's still just the one output, for which I could do something like "${f/_[1-2].fastq/.bam}", but I'm unsure how to get a loop that only iterates once over every sampleX at the same time as taking both the associated files... Ideas?
[EDIT #2]:
So, this is the final script that did the trick!
for f in *_1.fastq
do
tophat -o "${f/_1.fastq/.bam}" $f "${f/_1.fastq/_2.fasq}"
done

You can use:
tophat -o "${f/.fastq/.bam}" "$f"
Testing:
f='path/to/sample1.fastq'
echo "${f/.fastq/.bam}"
path/to/sample1.bam

Not an answer but a suggestion: as a bioinformatician, you shoud use GNU make and its option -j (number of parallel jobs). The Makefile would be:
.PHONY:all
FASTQS=$(shell ls *.fastq)
%.bam: %.fastq
tophat -o $# $<
all: $(FASTQS:.bam=.fastq)

Alternative to anubhava's concise solution,
d=$(dirname path/to/sample1.fastq)
b=$(basename path/to/sample1.fastq .fastq)
echo $d/$b.fastq
path/to/sample1.fastq
tophat -o "$d/$b.fastq" "$f"

Related

i want to change the names of the files and this change is in the middle. i basically need to remove a part of the variable and write something else

i have tried this
$ls
casts.c endian.c ptr.c signed-unsigned-representations.c signed-unsigned.c test-hard-link.c
$for i in *.c;do mv "$i" "$i"__swa.c; done
$ls
casts.c__swa.c endian.c__swa.c ptr.c__swa.c signed-unsigned-representations.c__swa.c signed-unsigned.c__swa.c test-hard-link.c__swa.c
and i know that because my i variable is *.c so when i try to rename and add the (__swa.c) part it just gets added on the variable name.
i need the files to be renamed like this:
casts__swa.c endian__swa.c ptr__swa.c signed-unsigned-representations__swa.c signed-unsigned__swa.c test-hard-link__swa.c
Using Bash's parameter expansion , you could do something like this:
for f in *.c; do
echo mv "$f" "${f%%.c}"__swa.c
done
(Remove the echo of course, if it looks like it will do what you want)
But I generally rather use the more flexible rename using Perl, as suggested in the answer by Cyrus.
With Perl's standalone rename or prename command:
rename -n 's/\./__swa./' *.c
If output looks okay, remove -n.

/bin/sh: -c: line 1: syntax error: unexpected end of file in bash [duplicate]

Considering that every command is run in its own shell, what is the best way to run a multi-line bash command in a makefile? For example, like this:
for i in `find`
do
all="$all $i"
done
gcc $all
You can use backslash for line continuation. However note that the shell receives the whole command concatenated into a single line, so you also need to terminate some of the lines with a semicolon:
foo:
for i in `find`; \
do \
all="$$all $$i"; \
done; \
gcc $$all
But if you just want to take the whole list returned by the find invocation and pass it to gcc, you actually don't necessarily need a multiline command:
foo:
gcc `find`
Or, using a more shell-conventional $(command) approach (notice the $ escaping though):
foo:
gcc $$(find)
As indicated in the question, every sub-command is run in its own shell. This makes writing non-trivial shell scripts a little bit messy -- but it is possible! The solution is to consolidate your script into what make will consider a single sub-command (a single line).
Tips for writing shell scripts within makefiles:
Escape the script's use of $ by replacing with $$
Convert the script to work as a single line by inserting ; between commands
If you want to write the script on multiple lines, escape end-of-line with \
Optionally start with set -e to match make's provision to abort on sub-command failure
This is totally optional, but you could bracket the script with () or {} to emphasize the cohesiveness of a multiple line sequence -- that this is not a typical makefile command sequence
Here's an example inspired by the OP:
mytarget:
{ \
set -e ;\
msg="header:" ;\
for i in $$(seq 1 3) ; do msg="$$msg pre_$${i}_post" ; done ;\
msg="$$msg :footer" ;\
echo msg=$$msg ;\
}
The ONESHELL directive allows to write multiple line recipes to be executed in the same shell invocation.
all: foo
SOURCE_FILES = $(shell find . -name '*.c')
.ONESHELL:
foo: ${SOURCE_FILES}
FILES=()
for F in $^; do
FILES+=($${F})
done
gcc "$${FILES[#]}" -o $#
There is a drawback though : special prefix characters (‘#’, ‘-’, and ‘+’) are interpreted differently.
https://www.gnu.org/software/make/manual/html_node/One-Shell.html
Of course, the proper way to write a Makefile is to actually document which targets depend on which sources. In the trivial case, the proposed solution will make foo depend on itself, but of course, make is smart enough to drop a circular dependency. But if you add a temporary file to your directory, it will "magically" become part of the dependency chain. Better to create an explicit list of dependencies once and for all, perhaps via a script.
GNU make knows how to run gcc to produce an executable out of a set of .c and .h files, so maybe all you really need amounts to
foo: $(wildcard *.h) $(wildcard *.c)
What's wrong with just invoking the commands?
foo:
echo line1
echo line2
....
And for your second question, you need to escape the $ by using $$ instead, i.e. bash -c '... echo $$a ...'.
EDIT: Your example could be rewritten to a single line script like this:
gcc $(for i in `find`; do echo $i; done)

Can you use make to process command line specified files?

This is exactly what I want to do: I wonder if I can define a rule in make, such that I can provide make a .o file through the command line, and then that command line will be processed with the command
arm-none-eabi-objdump -D <object file>
The idea is to simplify debugging by not having to do that command by hand but rather do something like:
debug file.o
If make can't do this is there any way to do this?
Sort of. You can create a rule like:
%.dump : %.o; arm-none-eabi-objdump -D $^
then do:
$ make file.dump
hth

how to work at the same time with multiple files inside file name array in linux shell script?

I have a shell script and i read all .s files in the specified folder first and then compile them to object file with a loop and after that link them to executable file.
this:
FILES=PTscalar_1.0/mibenchforpt/security/sha/*.s
for sfile in $FILES
do
echo "------------------------------------------------"
echo $sfile
objectFile="${sfile%.s}.o"
exefile="${objectFile%.o}.ex"
simplescalar/bin/sslittle-na-sstrix-as -o $objectFile $sfile
done
but I have a problem: in sha mibench program we have 2 files that each of them is in this flow:
.c -> .s -> .o
but at the last stage two .o files should be linked into one executable file.
how I can get two file names at the same time and create a command to link them.
main code is this:
simplescalar/bin/sslittle-na-sstrix-ld -o __sha.ex _sha.o _sha_driver.o
is there any way to see inside of FILES like this:
OFILES=PTscalar_1.0/mibenchforpt/security/sha/*.o
simplescalar/bin/sslittle-na-sstrix-ld -o $exefile OFILES[0] OFILES[1]
and after that doing that in a loop for all files with this pattern
first file is like *.o or *_main.o
second is: *_driver.o
Thanks
Obviously this is possible in shell. However many people find that the make utility is better for building software than shell scripts simply because of these dependencies. take a look at GNU Make. Its documentation contains numerous examples of what you're trying to do.
Caveat: Your tags "linux shell" do not specify a specific shell. POSIX sh, the standard specifying minimum required behavior for /bin/sh, does not support arrays; you should use a specific shell, such as bash or ksh, which does. To do this, you need to start your script with an appropriate shebang (such as #!/bin/bash instead of #!/bin/sh), and do any manual invocations with the correct shell (so bash -x myscript if you would otherwise use sh -x myscript... though if you've set the shebang correctly and have +x permissions, you can always just ./myscript)
# this is broken
FILES=PTscalar_1.0/mibenchforpt/security/sha/*.s
...does not create an array.
# this works in bash, ksh, and zsh
files=( PTscalar_1.0/mibenchforpt/security/sha/*.s )
does create an array, which can be expanded as "${files[#]}". So:
# this works in bash and ksh, and probably zsh
for file in "${files[#]}"; do
...
done
However, in this particular case, you don't have a reason to use an array at all:
# this works with absolutely any POSIX-compatible shell
for file in PTscalar_1.0/mibenchforpt/security/sha/*.s; do
echo "$sfile"
objectFile=${sfile%.s}.o
exefile=${objectFile%.o}.ex
simplescalar/bin/sslittle-na-sstrix-as -o "$objectFile" "$sfile"
done
Note a few corrections made in the above:
The right-hand-side of assignments in with no literal whitespace in their syntax do not need to be quoted.
All expansions (such as $objectFile) do need to be quoted, so, "$objectFile".
...yes, this does include echo; to test this, run s='*' and compare the output of echo $s to echo "$s".
To address the follow-up question you edited in:
ofiles=( PTscalar_1.0/mibenchforpt/security/sha/*.o )
simplescalar/bin/sslittle-na-sstrix-ld -o "$exefile" "${ofiles[0]}" "${ofiles[1]}"
...is a literal answer, but this would need to be edited if you had two or more outputs. Much better to do it this way instead:
ofiles=( PTscalar_1.0/mibenchforpt/security/sha/*.o )
simplescalar/bin/sslittle-na-sstrix-ld -o "$exefile" "${ofiles[#]}"
I created this file and it worked:
#!/bin/bash
#compile to assembly:
FILES=*_driver.s
for sdriverfile in $FILES
do
echo "------------------------------------------------"
# s file
echo $sdriverfile
sfile="${sdriverfile%_driver.s}.s"
echo $sfile
# object files
obj="${sfile%.s}.o"
obj_driver="${sdriverfile%.s}.o"
#exe file
exefile="${sfile%.s}_as.ex"
echo $exefile
#compile
/home/mahdi/programs/simplescalar/bin/sslittle-na-sstrix-as -o $obj $sfile
/home/mahdi/programs/simplescalar/bin/sslittle-na-sstrix-as -o $obj_driver $sdriverfile
#link
/home/mahdi/programs/simplescalar/bin/sslittle-na-sstrix-ld -o $exefile $obj $obj_driver -L /home/mahdi/programs/simplescalar/sslittle-na-sstrix/lib -lc -L /home/mahdi/programs/simplescalar/lib/gcc-lib/sslittle-na-sstrix/2.7.2.3/ -lgcc
done
thanks for answers.

Multi-line bash commands in makefile

Considering that every command is run in its own shell, what is the best way to run a multi-line bash command in a makefile? For example, like this:
for i in `find`
do
all="$all $i"
done
gcc $all
You can use backslash for line continuation. However note that the shell receives the whole command concatenated into a single line, so you also need to terminate some of the lines with a semicolon:
foo:
for i in `find`; \
do \
all="$$all $$i"; \
done; \
gcc $$all
But if you just want to take the whole list returned by the find invocation and pass it to gcc, you actually don't necessarily need a multiline command:
foo:
gcc `find`
Or, using a more shell-conventional $(command) approach (notice the $ escaping though):
foo:
gcc $$(find)
As indicated in the question, every sub-command is run in its own shell. This makes writing non-trivial shell scripts a little bit messy -- but it is possible! The solution is to consolidate your script into what make will consider a single sub-command (a single line).
Tips for writing shell scripts within makefiles:
Escape the script's use of $ by replacing with $$
Convert the script to work as a single line by inserting ; between commands
If you want to write the script on multiple lines, escape end-of-line with \
Optionally start with set -e to match make's provision to abort on sub-command failure
This is totally optional, but you could bracket the script with () or {} to emphasize the cohesiveness of a multiple line sequence -- that this is not a typical makefile command sequence
Here's an example inspired by the OP:
mytarget:
{ \
set -e ;\
msg="header:" ;\
for i in $$(seq 1 3) ; do msg="$$msg pre_$${i}_post" ; done ;\
msg="$$msg :footer" ;\
echo msg=$$msg ;\
}
The ONESHELL directive allows to write multiple line recipes to be executed in the same shell invocation.
all: foo
SOURCE_FILES = $(shell find . -name '*.c')
.ONESHELL:
foo: ${SOURCE_FILES}
FILES=()
for F in $^; do
FILES+=($${F})
done
gcc "$${FILES[#]}" -o $#
There is a drawback though : special prefix characters (‘#’, ‘-’, and ‘+’) are interpreted differently.
https://www.gnu.org/software/make/manual/html_node/One-Shell.html
Of course, the proper way to write a Makefile is to actually document which targets depend on which sources. In the trivial case, the proposed solution will make foo depend on itself, but of course, make is smart enough to drop a circular dependency. But if you add a temporary file to your directory, it will "magically" become part of the dependency chain. Better to create an explicit list of dependencies once and for all, perhaps via a script.
GNU make knows how to run gcc to produce an executable out of a set of .c and .h files, so maybe all you really need amounts to
foo: $(wildcard *.h) $(wildcard *.c)
What's wrong with just invoking the commands?
foo:
echo line1
echo line2
....
And for your second question, you need to escape the $ by using $$ instead, i.e. bash -c '... echo $$a ...'.
EDIT: Your example could be rewritten to a single line script like this:
gcc $(for i in `find`; do echo $i; done)

Resources