I am very, very new to UNIX programming (running on MacOSX Mountain Lion via Terminal). I've been learning the basics from a bioinformatics and molecular methods course (we've had two classes) where we will eventually be using perl and python for data management purposes. Anyway, we have been tasked with writing a shell script to take data from a group of files and write it to a new file in a format that can be read by a specific program (Migrate-N).
I have gotten a number of functions to do exactly what I need independently when I type them into the command line, but when I put them all together in a script and try to run it I get an error. Here are the details (I apologize for the length):
#! /bin/bash
grep -f Samples.NFCup.txt locus1.fasta > locus1.NFCup.txt
grep -f Samples.NFCup.txt locus2.fasta > locus2.NFCup.txt
grep -f Samples.NFCup.txt locus3.fasta > locus3.NFCup.txt
grep -f Samples.NFCup.txt locus4.fasta > locus4.NFCup.txt
grep -f Samples.NFCup.txt locus5.fasta > locus5.NFCup.txt
grep -f Samples.Salmon.txt locus1.fasta > locus1.Salmon.txt
grep -f Samples.Salmon.txt locus2.fasta > locus2.Salmon.txt
grep -f Samples.Salmon.txt locus3.fasta > locus3.Salmon.txt
grep -f Samples.Salmon.txt locus4.fasta > locus4.Salmon.txt
grep -f Samples.Salmon.txt locus5.fasta > locus5.Salmon.txt
grep -f Samples.Cascades.txt locus1.fasta > locus1.Cascades.txt
grep -f Samples.Cascades.txt locus2.fasta > locus2.Cascades.txt
grep -f Samples.Cascades.txt locus3.fasta > locus3.Cascades.txt
grep -f Samples.Cascades.txt locus4.fasta > locus4.Cascades.txt
grep -f Samples.Cascades.txt locus5.fasta > locus5.Cascades.txt
echo 3 5 Salex_melanopsis > Smelanopsis.mig
echo 656 708 847 1159 779 >> Smelanopsis.mig
echo 154 124 120 74 126 NFCup >> Smelanopsis.mig
cat locus1.NFCup.txt locus2.NFCup.txt locus3.NFCup.txt locus4.NFCup.txt locus5.NFCup.txt >> Smelanopsis.mig
echo 32 30 30 18 38 Salmon River >> Smelanopsis.mig
cat locus1.Salmon.txt locus2.Salmon.txt locus3.Salmon.txt locus4.Salmon.txt locus5.Salmon.txt >> Smelanopsis.mig
echo 56 52 24 29 48 Cascades >> Smelanopsis.mig
cat locus1.Cascades.txt locus2.Cascades.txt locus3.Cascades.txt locus4.Cascades.txt locus5.Cascades.txt >> Smelanopsis.mig
The series of greps are just pulling out DNA sequence data for each site for each locus into new text files. The Samples...txt files have the sample ID numbers for a site, the .fasta files have the sequence information organized by sample ID; the grepping works just fine in command line if I run it individually.
The second group of code creates the actual new file I need to end up with, that ends in .mig. The echo lines are data about counts (basepairs per locus, populations in the analysis, samples per site, etc.) that the program needs information on. The cat lines are to mash together the locus by site data created by all the grepping below the site-specific information dictated in the echo line. You no doubt get the picture.
For creating the shell script I've been starting in Excel so I can easily copy-paste/autofill cells, saving as tab-delimited text, then opening that text file in TextWrangler to remove the tabs before saving as a .sh file (Line breaks: Unix (LF) and Encoding: Unicode (UTF-8)) in the same directory as all the files used in the script. I've tried using chmod +x FILENAME.sh and chmod u+x FILENAME.sh to try to make sure it is executable, but to no avail. Even if I cut the script down to just a single grep line (with the #! /bin/bash first line) I can't get it to work. The process only takes a moment when I type it directly into the command line as none of these files are larger than 160KB and some are significantly smaller. This is what I type in and what I get when I try to run the file (HW is the correct directory)
localhost:HW Mirel$ MigrateNshell.sh
-bash: MigrateNshell.sh: command not found
I've been at this impass for two days now, so any input would be greatly appreciated! Thanks!!
For security reasons, the shell will not search the current directory (by default) for an executable. You have to be specific, and tell bash that your script is in the current directory (.):
$ ./MigrateNshell.sh
Change the first line to the following as pointed out by Marc B
#!/bin/bash
Then mark the script as executable and execute it from the command line
chmod +x MigrateNshell.sh
./MigrateNshell.sh
or simply execute bash from the command line passing in your script as a parameter
/bin/bash MigrateNshell.sh
Make sure you are not using "PATH" as a variable, which will override the existing PATH for environment variables.
Also try to dos2unix the shell script, because sometimes it has Windows line endings and the shell does not recognize it.
$ dos2unix MigrateNshell.sh
This helps sometimes.
#! /bin/bash
^---
remove the indicated space. The shebang should be
#!/bin/bash
Unix has a variable called PATH that is a list of directories where to find commands.
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Users/david/bin
If I type a command foo at the command line, my shell will first see if there's an executable command /usr/local/bin/foo. If there is, it will execute /usr/local/bin/foo. If not, it will see if there's an executable command /usr/bin/foo and if not there, it will look to see if /bin/foo exists, etc. until it gets to /Users/david/bin/foo.
If it can't find a command foo in any of those directories, it tell me command not found.
There are several ways I can handle this issue:
Use the commandbash foo since foo is a shell script.
Include the directory name when you eecute the command like /Users/david/foo or $PWD/foo or just plain ./foo.
Change your $PATH variable to add the directory that contains your commands to the PATH.
You can modify $HOME/.bash_profile or $HOME/.profile if .bash_profile doesn't exist. I did that to add in /usr/local/bin which I placed first in my path. This way, I can override the standard commands that are in the OS. For example, I have Ant 1.9.1, but the Mac came with Ant 1.8.4. I put my ant command in /usr/local/bin, so my version of antwill execute first. I also added $HOME/bin to the end of the PATH for my own commands. If I had a file like the one you want to execute, I'll place it in $HOME/bin to execute it.
Try chmod u+x MigrateNshell.sh
There have been a few good comments about adding the shebang line to the beginning of the script. I'd like to add a recommendation to use the env command as well, for additional portability.
While #!/bin/bash may be the correct location on your system, that's not universal. Additionally, that may not be the user's preferred bash. #!/usr/bin/env bash will select the first bash found in the path.
Also make sure /bin/bash is the proper location for bash .... if you took that line from an example somewhere it may not match your particular server. If you are specifying an invalid location for bash you're going to have a problem.
Add below lines in your .profile path
PATH=$PATH:$HOME/bin:$Dir_where_script_exists
export PATH
Now your script should work without ./
Raj Dagla
I'm new to shell scripting too, but I had this same issue. Make sure at the end of your script you have a blank line. Otherwise it won't work.
First:
chmod 777 ./MigrateNshell.sh
Then:
./MigrateNshell.sh
Or, add your program to a directory recognized in your $PATH variable. Example: Path Variable Example
Which will then allow you to call your program without ./
Related
I'm storing commands in a file to be read and run line by line by a POSIX shell program. It looks something like this:
curl -fLo $HOME/.antigen.zsh git.io/antigen
curl -fLo $HOME/.vim/autoload/plug.vim --create-dirs https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
vim +"so $HOME/.vimrc" +PlugInstall +qa!
I'm also using this small function body to go through it and run every line:
while read -r line; do
$line
done < file
Simple stuff. And it works! However, I am having trouble expanding $HOME to my home directory (and ~ for that matter). I've tried using an exec subshell and removing the -r from the read loop but the curl statements create a '$HOME' directory, which is not what I want to do, I want the commands to target my /home/.\+/ directory.
Since this is a strange question and you'll probably be wondering at this point (I certainly would), this is not an XY problem. I have spent a considerable time designing this piece of software and am certain that I need to store these commands in a file for my program to work and I won't consider doing otherwise unless this is proven absolutely impossible. Also, I'm not expanding $HOME myself because I want the commands to work in other users' computers.
Any ideas? Thanks in advance!
Transferring comments into an answer.
Can you use:
sh -c "$line"
Or:
eval $line
Usually eval is regarded as dangerous, but I'm not sure that sh -c is much different. Come to think of it, why not simply execute the file storing the commands?
sh "$file"
You can use sh -e "$file" to stop on an unchecked error, and add -x to see what is being executed.
I'm a greenhorn at bash scripting. I have a file called dir.txt storing this text:
./src/defaultPackage
how do I read dir.txt and switch to directory ./src/defaultPackage in one line?
I've tried cat ./dir.txt | xargs cd but cd throws no such file or directory error
You could do
cd "$(head -1 dir.txt)"
for explanations, read the GNU bash manual then head(1). In some cases (e.g. if dir.txt contains comments) you may want to use gawk(1) instead of head.
Notice that xargs(1) cannot work with any bash-builtins(7), including cd; xargs needs an executable file (or script) since it uses exec(3) functions. See execve(2) and elf(5).
You may want to read Advanced Linux Programming and syscalls(2) then study the source code of your GNU bash, which is free software. You could be interested in using strace(1) or ltrace(1) or gdb(1) to understand the behavior of programs or processes, including your Unix shell.
Of course, avoid weird characters such as newlines, tabs, or spaces in file or directory names. See path_resolution(7) and glob(7).
You could consider using GNU guile (see this and SICP) or Python for your scripts. Both are much more readable (and in my opinion, easier to write) than bash or even zsh or fish. Of course, use a distributed version control system (such as git or mercurial) on your scripts.
You might be interested by Linux From Scratch.
Using only builtins:
IFS= read -r line < dir.txt
cd -- "$line"
To read a line from a file, use read (and not head/cat/sed/whatever).
The reason you get your error, is that cd is a builtin, and not a command in your PATH: it wouldn't make sense for cd to be an external command, and it changes the internal state of the shell.
Note that if the line is empty, this will cd to $HOME. To avoid that:
IFS= read -r line < dir.txt && cd -- "${line:-.}"
I would like to get just the filename (with extension) of the output file I pass to my bash script:
a=$1
b=$(basename -- "$a")
echo $b #for debug
if [ "$b" == "test" ]; then
echo $b
fi
If i type in:
./test.sh /home/oscarchase/test.sh > /home/oscarchase/test.txt
I would like to get:
test.txt
in my output file but I get:
test.sh
How can I procede to parse this first argument to get the right name ?
Try this:
#!/bin/bash
output=$(readlink /proc/$$/fd/1)
echo "output is performed to \"$output\""
but please remember that this solution is system-dependent (particularly for Linux). I'm not sure that /proc filesystem has the same structure in e.g. FreeBSD and certainly this script won't work in bash for Windows.
Ahha, FreeBSD obsoleted procfs a while ago and now has a different facility called procstat. You may get an idea on how to extract the information you need from the following screenshot. I guess some awk-ing is required :)
Finding out the name of the file that is opened on file descriptor 1 (standard output) is not something you can do directly in bash; it depends on what operating system you are using. You can use lsof and awk to do this; it doesn't rely on the proc file system, and although the exact call may vary, this command worked for both Linux and Mac OS X, so it is at least somewhat portable.
output=$( lsof -p $$ -a -d 1 -F n | awk '/^n/ {print substr($1, 2)}' )
Some explanation:
-p $$ selects open files for the current process
-d 1 selects only file descriptor 1
-a is use to require both -p and -d apply (the default is to show all files that match either condition
-F n modifies the output so that you get one line per field, prefixed with an identifier character. With this, you'll get two lines: one beginning with p and indicating the process ID, and one beginning with `n indicating the file name of the file.
The awk command simply selects the line starting with n and outputs the first field minus the initial n.
I often want to perform a function on the most recent file in the current directory. Essentially I want a more general version of a method to open last modified file in the directory using vi.
I am able to write a global alias in zsh that does part of what I need:
alias -g lafi='`ls -rt|tail -n 1`'
Now I can execute something like
cat lafi
and I will see the content of the most recent file in the current dir. Or I can issue echo lafi to figure out what the last file was (or I could even say ls -rt|tail -n 1).
Is there a way to modify the alias definition such that it outputs the last file (to STDERR?) and then hands it on like lafi above for further consumption in the commandline?. So for the above cat lafi I would hope for this output.
last file: <name of last-file>
<content of last-file>
I suspect this involves tee but my shell kung fu doesn't cover this in sufficient detail.
Perhaps
alias -g lafi='`ls -rt | tail -n 1 | tee >({ printf "last file: "; cat; } >&2)`'
I think zsh has process substitutions like that.
These lines work when copy-pasted to the shell but don't work in a script:
ls -l file1 > /path/`echo !#:2`.txt
ls -l file2 > /path/`echo !#:2`.txt
ls -l file1 > /path/$(echo !#:2).txt
ls -l file2 > /path/$(echo !#:2).txt
What's the syntax for doing this in a bash script?
If possible, I would like to know how to do this for one file and for all files with the same extension in a folder.
Non-interactive shell has history expansion disabled.
Add the following two lines to your script to enable it:
set -o history
set -o histexpand
(UPDATE: I misunderstood the original question as referring to arguments to the script, not arguments to the current command within the script; this is a rewritten answer.)
As #choroba said, history is disabled by default in scripts, because it's not really the right way to do things like this in a script.
The preferred way to do things like this in a script is to store the item in question (in this case the filename) in a variable, then refer to it multiple times in the command:
fname=file1
ls -l "$fname" > "/path/$fname.txt"
Note that you should almost always put variable references inside double-quotes (as I did above) to avoid trouble if they contain spaces or other shell metacharacters. If you want to do this for multiple files, use a for loop:
for fname in *; do # this will repeat for each file (or directory) in the current directory
ls -l "$fname" > "/path/$fname.txt"
done
If you want to operate on files someplace other than the current directory, things are a little more complicated. You can use /inputpath/*, but it'll include the path along with each filename (e.g. it'd run the loop with "/inputpath/file1", "/inputpath/file2", etc), and if you use that directly in the output redirect you'll get something like > /path/inputpath/file1.txt (i.e. the two different paths will get appended together), probably not what you want. In this case, you can use the basename command to strip off the unwanted path for output purposes:
for fpath in /inputpath/*; do
ls -l "$fpath" > "/path/$(basename "$fpath").txt"
done
If you want a list of files with a particular extension, just use *.foo or /inputpath/*.foo as appropriate. However, in this case you'll wind up with the output going to files named e.g. "file1.foo.txt"; if you don't want stacked extensions, basename has an option to trim that as well:
for fpath in /inputpath/*.foo; do
ls -l "$fpath" > "/path/$(basename "$fpath" .foo).txt"
done
Finally, it might be neater (depending how complex the actual operation is, and whether it occurs multiple times in the script) to wrap this in a function, then use that:
doStuffWithFile() {
ls -l "$1" > "/path/$(basename "$1" "$2").txt"
}
for fpath in /inputpath/*.foo; do
doStuffWithFile "$fpath" ".foo"
done
doStuffWithFile /otherpath/otherfile.bar .bar