Problems with basename in a loop - shell

I am new at shell script programming and I'm trying to execute a software that reads a text and perform it's POS tagging. It requires an input and an output, as can be seen in the execute example:
$ cat input.txt | /path/to/tagger/run-Tagger.sh > output.txt
What I'm trying to do is to execute this line not only for a text, but a set of texts in an specific folder, and return the output files with the same name as the input files. So, I tried to do this script:
#!/bin/bash
path="/home/rafaeldaddio/Documents/"
program="/home/rafaeldaddio/Documents/LX-Tagger/POSTagger/Tagger/run-Tagger.sh"
for arqin in '/home/rafaeldaddio/Documents/teste/*'
do
out=$(basename $arqin)
output=$path$out
cat $arqin | $program > $output
done
I tried it with only one file and it works, but when I try with more than one, I get this error:
basename: extra operand ‘/home/rafaeldaddio/Documents/teste/3’
Try 'basename --help' for more information.
./scriptLXTagger.sh: 12: ./scriptLXTagger.sh: cannot create /home/rafaeldaddio/Documents/: Is a directory
Any insights on what I'm doing wrong? Thanks.

You don't want quotes around the pattern, and quote your variables:
for arqin in /home/rafaeldaddio/Documents/teste/*
do
out=$(basename "$arqin")
output=$path$out
"$program" <"$arqin" >"$output"
done

Related

no such file or directory error when using variables (works otherwise)

I am new to programming and just starting in bash.
I'm trying to print a list of directories and files to a txt file, and remove some of the path that gets printed to make it cleaner.
It works with this:
TODAY=$(date +"%Y-%m-%d")
cd
cd Downloads
ls -R ~/Music/iTunes/iTunes\ Media/Music | sed 's/\/Users\/BilPaLo\/Music\/iTunes\/iTunes\ Media\/Music\///g' > music-list-$TODAY.txt
But to clean it up I want to use variables like so,
# Creates a string of the date, format YYYY-MM-DD
TODAY="$(date +"%Y-%m-%d")"
# Where my music folders are
MUSIC="$HOME/Music/iTunes/iTunes\ Media/Music/"
# Where I want it to go
DESTINATION="$HOME/Downloads/music-list-"$TODAY".txt"
# Path name to be removed from text file
REMOVED="\/Users\/BilPaLo\/Music\/iTunes\/iTunes\ Media\/Music\/"
ls -R "$MUSIC" > "$DESTINATION"
sed "s/$REMOVED//g" > "$DESTINATION"
but it gives me a 'no such file or directory' error that I can't seem to get around.
I'm sure there are many other problems with this code but this one I don't understand.
Thank you everyone! I followed the much needed formatting advice and #amo-ej1's answer and now this works:
# Creates a string of the date format YYYY-MM-DD
today="$(date +"%Y-%m-%d")"
# Where my music folders are
music="$HOME/Music/iTunes/iTunes Media/Music/"
# Where I want it to go
destination="$HOME/Downloads/music-list-$today.txt"
# Temporary file
temp="$HOME/Downloads/temp.txt"
# Path name to be removed of text file to only leave artist name and album
remove="\\/Users\\/BilPaLo\\/Music\\/iTunes\\/iTunes\\ Media\\/Music\\/"
# lists all children of music and writes it in temp
ls -R "$music" > "$temp"
# substitutes remove by nothing and writes it in destination
sed "s/$remove//g" "$temp" > "$destination"
rm $temp #deletes temp
First when debugging bash it can be helpful to start bash with the -x flags (bash -x script.sh) or within the script enter set -x, that way bash will print out the commands it is executing (with the variable expansions) and you can more easily spot errors that way.
In this specific snippet our ls output is being redirected to a file called $DESTINATION and and sed will read from standard input and write also to $DESTINATION. So however you wanted to replace the pipe in your oneliner is wrong. As a result this will look as if your program is blocked but sed will simply wait for input arriving on standard input.
As for the 'no such file or directory', try executing with set -x and doublecheck the paths it is trying to access.

bash script to read my files and execute the command

I have a script which one can run as ./analyze file.txt file.root log.txt
file.txt is input file which contains executable root files with their paths, others are output. My problem is I have almost 30 input files and i do not want to write down the command each time to run the code. Bash script would be nice but I did not manage. It gives an error. see an example of the code below, I try to run:
#!/bin/bash
do
echo
./analyze runlist1.txt runlist1.root log1.txt
./analyze runlist2.txt runlist2.root log.txt
./analyze runlist3.txt runlist3.root log.txt
./analyze runlist4.txt runlist4.root res4.txt
done
I get the error "syntax error near unexpected token `do' ", Any help would be appreciated.
When all files are called Brunsplit.txt, the following will help
for file in Brunsplit*.txt; do
tmp=${file%.txt}
nr=${tmp#Brunsplit}
./analyze ${file} runlist${nr}.root res${nr}.txt
done
The tmp and nr vars are filled with special syntax, it is something like cutting off .txt at the end and Brunsplit from the start.
Like others indicated in comments, the do out of context is a syntax error (it is a valid keyword right after a for, while, and until control statement).
Apparently, there is no systematic mapping between input file names and the corresponding log file names, so you need a script which maintains this mapping. Something like this, then:
while read suffix logfile; do
./analyze "runlist$suffix.txt" "runlist$suffix.root" "$logfile"
done <<'____HERE'
1 log1.txt
2 log.txt
3 log.txt
4 res4.txt
____HERE
The here document (the stuff between << delimiter and the delimiter alone on a line) is just like a text file, except it is embedded as part of the script.
Check it out the primitive but the modified version.
!/bin/bash
for file in Brunsplit1.txt Brunsplit2.txt
do
echo $file
./analyze Brunsplit1.txt runlist1.root res1.txt
./analyze Brunsplit2.txt runlist2.root res2.txt
done
Thanks.

Output filename from input in bash

I have this script:
#!/bin/bash
FASTQFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fastq
FASTAFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fasta
clear
for file in $FASTQFILES
do cat $FASTQFILES | perl -e '$i=0;while(<>){if(/^\#/&&$i==0){s/^\#/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${FASTQFILES%.*}.fasta"
mv $FASTAFILES ~/Programs/ncbi-blast-2.2.29+/db/
done
I'm trying it to grab the files defined in $FASTQFILES, do the .fastq to .fasta conversion, name the output with the same filename of the input, and move it to a new folder. E.g., ~/./DB_files/HELLO.fastq should give a converted ~/./db/HELLO.fasta
The problem is that the output of the conversion is a properly formatted hidden file called .fasta in the first folder instead of the expected one named HELLO.fasta. So there is nothing to mv. I think I'm messing up in the ${FASTQFILES%.*}.fasta argument but I can't seem to fix it.
I see three problems:
One part of your trouble is that you use cat $FASTQFILES instead of cat $file.
You also need to fix the I/O redirection at the end of that line to > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${file%.fastq}.fasta".
The mv command needs to be executed outside the loop.
In fact, when processing a single file at a time, you don't need to use cat at all (UUOC — Useless Use Of Cat). Simply provide "$file" as an argument to the Perl script.

Unix Shell Script to take multiple files from standard input (csh)

Using either the for loop or the pipe (both work with one filename), I need to figure out how to accept unlimited specified files from standard input. I have tried regular expressions, and various wildcard forms. The two main issues I'm running into: only the first file is put through the script or every single file in the directory is put through. This is an assignment for a basic Unix Course and my problem thus far is over-complication. Based on the rest of the semester, there's a simple fix for what I'm wanting to do and here I've spent two hours perusing hundreds of websites and posts making my head spin.
EDIT: The command line prompt would be something like this ~/dir/script currentWord newWord fileName1 fileName2 fileName3
#!/bin/csh
set currentWord=$1
set newWord=$2
set fileName=$3
if { grep -q $1 *$3 } then
sed -i.bak -e "s/$1/$2/g" $3
else
echo "The string is not found."
endif
#grep -q $1 $3 | sed -i.bak -e "s/$1/$2/g" $3
You can access the command line arguments using $argv[]. To loop over them but skip the first two, you can use this construct:
foreach file ($argv[3-])
# do stuff here, eg
echo $file
end
You shouldn't use csh though, if you have been instructed to do so by your professor I would question this.

Shell script not running, command not found

I am very, very new to UNIX programming (running on MacOSX Mountain Lion via Terminal). I've been learning the basics from a bioinformatics and molecular methods course (we've had two classes) where we will eventually be using perl and python for data management purposes. Anyway, we have been tasked with writing a shell script to take data from a group of files and write it to a new file in a format that can be read by a specific program (Migrate-N).
I have gotten a number of functions to do exactly what I need independently when I type them into the command line, but when I put them all together in a script and try to run it I get an error. Here are the details (I apologize for the length):
#! /bin/bash
grep -f Samples.NFCup.txt locus1.fasta > locus1.NFCup.txt
grep -f Samples.NFCup.txt locus2.fasta > locus2.NFCup.txt
grep -f Samples.NFCup.txt locus3.fasta > locus3.NFCup.txt
grep -f Samples.NFCup.txt locus4.fasta > locus4.NFCup.txt
grep -f Samples.NFCup.txt locus5.fasta > locus5.NFCup.txt
grep -f Samples.Salmon.txt locus1.fasta > locus1.Salmon.txt
grep -f Samples.Salmon.txt locus2.fasta > locus2.Salmon.txt
grep -f Samples.Salmon.txt locus3.fasta > locus3.Salmon.txt
grep -f Samples.Salmon.txt locus4.fasta > locus4.Salmon.txt
grep -f Samples.Salmon.txt locus5.fasta > locus5.Salmon.txt
grep -f Samples.Cascades.txt locus1.fasta > locus1.Cascades.txt
grep -f Samples.Cascades.txt locus2.fasta > locus2.Cascades.txt
grep -f Samples.Cascades.txt locus3.fasta > locus3.Cascades.txt
grep -f Samples.Cascades.txt locus4.fasta > locus4.Cascades.txt
grep -f Samples.Cascades.txt locus5.fasta > locus5.Cascades.txt
echo 3 5 Salex_melanopsis > Smelanopsis.mig
echo 656 708 847 1159 779 >> Smelanopsis.mig
echo 154 124 120 74 126 NFCup >> Smelanopsis.mig
cat locus1.NFCup.txt locus2.NFCup.txt locus3.NFCup.txt locus4.NFCup.txt locus5.NFCup.txt >> Smelanopsis.mig
echo 32 30 30 18 38 Salmon River >> Smelanopsis.mig
cat locus1.Salmon.txt locus2.Salmon.txt locus3.Salmon.txt locus4.Salmon.txt locus5.Salmon.txt >> Smelanopsis.mig
echo 56 52 24 29 48 Cascades >> Smelanopsis.mig
cat locus1.Cascades.txt locus2.Cascades.txt locus3.Cascades.txt locus4.Cascades.txt locus5.Cascades.txt >> Smelanopsis.mig
The series of greps are just pulling out DNA sequence data for each site for each locus into new text files. The Samples...txt files have the sample ID numbers for a site, the .fasta files have the sequence information organized by sample ID; the grepping works just fine in command line if I run it individually.
The second group of code creates the actual new file I need to end up with, that ends in .mig. The echo lines are data about counts (basepairs per locus, populations in the analysis, samples per site, etc.) that the program needs information on. The cat lines are to mash together the locus by site data created by all the grepping below the site-specific information dictated in the echo line. You no doubt get the picture.
For creating the shell script I've been starting in Excel so I can easily copy-paste/autofill cells, saving as tab-delimited text, then opening that text file in TextWrangler to remove the tabs before saving as a .sh file (Line breaks: Unix (LF) and Encoding: Unicode (UTF-8)) in the same directory as all the files used in the script. I've tried using chmod +x FILENAME.sh and chmod u+x FILENAME.sh to try to make sure it is executable, but to no avail. Even if I cut the script down to just a single grep line (with the #! /bin/bash first line) I can't get it to work. The process only takes a moment when I type it directly into the command line as none of these files are larger than 160KB and some are significantly smaller. This is what I type in and what I get when I try to run the file (HW is the correct directory)
localhost:HW Mirel$ MigrateNshell.sh
-bash: MigrateNshell.sh: command not found
I've been at this impass for two days now, so any input would be greatly appreciated! Thanks!!
For security reasons, the shell will not search the current directory (by default) for an executable. You have to be specific, and tell bash that your script is in the current directory (.):
$ ./MigrateNshell.sh
Change the first line to the following as pointed out by Marc B
#!/bin/bash
Then mark the script as executable and execute it from the command line
chmod +x MigrateNshell.sh
./MigrateNshell.sh
or simply execute bash from the command line passing in your script as a parameter
/bin/bash MigrateNshell.sh
Make sure you are not using "PATH" as a variable, which will override the existing PATH for environment variables.
Also try to dos2unix the shell script, because sometimes it has Windows line endings and the shell does not recognize it.
$ dos2unix MigrateNshell.sh
This helps sometimes.
#! /bin/bash
^---
remove the indicated space. The shebang should be
#!/bin/bash
Unix has a variable called PATH that is a list of directories where to find commands.
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Users/david/bin
If I type a command foo at the command line, my shell will first see if there's an executable command /usr/local/bin/foo. If there is, it will execute /usr/local/bin/foo. If not, it will see if there's an executable command /usr/bin/foo and if not there, it will look to see if /bin/foo exists, etc. until it gets to /Users/david/bin/foo.
If it can't find a command foo in any of those directories, it tell me command not found.
There are several ways I can handle this issue:
Use the commandbash foo since foo is a shell script.
Include the directory name when you eecute the command like /Users/david/foo or $PWD/foo or just plain ./foo.
Change your $PATH variable to add the directory that contains your commands to the PATH.
You can modify $HOME/.bash_profile or $HOME/.profile if .bash_profile doesn't exist. I did that to add in /usr/local/bin which I placed first in my path. This way, I can override the standard commands that are in the OS. For example, I have Ant 1.9.1, but the Mac came with Ant 1.8.4. I put my ant command in /usr/local/bin, so my version of antwill execute first. I also added $HOME/bin to the end of the PATH for my own commands. If I had a file like the one you want to execute, I'll place it in $HOME/bin to execute it.
Try chmod u+x MigrateNshell.sh
There have been a few good comments about adding the shebang line to the beginning of the script. I'd like to add a recommendation to use the env command as well, for additional portability.
While #!/bin/bash may be the correct location on your system, that's not universal. Additionally, that may not be the user's preferred bash. #!/usr/bin/env bash will select the first bash found in the path.
Also make sure /bin/bash is the proper location for bash .... if you took that line from an example somewhere it may not match your particular server. If you are specifying an invalid location for bash you're going to have a problem.
Add below lines in your .profile path
PATH=$PATH:$HOME/bin:$Dir_where_script_exists
export PATH
Now your script should work without ./
Raj Dagla
I'm new to shell scripting too, but I had this same issue. Make sure at the end of your script you have a blank line. Otherwise it won't work.
First:
chmod 777 ./MigrateNshell.sh
Then:
./MigrateNshell.sh
Or, add your program to a directory recognized in your $PATH variable. Example: Path Variable Example
Which will then allow you to call your program without ./

Resources