Get output filename in Bash Script - bash

I would like to get just the filename (with extension) of the output file I pass to my bash script:
a=$1
b=$(basename -- "$a")
echo $b #for debug
if [ "$b" == "test" ]; then
echo $b
fi
If i type in:
./test.sh /home/oscarchase/test.sh > /home/oscarchase/test.txt
I would like to get:
test.txt
in my output file but I get:
test.sh
How can I procede to parse this first argument to get the right name ?

Try this:
#!/bin/bash
output=$(readlink /proc/$$/fd/1)
echo "output is performed to \"$output\""
but please remember that this solution is system-dependent (particularly for Linux). I'm not sure that /proc filesystem has the same structure in e.g. FreeBSD and certainly this script won't work in bash for Windows.
Ahha, FreeBSD obsoleted procfs a while ago and now has a different facility called procstat. You may get an idea on how to extract the information you need from the following screenshot. I guess some awk-ing is required :)

Finding out the name of the file that is opened on file descriptor 1 (standard output) is not something you can do directly in bash; it depends on what operating system you are using. You can use lsof and awk to do this; it doesn't rely on the proc file system, and although the exact call may vary, this command worked for both Linux and Mac OS X, so it is at least somewhat portable.
output=$( lsof -p $$ -a -d 1 -F n | awk '/^n/ {print substr($1, 2)}' )
Some explanation:
-p $$ selects open files for the current process
-d 1 selects only file descriptor 1
-a is use to require both -p and -d apply (the default is to show all files that match either condition
-F n modifies the output so that you get one line per field, prefixed with an identifier character. With this, you'll get two lines: one beginning with p and indicating the process ID, and one beginning with `n indicating the file name of the file.
The awk command simply selects the line starting with n and outputs the first field minus the initial n.

Related

Bash File names will not append to file from script

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?
The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.
As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

How to delay `redirection operator` of BASH `>`

First I create 3 files:
$ touch alpha bravo carlos
Then I want to save the list to a file:
$ ls > info.txt
However, I always got my info.txt inside:
$ cat info.txt
alpha
bravo
carlos
info.txt
It looks like the redirection operator creates my info.txt first.
In this case, my question is. How can I save my list of files before creating the info.txt first?
The main question is about the redirection operator. Why does it act first, and how to delay it so I complete my task first? Using the example above to answer it.
When you redirect a command's output to a file, the shell opens a file handle to the destination file, then runs the command in a child process whose standard output is connected to this file handle. There is no way to change this order, but you can redirect to a file in a different directory if you don't want the ls output to include the new file.
ls >/tmp/info.txt
mv /tmp/info.txt ./
In a production script, you should make sure that the file name is unique and unpredictable.
t=$(mktemp -t lstemp.XXXXXXXXXX) || exit
trap 'rm -f "$t"' INT HUP
ls >"$t"
mv "$t" ./info.txt
Alternatively, capture the output into a variable, and then write that variable to a file.
files=$(ls)
echo "$files" >info.txt
As an aside, probably don't use ls in scripts. If you want a list of files in the current directory
printf '%s\n' *
does that.
One simple approach is to save your command output to a variable, like this:
ls_output="$(ls)"
and then write the value of that variable to the file, using any of these commands:
printf '%s\n' "$ls_output" > info.txt
cat <<< "$ls_output" > info.txt
echo "$ls_output" > info.txt
Some caveats with this approach:
Bash variables can't contain null bytes. If the output of the command includes a null byte, that byte and everything after it will be discarded.
In the specific case of ls, though, this shouldn't be an issue, because the output of ls should never contain a null byte.
$(...) removes trailing newlines. The above compensates for this by adding a newline while creating info.txt, but if the the command output ends with multiple newlines, then the above will effectively collapse them into a single newline.
In the specific case of ls, this could happen if a filename ends with a newline — very unusual, and unlikely to be intentional, but nonetheless possible.
Since the above adds a newline while creating info.txt, it will put a newline there even if the command output doesn't end with a newline.
In the specific case of ls, this shouldn't be an issue, because the output of ls should always end with a newline.
If you want to avoid the above issues, another approach is to save your command output to a temporary file in a different directory, and then move it to the right place; for example:
tmpfile="$(mktemp)"
ls > "$tmpfile"
mv -- "$tmpfile" info.txt
. . . which obviously has different caveats (e.g., it requires access to write to a different directory), but should work on most systems.
One way to do what you want is to exclude the info.txt file from the ls output.
If you can rename the list file to .info.txt then it's as simple as:
ls >.info.txt
ls doesn't list files whose names start with . by default.
If you can't rename the list file but you've got GNU ls then you can use:
ls --ignore=info.txt >info.txt
Failing that, you can use:
ls | grep -v '^info\.txt$' >info.txt
All of the above options have the advantage that you can safely run them after the list file has been created.
Another general approach is to capture the output of ls with one command and save it to the list file with a second command. As others have pointed out, temporary files and shell variables are two specific ways to capture the output. Another way, if you've got the moreutils package installed, is to use the sponge utility:
ls | sponge info.txt
Finally, note that you may not be able to reliably extract the list of files from info.txt if it contains plain ls output. See ParsingLs - Greg's Wiki for more information.

Writing a script that uses agrep to loop through lines in a document one by one against lines in another document and getting a result

I am trying to write a script that uses agrep to loop through files in one document and match them against another document. I believe this might use a nested loop however, I am not completely sure. In the template document, I need for it to take one string and match it against other strings in another document then move to the next string and match it again
If unable to see images for some odd reason I have included the links at the bottom here as well. Also If you need me to explain more just let me know. This is my first post so I am not sure how this will be perceived or if I used the correct terminologies :)
Template agrep/highlighted- https://imgur.com/kJvySbW
Matching strings not highlighted- https://imgur.com/NHBlB2R
I have already looked on various websites regarding loops
#!/bin/bash
#agrep script
echo ${BASH_VERSION}
TemplateSpacers="/Users/kj/Documents/Research/Dr. Gage
Research/Thesis/FastA files for AGREP
test/Template/TA21_spacers.fasta"
MatchingSpacers="/Users/kj/Documents/Research/Dr. Gage
Research/Thesis/FastA files for AGREP test/Matching/TA26_spacers.fasta"
for * in filename
do
agrep -3 * to file im comparing to
#potentially may need to use nested loop but not sure
Ok, I get it now, I think. This should get you started.
#!/bin/bash
document="documentToSearchIn.txt"
grep -v spacer fileWithSearchStrings.txt | while read srchstr ; do
echo "Searching for $srchstr in $document"
echo agrep -3 "$srchstr" "$document"
done
If that looks correct, remove the echo before agrep and run again.
If, as you say in the comments, you want to store the script somewhere else, say in $HOME/bin, you can do this:
mkdir $HOME/bin
Save the script above as $HOME/bin/search. Now make it executable (only necessary one time) with:
chmod +x $HOME/bin/search
Now add $HOME/bin to your PATH. So, find the line starting:
export PATH=...
in your login profile, and change it to include the new directory:
export PATH=$PATH:$HOME/bin
Then start a new Terminal and you should be able to just run:
search
If you want to be able to specify the name of the strings file and the document to search in, you can change the code to this:
#!/bin/bash
# Pick up parameters, if supplied
# 1st param is name of file with strings to search for
# 2nd param is name of document to search in
str=${1:-""}
doc=${2:-""}
# Ensure name of strings file is valid
while : ; do
[ -f "$str" ] && break
read -p "Enter strings filename:" str
done
# Ensure name of document file is valid
while : ; do
[ -f "$doc" ] && break
read -p "Enter document name:" doc
done
echo "Search for strings from: $str, searching in document: $doc"
grep -v spacer "$str" | while read srchstr ; do
echo "Searching for $str in $doc"
echo agrep -3 "$str" "$doc"
done
Then you can run:
search path/to/file/with/strings path/to/document/to/search/in
or, if you run like this:
search
it will ask you for the 2 filenames.

Investigating a diff error in a bash script when variables are used instead of hardcoded file names

I have a script that looks for files of specific type in a specified directory and if they are present, generates a file with the basenames before creating a tar.gz. Once compressed, I check to ensure the tarball contains all the files by running a diff check.
I have created a pair of variables that are the pre-compressed file list and those found in the tarball. When I run an if statement including diff of the variables, I receive this error:
diff: missing operand after `/my/original/dir/filelist.txt'
diff: Try `diff --help' for more information.
I worked around this by referencing the files themselves rather than the created variables. If I run the if statement in a separate bash script, it works just fine using the variables so I am entirely lost as to what my error is in my larger script. Below I provide both the snippet from the large script and the diff statement as its own script for reference.
The if diff in its own script:
#!/bin/sh
filelist=(filelist.txt)
tarfiles=(tarfiles.txt)
#differences=$(diff filelist.txt tarfiles.txt) #Uncomment if below fails
differences=$(diff $filelist $tarfiles)
if $differences > /dev/null ; then
echo Same
else
echo Different
fi
The above works just fine.
Now including this at the end my larger script:
TARFILES=$(tar -tzf "$ARCHIVES/tarredfiles.tar.gz" | awk -F/ '{ if($NF != "") print $NF }' > $LOGS/tarfiles.txt)
FILELIST=($LOGS/filelist.txt)
#Check to see if it all worked
DIFF=$(diff $FILELIST $TARFILES)
cd $LOGS #I shouldn't need to do this but I do as a safety mechanism
#if diff filelist.txt tarfiles.txt > /dev/null ; then
if diff $FILELIST $TARFILES > /dev/null ; then
echo "Today's files have been archived and checked."
else
echo "Some or none of today's files have been archived, check the logs to find the error."
echo (diff $TARFILES $FILELIST) > $LOGS/$(date '+%Y%m%d')errors.txt
fi
I have tried enclosing the variables in "" and it didn't seem to make a difference.
The way you populate TARFILES results in it being empty. What is it that you're trying to store in the variable?
This line
TARFILES=$(tar -tzf "$ARCHIVES/tarredfiles.tar.gz" | awk -F/ '{ if($NF != "") print $NF }' > $LOGS/tarfiles.txt)
Does the following steps
Extracts a list of the filenames (-t) from the compressed (-z) tar file (-f) named tarredfiles.tar.gz in the directory referred to by the $ARCHIVES variables
Sends (pipes) that list of filenames into awk where you print the last component of the filename, that is the last field ($NF) of each line when it is split by / (-F/)
Sends (redirects) all of that output into the log file $LOGS/tarfiles.txt
Captures any other output (of which there will be none!) and stores it in the TARFILES variable.
So, the variable TARFILES is always empty, but the file tarfiles.txt has content in it.
It seems that you want the diff to compare the content of tarfiles.txt with the content of filelist.txt, but you're trying to use your variables in a way that isn't really compatible with that.
An expression of the form:
TARFILES=$( command goes here )
captures the output of that command.
And
TARFILES=$( command goes here > some-file.txt )
sends the output of the command into the file, and then captures nothing.
What you want is something like:
TARFILES=some-file.txt
command goes here > $TARFILES
which will set the variable to be the name of your file, and then run a command which put content into that file.
So, specifically:
TARFILES=$LOGS/tarfiles.txt
tar -tzf "$ARCHIVES/tarredfiles.tar.gz" | awk -F/ '{ if($NF != "") print $NF }' > $TARFILES
When working will shell scripts, it is very common to be running commands that produce output that goes into files, etc. One thing you need to be really clear about in the logic of your script is when you want your variables to contain actual content (that is, the output of a command), and when you want them to contain filenames.
In your case you want to run diff on 2 files ("tarfiles" and "filelist") that happen to contain a list of filenames, so that means there's a little bit more to keep track of, but essentially you want to populate "tarfiles" with the output from a command, and then run a diff where you pass in the 2 files names "tarfiles" and "filelist". So you never want to use $( ... ) to populate tarfiles.txt because that is how you capture the output of a command into a variable, and what you're trying to do is store a filename in your variable.

Redirecting the result files to different variable file names

I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).

Resources