Bash File names will not append to file from script - bash

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?

The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.

As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

Related

Bash script MV is disappearing files

I've written a script to go through all the files in the directory the script is located in, identify if a file name contains a certain string and then modify the filename. When I run this script, the files that are supposed to be modified are disappearing. It appears my usage of the mv command is incorrect and the files are likely going to an unknown directory.
#!/bin/bash
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]];then
filename= echo $FILE | head -c 15
combined_name="$filename$file_extension"
echo $combined_name
mv $FILE $combined_name
echo $FILE
fi
fi
done
I've done my best to go through the possible errors I've made in the MV command but I haven't had any success so far.
There are a couple of problems and several places where your script can be improved.
filename= echo $FILE | head -c 15
This pipeline runs echo $FILE adding the variable filename having the null string as value in its environment. This value of the variable is visible only to the echo command, the variable is not set in the current shell. echo does not care about it anyway.
You probably want to capture the output of echo $FILE | head -c 15 into the variable filename but this is not the way to do it.
You need to use command substitution for this purpose:
filename=$(echo $FILE | head -c 15)
head -c outputs only the first 15 characters of the input file (they can be on multiple lines but this does not happen here). head is not the most appropriate way for this. Use cut -c-15 instead.
But for what you need (extract the first 15 characters of the value stored in the variable $FILE), there is a much simpler way; use a form of parameter expansion called "substring expansion":
filename=${FILE:0:15}
mv $FILE $combined_name
Before running mv, the variables $FILE and $combined_name are expanded (it is called "parameter expansion"). This means that the variable are replaced by their values.
For example, if the value of FILE is abc def and the value of combined_name is mnp opq, the line above becomes:
mv abc def mnp opq
The mv command receives 4 arguments and it attempts to move the files denoted by the first three arguments into the directory denoted by the fourth argument (and it probably fails).
In order to keep the values of the variables as single words (if they contain spaces), always enclose them in double quotes. The correct command is:
mv "$FILE" "$combined_name"
This way, in the example above, the command becomes:
mv "abc def" "mnp opq"
... and mv is invoked with two arguments: abc def and mnp opq.
combined_name="$filename$file_extension"
There isn't any problem in this line. The quotes are simply not needed.
The variables filename and file_extension are expanded (replaced by their values) but on assignments word splitting is not applied. The value resulted after the replacement is the value assigned to variable combined_name, even if it contains spaces or other word separator characters (spaces, tabs, newlines).
The quotes are also not needed here because the values do not contain spaces or other characters that are special in the command line. They must be quoted if they contain such characters.
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
It is not not incorrect to quote the values though.
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]]; then
This is also not wrong but it is inefficient.
You can use the expression from the if condition directly in the for statement (and get rid of the if statement):
for FILE in *"$string_contains"*; do
if [[ "$FILE" != *"$string_dontwant"* ]]; then
...
If you have read and understood the above (and some of the linked documentation) you will be able to figure out yourself where were your files moved :-)

Writing a script that uses agrep to loop through lines in a document one by one against lines in another document and getting a result

I am trying to write a script that uses agrep to loop through files in one document and match them against another document. I believe this might use a nested loop however, I am not completely sure. In the template document, I need for it to take one string and match it against other strings in another document then move to the next string and match it again
If unable to see images for some odd reason I have included the links at the bottom here as well. Also If you need me to explain more just let me know. This is my first post so I am not sure how this will be perceived or if I used the correct terminologies :)
Template agrep/highlighted- https://imgur.com/kJvySbW
Matching strings not highlighted- https://imgur.com/NHBlB2R
I have already looked on various websites regarding loops
#!/bin/bash
#agrep script
echo ${BASH_VERSION}
TemplateSpacers="/Users/kj/Documents/Research/Dr. Gage
Research/Thesis/FastA files for AGREP
test/Template/TA21_spacers.fasta"
MatchingSpacers="/Users/kj/Documents/Research/Dr. Gage
Research/Thesis/FastA files for AGREP test/Matching/TA26_spacers.fasta"
for * in filename
do
agrep -3 * to file im comparing to
#potentially may need to use nested loop but not sure
Ok, I get it now, I think. This should get you started.
#!/bin/bash
document="documentToSearchIn.txt"
grep -v spacer fileWithSearchStrings.txt | while read srchstr ; do
echo "Searching for $srchstr in $document"
echo agrep -3 "$srchstr" "$document"
done
If that looks correct, remove the echo before agrep and run again.
If, as you say in the comments, you want to store the script somewhere else, say in $HOME/bin, you can do this:
mkdir $HOME/bin
Save the script above as $HOME/bin/search. Now make it executable (only necessary one time) with:
chmod +x $HOME/bin/search
Now add $HOME/bin to your PATH. So, find the line starting:
export PATH=...
in your login profile, and change it to include the new directory:
export PATH=$PATH:$HOME/bin
Then start a new Terminal and you should be able to just run:
search
If you want to be able to specify the name of the strings file and the document to search in, you can change the code to this:
#!/bin/bash
# Pick up parameters, if supplied
# 1st param is name of file with strings to search for
# 2nd param is name of document to search in
str=${1:-""}
doc=${2:-""}
# Ensure name of strings file is valid
while : ; do
[ -f "$str" ] && break
read -p "Enter strings filename:" str
done
# Ensure name of document file is valid
while : ; do
[ -f "$doc" ] && break
read -p "Enter document name:" doc
done
echo "Search for strings from: $str, searching in document: $doc"
grep -v spacer "$str" | while read srchstr ; do
echo "Searching for $str in $doc"
echo agrep -3 "$str" "$doc"
done
Then you can run:
search path/to/file/with/strings path/to/document/to/search/in
or, if you run like this:
search
it will ask you for the 2 filenames.

Loop through all the files with .txt extension in bash [duplicate]

This question already has answers here:
Loop through all the files with a specific extension
(7 answers)
Closed 4 years ago.
I am trying to loop over files in a folder and test for .txt extensions.
But I get the following error: "awk: cannot open = (No such file or directory)
Here's my code:
!/bin/bash
files=$(ls);
for file in $files
do
# extension=$($file | awk -F . '{ print $NF }');
if [ $file | awk -F . "{ print $NF }" = txt ]
then
echo $file;
else
echo "Not a .txt file";
fi;
done;
The way you are doing this is wrong in many ways.
You should never parse output of ls. It does not handle the filename containing special characters intuitively See Why you shouldn't parse the output of ls(1)
Don't use variables to store multi-line data. The output of ls in a variable is expected to undergo word splitting. In your case files is being referenced as a plain variable, and without a delimiter set, you can't go through the multiple files stored.
Using awk is absolutely unnecessary here, the part $file | awk -F . "{ print $NF }" = txt is totally wrong, you are not passing the name the file to the pipe, just the variable $file, it should have been echo "$file"
The right interpreter she-bang should have been set as #!/bin/bash in your script if you were planning to run it as an executable, i.e. ./script.sh. The more recommended way would be to say #!/usr/bin/env bash to let the shell identify the default version of the bash installed.
As such your requirement could be simply reduced to
for file in *.txt; do
[ -f "$file" ] || continue
echo "$file"
done
This is a simple example using a glob pattern using *.txt which does pathname expansion on the all the files ending with the txt format. Before the loop is processed, the glob is expanded as the list of files i.e. assuming the folder has files as 1.txt, 2.txt and foo.txt, the loop is generated to
for file in 1.txt 2.txt foo.txt; do
Even in the presence of no files, i.e. when the glob matches empty (no text files found), the condition [ -f "$file" ] || continue would ensure the loop is exit gracefully by checking if the glob returned any valid file results or just an un-expanded string. The condition [ -f "$file" ] would fail for everything if except a valid file argument.
Or if you are targeting scripts for bourne again shell, enable glob options to remove non-matching globs, rather than preserving them
shopt -s nullglob
for file in *.txt; do
echo "$file"
done
Another way using shell array to store the glob results and parse them over later to do a specific action on them. This way is useful when doing a list of files as an argument list to another command. Using a proper quoted expansion "${filesList[#]}" will preserve the spacing/tabs/newlines and other meta characters in filenames.
shopt -s nullglob
filesList=(*.txt)
for file in "${filesList[#]}"; do
echo "$file"
done

Loop for all files with certain name in directory in bash

I'm trying to make a script that would test whether my file is exactly as it should be, but I haven't been using bash before:
#!/bin/bash
./myfile <test.in 1>>test.out 2>>testerror.out
if cmp -s "test.out" "pattern.out"
then
echo "Test matches pattern"
else
echo "Test does not match pattern"
fi
if cmp -s "testerror.out" "pattern.err"
then
echo "Errors matches pattern"
else
echo "Errors does not match pattern"
fi
Can I write it in such way that after calling ./script.sh myfile pattern my scripts would run over all files named pattern*.in and check if myfile gives same files as pattern*.out and pattern*.err ? e.g there are files pattern1, pattern2, pattern4 and i want to run test for them, but not for pattern3 that doesn't exist.
Can I somehow go around creating new files? (Assuming i don't need them) If I were doing it from command line, I'd go with something like
< pattern.in ./myfile | diff -s ./pattern.out
but I have no idea how to write it in script file to make it work.
Or maybe i should just use rm everytime?
If I understand you correctly:
for infile in pattern*.in ; do
outfile="${infile%.in}.out"
errfile="${infile%.in}.err"
echo "Working on input $infile with output $outfile and error $errfile"
./myfile <"$infile" >>"$outfile" 2>>"$errfile"
# Your `if`..`fi` blocks here, referencing infile/outfile/errfile
done
The % replacement operator strips a substring off the end of a variable's value. So if $infile is pattern.in, ${infile%.in} is that without the trailing .in, i.e., pattern. The outfile and errfile assignments use this to copy the first part (e.g., pattern1) of the particular .in file being processed.

Bash: Extract user path (/home/userID) from read line containing full path and replace with "~"

I'm constructing a bash script file a bit at a time. I'm learning as I
go. But I can't find anything online to help me at this point: I need to
extract a substring from a large string, and the two methods I found using ${} (curly brackets) just won't work.
The first, ${x#y}, doesn't do what it should.
The second, ${x:p} or ${x:p:n}, keeps reporting bad substitution.
It only seems to work with constants.
The ${#x} returns a string length as text, not as a number, meaning it does not work with either ${x:p} or ${x:p:n}.
Fact is, it's seems really hard to get bash to do much math at all. Except for the for statements. But that is just counting. And this isn't a task for a for loop.
I've consolidated my script file here as a means of helping you all understand what it is that I am doing. It's for working with PureBasic source files, but you only have to change the grep's "--include=" argument, and it can search other types of text files instead.
#!/bin/bash
home=$(echo ~) # Copy the user's path to a variable named home
len=${#home} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $home "has length of" $len "characters."
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" > 1.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a0=$line # this is each line as read
a1=$(echo "$a0" | awk -F: '{print $1}') # this gets the full path before the first ':'
echo $a0 # Shows full line
echo $a1 # Shows just full path
q1=${line#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $line.
q1=${a0#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $a0.
break # Can't do a 'read -n 1', as it just reads 1 char from the next line.
# Can't do a pause, because it doesn't exist. So just run from the
# terminal so that after break we can see what's on the screen .
len=${#a1} # Can get the length of $a1, but only as a string
# q1=${line:len} # Right command, wrong variable
# q1=${line:$len} # Right command, right variable, but wrong variable type
# q1=${line:14} # Constants work, but all $home's aren't 14 characters long
done < 1.tmp
The following works:
x="/home/user/rest/of/path"
y="~${x#/home/user}"
echo $y
Will output
~/rest/of/path
If you want to use "/home/user" inside a variable, say prefix, you need to use $ after the #, i.e., ${x#$prefix}, which I think is your issue.
The hejp I got was most appreciated. I got it done, and here it is:
#!/bin/bash
len=${#HOME} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $HOME "has length of" $len "characters."
while :
do
echo
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
a0=""; > 0.tmp; > 1.tmp
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" >> 0.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a1=$(echo $line | awk -F: '{print $1}') # this gets the full path before the first ':'
a2=${line#$a1":"} # renove path and first colon from rest of line
if [[ $a0 != $a1 ]]
then
echo >> 1.tmp
echo $a1":" >> 1.tmp
a0=$a1
fi
echo " "$a2 >> 1.tmp
done < 0.tmp
cat 1.tmp | less
done
What I don't have yet is an answer as to whether variables can be used in place of constants in the dollar-sign, curly brackets where you use colons to mark that you want a substring of that string returned, if it requires constants, then the only choice might be to generate a child scriot using the variables, which would appear to be constants in the child, execute there, then return the results in an environmental variable or temporary file. I did stuff like that with MSDOS a lot. Limitation here is that you have to then make the produced file executable as well using "chmod +x filename". Or call it using "/bin/bash filename".
Another bash limitation found it that you cannot use "sudo" in the script without discontinuing execution of the present script. I guess a way around that is use sudo to call /bin/bash to call a child script that you produced. I assume then that if the child completes, you return to the parent script where you stopped at. Unless you did "sudo -i", "sudo -su", or some other variation where you become super user. Then you likely need to do an "exit" to drop the super user overlay.
If you exit the child script still as super user, would typing "exit" but you back to completing the parent script? I suspect so, which makes for some interesting senarios.
Another question: If doing a "while read line", what can you do in bash to check for a keyboard key press? The "read" option is already taken while in this loop.

Resources