Adding test_ in front of a file name with path - bash

I have a list of files stored in a text file, and if a Python file is found in that list. I want to the corresponding test file using Pytest.
My file looks like this:
/folder1/file1.txt
/folder1/file2.jpg
/folder1/file3.md
/folder1/file4.py
/folder1/folder2/file5.py
When 4th/5th files are found, I want to run the command pytest like:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Currently, I am using this command:
cat /workspace/filelist.txt | while read line; do if [[ $$line == *.py ]]; then exec "pytest test_$${line}"; fi; done;
which is not working correctly, as I have file path in the text as well. Any idea how to implement this?

Using Bash's variable substring removal to add the test_. One-liner:
$ while read line; do if [[ $line == *.py ]]; then echo "pytest ${line%/*}/test_${line##*/}"; fi; done < file
In more readable form:
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/}"
fi
done < file
Output:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Don't know anything about the Google Cloudbuild so I'll let you experiment with the double dollar signs.
Update:
In case there are files already with test_ prefix, use this bash script that utilizes extglob in variable substring removal:
shopt -s extglob # notice
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/?(test_)}" # notice
fi
done < file

You can easily refactor all your conditions into a simple sed script. This also gets rid of the useless cat and the similarly useless exec.
sed -n 's%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The regular expression matches anything after the last slash, which means the entire line if there is no slash; we include the .py suffix to make sure this only matches those files.
The pipe to xargs is a common way to convert standard input into command-line arguments. The -n 1 says to pass one argument at a time, rather than as many as possible. (Maybe pytest allows you to specify many tests; then, you can take out the -n 1 and let xargs pass in as many as it can fit.)
If you want to avoid adding the test_ prefix to files which already have it, one solution is to break up the sed script into two separate actions:
sed -n '/test_[^/]*\.py/p;t;s%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The first p simply prints the matches verbatim; the t says if that matched, skip the rest of the script for this input.
(MacOS / BSD sed will want a newline instead of a semicolon after the t command.)
sed is arguably a bit of a read-only language; this is already pressing towards the boundary where perhaps you would rewrite this in Awk instead.

You may want to focus on lines that ends with ".py" string
You can achieve that using grep combined with a regex so you can figure out if a line ends with .py - that eliminates the if statement.
IFS=$'\n'
for file in $(cat /workspace/filelist.txt|grep '\.py$');do pytest $file;done

Related

Bash script MV is disappearing files

I've written a script to go through all the files in the directory the script is located in, identify if a file name contains a certain string and then modify the filename. When I run this script, the files that are supposed to be modified are disappearing. It appears my usage of the mv command is incorrect and the files are likely going to an unknown directory.
#!/bin/bash
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]];then
filename= echo $FILE | head -c 15
combined_name="$filename$file_extension"
echo $combined_name
mv $FILE $combined_name
echo $FILE
fi
fi
done
I've done my best to go through the possible errors I've made in the MV command but I haven't had any success so far.
There are a couple of problems and several places where your script can be improved.
filename= echo $FILE | head -c 15
This pipeline runs echo $FILE adding the variable filename having the null string as value in its environment. This value of the variable is visible only to the echo command, the variable is not set in the current shell. echo does not care about it anyway.
You probably want to capture the output of echo $FILE | head -c 15 into the variable filename but this is not the way to do it.
You need to use command substitution for this purpose:
filename=$(echo $FILE | head -c 15)
head -c outputs only the first 15 characters of the input file (they can be on multiple lines but this does not happen here). head is not the most appropriate way for this. Use cut -c-15 instead.
But for what you need (extract the first 15 characters of the value stored in the variable $FILE), there is a much simpler way; use a form of parameter expansion called "substring expansion":
filename=${FILE:0:15}
mv $FILE $combined_name
Before running mv, the variables $FILE and $combined_name are expanded (it is called "parameter expansion"). This means that the variable are replaced by their values.
For example, if the value of FILE is abc def and the value of combined_name is mnp opq, the line above becomes:
mv abc def mnp opq
The mv command receives 4 arguments and it attempts to move the files denoted by the first three arguments into the directory denoted by the fourth argument (and it probably fails).
In order to keep the values of the variables as single words (if they contain spaces), always enclose them in double quotes. The correct command is:
mv "$FILE" "$combined_name"
This way, in the example above, the command becomes:
mv "abc def" "mnp opq"
... and mv is invoked with two arguments: abc def and mnp opq.
combined_name="$filename$file_extension"
There isn't any problem in this line. The quotes are simply not needed.
The variables filename and file_extension are expanded (replaced by their values) but on assignments word splitting is not applied. The value resulted after the replacement is the value assigned to variable combined_name, even if it contains spaces or other word separator characters (spaces, tabs, newlines).
The quotes are also not needed here because the values do not contain spaces or other characters that are special in the command line. They must be quoted if they contain such characters.
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
It is not not incorrect to quote the values though.
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]]; then
This is also not wrong but it is inefficient.
You can use the expression from the if condition directly in the for statement (and get rid of the if statement):
for FILE in *"$string_contains"*; do
if [[ "$FILE" != *"$string_dontwant"* ]]; then
...
If you have read and understood the above (and some of the linked documentation) you will be able to figure out yourself where were your files moved :-)

Shell script Issues and Errors when tested in school's program

Files created in 'testdir':
file1 file2.old file3old file4.old
Execution of 'oldfiles2 testdir':
Files in 'testdir' after 'oldfiles2' was run:
file1.old file2.old file3old.old file4.old
Error: 'for' does not seem to loop only through required filenames
Please hit to continue with the Assignment
Is the error I am hitting with a script running for school,
Here is the script below
#!/bin/bash
shopt -s extglob nullglob
dir=$1
for file in "$dir"/!(*.old)
do
[[ $file == *.old ]] || mv -- "$file" "$file.old"
done
The assignment was written by someone who doesn't know bash well. Your approach is way better.
Instead of grepping ls, you can use extglob (and also nullglob in case there are no matches):
#!/bin/bash
shopt -s extglob nullglob
for file in "$dir"/!(*.old)
do
mv -- "$file" "$file.old"
done
As demonstrated by your test validator's output, it works perfectly:
file1 does not end in .old, and so it's renamed to file1.old
file2.old ends in .old, and is not renamed.
file3old does not end in .old (old != .old), and is renamed.
file4.old ends in .old, and is not renamed.
However, the validator refuses to accept it, indicating that the validator is wrong. A common mistake for people who don't know bash well (like your professor) is to use grep -v .old or grep -v '.old$', which doesn't actually check if files end .old because . means "any character".
We can emulate this bug in the script:
#!/bin/bash
shopt -s extglob nullglob
for file in "$dir"/!(*?old*)
do
mv -- "$file" "$file.old"
done
This code is objectively wrong, but may pass the incorrect validator. Alternatively, "$dir"/!(*?old) will emulate a buggy grep anchored to the end of the line.
If I read correctly what your teacher wants, then here is a one liner using grep -v and no if statement. You can block it out in the script or leave it as a one liner.
ls | grep -v '\.old' | while read FILE; do mv "${FILE}" "${FILE}.old"; done
BTW I've tested this and it works because the "." in '\.old' is a dot (or period) and not "any character" because it's escaped with a backslash.
Here is sample output from Terminal
System1:test 123$ ls -1
file name 1
file name 2
file name.old
file.old
file1
file2
System1:test 123$ ls | grep -v '\.old' | while read FILE; do mv "${FILE}" "${FILE}.old"; done
System1:test 123$ ls -1
file name 1.old
file name 2.old
file name.old
file.old
file1.old
file2.old
System1:test 123$
Try:
#!/bin/bash
for filename in $(ls $1 | grep -v "\.old$")
do
mv $1/$filename $1/$filename.old
done
In Bash you can use character classes beginning with the inversion character ^ or ! to match all characters except the listed character. In your case:
for file in "$dir"/*.[^o][^l][^d]*; do
[ "$file" = *.old ] || mv -- "$file" "$file.old"
done
That will locate all files in $dir that do NOT have and .old extension and move the file to $file.old. For a case insensitive version:
for file in "$dir"/*.[^oO][^lL][^dD]*; do
You can use the bash [[ operator for the [[ "$file" == *.old ]] test as well, but it is less portable in practice. (character classes are also not portable). Unless a file starts potentially starts with -, there isn't any reason to include -- following mv (but it doesn't hurt either).

How can I grep contents of files with bash only without using find or grep -r?

I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt

Basename puts single quotes around variable

I am writing a simple shell script to make automated backups, and I am trying to use basename to create a list of directories and them parse this list to get the first and the last directory from the list.
The problem is: when I use basename in the terminal, all goes fine and it gives me the list exactly as I want it. For example:
basename -a /var/*/
gives me a list of all the directories inside /var without the / in the end of the name, one per line.
BUT, when I use it inside a script and pass a variable to basename, it puts single quotes around the variable:
while read line; do
dir_name=$(echo $line)
basename -a $dir_name/*/ > dir_list.tmp
done < file_with_list.txt
When running with +x:
+ basename -a '/Volumes/OUTROS/backup/test/*/'
and, therefore, the result is not what I need.
Now, I know there must be a thousand ways to go around the basename problem, but then I'd learn nothing, right? ;)
How to get rid of the single quotes?
And if my directory name has spaces in it?
If your directory name could include spaces, you need to quote the value of dir_name (which is a good idea for any variable expansion, whether you expect spaces or not).
while read line; do
dir_name=$line
basename -a "$dir_name"/*/ > dir_list.tmp
done < file_with_list.txt
(As jordanm points out, you don't need to quote the RHS of a variable assignment.)
Assuming your goal is to populate dir_list.tmp with a list of directories found under each directory listed in file_with_list.txt, this might do.
#!/bin/bash
inputfile=file_with_list.txt
outputfile=dir_list.tmp
rm -f "$outputfile" # the -f makes rm fail silently if file does not exist
while read line; do
# basic syntax checking
if [[ ! ${line} =~ ^/[a-z][a-z0-9/-]*$ ]]; then
continue
fi
# collect targets using globbing
for target in "$line"/*; do
if [[ -d "$target" ]]; then
printf "%s\n" "$target" >> $outputfile
fi
done
done < $inputfile
As you develop whatever tool will process your dir_list.tmp file, be careful of special characters (including spaces) in that file.
Note that I'm using printf instead of echo so that targets whose first character is a hyphen won't cause errors.
This might work
while read; do
find "$REPLY" >> dir_list.tmp
done < file_with_list.txt

In a small script to monitor a folder for new files, the script seems to be finding the wrong files

I'm using this script to monitor the downloads folder for new .bin files being created. However, it doesn't seem to be working. If I remove the grep, I can make it copy any file created in the Downloads folder, but with the grep it's not working. I suspect the problem is how I'm trying to compare the two values, but I'm really not sure what to do.
#!/bin/sh
downloadDir="$HOME/Downloads/"
mbedDir="/media/mbed"
inotifywait -m --format %f -e create $downloadDir -q | \
while read line; do
if [ $(ls $downloadDir -a1 | grep '[^.].*bin' | head -1) == $line ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
fi
done
The ls $downloadDir -a1 | grep '[^.].*bin' | head -1 is the wrong way to go about this. To see why, suppose you had files named a.txt and b.bin in the download directory, and then c.bin was added. inotifywait would print c.bin, ls would print a.txt\nb.bin\nc.bin (with actual newlines, not \n), grep would thin that to b.bin\nc.bin, head would remove all but the first line leaving b.bin, which would not match c.bin. You need to be checking $line to see if it ends in .bin, not scanning a directory listing. I'll give you three ways to do this:
First option, use grep to check $line, not the listing:
if echo "$line" | grep -q '[.]bin$'; then
Note that I'm using the -q option to supress grep's output, and instead simply letting the if command check its exit status (success if it found a match, failure if not). Also, the RE is anchored to the end of the line, and the period is in brackets so it'll only match an actual period (normally, . in a regular expression matches any single character). \.bin$ would also work here.
Second option, use the shell's ability to edit variable contents to see if $line ends in .bin:
if [ "${line%.bin}" != "$line" ]; then
the "${line%.bin}" part gives the value of $line with .bin trimmed from the end if it's there. If that's not the same as $line itself, then $line must've ended with .bin.
Third option, use bash's [[ ]] expression to do pattern matching directly:
if [[ "$line" == *.bin ]]; then
This is (IMHO) the simplest and clearest of the bunch, but it only works in bash (i.e. you must start the script with #!/bin/bash).
Other notes: to avoid some possible issues with whitespace and backslashes in filenames, use while IFS= read -r line; do and follow #shellter's recommendation about double-quotes religiously.
Also, I'm not very familiar with inotifywait, but AIUI its -e create option will notify you when the file is created, not when its contents are fully written out. Depending on the timing, you may wind up copying partially-written files.
Finally, you don't have any checking for duplicate filenames. What should happen if you download a file named foo.bin, it gets copied, you delete the original, then download a different file named foo.bin. As the script is now, it'll silently overwrite the first foo.bin. If this isn't what you want, you should add something like:
if [ ! -e "$mbedDir/$line" ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
elif ! cmp -s "$downloadDir/$line" "$mbedDir/$line"; then
echo "Eeek, a duplicate filename!" >&2
# or possibly something more constructive than that...
fi

Resources