How to check filetype in if statement bash using wildecard and -f - bash

subjects_list=$(ls -l /Volumes/Backup_Plus/PPMI_10 | awk '{ print $NF }')
filepath="/Volumes/Backup_Plus/PPMI_10/$subjects/*/*/S*/"
for subjects in $subjects_list; do
if [[ -f "${filepath}/*.bval" && -f "${filepath}/*.bvec" && -f "${filepath}/*.json" && -f "${filepath}/*.nii.gz" ]]; then
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/keep_subjects.txt
else
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/not_keep_subjects.txt
fi
done
problem is supposedly in the if statement, I tried this...
bvalfile = (*.bval)
bvecfile =(*.bvec)
jsonfile =(*.json)
niigzfile =(*.nii.gz)
if [[ -f "$bvalfile" && -f "$bvecfile" && -f "$jsonfile" && -f "$niigzfile" ]]; then
however that didn't work. Any help with syntax or errors or does it need to be changed completely. Trying to separate the files that have .^file types from those that don't by making two lists.
thanks

You're assigning filepath outside the for-subject loop but using the unset variable $subjects in it. You want to move that inside the loop.
Double-quoted wildcards aren't expanded, so both $filepath and your -f test will be looking for filenames with literal asterisks in them.
-f only works on a single file, so even if you fix the quotes, you'll have a syntax error if there's more than one file matching the pattern.
So I think what you want is something like this:
# note: array assignment -
# shell does the wildcard expansion, no ls required
prefix_list=( /Volumes/Backup_Plus/PPMI_10/* )
# and array expansion
for prefix in "${prefix_list[#]}"; do
# the subject is just the last component of the path
subject=${prefix##*/}
# start by assuming we're keeping this one
decision=keep
# in case filepath pattern matches more than one directory, loop over them
for filepath in "$prefix"/*/*/S*/; do
# if any of the files don't exist, switch to not keeping it
for file in "$filepath"/{*.bval,*.bvec,*.json,*.nii.gz}; do
if [[ ! -f "$file" ]]; then
decision=not_keep
# we have our answer and can stop looping now
break 2
fi
done
done
# now append to the correct list
printf '%s\n' "$subject" >>"/Volumes/Backup_Plus/PPMI_10/${decision}_subjects.txt"
done

Related

Finding presence of substring within a string in BASH

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.
I've settled on something like:
#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" $file`
if [ $match ]; then
echo "Match: $match"
fi
done
Blacklist.txt contains lines of potential matches, like so:
matchthis
"match this too"
thisisasingleword
"This is multiple words"
myfile.txt could be something like:
I would matchthis if I could match things with grep. I really wish I could.
When I ask it to match this too, it fails to matchthis. It should match this too - right?
If I run this at a bash prompt, like so:
j="match this too"
grep -i -m1 -o "$j" myfile.txt
...I get "match this too".
However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.
Where am I going wrong?
Wouldn't
grep -owF -f blacklist.txt myfile.txt
instead of writing an inefficient loop, do what you want?
Would you please try:
#!/bin/bash
file="myfile.txt"
while IFS= read -r j; do
j=${j#\"}; j=${j%\"} # remove surrounding double quotes
echo "Searching for $j..."
match=$(grep -i -m1 -o "$j" "$file")
if (( $? == 0 )); then # if match
echo "Match: $match" # then print it
fi
done < blacklist.txt
Output:
Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...
I wound up abandoning grep entirely and using sed instead.
match=`sed -n "s/.*\($j\).*/\1/p" $file
Works well, and I was able to use unquoted multiple word phrases in the blacklist file.
With this:
if [ $match ]; then
you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:
if [ -n "$match" ]; then
You might also use grep's exit code instead:
if [ "$?" -eq 0 ]; then
for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:
IFS='
'
for j in `cat blacklist.txt`; do
blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.
j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.
Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.
Script:
#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" "$file"`
if [ -n "$match" ]; then
echo "Match: $match"
fi
done
Alternative to the for ... in ... loop (no IFS= needed):
while read; do
j="$REPLY"
...
done < 'blacklist.txt'

How can I check if exists file with name according to "template" in the directory?

Given variable with name template , for example: template=*.txt.
How can I check if files with name like this template exist in the current directory?
For example, according to the value of the template above, I want to know if there is files with the suffix .txt in the current directory.
I would do it like this with just built-ins:
templcheck () {
for f in * .*; do
[[ -f $f ]] && [[ $f = $1 ]] && return 0
done
return 1
}
This takes the template as an argument (must be quoted to prevent premature expansion) and returns success if there was a match in the current directory. This should work for any filenames, including those with spaces and newlines.
Usage would look like this:
$ ls
file1.txt 'has space1.txt' script.bash
$ templcheck '*.txt' && echo yes
yes
$ templcheck '*.md' && echo yes || echo no
no
To use with the template contained in a variable, that expansion has to be quoted as well:
templcheck "$template"
Use find:
: > found.txt # Ensure the file is empty
find . -prune -exec find -name "$template" \; > found.txt
if [ -s found.txt ]; then
echo "No matching files"
else
echo "Matching files found"
fi
Strictly speaking, you can't assume that found.txt contains exactly one file name per line; a filename with an embedded newline will look the same as two separate files. But this does guarantee that an empty file means no matching files.
If you want an accurate list of matching file names, you need to disable field splitting while keeping pathname expansion.
[[ -v IFS ]] && OLD_IFS=$IFS
IFS=
shopt -s nullglob
files=( $template )
[[ -v OLD_IFS ]] && IFS=$OLD_IFS
printf "Found: %s\n" "${files[#]}"
This requires several bash extensions (the nullglob option, arrays, and the -v operator for convenience of restoring IFS). Each element of the array is exactly one match.

bash script not filtering

I'm hoping this is a simple question, since I've never done shell scripting before. I'm trying to filter certain files out of a list of results. While the script executes and prints out a list of files, it's not filtering out the ones I don't want. Thanks for any help you can provide!
#!/bin/bash
# Purpose: Identify all *md files in H2 repo where there is no audit date
#
#
#
# Example call: no_audits.sh
#
# If that call doesn't work, try ./no_audits.sh
#
# NOTE: Script assumes you are executing from within the scripts directory of
# your local H2 git repo.
#
# Process:
# 1) Go to H2 repo content directory (assumption is you are in the scripts dir)
# 2) Use for loop to go through all *md files in each content sub dir
# and list all file names and directories where audit date is null
#
#set counter
count=0
# Go to content directory and loop through all 'md' files in sub dirs
cd ../content
FILES=`find . -type f -name '*md' -print`
for f in $FILES
do
if [[ $f == "*all*" ]] || [[ $f == "*index*" ]] ;
then
# code to skip
echo " Skipping file: " $f
continue
else
# find audit_date in file metadata
adate=`grep audit_date $f`
# separate actual dates from rest of the grepped line
aadate=`echo $adate | awk -F\' '{print $2}'`
# if create date is null - proceed
if [[ -z "$aadate" ]] ;
then
# print a list of all files without audit dates
echo "Audit date: " $aadate " " $f;
count=$((count+1));
fi
fi
done
echo $count " files without audit dates "
First, to address the immediate issue:
[[ $f == "*all*" ]]
is only true if the exact contents of f is the string *all* -- with the wildcards as literal characters. If you want to check for a substring, then the asterisks shouldn't be quoted:
[[ $f = *all* ]]
...is a better-practice solution. (Note the use of = rather than == -- this isn't essential, but is a good habit to be in, as the POSIX test command is only specified to permit = as a string comparison operator; if one writes [ "$f" == foo ] by habit, one can get unexpected failures on platforms with a strictly compliant /bin/sh).
That said, a ground-up implementation of this script intended to follow best practices might look more like the following:
#!/usr/bin/env bash
count=0
while IFS= read -r -d '' filename; do
aadate=$(awk -F"'" '/audit_date/ { print $2; exit; }' <"$filename")
if [[ -z $aadate ]]; then
(( ++count ))
printf 'File %q has no audit date\n' "$filename"
else
printf 'File %q has audit date %s\n' "$filename" "$aadate"
fi
done < <(find . -not '(' -name '*all*' -o -name '*index*' ')' -type f -name '*md' -print0)
echo "Found $count files without audit dates" >&2
Note:
An arbitrary list of filenames cannot be stored in a single bash string (because all characters that might otherwise be used to determine where the first name ends and the next name begins could be present in the name itself). Instead, read one NUL-delimited filename at a time -- emitted with find -print0, read with IFS= read -r -d ''; this is discussed in [BashFAQ #1].
Filtering out unwanted names can be done internal to find.
There's no need to preprocess input to awk using grep, as awk is capable of searching through input files itself.
< <(...) is used to avoid the behavior in BashFAQ #24, wherein content piped to a while loop causes variables set or modified within that loop to become unavailable after its exit.
printf '...%q...\n' "$name" is safer than echo "...$name..." when handling unknown filenames, as printf will emit printable content that accurately represents those names even if they contain unprintable characters or characters which, when emitted directly to a terminal, act to modify that terminal's configuration.
Nevermind, I found the answer here:
bash script to check file name begins with expected string
I tried various versions of the wildcard/filename and ended up with:
if [[ "$f" == *all.md ]] || [[ "$f" == *index.md ]] ;
The link above said not to put those in quotes, and removing the quotes did the trick!

Why doesn't counting files with "for file in $0/*; let i=$i+1; done" work?

I'm new in ShellScripting and have the following script that i created based on a simpler one, i want to pass it an argument with the path to count files. Cannot find my logical mistake to make it work right, the output is always "1"
#!/bin/bash
i=0
for file in $0/*
do
let i=$i+1
done
echo $i
To execute the code i use
sh scriptname.sh /path/to/folder/to/count/files
$0 is the name with which your script was invoked (roughly, subject to several exceptions that aren't pertinent here). The first argument is $1, and so it's $1 that you want to use in your glob expression.
#!/bin/bash
i=0
for file in "$1"/*; do
i=$(( i + 1 )) ## $(( )) is POSIX-compliant arithmetic syntax; let is deprecated.
done
echo "$i"
That said, you can get this number more directly:
#!/bin/bash
shopt -s nullglob # allow globs to expand to an empty list
files=( "$1"/* ) # put list of files into an array
echo "${#files[#]}" # count the number of items in the array
...or even:
#!/bin/sh
set -- "$1"/* # override $# with the list of files matching the glob
if [ -e "$1" ] || [ -L "$1" ]; then # if $1 exists, then it had matches
echo "$#" # ...so emit their number.
else
echo 0 # otherwise, our result is 0.
fi
If you want to count the number of files in a directory, you can run something like this:
ls /path/to/folder/to/count/files | wc -l

Recursively list hidden files without ls, find or extendedglob

As an exercise I have set myself the task of recursively listing files using bash builtins. I particularly don't want to use ls or find and I would prefer not to use setopt extendedglob. The following appears to work but I cannot see how to extend it with /.* to list hidden files. Is there a simple workaround?
g() { for k in "$1"/*; do # loop through directory
[[ -f "$k" ]] && { echo "$k"; continue; }; # echo file path
[[ -d "$k" ]] && { [[ -L "$k" ]] && { echo "$k"; continue; }; # echo symlinks but don't follow
g "$k"; }; # start over with new directory
done; }; g "/Users/neville/Desktop" # original directory
Added later: sorry - I should have said: 'bash-3.2 on OS X'
Change
for k in "$1"/*; do
to
for k in "$1"/* "$1"/.[^.]* "$1"/..?*; do
The second glob matches all files whose names start with a dot followed by anything other than a dot, while the third matches all files whose names start with two dots followed by something. Between the two of them, they will match all hidden files other than the entries . and ...
Unfortunately, unless the shell option nullglob is set, those (like the first glob) could remain as-is if there are no files whose names match (extremely likely in the case of the third one) so it is necessary to verify that the name is actually a file.
An alternative would be to use the much simpler glob "$1"/.*, which will always match the . and .. directory entries, and will consequently always be substituted. In that case, it's necessary to remove the two entries from the list:
for k in "$1"/* "$1"/.*; do
if ! [[ $k =~ /\.\.?$ ]]; then
# ...
fi
done
(It is still possible for "$1"/* to remain in the list, though. So that doesn't help as much as it might.)
Set the GLOBIGNORE file to exclude . and .., which implicitly turns on "shopt -u dotglob". Then your original code works with no other changes.
user#host [/home/user/dir]
$ touch file
user#host [/home/user/dir]
$ touch .dotfile
user#host [/home/user/dir]
$ echo *
file
user#host [/home/user/dir]
$ GLOBIGNORE=".:.."
user#host [/home/user/dir]
$ echo *
.dotfile file
Note that this is bash-specific. In particular, it does not work in ksh.
You can specify multiple arguments to for:
for k in "$1"/* "$1"/.*; do
But if you do search for .* in directories , you should be aware that it also gives you the . and .. files. You may also be given a nonexistent file if the "$1"/* glob matches, so I would check that too.
With that in mind, this is how I would correct the loop:
g() {
local k subdir
for k in "$1"/* "$1"/.*; do # loop through directory
[[ -e "$k" ]] || continue # Skip missing files (unmatched globs)
subdir=${k##*/}
[[ "$subdir" = . ]] || [[ "$subdir" = .. ]] && continue # Skip the pseudo-directories "." and ".."
if [[ -f "$k" ]] || [[ -L "$k" ]]; then
printf %s\\n "$k" # Echo the paths of files and symlinks
elif [[ -d "$k" ]]; then
g "$k" # start over with new directory
fi
done
}
g ~neville/Desktop
Here the funky-looking ${k##*/} is just a fast way to take the basename of the file, while local was put in so that the variables don't modify any existing variables in the shell.
One more thing I've changed is echo "$k" to printf %s\\n "$k", because echo is irredeemably flawed in its argument handling and should be avoided for the purpose of echoing an unknown variable. (See Rich's sh tricks for an explanation of how; it boils down to -n and -e throwing a spanner in the works.)
By the way, this will NOT print sockets or fifos - is that intentional?

Resources