Get directory when last folder in path ends in given string (sed in ifelse) - bash

I am attempting to find multiple files with .py extension and grep to see if any of these files contain the string nn. Then return only the directory name (uniques), afterwards, if the last folder of the path ends in nn, then select this.
For example:
find `pwd` -iname '*.py' | xargs grep -l 'nn' | xargs dirname | sort -u | while read files; do if [[ sed 's|[\/](.*)*[\/]||g' == 'nn' ]]; then echo $files; fi; done
However, I cannot use sed in if-else expression, how can I use it for this case?

[[ ]] is not bracket syntax for an if statement like in other languages such as C or Java. It's a special command for evaluating a conditional expression. Depending on your intentions you need to either exclude it or use it correctly.
If you're trying to test a command for success or failure just call the command:
if command ; then
:
fi
If you want to test the output of the command is equal to some value, you need to use a command substitution:
if [[ $( command ) = some_value ]] ; then
:
fi
In your case though, a simple parameter expansion will be easier:
# if $files does not contain a trailing slash
if [[ "${files: -2}" = "nn" ]] ; then
echo "${files}"
fi
# if $files does contain a trailing slash
if [[ "${files: -3}" = "nn/" ]] ; then
echo "${files%/}"
fi

Shell loop and the [[ is superfluous here, since you use the sed anyway. This task could be accomplished by:
find "$PWD" -type f -name '*.py' -exec grep -l 'nn' {} + |
sed -n 's%\(.*nn\)/[^/]*$%\1%p' | sort -u
assuming pathnames don't contain a newline character.

Related

How can i sort a Array based on a not integer Substring in Bash?

I wrote a cleanup Script to delete some certain files. The files are stored in Subfolders. I use find to get those files into a Array and its recursive because of find. So an Array entry could look like this:
(path to File)
./2021_11_08_17_28_45_1733556/2021_11_12_04_15_51_1733556_0.jfr
As you can see the filenames are Timestamps. Find sorts by the Folder name only (./2021_11_08_17_28_45_1733556) but I need to sort all Files which can be in different Folders by the timestamp only of the files and not of the folders (they can be completely ignored), so I can delete the oldest files first. Here you can find my Script at the not properly working state, I need to add some sorting to fix my problems.
Any Ideas?
#!/bin/bash
# handle -h (help)
if [[ "$1" == "-h" || "$1" == "" ]]; then
echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
exit 0
fi
# handle parameters
while getopts p:f:d: flag
do
case "${flag}" in
p) pathToFolder=${OPTARG};;
f) maxFiles=${OPTARG};;
d) dryRun=${OPTARG};;
*) echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
esac
done
if [[ -z $dryRun ]]; then
dryRun=true
fi
# fill array specified by .jfr files an sorted that the oldest files get deleted first
fillarray() {
files=($(find -name "*.jfr" -type f))
totalFiles=${#files[#]}
}
# Return size of file
getfilesize() {
filesize=$(du -k "$1" | cut -f1)
}
count=0
checkfiles() {
# Check if File matches the maxFiles parameter
if [[ ${#files[#]} -gt $maxFiles ]]; then
# Check if dryRun is enabled
if [[ $dryRun == "false" ]]; then
echo "msg=\"Removal result\", result=true, file=$(realpath $1) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
((count++))
rm $1
else
((count++))
echo msg="\"Removal result\", result=true, file=$(realpath $1 ) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
fi
# Remove the file from the files array
files=(${files[#]/$1})
else
echo msg="\"Removal result\", result=false, file=$( realpath $1), reason=\"within max file boundary\""
fi
}
# Scan for empty files
scanfornullfiles() {
for file in "${files[#]}"
do
filesize=$(! getfilesize $file)
if [[ $filesize == 0 ]]; then
files=(${files[#]/$file})
echo msg="\"Removal result\", result=false, file=$(realpath $file), reason=\"empty file\""
fi
done
}
echo msg="jfrcleanup.sh started", maxFiles=$maxFiles, dryRun=$dryRun, directory=$pathToFolder
{
cd $pathToFolder > /dev/null 2>&1
} || {
echo msg="no permission in directory"
echo msg="jfrcleanup.sh stopped"
exit 0
}
fillarray #> /dev/null 2>&1
scanfornullfiles
for file in "${files[#]}"
do
checkfiles $file
done
echo msg="\"jfrcleanup.sh finished\", totalFileCount=$totalFiles filesRemoved=$count"
Assuming the file paths do not contain newline characters, would tou please try
the following Schwartzian transform method:
#!/bin/bash
pat="/([0-9]{4}(_[0-9]{2}){5})[^/]*\.jfr$"
while IFS= read -r -d "" path; do
if [[ $path =~ $pat ]]; then
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$path"
fi
done < <(find . -type f -name "*.jfr" -print0) | sort -k1,1 | head -n 1 | cut -f2- | tr "\n" "\0" | xargs -0 echo rm
The string pat is a regex pattern to extract the timestamp from the
filename such as 2021_11_12_04_15_51.
Then the timestamp is prepended to the filename delimited by a tab
character.
The output lines are sorted by the timestamp in ascending order
(oldest first).
head -n 1 picks the oldest line. If you want to change the number of files
to remove, modify the number to the -n option.
cut -f2- drops the timestamp to retrieve the filename.
tr "\n" "\0" protects the filenames which contain whitespaces or
tab characters.
xargs -0 echo rm just outputs the command lines as a dry run.
If the output looks good, drop echo.
If you have GNU find, and pathnames don't contain new-line ('\n') and tab ('\t') characters, the output of this command will be ordered by basenames:
find path/to/dir -type f -printf '%f\t%p\n' | sort | cut -f2-
TL;DR but Since you're using find and if it supports the -printf flag/option something like.
find . -type f -name '*.jfr' -printf '%f/%h/%f\n' | sort -k1 -n | cut -d '/' -f2-
Otherwise a while read loop with another -printf option.
#!/usr/bin/env bash
while IFS='/' read -rd '' time file; do
printf '%s\n' "$file"
done < <(find . -type f -name '*.jfr' -printf '%T#/%p\0' | sort -zn)
That is -printf from find and the -z flag from sort is a GNU extension.
Saving the file names you could change
printf '%s\n' "$file"
To something like, which is an array named files
files+=("$file")
Then "${files[#]}" has the file names as elements.
The last code with a while read loop does not depend on the file names but the time stamp from GNU find.
I solved the problem! I sort the array with the following so the oldest files will be deleted first:
files=($(printf '%s\n' "${files[#]}" | sort -t/ -k3))
Link to Solution

How to match a folder name and use it in an if condition using grep in bash?

for d in */ ; do
cd $d
NUM = $(echo ${PWD##*/} | grep -q "*abc*");
if [[ "$NUM" -ne "0" ]]; then
pwd
fi
cd ..
done
Here I'm trying to match a folder name to some substring 'abc' in the name of the folder and check if the output of the grep is not 0. But it gives me an error which reads that NUM: command not found
An error was addressed in comments.
NUM = $(echo ${PWD##*/} | grep -q "*abc*"); should be NUM=$(echo ${PWD##*/} | grep -q "*abc*");.
To clarify, the core problem would be to be able to match current directory name to a pattern.
You can probably simply the code to just
if grep -q "*abc*" <<< "${PWD##*/}" 2>/dev/null; then
echo "$PWD"
# Your rest of the code goes here
fi
You can use the exit code of the grep directly in a if-conditional without using a temporary variable here ($NUM here). The condition will pass if grep was able to find a match. The here-string <<<, will pass the input to grep similar to echo with a pipeline. The part 2>/dev/null is to just suppress any errors (stderr - file descriptor 2) if grep throws!
As an additional requirement asked by OP, to negate the conditional check just do
if ! grep -q "*abc*" <<< "${PWD##*/}" 2>/dev/null; then

How to find files and count them (storing the info into a variable)?

I want to have a conditional behavior depending on the number of files found:
found=$(find . -type f -name "$1")
numfiles=$(printf "%s\n" "$found" | wc -l)
if [ $numfiles -eq 0 ]; then
echo "cannot access $1: No such file" > /dev/stderr; exit 2;
elif [ $numfiles -gt 1 ]; then
echo "cannot access $1: Duplicate file found" > /dev/stderr; exit 2;
else
echo "File: $(ls $found)"
head $found
fi
EDITED CODE (to reflect more precisely what I need)
Though, numfiles isn't equal to 2(or more) when there are duplicate files found...
All the filenames are on one line, separated by a space.
On the other hand, this works correctly:
find . -type f -name "$1" | wc -l
but I don't want to do twice the recursive search in the if/then/else construct...
Adding -print0 doesn't help either.
What would?
PS- Simplifications or improvements are always welcome!
You want to find files and count the files with a name "$1":
grep -c "/${1}$" $(find . 2>/dev/null)
And store the result in a var. In one command:
numfiles=$(grep -c "/${1}$" <(find . 2>/dev/null))
Using $() to store data to a variable trims tailing whitespace. Since the final newline does not appear in the variable numfiles, wc miscounts by one. You can recover the trailing newline with:
numfiles=$(printf "%s\n" "$found" | wc -l)
This miscounts if found is empty (and if any filenames contain a newline), emphasizing the fact that this entire approach is faulty. If you really want to go this way, you can try:
numfiles=$(test -z "$numfiles" && echo 0 || printf "%s\n" "$found" | wc -l)
or pipe the output of find to a script that counts the output and prints a count along with the first filename:
find . -type f -name "$1" | tr '\n' ' ' |
awk '{c=NF; f=$1 } END {print c, f; exit c!=1}' c=0 |
while read count name; do
case $count in
0) echo no files >&2;;
1) echo 1 file $name;;
*) echo Duplicate files >&2;;
esac;
done
All of these solutions fail miserably if any pathnames contain whitespace. If that matters, you could change the awk to a perl script to make it easier to handle null separators and use -print0, but really I think you should stop worrying about special cases. (find -exec and find | xargs both fail to handle to 0 files matching case cleanly. Arguably this awk solution also doesn't handle it cleanly.)

How to ensure only one file is returned by checking for newline

What is the simplest way to check if a string contains newline?
For example, after
FILE=$(find . -name "pattern_*.sh")
I'd like to check for newline to ensure only one file matched.
You can use pattern matching:
[[ $FILE == *$'\n'* ]] && echo More than one line
If $str contains new line you can check it by,
if [ $(echo "$str" | wc -l) -gt 1 ];
then
// perform operation if it has new lines
else
// no new lines.
fi
To refer to your example: Note that filenames could contain newlines, too.
A safe way to count files would be
find -name "pattern_*.sh" -printf '\n' | wc -c
This avoids printing the filename and prints only a newline instead.

Removing final bash script argument

I'm trying to write a script that searches a directory for files and greps for a pattern. Something similar to the below except the find expression is much more complicated (excludes particular directories and files).
#!/bin/bash
if [ -d "${!#}" ]
then
path=${!#}
else
path="."
fi
find $path -print0 | xargs -0 grep "$#"
Obviously, the above doesn't work because "$#" still contains the path. I've tried variants of building up an argument list by iterating over all the arguments to exclude path such as
args=${#%$path}
find $path -print0 | xargs -0 grep "$path"
or
whitespace="[[:space:]]"
args=""
for i in "${#%$path}"
do
# handle the NULL case
if [ ! "$i" ]
then
continue
# quote any arguments containing white-space
elif [[ $i =~ $whitespace ]]
then
args="$args \"$i\""
else
args="$args $i"
fi
done
find $path -print0 | xargs -0 grep --color "$args"
but these fail with quoted input. For example,
# ./find.sh -i "some quoted string"
grep: quoted: No such file or directory
grep: string: No such file or directory
Note that if $# doesn't contain the path, the first script does do what I want.
EDIT : Thanks for the great solutions! I went with a combination of the answers:
#!/bin/bash
path="."
end=$#
if [ -d "${!#}" ]
then
path="${!#}"
end=$((end - 1))
fi
find "$path" -print0 | xargs -0 grep "${#:1:$end}"
EDIT:
Original was just slightly off. No removal is to be done if the last argument is not a directory.
#!/bin/bash
if [ -d "${!#}" ]
then
path="${!#}"
remove=1
else
path="."
remove=0
fi
find "$path" -print0 | xargs -0 grep "${#:1:$(($#-remove))}"

Resources