Simulating the find command: why is my code not recursing correctly? - bash

My assignment is to write a Unix shell script that asks the user for the name of a directory, and then works exactly like find.
Here is what I have so far:
#!/bin/bash
dir_lister()
{
cd "$1"
echo "$1"
list=$(ls -l ${1})
nolines=$(echo "$list" | awk 'END{printf "%d",NF}')
if [ $nolines -eq 2 ]
then
echo "$1"
return
fi
filelist=$(echo "$list" | grep ^-.*)
dirlist=$(echo "$list" | grep ^d.*)
filename=$(echo "$filelist"| awk '{printf "%s\n",$NF}')
present=$(pwd)
echo "$filename"| awk -v pres=$present '{printf "%s/%s\n",pres,$0}'
dirlist2=$(echo "$dirlist" | awk '{printf "%s\n",$NF}')
echo "$dirlist2" | while IFS= read -r line;
do
nextCall=$(echo "$present/$line");
dir_lister $nextCall;
cd ".."
done
cd ".."
}
read -p "Enter the name of the direcotry: " dName
dir_lister $dName
The problem is, after a depth of three directories, this script gets into an infinite loop, and I don't see why.
EDIT:
Here is the code i came up with after looking at your answer, it still doesn't go more than 1 directory depth:
#!/bin/bash
shopt -s dotglob # don't miss "hidden files"
shopt -s nullglob # don't fail on empty directories
list_directory()
{
cd "$2"
cd "$1"
##echo -e "I am called \t $1 \t $2"
for fileName in "$1/"*
do
##echo -e "hello \t $fileName"
if [ -d "$fileName" ];
then
echo "$fileName"
list_directory $fileName $2
else
echo "$fileName"
fi
done
}
read -p "Enter the direcotory Name: " dirName
var=$(pwd)
list_directory $dirName $var

Okay, that is completely the wrong way to list files in a directory (see ParsingLs). I'll give you the pieces and you should be able to put them together into a working script.
Put this at the top of your script:
shopt -s dotglob # don't miss "hidden files"
shopt -s nullglob # don't fail on empty directories
Then you can easily loop over directory contents with:
for file in "$directory/"* ; do
#...
done
Test if you have a directory:
if [ -d "$file" ] ; then
# "$file" is a directory, recurse...
fi

Related

Delete empty files - Improve performance of logic

I am i need to find & remove empty files. The definition of empty files in my use case is a file which has zero lines.
I did try testing the file to see if it's empty However, this behaves strangely as in even though the file is empty it doesn't detect it so.
Hence, the best thing I could write up is the below script which i way too slow given it has to test several hundred thousand files
#!/bin/bash
LOOKUP_DIR="/path/to/source/directory"
cd ${LOOKUP_DIR} || { echo "cd failed"; exit 0; }
for fname in $(realpath */*)
do
if [[ $(wc -l "${fname}" | awk '{print $1}') -eq 0 ]]
then
echo "${fname}" is empty
rm -f "${fname}"
fi
done
Is there a better way to do what I'm after or alternatively, can the above logic be re-written in a way that brings better performance please?
Your script is slow beacuse wc reads every file to the end, which is not needed for your purpose. This might be what you're looking for:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || echo rm -f -- "$file"
fi
done
Drop the echo after making sure it works as intended.
Another version, calling the rm only once, could be:
#!/bin/bash
lookup_dir='/path/to/source/directory'
cd "$lookup_dir" || exit
for file in *; do
if [[ -f "$file" && -r "$file" && ! -L "$file" ]]; then
read < "$file" || files_to_be_deleted+=("$file")
fi
done
rm -f -- "${files_to_be_deleted[#]}"
Explanation:
The core logic is in the line
read < "$file" || rm -f -- "$file"
The read < "$file" command attempts to read a line from the $file. If it succeeds, that is, a line is read, then the rm command on the right-hand side of the || won't be executed (that's how the || works). If it fails then the rm command will be executed. In any case, at most one line will be read. This has great advantage over the wc command because wc would read the whole file.
if ! read < "$file"; then rm -f -- "$file"; fi
could be used instead. The two lines are equivalent.
To check a "$fname" is a file and is empty or not, use [ -s "$fname" ]:
#!/usr/bin/env sh
LOOKUP_DIR="/path/to/source/directory"
for fname in "$LOOKUP_DIR"*/*; do
if ! [ -s "$fname" ]; then
echo "${fname}" is empty
# remove echo when output is what you want
echo rm -f "${fname}"
fi
done
See: help test:
File operators:
...
-s FILE True if file exists and is not empty.
Yet another method
wc -l ~/tmp/* 2>/dev/null | awk '$1 == 0 {print $2}' | xargs echo rm
This will break if any of your files have whitespace in the name.
To work around that, with awk still
wc -l ~/tmp/* 2>/dev/null \
| awk 'sub(/^[[:blank:]]+0[[:blank:]]+/, "")' \
| xargs echo rm
This works because the sub function returns the number of substitutions made, which can be treated as a boolean zero/not-zero condition.
Remove the echo to actually delete the files.

Looping through each file in directory - bash

I'm trying to perform certain operation on each file in a directory but there is a problem with order it's going through. It should do one file at the time. The long line (unzipping, grepping, zipping) works fine on a single file without a script, so there is a problem with a loop. Any ideas?
Script should grep through through each zipped file and look for word1 or word2. If at least one of them exist then:
unzip file
grep word1 and word2 and save it to file_done
remove unzipped file
zip file_done to /donefiles/ with original name
remove file_done from original directory
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The code snipped you've provided has a few problems, e.g. unneeded nested for cycle and erroneous pipeline
(the whole line gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip...).
Note also your code will work correctly only if *.gz files don't have spaces (or special characters) in names.
Also zgrep -c 'word1\|word2' will also match strings like line_starts_withword1_orword2_.
Here is the working version of the script:
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c -E 'word1|word2' $file) # now counter is the number of word1/word2 occurences in $file
if [[ $counter -gt 0 ]]; then
name=$(basename $file .gz)
zcat $file | grep -E 'word1|word2' > ${name}_done
gzip -f -c ${name}_done > /donefiles/$file
rm -f ${name}_done
else
echo 'nothing to do here'
fi
done
What we can improve here is:
since we unzipping the file anyway to check for word1|word2 presence, we may do this to temp file and avoid double-unzipping
we don't need to count how many word1 or word2 is inside the file, we may just check for their presence
${name}_done can be a temp file cleaned up automatically
we can use while cycle to handle file names with spaces
#!/bin/bash
tmp=`mktemp /tmp/gzip_demo.XXXXXX` # create temp file for us
trap "rm -f \"$tmp\"" EXIT INT TERM QUIT HUP # clean $tmp upon exit or termination
find . -maxdepth 1 -mindepth 1 -type f -name '*.gz' | while read f; do
# quotes around $f are now required in case of spaces in it
s=$(basename "$f") # short name w/o dir
gunzip -f -c "$f" | grep -P '\b(word1|word2)\b' > "$tmp"
[ -s "$tmp" ] && gzip -f -c "$tmp" > "/donefiles/$s" # create archive if anything is found
done
It looks like you have an inner loop inside the outer one :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do #<<< HERE
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The inner loop goes through all the files in the directory if one of them contains file1 or file2. You probably want this :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
else
echo "nothing to do here"
fi
done

Bash script loop through subdirectories and write to file

I have no idea I have spent a lot of hours dealing with this problem. I need to write script. Script should loop recursively through subdirectories in current directory. It should check files count in each directory. If file count is greater than 10 it should write all names of these file in file named "BigList" otherwise it should write in file "ShortList". This should look like
---<directory name>
<filename>
<filename>
<filename>
<filename>
....
---<directory name>
<filename>
<filename>
<filename>
<filename>
....
My script only works if subdirecotries don't include subdirectories in turn.
I am confused about this. Because it doesn't work as I expect. It will take less than 5 minutes to write this on any programming language for my.
Please help to solve this problem , because I have no idea how to do this.
Here is my script
#!/bin/bash
parent_dir=""
if [ -d "$1" ]; then
path=$1;
else
path=$(pwd)
fi
parent_dir=$path
loop_folder_recurse() {
local files_list=""
local cnt=0
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
parent_dir=$i
echo before recursion
loop_folder_recurse "$i"
echo after recursion
if [ $cnt -ge 10 ]; then
echo -e "---"$parent_dir >> BigList
echo -e $file_list >> BigList
else
echo -e "---"$parent_dir >> ShortList
echo -e $file_list >> ShortList
fi
elif [ -f "$i" ]; then
echo file $i
if [ $cur_fol != $main_pwd ]; then
file_list+=$i'\n'
cnt=$((cnt + 1))
fi
fi
done
}
echo "Base path: $path"
loop_folder_recurse $path
I believe that this does what you want:
find . -type d -exec env d={} bash -c 'out=Shortlist; [ $(ls "$d" | wc -l) -ge 10 ] && out=Biglist; { echo "--$d"; ls "$d"; echo; } >>"$out"' ';'
If we don't want either to count subdirectories to the cut-off or to list them in the output, then use this version:
find . -type d -exec env d={} bash -c 'out=Shortlist; [ $(ls -p "$d" | grep -v "/$" | wc -l) -ge 10 ] && out=Biglist; { echo "--$d"; ls -p "$d"; echo; } | grep -v "/$" >>"$out"' ';'

Bash: Native way to check if an entry is one line?

I have a find script that automatically opens a file if just one file is found. The way I currently handle it is doing a word count on the number of lines of the search results. Is there an easier way to do this?
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
edit `cat "$temp"`
fi
EDITED - here is the context of the whole script.
term="$1"
temp=".aafind.txt"
find src sql common -iname "*$term*" | grep -v 'src/.*lib' >> "$temp"
if [ ! -s "$temp" ]; then
echo "ΓΈ - including lib..." 1>&2
find src sql common -iname "*$term*" >> "$temp"
fi
if [ "$( cat "$temp" | wc -l | xargs echo )" == "1" ]; then
# just open it in an editor
edit `cat "$temp"`
else
# format output
term_regex=`echo "$term" | sed "s%\*%[^/]*%g" | sed "s%\?%[^/]%g" `
cat "$temp" | sed -E 's%//+%/%' | grep --color -E -i "$term_regex|$"
fi
rm "$temp"
Unless I'm misunderstanding, the variable $temp contains one or more filenames, one per line, and if there is only one filename it should be edited?
[ $(wc -l <<< "$temp") = "1" ] && edit "$temp"
If $temp is a file containing filenames:
[ $(wc -l < "$temp") = "1" ] && edit "$(cat "$temp")"
Several of the results here will read through an entire file, whereas one can stop and have an answer after one line and one character:
if { IFS='' read -r result && ! read -n 1 _; } <file; then
echo "Exactly one line: $result"
else
echo "Either no valid content at all, or more than one line"
fi
For safely reading from find, if you have GNU find and bash as your shell, replace <file with < <(find ...) in the above. Even better, in that case, is to use NUL-delimited names, such that filenames with newlines (yes, they're legal) don't trip you up:
if { IFS='' read -r -d '' result && ! read -r -d '' -n 1 _; } \
< <(find ... -print0); then
printf 'Exactly one file: %q\n' "$result"
else
echo "Either no results, or more than one"
fi
Well, given that you are storing these results in the file $temp this is a little easier:
[ "$( wc -l < $temp )" -eq 1 ] && edit "$( cat $temp )"
Instead of 'cat $temp' you can do '< $temp', but it might take away some readability if you are not very familiar with redirection 8)
If you want to test whether the file is empty or not, test -s does that.
if [ -s "$temp" ]; then
edit `cat "$temp"`
fi
(A non-empty file by definition contains at least one line. You should find that wc -l agrees.)
If you genuinely want a line count of exactly one, then yes, it can be simplified substantially;
if [ $( wc -l <"$temp" ) = 1 ]; then
edit `cat "$temp"`
fi
You can use arrays:
x=($(find . -type f))
[ "${#x[*]}" -eq 1 ] && echo "just one || echo "many"
But you might have problems in case of filenames with whitespace, etc.
Still, something like this would be a native way
no this is the way, though you're making it over-complicated:
if [ "`wc -l $temp | cut -d' ' -f1`" = "1" ]; then
edit "$temp";
fi
what's complicating it is:
useless use of cat,
unuseful use of xargs
and I'm not sure if you really want the editcat $temp`` which is editing the file at the content of $temp

shell get string

I have some lines have same structure like
1000 AS34_59329 RICwdsRSYHSD11-2-IPAAPEK-93 /ifshk5/BC_IP/PROJECT/T1
1073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IP
AAPEK-93_1.fq.gz /ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_5932
9/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_2.fq.gz /ifshk5/
BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/clean_111220_I631_FCC0E5EACXX_
L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz.total.info 11.824 0.981393
43.8283 95.7401 OK
And I want to get the Bold part to check whether in /home/jesse/ has this folder, if not create mkdir /home/jesse/AS34_59329
I use this code
! /bin/bash
myPath="/home/jesse/"
while read myline
do
dirname= echo "$myline" | awk -F ' ' '{print $2}'
echo $dirname
myPath= $myPath$dirname
echo $myPath
mkdir -p "$myPath"
done < T11073_all_3254.fq.list
But it can't mkdir and show the path name, it shows
-bash: /home/jesse/: is a directory
/home/jesse/
AS39_59324
read can read each field into a separate variable, and mkdir -p will create a dir only if it doesn't exist:
path="/home/jesse"
while read _ dir _
do
mkdir -p "$path/$dir"
done < T11073_all_3254.fq.list
for will iterate over each whitespace separated token. Try this instead.
#!/usr/bin/env bash
# Invoke with first arg as file containing the lines
# foo.sh <input_filename>
for i in `cat $1 | cut -d " " -f2`
do
if [ -d /home/jesse/$i ]
then
echo "Directory /home/jesse/$i exists"
else
mkdir /home/jesse/$i;
echo "Directory /home/jesse/$i created"
fi
done

Resources