How sort recursively by maximum file size and count files? - bash

I'm beginner in bash programming. I want to display head -n $1 results of sorting files
by size in /etc/*. The problem is that at final search, I must know how many directories and files has processed.
I compose following code:
#!/bash/bin
let countF=0;
let countD=0;
for file in $(du -sk /etc/* |sort +0n | head $1); do
if [ -f "file" ] then
echo $file;
let countF=countF+1;
else if [ -d "file" ] then
let countD=countD+1;
fi
done
echo $countF
echo $countD
I have errors at execution. How use find with du, because I must search recursively?

#!/bin/bash # directory and program reversed
let countF=0 # semicolon not needed (several more places)
let countD=0
while read -r file; do
if [ -f "$file" ]; then # missing dollar sign and semicolon
echo $file
let countF=countF+1 # could also be: let countF++
else if [ -d "$file" ]; then # missing dollar sign and semicolon
let countD=countD+1
fi
done < <(du -sk /etc/* |sort +0n | head $1) # see below
echo $countF
echo $countD
Changing the loop from a for to a while allows it to work properly in case filenames contain spaces.
I'm not sure what version of sort you have, but I'll take your word for it that the argument is correct.

It's #!/bin/bash not #!/bash/bin.
I don't know what that argument to sort is supposed to be. Maybe you meant sort -r -n?
Your use of head is wrong. Giving head file arguments causes it to ignore its standard input, so in general it's an error to both pipe something to head and give it a file argument. Besides that, "$1" refers to the script's first argument. Did you maybe mean head -n 1, or were you trying to make the number of lines processed configurable from an argument to the script: head -n"$1".
In your if tests, you're not referencing your loop variable: it should read "$file", not "file".
Not that the bash parser cares, but you should try to indent sanely.

#!/bin/bash # directory and program reversed
let countF=0 # semicolon not needed (several more places)
let countD=0
while read -r file; do
if [ -f "$file" ]; then # missing dollar sign and semicolon
echo $file
let countF=countF+1 # could also be: let countF++
else if [ -d "$file" ]; then # missing dollar sign and semicolon
let countD=countD+1
fi
done < <(du -sk /etc/* |sort +0n | head $1) # see below
echo $countF
echo $countD
I tried instead of file variable the /etc/* but I don't see a result. the idea is to sort all files by size from a directories and subdirectories and display $1 results ordered by
size of the files. In this process I must know how many files and dirs contains the directory where
I did the search.

Ruby(1.9+)
#!/usr/bin/env ruby
fc=0
dc=0
a=Dir["/etc/*"].inject([]) do |x,f|
fc+=1 if File.file?(f)
dc+=1 if File.directory?(f)
x<<f
end
puts a.sort
puts "number of files: #{fc}"
puts "number of directories: #{dc}"

Related

Finding presence of substring within a string in BASH

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.
I've settled on something like:
#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" $file`
if [ $match ]; then
echo "Match: $match"
fi
done
Blacklist.txt contains lines of potential matches, like so:
matchthis
"match this too"
thisisasingleword
"This is multiple words"
myfile.txt could be something like:
I would matchthis if I could match things with grep. I really wish I could.
When I ask it to match this too, it fails to matchthis. It should match this too - right?
If I run this at a bash prompt, like so:
j="match this too"
grep -i -m1 -o "$j" myfile.txt
...I get "match this too".
However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.
Where am I going wrong?
Wouldn't
grep -owF -f blacklist.txt myfile.txt
instead of writing an inefficient loop, do what you want?
Would you please try:
#!/bin/bash
file="myfile.txt"
while IFS= read -r j; do
j=${j#\"}; j=${j%\"} # remove surrounding double quotes
echo "Searching for $j..."
match=$(grep -i -m1 -o "$j" "$file")
if (( $? == 0 )); then # if match
echo "Match: $match" # then print it
fi
done < blacklist.txt
Output:
Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...
I wound up abandoning grep entirely and using sed instead.
match=`sed -n "s/.*\($j\).*/\1/p" $file
Works well, and I was able to use unquoted multiple word phrases in the blacklist file.
With this:
if [ $match ]; then
you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:
if [ -n "$match" ]; then
You might also use grep's exit code instead:
if [ "$?" -eq 0 ]; then
for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:
IFS='
'
for j in `cat blacklist.txt`; do
blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.
j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.
Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.
Script:
#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
echo Searching for $j...
unset match
match=`grep -i -m1 -o "$j" "$file"`
if [ -n "$match" ]; then
echo "Match: $match"
fi
done
Alternative to the for ... in ... loop (no IFS= needed):
while read; do
j="$REPLY"
...
done < 'blacklist.txt'

How to check filetype in if statement bash using wildecard and -f

subjects_list=$(ls -l /Volumes/Backup_Plus/PPMI_10 | awk '{ print $NF }')
filepath="/Volumes/Backup_Plus/PPMI_10/$subjects/*/*/S*/"
for subjects in $subjects_list; do
if [[ -f "${filepath}/*.bval" && -f "${filepath}/*.bvec" && -f "${filepath}/*.json" && -f "${filepath}/*.nii.gz" ]]; then
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/keep_subjects.txt
else
echo "${subjects}" >> /Volumes/Backup_Plus/PPMI_10/not_keep_subjects.txt
fi
done
problem is supposedly in the if statement, I tried this...
bvalfile = (*.bval)
bvecfile =(*.bvec)
jsonfile =(*.json)
niigzfile =(*.nii.gz)
if [[ -f "$bvalfile" && -f "$bvecfile" && -f "$jsonfile" && -f "$niigzfile" ]]; then
however that didn't work. Any help with syntax or errors or does it need to be changed completely. Trying to separate the files that have .^file types from those that don't by making two lists.
thanks
You're assigning filepath outside the for-subject loop but using the unset variable $subjects in it. You want to move that inside the loop.
Double-quoted wildcards aren't expanded, so both $filepath and your -f test will be looking for filenames with literal asterisks in them.
-f only works on a single file, so even if you fix the quotes, you'll have a syntax error if there's more than one file matching the pattern.
So I think what you want is something like this:
# note: array assignment -
# shell does the wildcard expansion, no ls required
prefix_list=( /Volumes/Backup_Plus/PPMI_10/* )
# and array expansion
for prefix in "${prefix_list[#]}"; do
# the subject is just the last component of the path
subject=${prefix##*/}
# start by assuming we're keeping this one
decision=keep
# in case filepath pattern matches more than one directory, loop over them
for filepath in "$prefix"/*/*/S*/; do
# if any of the files don't exist, switch to not keeping it
for file in "$filepath"/{*.bval,*.bvec,*.json,*.nii.gz}; do
if [[ ! -f "$file" ]]; then
decision=not_keep
# we have our answer and can stop looping now
break 2
fi
done
done
# now append to the correct list
printf '%s\n' "$subject" >>"/Volumes/Backup_Plus/PPMI_10/${decision}_subjects.txt"
done

Why doesn't counting files with "for file in $0/*; let i=$i+1; done" work?

I'm new in ShellScripting and have the following script that i created based on a simpler one, i want to pass it an argument with the path to count files. Cannot find my logical mistake to make it work right, the output is always "1"
#!/bin/bash
i=0
for file in $0/*
do
let i=$i+1
done
echo $i
To execute the code i use
sh scriptname.sh /path/to/folder/to/count/files
$0 is the name with which your script was invoked (roughly, subject to several exceptions that aren't pertinent here). The first argument is $1, and so it's $1 that you want to use in your glob expression.
#!/bin/bash
i=0
for file in "$1"/*; do
i=$(( i + 1 )) ## $(( )) is POSIX-compliant arithmetic syntax; let is deprecated.
done
echo "$i"
That said, you can get this number more directly:
#!/bin/bash
shopt -s nullglob # allow globs to expand to an empty list
files=( "$1"/* ) # put list of files into an array
echo "${#files[#]}" # count the number of items in the array
...or even:
#!/bin/sh
set -- "$1"/* # override $# with the list of files matching the glob
if [ -e "$1" ] || [ -L "$1" ]; then # if $1 exists, then it had matches
echo "$#" # ...so emit their number.
else
echo 0 # otherwise, our result is 0.
fi
If you want to count the number of files in a directory, you can run something like this:
ls /path/to/folder/to/count/files | wc -l

rename numbering within filename using shell

My files have the following pattern:
a0015_random_name.txt
a0016_some_completely_different_name.txt
a0017_and_so_on.txt
...
I would like to rename only the numbering using the shell, so that they are going two numbers down:
a0015_random_name.txt ---> a0013_random_name.txt
a0016_some_completely_different_name.txt ---> a0014_some_completely_different_name.txt
a0017_and_so_on.txt ---> a0015_and_so_on.txt
I've tried already this:
let n=15; for i in *.txt; do let n=n-2; b=`printf a00`$n'*'.txt; echo "mv $i $b"; done
(I use echo first, in order to see what would happen)
but this gave me:
mv a0015_random_name.txt a0013*.txt
mv a0016_some_completely_different_name.txt a0014*.txt
mv a0017_and_so_on.txt a0015*.txt
Also I've tried to find the command, which would set the rest of the name right, but I couldn't find it. Does someone know it, or have a better idea how to do this?
Your code is almost correct. Try this:
let n=15; for i in *.txt; do let n=n-2; b=`echo $i | sed "s/a[0-9]*/a$n/g`; echo "mv $i $b"; done
Better yet, to make it more robust, use the following modification:
let n=15; for i in *.txt; do let t=n-2; let n=n+1; b=`echo $i | sed "s/a00$n/a00$t/g`; echo "mv $i $b"; done
If you have the Perl rename.pl script, this is a one-liner:
rename 's/\d+/sprintf "%0${\(length $&)}d", $&-2/e' *.txt
Otherwise, it's a bit wordier. Here's one way:
for f in *.txt; do
number=$(expr "$f" : '^[^0-9]*\([0-9]*\)') # extract the first number from the filename
prefix=${f%%$number*} # remember the part before
suffix=${f#*$number} # and after the number
let n=10#$number-2 # subtract 2
nf=$(printf "%s%0${#number}d%s" \
"$prefix" "$n" "$suffix") # build new filename
echo "mv '$f' '$nf'" # echo the rename command
# mv "$f" "$nf" # uncomment to actually do the rename
done
Note the 10# on the let line - that forces the number to be interpreted in base 10 even if it has leading zeroes, which would otherwise cause it to be interpreted in base 8. Also, the %0${#number}d format tells printf to format the new number with enough leading zeroes to be the same length as the original number.
On your example, the above script produces this output:
mv 'a0015_random_name.txt' 'a0013_random_name.txt'
mv 'a0016_some_completely_different_name.txt' 'a0014_some_completely_different_name.txt'
mv 'a0017_and_so_on.txt' 'a0015_and_so_on.txt'

How can I get my bash script to work?

My bash script doesn't work the way I want it to:
#!/bin/bash
total="0"
count="0"
#FILE="$1" This is the easier way
for FILE in $*
do
# Start processing all processable files
while read line
do
if [[ "$line" =~ ^Total ]];
then
tmp=$(echo $line | cut -d':' -f2)
count=$(expr $count + 1)
total=$(expr $total + $tmp)
fi
done < $FILE
done
echo "The Total Is: $total"
echo "$FILE"
Is there another way to modify this script so that it reads arguments into $1 instead of $FILE? I've tried using a while loop:
while [ $1 != "" ]
do ....
done
Also when I implement that the code repeats itself. Is there a way to fix that as well?
Another problem that I'm having is that when I have multiple files hi*.txt it gives me duplicates. Why? I have files like hi1.txt hi1.txt~ but the tilde file is of 0 bytes, so my script shouldn't be finding anything.
What i have is fine, but could be improved. I appreciate your awk suggestions but its currently beyond my level as a unix programmer.
Strager: The files that my text editor generates automatically contain nothing..it is of 0 bytes..But yeah i went ahead and deleted them just to be sure. But no my script is in fact reading everything twice. I suppose its looping again when it really shouldnt. I've tried to silence that action with the exit commands..But wasnt successful.
while [ "$1" != "" ]; do
# Code here
# Next argument
shift
done
This code is pretty sweet, but I'm specifying all the possible commands at one time. Example: hi[145].txt
If supplied would read all three files at once.
Suppose the user enters hi*.txt;
I then get all my hi files read twice and then added again.
How can I code it so that it reads my files (just once) upon specification of hi*.txt?
I really think that this is because of not having $1.
It looks like you are trying to add up the totals from the lines labelled 'Total:' in the files provided. It is always a good idea to state what you're trying to do - as well as how you're trying to do it (see How to Ask Questions the Smart Way).
If so, then you're doing in about as complicated a way as I can see. What was wrong with:
grep '^Total:' "$#" |
cut -d: -f2 |
awk '{sum += $1}
END { print sum }'
This doesn't print out "The total is" etc; and it is not clear why you echo $FILE at the end of your version.
You can use Perl or any other suitable program in place of awk; you could do the whole job in Perl or Python - indeed, the cut work could be done by awk:
grep "^Total:" "$#" |
awk -F: '{sum += $2}
END { print sum }'
Taken still further, the whole job could be done by awk:
awk -F: '$1 ~ /^Total/ { sum += $2 }
END { print sum }' "$#"
The code in Perl wouldn't be much harder and the result might be quicker:
perl -na -F: -e '$sum += $F[1] if m/^Total:/; END { print $sum; }' "$#"
When iterating over the file name arguments provided in a shell script, you should use '"$#"' in place of '$*' as the latter notation does not preserve spaces in file names.
Your comment about '$1' is confusing to me. You could be asking to read from the file whose name is in $1 on each iteration; that is done using:
while [ $# -gt 0 ]
do
...process $1...
shift
done
HTH!
If you define a function, it'll receive the argument as $1. Why is $1 more valuable to you than $FILE, though?
#!/bin/sh
process() {
echo "doing something with $1"
}
for i in "$#" # Note use of "$#" to not break on filenames with whitespace
do
process "$i"
done
while [ "$1" != "" ]; do
# Code here
# Next argument
shift
done
On your problem with tilde files ... those are temporary files created by your text editor. Delete them if you don't want them to be matched by your glob expression (wildcard). Otherwise, filter them in your script (not recommended).

Resources