Find duplicated nested directories - bash

I have a large directory tree with this nested directories duplicates (but not all):
data/home/home/
data/banners/banners/
resources/users/documents/documents/
How can I merge only duplicated directories with this actions:
copy (without replace) data/home/home/ contents to data/home/
delete data/home/home
My current code:
#/bin/bash
for folder in $(find httpdocs -type d); do
n=$(echo $folder | tr "/" "\n" | wc -l)
nuniq=$(echo $folder | tr "/" "\n" | sort | uniq | wc -l)
[ $n -eq $nuniq ] || echo "Duplicated folder $folder"
done
But have a problem, because data/home/es/home is a valid folder, but detected as duplicated.
Thanks.

you can use uniq command as below;
#/bin/bash
for folder in $(find httpdocs -type d); do
nuniq=$(echo $folder | tr "/" "\n" | uniq -d | wc -l)
if [ "$nuniq" -gt "0" ]
then
echo "Duplicated folder $folder"
fi
done
man uniq;
-d, --repeated
only print duplicate lines
you can try the following script for copy and delete folder. I can not test this, so take a backup your httpdocs folder before run this.
#/bin/bash
for folder in $(find httpdocs -type d); do
nuniq=$(echo $folder | tr "/" "\n" | uniq -d | wc -l)
if [ "$nuniq" -gt "0" ]
then
dest=$(echo $folder | tr '/' '\n' | awk '!a[$0]++' | tr '\n' '/')
mv -i $folder/* $dest
rmdir $folder
fi
done
For example;
user#host $ echo "data/home/es/home" | tr "/" "\n"
data
home
es
home
user#host $ echo "data/home/es/home" | tr "/" "\n" | uniq -d | wc -l
0
user#host $ echo "data/home/home" | tr "/" "\n"
data
home
home
user#host $ echo "data/home/home" | tr "/" "\n" | uniq -d
home
user#host $ echo "data/home/home" | tr "/" "\n" | uniq -d | wc -l
1

Related

I want my script to echo "$1" into a file literally

This is part of my script
#!/bin/bash
echo "ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//';" >> script2.sh
This echos everything nicely into my script except $1 and $2. Instead of that it outputs the input of those variables but i want it to literally read "$1" and "$2". Help?
Escape it:
echo "ls /SomeFolder | grep \$1 | xargs cat | grep something | grep .txt | awk '{print \$2}' | sed 's/;\$//';" >> script2.sh
Quote it:
echo "ls /SomeFolder | grep "'$'"1 | xargs cat | grep something | grep .txt | awk '{print "'$'"2}' | sed 's/;"'$'"//';" >> script2.sh
or like this:
echo 'ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '\''{print $2}'\'' | sed '\''s/;$//'\'';' >> script2.sh
Use quoted here document:
cat << 'EOF' >> script2.sh
ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//';
EOF
Basically you want to prevent expansion, ie. take the string literaly. You may want to read bashfaq quotes
First, you'd never write this (see https://mywiki.wooledge.org/ParsingLs, http://porkmail.org/era/unix/award.html and you don't need greps+seds+pipes when you're using awk):
ls /SomeFolder | grep $1 | xargs cat | grep something | grep .txt | awk '{print $2}' | sed 's/;$//'`
you'd write this instead:
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -exec \
awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}' {} +
or if you prefer using print | xargs instead of -exec:
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -print0 |
xargs -0 awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}'
and now to append that script to a file would be:
cat <<'EOF' >> script2.sh
find /SomeFolder -mindepth 1 -maxdepth 1 -type f -name "*$1*" -print0 |
xargs -0 awk '/something/ && /.txt/{sub(/;$/,"",$2); print $2}'
EOF
Btw, if you want the . in .txt to be treated literally instead of as a regexp metachar meaning "any character" then you should be using \.txt instead of .txt.

Getting file size in bytes with bash (Ubuntu)

Hi, i'm looking for a way to output a filesize in bytes. Whatever i try i will get either 96 or 96k instead of 96000.
if [[ -d $1 ]]; then
largestN=$(find $1 -depth -type f | tr '\n' '\0' | du -s --files0-from=- | sort | tail -n 1 | awk '{print $2}')
largestS=$(find $1 -depth -type f | tr '\n' '\0' | du -h --files0-from=- | sort | tail -n 1 | awk '{print $1}')
echo "The largest file is $largestN which is $largestS bytes."
else
echo "$1 is not a directory..."
fi
This prints "The largest file [file] is 96k bytes"
there is -b option for this
$ du -b ...
Looks like you're trying to find the largest file in a given directory. It's more efficient (and shorter) to let find do the heavy lifting for you:
find $1 -type f -printf '%s %p\n' | sort -n | tail -n1
Here, %s expands to the size in bytes of the file, and %p expands to the name of the file.

unix home directories without entries in /etc/passwd

I am able to get both listings ( /etc/passwd and /home ) but how to script something like read line of /etc/passwd, parse home directory, then look for that in /home . If it doesn't exist, throw an error, if it does exist, move along.
/etc/passwd home dir listing for users
cut -d":" -f6 /etc/passwd | grep home | sort
user listing from /home
ls -1 /home | (while read line; do echo "/home/"$line; done)
Maybe right out output from first command to a file, then read each line into a find command and...or, test with
if [ -d "$DIRECTORY" ]; then
echo "directory found for user that doesn't exist"
fi
Now how to put it all together...
EDIT: isedev had exactly what I needed. I may have mis-worded my original message...we have been cleaning up users, but not cleaning up their /home directory. So I want to know what /home directories still exist that don't have /etc/passwd entries.
this is what worked to a T
for name in /home/*; do
if [ -d "$name" ]; then
cut -d':' -f6 /etc/passwd | egrep -q "^$name$"
if [ $? -ne 0 ]; then
echo "directory $name does not correspond to a valid user"
fi
fi
done
from now on, we will be running
userdel -r login
This will report all home directories from /etc/passwd that should be in /home but aren't:
cut -d":" -f6 /etc/passwd | grep home | sort |
while read dir; do [ -e "$dir" ] || echo Missing $dir; done
And this one reports all that don't exist:
cut -d":" -f6 /etc/passwd | while read dir; do
[ -e "$dir" ] || echo Missing $dir
done
as 1st approximation:
perl -F: -lane 'next if m/^#/;print "$F[5] for user $F[0] missing\n" unless(-d $F[5])' /etc/passwd
if you want find the differences between the /etc/passwd and the /home
comm <(find /home -type d -maxdepth 1 -mindepth 1 -print|sort) <(grep -v '^#' /etc/passwd | cut -d: -f6| grep '/home' | sort)
in an narrow form
comm <(
find /home -type d -maxdepth 1 -mindepth 1 -print |sort
) <(
grep -v '^#' /etc/passwd |cut -d: -f6 |grep /home |sort
)
if you will use
comm ... (without args as above) will show 3 colums 1.) only in /home 2.)only in /etc/passwd 3.) common
comm -23 .... - will show directories what are only in the /home (and not in the /etc/passwd)
comm -13 .... - will show dirs what are only in the /etc/passwd and not in the /home
comm -12 .... - will show correct directories (exists in the /etc/passwd and the /home too)
I'm not sure with the -{max|min}depth on the AIX..
So, assuming you want to know if there are directories under /home which do not correspond to existing users:
for name in /home/*; do
if [ -d "$name" ]; then
cut -d':' -f6 /etc/passwd | egrep -q "^$name$"
if [ $? -ne 0 ]; then
echo "directory $name does not correspond to a valid user"
fi
fi
done
Then again, this assumes you are not using a name service such as LDAP or NIS, in which case, change the line starting with cut to:
getent passwd | cut -d':' -f6 | egrep -q "^$name$"

Find TXT files and show Total Count of records of each file and Size of each file

I need to find row Count and size of each TXT files.
It needs to search all the directories and just show result as :
FileName|Cnt|Size
ABC.TXT|230|23MB
Here is some code:
v_DIR=$1
echo "the directory to cd is "$1
x=`ls -l $0 | awk '{print $9 "|" $5}'`
y=`awk 'END {print NR}' $0`
echo $x '|' $y
Try something like
find -type f -name '*.txt' -exec bash -c 'lines=$(wc -l "$0" | cut -d " " -f1); size=$(du -h "$0" | cut -f1); echo "$0|$lines|$size"' {} \;

Bash scripting: Deleting the oldest directory

I want to look for the oldest directory (inside a directory), and delete it. I am using the following:
rm -R $(ls -1t | tail -1)
ls -1t | tail -1 does indeed gives me the oldest directory, the the problem is that it is not deleting the directory, and that it also list files.
How could I please fix that?
rm -R "$(find . -maxdepth 1 -type d -printf '%T#\t%p\n' | sort -r | tail -n 1 | sed 's/[0-9]*\.[0-9]*\t//')"
This works also with directory whose name contains spaces, tabs or starts with a "-".
This is not pretty but it works:
rm -R $(ls -lt | grep '^d' | tail -1 | tr " " "\n" | tail -1)
rm -R $(ls -tl | grep '^d' | tail -1 | cut -d' ' -f8)
find directory_name -type d -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -nr | tail -1 | cut -d" " -f2 | xargs -n1 echo rm -Rf
You should remove the echo before the rm if it produces the right results

Resources