How to segregate files based on recursive grep - bash

I have a directory, sub-directories each containing some text files.
main-dir
|
sub-dir1
| file1 "foo"
|
sub-dir2
| file2 "bar"
|
sub-dir3
| file3 "foo"
These files file1, file2 contain same text. I want to segregate these sub-directories based on content of files. I would like to group sub-dir1 and sub-dir3 as files in these sub-dirs have same content. In this example, move sub-dir1 and sub-dir3 to another directory.
using grep in recursive mode lists out all subdirectories matching file content. How can I make use that of output.

Your solution could be simplified to this:
for dir in *; do
if grep "foo" "$dir/file1" >/dev/null; then
cp -rf "$dir" "$HOME_PATH/newdir/"
fi
done
but will work only when all directories actually contain a file file1.
Something like this:
grep -rl "foo" * | sed -r 's|(.*)/.*|\1|' | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
grep -rl "foo" * | while read f; do
dirname "$f"
done | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
find . -type f -exec grep -l "foo" {} \; | xargs -I {} dirname {} | sort -u |
while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
might be better.

I managed to write this script which solves my question.
PWD=`$pwd`
FILES=$PWD/*
for f in $FILES
do
str=$(cat $f/file1)
if [ "$str" == "foo" ];
then
cp -rf $f $HOME_PATH/newdir/
fi
done

Related

moving files to their respective folders using bash scripting

I have files in this format:
2022-03-5344-REQUEST.jpg
2022-03-5344-IMAGE.jpg
2022-03-5344-00imgtest.jpg
2022-03-5344-anotherone.JPG
2022-03-5343-kdijffj.JPG
2022-03-5343-zslkjfs.jpg
2022-03-5343-myimage-2010.jpg
2022-03-5343-anotherone.png
2022-03-5342-ebee5654.jpeg
2022-03-5342-dec.jpg
2022-03-5341-att.jpg
2022-03-5341-timephoto_december.jpeg
....
about 13k images like these.
I want to create folders like:
2022-03-5344/
2022-03-5343/
2022-03-5342/
2022-03-5341/
....
I started manually moving them like:
mkdir name
mv name-* name/
But of course I'm not gonna repeat this process for 13k files.
So I want to do this using bash scripting, and since I am new to bash, and I am working on a production environment, I want to play it safe, but it doesn't give me my results. This is what I did so far:
#!/bin/bash
name = $1
mkdir "$name"
mv "${name}-*" $name/
and all I can do is: ./move.sh name for every folder, I didn't know how to automate this using loops.
With bash and a regex. I assume that the files are all in the current directory.
for name in *; do
if [[ "$name" =~ (^....-..-....)- ]]; then
dir="${BASH_REMATCH[1]}"; # dir contains 2022-03-5344, e.g.
echo mkdir -p "$dir" || exit 1;
echo mv -v "$name" "$dir";
fi;
done
If output looks okay, remove both echo.
Try this
xargs -i sh -c 'mkdir -p {}; mv {}-* {}' < <(ls *-*-*-*|awk -F- -vOFS=- '{print $1,$2,$3}'|uniq)
Or:
find . -maxdepth 1 -type f -name "*-*-*-*" | \
awk -F- -vOFS=- '{print $1,$2,$3}' | \
sort -u | \
xargs -i sh -c 'mkdir -p {}; mv {}-* {}'
Or find with regex:
find . -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{4}-[0-9]{2}-[0-9]{4}.*"
You could use awk
$ cat awk.script
/^[[:digit:]-]/ && ! a[$1]++ {
dir=$1
} /^[[:digit:]-]/ {
system("sudo mkdir " dir )
system("sudo mv " $0" "dir"/"$0)
}
To call the script and use for your purposes;
$ awk -F"-([0-9]+)?[[:alpha:]]+.*" -f awk.script <(ls)
You will see some errors such as;
mkdir: cannot create directory ‘2022-03-5341’: File exists
after the initial dir has been created, you can safely ignore these as the dir now exist.
The content of each directory will now have the relevant files
$ ls 2022-03-5344
2022-03-5344-00imgtest.jpg 2022-03-5344-IMAGE.jpg 2022-03-5344-REQUEST.jpg 2022-03-5344-anotherone.JPG

Rename files to unique names and move them into a single destination directory

i have 100s of directories with same filename of content.html along with other files.
I am trying to copy all these content.html files under 1 directory, but since they have same name, it overwrites each other
so how can i rename and move all these under 1 directory
Eg:
./0BD3D9D2-F8B1-4472-95C2-13319650A45C:
card.png content.html note.xhtml quickLook.png snippet.txt
./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0:
card.png content.html note.xhtml quickLook.png related snippet.txt
./1A33F29E-3938-4C2F-BA99-6B98FD045742:
card.png content.html note.xhtml quickLook.png snippet.txt
command i tried:
rename content.html to content
find . -type f | grep content.html | while read f; do mv $f ${f/.html/}; done
append number to filename "content" to make it unique
find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
MacBook-Pro$ find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
mv ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content1.html
mv ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content1.html
mv ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content1.html
once above step is successful, i should be able do this to achieve my desired output:
find . -type f | grep content | while read f; do mv $f ../; done
however, i am sure i can do this in 1 step command and also my step 2 is not working (incrementing i)
any idea why step2 is not working??
bash script:
#!/bin/bash
find . -type f -name content.html | while IFS= read -r f; do
name=$(basename $f)
((++i))
mv "$f" "for_content/${name%.*}$i.html"
done
replace for_content with your destination folder name
Suppose in your base directory, you create a folder named final for storing
content.html files, then do something like below
find . -path ./final -prune -o -name "content.html" -print0 |
while read -r -d '' name
do
mv "$name" "./final/content$(mktemp -u XXXX).html"
# mktemp with -u option just creates random characters, or it is just a dry run
done
At the end you'll get all the content.html files under ./final folder in the format contentXXXX.html where XXXX are random characters.
Note:-path ./final -prune -o in find prevents it from descending to our results folder.
The inode of the of the files should be unique and so you could use the following:
find $(pwd) -name "content.html" -printf %f" "%i" "%p"\n" | awk '{ system("mv "$3" <directorytomoveto>"$2$1) }'
I'd use something like this:
find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
You can replace ./output-dir/ with your destination directory
Example:
[root#sl7-o2 test]# ls -R
.:
1 2 3 output-dir
./1:
test
./2:
test
./3:
test
./output-dir:
[root#sl7-o2 test]# find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
[root#sl7-o2 test]# ls ./output-dir/
content_000.txt content_001.txt content_002.txt
You can use shopt -s globstar to grab all content.html files recursively and then use a loop to rename them:
#!/bin/bash
set -o globstar
counter=0
dest_dir=/path/to/destination
for f in **/content.html; do # pick up all content.html files
[[ -f "$f" ]] || continue # skip if not a regular file
mv "$f" "$dest_dir/content_$((++counter).html"
done

Bash : Find and Remove duplicate files from different folders

I have two folders with some common files, I want to delete duplicate files from xyz folder.
folder1:
/abc/file1.csv
/abc/file2.csv
/abc/file3.csv
/abc/file4.csv
folder2:
/xyz/file1.csv
/xyz/file5.csv
I want to compare both folders and remove duplicate from /xyz folder. Output should be: file5.csv
For now I am using :
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | -exec rm {} \;
But it failing with reason : if -exec is not a typo you can run the following command to lookup the package that contains the binary:
command-not-found -exec
-bash: -exec: command not found
-exec is an option to find, you've already exited the command find when you started the pipes.
Try xargs instead, it take all the data from stdin and appends to the program.
UNTESTED
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | xargs rm
Find every file in 234 and 123 directory get filename by -printf, sort them, uniq -d give list of duplications, give back path by sed, using 123 directory to delete the duplications from, and pass files to xargs rm
Command:
find ./234 ./123 -type f -printf '%P\n' | sort | uniq -d | sed 's/^/.\/123\//g' | xargs rm
sed don't needed if you are in the ./123 directory and using full path for folders in find.
Another approach: just find the files in abc and attempt to remove them from xyz:
UNTESTED
find /abc -type f -printf 'rm -f /xyz/%P' | sh
Remove Duplicate Files From Particular Directory
FileList=$(ls)
for D1 in $FileList ;do
if [[ -f $D1 ]]; then
for D2 in $FileList ;do
if [[ -f $D2 ]]; then
if [[ $D1 == $D2 ]]; then
: 'Skip Orignal File'
else
if [[ $(md5sum $D1 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) == $(md5sum $D2 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) ]]; then
echo "Duplicate File Found : $D2"
rm -rf $D2
fi #Detect Duplicate Using MD5
fi #Skip Orginal File
fi #D2 File available Then Next
done
fi #D1 File available Then Next
done

Move files from directories listed in file

I have a directory structure like the following toy example
DirectoryTo
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I'm trying to copy all the files from DirectoryFrom to DirectoryTo, keeping the newer file if there are duplicates.
DirectoryTo
-File1.txt
-File2.txt
-File3.txt
-File4.txt
-File5.txt
-File6.txt
-File7.txt
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I've created a text file with a list of all the subdirectories. This list is in the order such that the NEWEST files will be listed first:
Filelist.txt
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
So what I'd like to do is loop through each directory in Filelist.txt, copy the files, and NOT replace if the file already exists.
I'd like to do this at the command line, in a shell script, or possibly in Python. I'm pretty new to Python, but have a little experience with the command line. However, I've never done something this complicated.
In reality, I have ~60 folders, each with 50-200 files in them, to give you a feel for how many I have. Also, each file is ~75MB.
I've done something similar in R before, but it's slow and not really meant for this. But here's what I've tried for a shell script, edited to fit this toy example:
#!/bin/bash
for line in Filelist.txt
do
cp -n line C:/DirectoryTo/
done
If you have only one one directory level in your DirectoryFrom then you can use:
cp -n DirectoryFrom/*/* DirectoryTo
explanation : copy every file which exist in subdirectories of DirectoryFrom to DirectoryTo if it doesn't exist
n flag is for not overwriting files if they already exist.
cp will also ignore directories if they exist in subdirectories of DirectoryTo
# Create test environnement :
mkdir C:/DirectoryTo
mkdir C:/DirectoryFrom
cd C:/DirectoryFrom
mkdir Dir1 Dir2 Dir3
(
cat << EOF
Dir1/File1.txt
Dir1/File2.txt
Dir1/File3.txt
Dir2/File4.txt
Dir2/File5.txt
Dir2/File6.txt
Dir3/File1.txt
Dir3/File5.txt
Dir3/File7.txt
EOF
)| while read f
do
echo "$f : `date`"
echo "$f : `date`" > $f
sleep 1
done
# create Filelist.txt file :
(
cat << EOF
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
EOF
) > Filelist.txt
# Generate the liste of all files :
cd C:/DirectoryFrom
cat Filelist.txt | while read f; do ls -1 $f; done | sort -u > filenames.txt
cat filenames.txt
# liste of all files path, sorted by time order :
cd C:/DirectoryFrom
ls -1tr */* > all_filespath_sorted.txt
cat all_filespath_sorted.txt
# selected files to be copied :
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done
# copy of selected files:
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done
# verifying :
cd C:/DirectoryTo
ls -ltr
# or
ls -1 | while read f; do echo -e "\n$f\n-------"; cat $f; done
#------------------------------------------------
# Other solution for a limited number of files :
#------------------------------------------------
# To list files by order :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr
# To copy files, the newer will replace the older :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done

BASH: How to remove all files except those named in a manifest?

I have a manifest file which is just a list of newline separated filenames. How can I remove all files that are not named in the manifest from a folder?
I've tried to build a find ./ ! -name "filename" command dynamically:
command="find ./ ! -name \"MANIFEST\" "
for line in `cat MANIFEST`; do
command=${command}"! -name \"${line}\" "
done
command=${command} -exec echo {} \;
$command
But the files remain.
[Note:] I know this uses echo. I want to check what my command does before using it.
Solution:(thanks PixelBeat)
ls -1 > ALLFILES
sort MANIFEST MANIFEST ALLFILES | uniq -u | xargs rm
Without temp file:
ls -1 | sort MANIFEST MANIFEST - | uniq -u | xargs rm
Both Ignores whether the files are sorted/not.
For each file in current directory grep filename in MANIFEST file and rm file if not matched.
for file in *
do grep -q -F "$file" PATH_TO_YOUR_MANIFIST || rm "$file"
done
Using the "set difference" pattern from http://www.pixelbeat.org/cmdline.html#sets
(find ./ -type f -printf "%P\n"; cat MANIFEST MANIFEST; echo MANIFEST) |
sort | uniq -u | xargs -r rm
Note I list MANIFEST twice in case there are files listed there that are not actually present.
Also note the above supports files in subdirectories
figured it out:
ls -1 > ALLFILES
comm -3 MANIFEST ALLFILES | xargs rm
Just for fun, a Perl 1-liner... not really needed in this case but much more customizable/extensible than Bash if you want something fancier :)
$ ls
1 2 3 4 5 M
$ cat M
1
3
$ perl -e '{use File::Slurp; %M = map {chomp; $_ => 1} read_file("M"); $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f}; unlink "$f"||die "Can not unlink: $!\n" };}'
$ ls
1 3 M
The above can be even shorter if you pass the manifest on STDIN
perl -e '{%M = map {chomp; $_ => 1} <>; $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f};unlink "$f"||die "Can not unlink: $!\n" };}' M
Assumes that MANIFEST is already sorted:
find -type f -printf %P\\n | sort | comm -3 MANIFEST - | xargs rm

Resources