BASH: How to remove all files except those named in a manifest? - bash

I have a manifest file which is just a list of newline separated filenames. How can I remove all files that are not named in the manifest from a folder?
I've tried to build a find ./ ! -name "filename" command dynamically:
command="find ./ ! -name \"MANIFEST\" "
for line in `cat MANIFEST`; do
command=${command}"! -name \"${line}\" "
done
command=${command} -exec echo {} \;
$command
But the files remain.
[Note:] I know this uses echo. I want to check what my command does before using it.
Solution:(thanks PixelBeat)
ls -1 > ALLFILES
sort MANIFEST MANIFEST ALLFILES | uniq -u | xargs rm
Without temp file:
ls -1 | sort MANIFEST MANIFEST - | uniq -u | xargs rm
Both Ignores whether the files are sorted/not.

For each file in current directory grep filename in MANIFEST file and rm file if not matched.
for file in *
do grep -q -F "$file" PATH_TO_YOUR_MANIFIST || rm "$file"
done

Using the "set difference" pattern from http://www.pixelbeat.org/cmdline.html#sets
(find ./ -type f -printf "%P\n"; cat MANIFEST MANIFEST; echo MANIFEST) |
sort | uniq -u | xargs -r rm
Note I list MANIFEST twice in case there are files listed there that are not actually present.
Also note the above supports files in subdirectories

figured it out:
ls -1 > ALLFILES
comm -3 MANIFEST ALLFILES | xargs rm

Just for fun, a Perl 1-liner... not really needed in this case but much more customizable/extensible than Bash if you want something fancier :)
$ ls
1 2 3 4 5 M
$ cat M
1
3
$ perl -e '{use File::Slurp; %M = map {chomp; $_ => 1} read_file("M"); $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f}; unlink "$f"||die "Can not unlink: $!\n" };}'
$ ls
1 3 M
The above can be even shorter if you pass the manifest on STDIN
perl -e '{%M = map {chomp; $_ => 1} <>; $M{M}=1; \
foreach $f (glob("*")) {next if $M{$f};unlink "$f"||die "Can not unlink: $!\n" };}' M

Assumes that MANIFEST is already sorted:
find -type f -printf %P\\n | sort | comm -3 MANIFEST - | xargs rm

Related

execute an if statement on every folder

I have for example 3 files (it could 1 or it could be 30) like this :
name_date1.tgz
name_date2.tgz
name_date3.tgz
When extracted it will look like :
name_date1/data/info/
name_date2/data/info/
name_date3/data/info/
Here how it looks inside each folder:
name_date1/data/info/
you.log
you.log.1.gz
you.log.2.gz
you.log.3.gz
name_date2/data/info/
you.log
name_date3/data/info/
you.log
you.log.1.gz
you.log.2.gz
What I want to do is concatenate all you file from each folder and concatenate one more time all the concatenated one to one single file.
1st step: extract all the folder
for a in *.tgz
do
a_dir=${a%.tgz}
mkdir $a_dir 2>/dev/null
tar -xvzf $a -C $a_dir >/dev/null
done
2nd step: executing an if statement on each folder available and cat everything
myarray=(`find */data/info/ -maxdepth 1 -name "you.log.*.gz"`)
ls -d */ | xargs -I {} bash -c "cd '{}' &&
if [ ${#myarray[#]} -gt 0 ];
then
find data/info -name "you.log.*.gz" -print0 | sort -z -rn -t. -k4 | xargs -0 zcat | cat -
data/info/you.log > youfull1.log
else
cat - data/info/you.log > youfull1.log
fi "
cat */youfull1.log > youfull.log
My issue when I put multiple name_date*.tgzit gives me this error:
gzip: stdin: unexpected end of file
With the error, I still have all my files concatenated, but why error message ?
But when I put only one .tgz file then I don't have any issue regardless the number you file.
any suggestion please ?
Try something simpler. No need for myarray. Pass files one at a time as they are inputted and decide what to do with them one at a time. Try:
find */data/info -type f -maxdepth 1 -name "you.log*" -print0 |
sort -z |
xargs -0 -n1 bash -c '
if [[ "${1##*.}" == "gz" ]]; then
zcat "$1";
else
cat "$1";
fi
' --
If you have to iterate over directories, don't use ls, still use find.
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
while IFS= read -r -d '' dir; do
cat "$dir"/data/info/you.log
find "$dir"/data/info -type f -maxdepth 1 -name 'you.log.*.gz' -print0 |
sort -z -t'.' -n -k3 |
xargs -r -0 zcat
done
or (if you have to) with xargs, which should give you the idea how it's used:
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
xargs -0 -n1 bash -c '
cat "$1"/data/info/you.log
find "$1"/data/info -type f -maxdepth 1 -name "you.log.*.gz" -print0 |
sort -z -t"." -n -k3 |
xargs -r -0 zcat
' --
Use -t option with xargs to see what it's doing.

Rename files to unique names and move them into a single destination directory

i have 100s of directories with same filename of content.html along with other files.
I am trying to copy all these content.html files under 1 directory, but since they have same name, it overwrites each other
so how can i rename and move all these under 1 directory
Eg:
./0BD3D9D2-F8B1-4472-95C2-13319650A45C:
card.png content.html note.xhtml quickLook.png snippet.txt
./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0:
card.png content.html note.xhtml quickLook.png related snippet.txt
./1A33F29E-3938-4C2F-BA99-6B98FD045742:
card.png content.html note.xhtml quickLook.png snippet.txt
command i tried:
rename content.html to content
find . -type f | grep content.html | while read f; do mv $f ${f/.html/}; done
append number to filename "content" to make it unique
find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
MacBook-Pro$ find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
mv ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content1.html
mv ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content1.html
mv ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content1.html
once above step is successful, i should be able do this to achieve my desired output:
find . -type f | grep content | while read f; do mv $f ../; done
however, i am sure i can do this in 1 step command and also my step 2 is not working (incrementing i)
any idea why step2 is not working??
bash script:
#!/bin/bash
find . -type f -name content.html | while IFS= read -r f; do
name=$(basename $f)
((++i))
mv "$f" "for_content/${name%.*}$i.html"
done
replace for_content with your destination folder name
Suppose in your base directory, you create a folder named final for storing
content.html files, then do something like below
find . -path ./final -prune -o -name "content.html" -print0 |
while read -r -d '' name
do
mv "$name" "./final/content$(mktemp -u XXXX).html"
# mktemp with -u option just creates random characters, or it is just a dry run
done
At the end you'll get all the content.html files under ./final folder in the format contentXXXX.html where XXXX are random characters.
Note:-path ./final -prune -o in find prevents it from descending to our results folder.
The inode of the of the files should be unique and so you could use the following:
find $(pwd) -name "content.html" -printf %f" "%i" "%p"\n" | awk '{ system("mv "$3" <directorytomoveto>"$2$1) }'
I'd use something like this:
find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
You can replace ./output-dir/ with your destination directory
Example:
[root#sl7-o2 test]# ls -R
.:
1 2 3 output-dir
./1:
test
./2:
test
./3:
test
./output-dir:
[root#sl7-o2 test]# find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
[root#sl7-o2 test]# ls ./output-dir/
content_000.txt content_001.txt content_002.txt
You can use shopt -s globstar to grab all content.html files recursively and then use a loop to rename them:
#!/bin/bash
set -o globstar
counter=0
dest_dir=/path/to/destination
for f in **/content.html; do # pick up all content.html files
[[ -f "$f" ]] || continue # skip if not a regular file
mv "$f" "$dest_dir/content_$((++counter).html"
done

Bash : Find and Remove duplicate files from different folders

I have two folders with some common files, I want to delete duplicate files from xyz folder.
folder1:
/abc/file1.csv
/abc/file2.csv
/abc/file3.csv
/abc/file4.csv
folder2:
/xyz/file1.csv
/xyz/file5.csv
I want to compare both folders and remove duplicate from /xyz folder. Output should be: file5.csv
For now I am using :
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | -exec rm {} \;
But it failing with reason : if -exec is not a typo you can run the following command to lookup the package that contains the binary:
command-not-found -exec
-bash: -exec: command not found
-exec is an option to find, you've already exited the command find when you started the pipes.
Try xargs instead, it take all the data from stdin and appends to the program.
UNTESTED
find "/xyz" "/abc" "/abc" -printf '%P\n' | sort | uniq -u | xargs rm
Find every file in 234 and 123 directory get filename by -printf, sort them, uniq -d give list of duplications, give back path by sed, using 123 directory to delete the duplications from, and pass files to xargs rm
Command:
find ./234 ./123 -type f -printf '%P\n' | sort | uniq -d | sed 's/^/.\/123\//g' | xargs rm
sed don't needed if you are in the ./123 directory and using full path for folders in find.
Another approach: just find the files in abc and attempt to remove them from xyz:
UNTESTED
find /abc -type f -printf 'rm -f /xyz/%P' | sh
Remove Duplicate Files From Particular Directory
FileList=$(ls)
for D1 in $FileList ;do
if [[ -f $D1 ]]; then
for D2 in $FileList ;do
if [[ -f $D2 ]]; then
if [[ $D1 == $D2 ]]; then
: 'Skip Orignal File'
else
if [[ $(md5sum $D1 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) == $(md5sum $D2 | cut -d'=' -f 2 | cut -d ' ' -f 1 ) ]]; then
echo "Duplicate File Found : $D2"
rm -rf $D2
fi #Detect Duplicate Using MD5
fi #Skip Orginal File
fi #D2 File available Then Next
done
fi #D1 File available Then Next
done

How to segregate files based on recursive grep

I have a directory, sub-directories each containing some text files.
main-dir
|
sub-dir1
| file1 "foo"
|
sub-dir2
| file2 "bar"
|
sub-dir3
| file3 "foo"
These files file1, file2 contain same text. I want to segregate these sub-directories based on content of files. I would like to group sub-dir1 and sub-dir3 as files in these sub-dirs have same content. In this example, move sub-dir1 and sub-dir3 to another directory.
using grep in recursive mode lists out all subdirectories matching file content. How can I make use that of output.
Your solution could be simplified to this:
for dir in *; do
if grep "foo" "$dir/file1" >/dev/null; then
cp -rf "$dir" "$HOME_PATH/newdir/"
fi
done
but will work only when all directories actually contain a file file1.
Something like this:
grep -rl "foo" * | sed -r 's|(.*)/.*|\1|' | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
grep -rl "foo" * | while read f; do
dirname "$f"
done | sort -u | while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
or like this:
find . -type f -exec grep -l "foo" {} \; | xargs -I {} dirname {} | sort -u |
while read dir; do
cp -rf "$dir" "$HOME_PATH/newdir/"
done
might be better.
I managed to write this script which solves my question.
PWD=`$pwd`
FILES=$PWD/*
for f in $FILES
do
str=$(cat $f/file1)
if [ "$str" == "foo" ];
then
cp -rf $f $HOME_PATH/newdir/
fi
done

Escape single quotes in long directory name then pass it to xargs [Bash 3.2.48]

In my directory I have subfolders, and I want to list all directories like this:
- ./subfolder
- ./subfolder/subsubfolder1
- ./subfolder/subsubfolder2
- ./subfolder/subsubfolder2/subsubsubfolder
I want to list this structure:
./fol'der/subfol'der/
Here is my code:
echo -n "" > myfile
find . -type d -print0 | xargs -0 -I# | cat | grep -v -P "^.$" | sed -e "s/'/\\\'/g" | xargs -I# echo "- #" >> myfile
The desired output would be like this:
- ./fol'der
- ./fol'der/subfol'der
But the output is:
- ./fol'der
- #
It seems like sed fails at the second occurrence of the single quote (') character, or something. I have no idea. Can you help me? (I'm on OS X 10.7.4.)
I've been grep-ing and sed-ing like an idiot. Thought about a little bit, and I came up with a much more simple solution, a for loop.
echo -n "" > myfile
for folder in $(find . -type d)
do
if [[ $folder != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
My previous solution wasn't working with names containing whitespaces, so the correct one is:
echo -n "" > myfile
find . -type d -print0 | while read -d $'\0' folder
do
if [[ "${folder}" != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
With GNU Parallel you can do:
find . -type d -print0 | parallel -q -0 echo '- '{}
Your output will be screwed up if you have any dirs with \n in its name. If you do not have any dirs with \n in the name you can do:
find . -type d -print | parallel -q echo '- '{}
The -q is only needed if you really need two spaces after '-'.
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
This is on Linux, but it should work on OS X:
find . -type d -print0 | xargs -0 -I # echo '- #'
It works for me regardless of whether the last set of quotes are single or double.
Output:
- ./fol'der
- ./fol'der/subfol'der

Resources