mac: compare two folders, extract non-identical photos only - macos

I have been looking in S.O. for ways to compare two folders to extract only the "new" or the "non-identical" photos. I have two large folders and I only want the new files, I need to identify them. Is there a way to do it? or an application to do it?
situation: with the iOS8, I gave it a try, and then I revert back to iOS711. but my most recent backup (the one I did before upgrading to iOS8) was corrupt, due to the downgrade I did.
Now, I have a copy of my iOS photos from the recent backup (I had to use backup extractor to extract the photos from), but I also have the photos from a month old backup that I restored into my phone after I gave up on the corrupt and recent backup.
I have now two sets of photo libraries. one with the up to date photos (which cannot be restored in the iPhone through iTunes), and one with the month old photo library (which was restored to my iPhone through iTunes easily).
I extracted photos from both backups, and I ended up with two directories. I only need the new photos (the difference between the two folders).
I hope it's now clearer, and more detailed.
Thanks a lot!

Can you install duff via homebrew or macports? If so, the following should give you a list of files that occur only once:
$ duff -r -f '' folder1 folder2 | sort > duplicate_files.txt
$ find folder1 folder2 -print | sort > all_files.txt
$ diff all_files.txt duplicate_files.txt | grep '^< ' | cut -c 3-

If you don't want to install additional packages, this would also work:
sort <(ls dir1) <(ls dir2) | uniq -u
That'll sort the list of files in both directories and then return only the items that appear only once. If you want to also return the locations of those files, you could then search for them.
This compares files by name, which might not be desirable. If you want to compare them by something else (e.g. size), then the answer gets a little more complicated.

Related

Obtain a list of files in Terminal in same order as in Finder

I have a folder of about 1,000 image files. I need to create a list of them. I saw somewhere that if I go into Terminal and cd into the folder, all I have to do is type in
ls > list.csv
to generate a list.
The thing is, this list is not in the same order as the files I see in Finder. For example, in Finder, the first image is 16_left.jpg. However, the first image in the generated list.csv file is 10017_left.jpg, whilst 16_left.jpg is snuggled all the way down in between 15975_right.jpg and 16007_right.jpg.
I can see that in Finder, it sees the number 16 as being the smallest in the files and puts that top, whereas the list.csv file sorts the list not by the whole number itself but by each individual digit left-to-right.
How do I get list.csv to be in the same order as Finder?
Probably your finder shows to you files ordered by creation time from the newest to oldest. ls with no arguments list by name order.
ls -lt will list by creation order from newest to oldest
I posted this question in https://unix.stackexchange.com/
and got the answer here:
https://unix.stackexchange.com/questions/716914/obtain-a-list-of-files-in-terminal-in-same-order-as-in-finder?noredirect=1#comment1359014_716914
Basically the command to use is: print -rC1 -- *.jpg(NDn) > list.csv
Do upvote their (more fleshed-out) answer here as it worked like a charm!

diff between folders whilst ignoring filename changes

How can I use diff in terminal but ignore changes in file names?
Currently this is what i'm doing:
diff -wrN folder1 folder2 | grep '^>' | wc -l
How can I do git diff between two commit ids whilst:
ignoring file rename
only look at java files
ignore specific folder names e.g. folder 'a' and 'b'
perform the grep '^>' | wc -l
You seem unaware of the hardness of this problem, so I'd like to point out why this is so difficult.
Given two directories which are equal in the beginning and both contain, say, 1000 files. Now you rename, say, 500 files in one of the directories. Renamings can vary greatly. A file called foobar.txt originally can be named DSC-3457.orig.jpg afterwards. The diff command cannot really find it again without having any idea about what has been renamed into what.
Additionally, a file called x could be renamed to y, while a file called y could be renamed to x. In this case it even is questionable whether this should be regarded a mere renaming or if simply both files' contents have been exchanged.
This all means that in general you will have large problems to accomplish this. Standard tools will not do this out-of-the-box.
This said, I have two aspects I want to point out which might help you.
File Sizes
You can sort all files by their file sizes and then diff each pair of the two directories. This can work perfectly well if all changes you have are only renamings and if all files are of different size. If you have several files of the same size (maybe by pure chance or because they are all of the same format which has a fixed size), you are in trouble again and will have to compare each possible pair of the same-size group.
Git-Diff
You mentioned git-diff in the tags. git actually keeps a record in case a file is renamed. So if you intend to use git diff, you can rely to some degree on git's ability to detect renamings. This typically works if a file is removed and added with a new name in one single commit. If it gets added with a new name in one commit and then the older version is removed in another commit, this won't work properly. There is a lot more to learn about renames in git diff; see man git diff and search for rename in this case, there are about a dozen places this gets mentioned, so I won't try to summarize this here myself.
EDIT: You can use a command like git diff --find-renames --diff-filter=ACDMTUX (i. e. you let all kinds of changes pass the filter with the exception of renamings).

Extracting contents of many zipped folders into a single directory

Kind of easy question, but I can't find the answer. I want to extract the contents of multiple zipped folders into a single directory. I am using the bash console, which is the only tool available on the particular website I am using.
For example, I have two folders: a.zip (which contains a1.txt and a2.txt) and b.zip (which contains b1.txt and b2.txt). I want to get extract all four text files into a single directory.
I have tried
unzip \*.zip -d \newdirectory
But it creates two directories (a and b) with two text files in each.
I also tried concatenating the two zipped folders into one big folder and extracting it, but it still creates two directories, even when I specify a new directory.
I can't figure what I am doing wrong. Any help?
Thanks in advance!
Use the -j parameter to ignore any directory structure.
unzip -j -d /path/to/your/directory '*.zip*'

Grep Zip files in windows - Have a process that works, but could this be faster?

Have seen posts for zipgrep for Linux..
For example - grep -f on files in a zipped folder
rem zipgrep -s "pattern" TestZipFolder.zip
rem zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]
Using Google, did find: http://www.info-zip.org/mans/zipgrep.html and looking in their archives don't see zipgrep in there. It also seems the Info-Zip binaries/code has not been updated in quite a while. I suppose I could grab some of their source and compile..
Also, looked on the Cygwin site and see they are also toying with this as well..
Here is what I am using today.. Just wondering if I could make this faster?
D:\WORK\Scripts\unzip -c D:\Logs\ArchiveTemp\%computername%-04-07-2014-??-00-00-compressed.zip server.log.* | D:\WORK\Scripts\grep -i ">somestring<" >> somestring.txt
Couple issues with the code I have posted:
* Does not show which log file the string is in
* Does not show which zip file the string is in
While the zip file I posted works, it has a lot of room for improvement.
Not much headroom for optimization, but it is worth noting that different implementations of unzip vary in performance. For speed on Windows, decompress the zip file using 7-zip, or the cygwin unzipping utlity. (Obtain via setup -nqP unzip, or the setup utility).
After unzipping, fgrep the directory structure recursively using grep -r.
In summary:
1) copy the zip file to fooCopy.zip
2) unzip fooCopy.zip
3) fgrep -r "regular expression" fooCopy
Rationale, because the file is compressed, you will have to incrementally uncompress the pieces to grep them anyway. Doing it as one batch job is faster, and clearer for someone else to understand.

Finding and Removing Unused Files Through Command Line

My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!
Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.

Resources