Finding Multiple Files, Copying, and Renaming Sequentially

Finding Multiple Files, Copying, and Renaming Sequentially - bash

I have a few hundred thousand files in many, many subdirectories. I am trying to extract all of the relevant image files, using a regular expression like so:
find -E . -regex '.+\.ca/.+(\.gif|\.jpg|\.tif|\.jpeg|\.tiff|\.png|\.jp2|\.j2k|\.bmp|\.pict|\.wmf|\.emf|\.ico|\.xbm)'
This finds the files. However, I want to move them to a newdir and have them named like so:
1.png
2.jpg
3.ico
4.pict
5.png
And so forth. I haven't been able to find a way that (a) preserves the various extensions; (b) renames them as they come in. Many of the files will be duplicates and I will want to preserve that. Thanks so much for your help.

i=1
find ... | while read filename; do
newname=$i.${filename##*.}
mv "$filename" newdir/"$newname"
i=$((i+1))
done

Related

Given a text file with file names, how can I find files in subdirectories of the current directory?

I have a bunch of files with different names in different subdirectories. I created a txt file with those names but I cannot make find to work using the file. I have seen posts on problems creating the list, on not using find (do not understand the reason though). Suggestions? Is difficult for me to come up with an example because I do not know how to reproduce the directory structure.
The following are the names of the files (just in case there is a formatting problem)
AO-169
AO-170
AO-171
The best that I came up with is:
cat ExtendedList.txt | xargs -I {} find . -name {}
It obviously dies in the first directory that it finds.
I also tried
ta="AO-169 AO-170 AO-171"
find . -name $ta
but it complains find: AO-170: unknown primary or operator

If you are trying to ask "how can I find files with any of these names in subdirectories of the current directory", the answer to that would look something like
xargs printf -- '-o\0-name\0%s\0' <ExtendedList.txt |
xargs -r0 find . -false
The -false is just a cute way to let the list of actual predicates start with "... or".
If the list of names in ExtendedList.txt is large, this could fail if the second xargs decides to break it up between -o and -name.
The option -0 is not portable, but should work e.g. on Linux or wherever you have GNU xargs.
If you can guarantee that the list of strings in ExtendedList.txt does not contain any characters which are problematic to the shell (like single quotes), you could simply say
sed "s/.*/-o -name '&'/" ExtendedList.txt |
xargs -r find . -false

Move and rename files based on subfolders

I would appreciate any help, relatively new here
I have the following directory structure
Main_dir
|-Barcode_subdirname_01\(many further subfolders)\filename.pdf
|-Barcode_subdirname_02\(many further subfolders)\filename.csv
There are 1000s of files within many subfolders
The first level sub directories have the barcode associated to all files within. eg 123456_dirname
I want to copy all files within all subfoders to the main_dir and
rename the files subdirname_barcode_filename.extension (based only on the first subdirectory name and barcode)
I've been attempting to write a bash script to do this from the main_dir but have hit the limit of my coding ability (i'm open to any other way that'll work).
firstly identifying the first level sub folders
find -maxdepth 1 -type d |
then cut out the first 2 parts deliminated by the underscores
cut -d\_ -f1 > barcode
then find the files within the subfolders, rename and move
find -type f -print0 |
while IFS= read -r filenames; do
newname="${barcode/sudirname/filename\/}"
mv "filename" "main_dir"/"newname"
done
I can't get it to work and may be headed in the wrong direction.

You can use rename with sed like substitute conventions, for example
$ rename 's~([^_]+)_([^_]+)_.*/([^/.]+\..*)~$1_$2_$3~' barcode_subdir_01/a/b/c/file2.csv
will rename file to
barcode_subdir_file2.csv
I used ~ instead of the more common / separator to make it more clear.
You can test the script with -n option to show the renamed files without actually doing the action.

Merging Multiple .mp3 files

On mac/linux there is a command to merge mp3 files together which is
cat file1.mp3 file2.mp3 > newfile.mp3
I was wondering if there is a simpler way or command to select multiple mp3's in a folder and output it as a single file?

The find command would work. In this example I produce a sorted list of *.mp3 files in the current directory, cat the file and append it to the output file called out
find . -maxdepth 1 -type f -name '*.mp3' -print0 |
sort -z |
xargs -0 cat -- >>out
I should warn you though. If your mp3 files have id3 headers in them then simply appending the files is not a good way to go because the headers are going to wind up littered in the file. There are some tools that manage this much better. http://mp3wrap.sourceforge.net/ for example.

simply linking files together won't work. Don't forget modern Mp3 files have metadata in the head. Even if you don't care about the player name, album name etc, you should at least make the "end of file" mark correct.
Better use some tools like http://mulholland.xyz/dev/mp3cat/.

You can use mp3cat by Darren Mulholland available at https://darrenmulholland.com/dev/mp3cat.html
Source is available at https://github.com/dmulholland/mp3cat

Find files in current directory, list differences from list within script

I am attempting to find differences for a directory and a list of files located in the bash script, for portability.
For example, search a directory with phpBB installed. Compare recursive directory listing to list of core installation files (excluding themes, uploads, etc). Display additional and missing files.
Thus far, I have attempted using diff, comm, and tr with "argument too long" errors. This is likely due to the lists being a list of files it is attempting to compare the actual files rather than the lists themselves.
The file list in the script looks something like this (But I am willing to format differently):
./file.php
./file2.php
./dir/file.php
./dir/.file2.php
I am attempting to use one of the following to print the list:
find ./ -type f -printf "%P\n"
or
find ./ -type f -print
Then use any command you can think of to compare the results to the list of files inside the script.
The following are difficult to use as there are often 1000's of files to check and each version can change the listings and it is a pain to update a whole script every time there is a new release.
find . ! -wholename './file.php' ! -wholename './file2.php'
find . ! -name './file.php' ! -name './file2.php'
find . ! -path './file.php' ! -path './file2.php'
With the lists being in different orders to accommodate any additional files, it can't be a straight comparison.
I'm just stumped. I greatly appreciate any advice or if I could be pointed in the right direction. Ask away for clarification!

You can use the -r option of diff command, to recursively compare the contents of the two directories. This way you don't need all the file names on the command line; just the two top level directory names.
It will give you missing files, newly added files, and the difference of changed files. Many things can be controlled by different options.
If you mean you have a list of expected files somewhere, and only one directory to be compared against it, then you can try using the tree command. The list can be first created using the tree command, and then at the time of comparison you can run the tree command again on the directory, and compare it with the stored "expected output" using the diff command.

Do you have to use coreutils? If so:
Put your list in a file, say list.txt, with one file path per line.
comm -23 <(find path/to/your/directory -type f | sort) \
<(sort path/to/list.txt) \
> diff.txt
diff.txt will have one line per file in path/to/your/directory that is not in your list.
If you care about files in your list that are not in path/to/your/directory, do comm -13 with the same parameters.
Otherwise, you can also use sd (stream diff), which doesn't require sorting nor process substitution and supports infinite streams, like so:
find path/to/your/directory -type f | sd 'cat path/to/list.txt' > diff.txt
And just invert the streams to get the second requirement, like so:
cat path/to/list.txt | sd 'find path/to/your/directory -type f' > diff.txt
Probably not that much of a benefit on this example other than succintness, but still consider it; in some cases you won't be able to use comm nor grep -F nor diff.
Here's a blogpost I wrote about diffing streams on the terminal, which introduces sd.

Bash script pdftk merge PDFs

I have a few thousand PDFs that I need merged based on filename.
Named like:
Lastname, Firstname_12345.pdf
Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:
Lastname, Firstname_12345_201305160953344627.pdf
For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf? and the originals deleted.
I gave this my best effort and this is what I have so far.
#! /bin/bash
# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"
# Remove .pdf extension. merge pdfs. delete originals.
do
y=${x%%.*}
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done
This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?

Since your file names contain spaces the for loop won't work as is.
Once you have a list of file names, a test on the number of files matching y*.pdf to determine if you need to merge the pdfs.
#!/bin/bash
LIST=( * )
# Remove .pdf extension. merge pdfs. delete originals.
for x in "${LIST[#]}" ; do
y=${x%%.pdf}
if [ $(ls "$y"*.pdf 2>/dev/null | wc -l ) -gt 1 ]; then
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
fi
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio