How to merge files in bash in alphabetical order

How to merge files in bash in alphabetical order - bash

I need to merge a bunch of mp3 files together. I know that simply doing
cat file1.mp3 >> file2.mp3
seems to work fine (at least it plays back correctly on my Zune anyway).
I'd like to run
cat *.mp3 > merged.mp3
but since there are around 50 separate mp3 files I don't want to be surprised halfway through by a file in the wrong spot (this is an audio book that I don't feel like re-ripping).
I read through the cat man pages and couldn't find if the order of the wildcard operator is defined.
If cat doesn't work for this, is there a simple way (perhaps using ls and xargs) that might be able to do this for me?

Your version (cat *.mp3 > merged.mp3) should work as you'd expect. The *.mp3 is expanded by the shell and will be in alphabetical order.
From the Bash Reference Manual:
After word splitting, unless the -f option has been set, Bash scans each word for the characters ‘*’, ‘?’, and ‘[’. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of file names matching the pattern.
However, do be aware that if you have many files (or long file names) you'll be hampered by the "argument list too long" error.
If that happens, use find instead:
find . -name "*.mp3" -maxdepth 0 -print0 | sort -z | xargs -0 cat > merged.mp3
The -print0 option in find uses a null character as field separators (to properly handle filenames with spaces, as is common with MP3 files), while the -z in sort and -0 in xargs informs the programs of the alternative separator.
Bonus feature: leave out -maxdepth 0 to also include files in sub directories.
However, that method of merging MP3 files would mess up information such as your ID3 headers and duration info. That will affect playability on more picky players such as iTunes (maybe?).
To do it properly, see "A better way to losslessly join MP3 files" or " What is the best way to merge mp3 files?"

try:
ls | sort | xargs cat > merged.mp3
(Anyway I'm not sure that you can merge mp3 files that way)

Related

Given a text file with file names, how can I find files in subdirectories of the current directory?

I have a bunch of files with different names in different subdirectories. I created a txt file with those names but I cannot make find to work using the file. I have seen posts on problems creating the list, on not using find (do not understand the reason though). Suggestions? Is difficult for me to come up with an example because I do not know how to reproduce the directory structure.
The following are the names of the files (just in case there is a formatting problem)
AO-169
AO-170
AO-171
The best that I came up with is:
cat ExtendedList.txt | xargs -I {} find . -name {}
It obviously dies in the first directory that it finds.
I also tried
ta="AO-169 AO-170 AO-171"
find . -name $ta
but it complains find: AO-170: unknown primary or operator

If you are trying to ask "how can I find files with any of these names in subdirectories of the current directory", the answer to that would look something like
xargs printf -- '-o\0-name\0%s\0' <ExtendedList.txt |
xargs -r0 find . -false
The -false is just a cute way to let the list of actual predicates start with "... or".
If the list of names in ExtendedList.txt is large, this could fail if the second xargs decides to break it up between -o and -name.
The option -0 is not portable, but should work e.g. on Linux or wherever you have GNU xargs.
If you can guarantee that the list of strings in ExtendedList.txt does not contain any characters which are problematic to the shell (like single quotes), you could simply say
sed "s/.*/-o -name '&'/" ExtendedList.txt |
xargs -r find . -false

How can I iterate from a list of source files and locate those files on my disk drive? I'm using FD and RIPGREP

I have a very long list of files stored in a text file (missing-files.txt) that I want to locate on my drive. These files are scattered in different folders in my drive. I want to get whatever closest available that can be found.
missing-files.txt
wp-content/uploads/2019/07/apple.jpg
wp-content/uploads/2019/08/apricots.jpg
wp-content/uploads/2019/10/avocado.jpg
wp-content/uploads/2020/04/banana.jpg
wp-content/uploads/2020/07/blackberries.jpg
wp-content/uploads/2020/08/blackcurrant.jpg
wp-content/uploads/2021/06/blueberries.jpg
wp-content/uploads/2021/01/breadfruit.jpg
wp-content/uploads/2021/02/cantaloupe.jpg
wp-content/uploads/2021/03/carambola.jpg
....
Here's my working bash code:
while read p;
do
file="${p##*/}"
/usr/local/bin/fd "${file}" | /usr/local/bin/rg "${p}" | /usr/bin/head -n 1 >> collected-results.txt
done <missing-files.txt
What's happening in my bash code:
I iterate from my list of files
I use FD (https://github.com/sharkdp/fd) command to locate those files in my drive
I then piped it to RIPGREP (https://github.com/BurntSushi/ripgrep) to filter the results and find the closest match. The match I'm looking for should match the same file and folder structure. I only limit it to one result.
Then finally stored it on another text file where I can later then evaluate the lists for next step
Where I need help:
Is this the most effecient way to do this? I have over 2,000 files that I need to locate. I'm open to other solution, this is something I just divised.
For some reason my coded broke, It stopped returning results to "collected-results.txt". My guess is that it broke somewhere in the second pipe right after the FD command. I haven't setup any condition in case it encounters an error or it can't find the file so it's hard for me to determine.
Additional Information:
I'm using Mac, and running on Catalina
Clearly this is not my area of expertise

"Missing" sounds like they do not exist where expected.
What makes you think they would be somewhere else?
If they are, I'd put the filenames in a list.txt file with enough minimal pattern to pick them out of the output of find.
$: cat list.txt
/apple.jpg$
/apricots.jpg$
/avocado.jpg$
/banana.jpg$
/blackberries.jpg$
/blackcurrant.jpg$
/blueberries.jpg$
/breadfruit.jpg$
/cantaloupe.jpg$
/carambola.jpg$
Then search the whole machine, which is gonna take a bit...
$: find / | grep -f list.txt
/tmp/apricots.jpg
/tmp/blackberries.jpg
/tmp/breadfruit.jpg
/tmp/carambola.jpg
Or if you want those longer partial paths,
$: find / | grep -f missing-files.txt
That should show you the actual paths to wherever those files exist IF they do exist on the system.

From the way I understand it, you want to find all files what could match the directory structure:
path/to/file
So it should return something like "/full/path/to/file" and "/another/full/path/to/file"
Using a simple find command you can get a list of all files that match this criteria.
Using find you can search your hard disk in a single go with something of the form:
$ find -regex pattern
The idea is now to build pattern, which we can do from the file missing_files.txt. The pattern should look something like .*/\(file1\|file2\|...\|filen\). So we can use the following awk to do so:
$ sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt
So now we can do exactly what you did, but a bit quicker, in the following way:
pattern="$(sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt)"
pattern=".*/\($pattern\)"
find -regex "$pattern" > file_list.txt
In order to find the files, you can now do something like:
grep -F -f missing_files file_list.txt
This will return all the matching cases. If you just want the first case, i.e.
awk '(NR==FNR){a[$0]++;next}{for(i in a) if (!(i in b)) if ($0 ~ i) {print; b[i]}}' missing_files file_list.txt

Is this the most effecient way to do this?
I/O is mostly usually the biggest bottleneck. You are running some software fd to find the files for one file one at a time. Instead, run it to find all files at once - do single I/O for all files. In shell you would do:
find . -type f '(' -name "first name" -o -name "other name" -o .... ')'
How can I iterate from a list of source files and locate those files on my disk drive?
Use -path to match the full path. First build the arguments then call find.
findargs=()
# Read bashfaq/001
while IFS= read -r patt; do
# I think */ should match anything in front.
findargs+=(-o -path "*/$patt")
done < <(
# TODO: escape glob better, not tested
# see https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13
sed 's/[?*[]/\\&/g' missing-files.txt
)
# remove leading -o
unset findargs[0]
find / -type f '(' "${findargs[#]}" ')'
Topics to research: var=() - bash arrays, < <(...) shell redirection with process substitution and when to use it (bashfaq/024), glob (and see man 7 glob) and man find.

How do I filter down a subset of files based upon time?

Let's assume I have done lots of work whittling down a list of files in a directory down to the 10 files that I am interested in. There were hundreds of files, and I have finally found the ones I need.
I can either pipe out the results of this (piping from ls), or I can say I have an array of those values (doing this inside a script). Doesn't matter either way.
Now, of that list, I want to find only the files that were created yesterday.
We can use tools like find -mtime 1 which are fine. But how would we do that with a subset of files in a directory? Can we pass a subset to find via xargs?
I can do this pretty easily with a for loop. But I was curious if you smart people knew of a one-liner.

If they're in an array:
files=(...)
find "${files[#]}" -mtime 1
If they're being piped in:
... | xargs -d'\n' -I{} find {} -mtime 1
Note that the second one will run a separate find command for each file which is a bit inefficient.
If any of the items are directories and you don't want to search inside of them, add -maxdepth 0 to disable find's recursion.

Another option that won't recurse, though I'd just use John's find solution if I were you.
$: stat -c "%n %w" "${files[#]}" | sed -n "
/ $(date +'%Y-%m-%d' --date=yesterday) /{ s/ .*//; p; }"
The stat will print the name and creation date of files in the array.
The sed "greps" for the date you want and strips the date info before printing the filename.

concat a lot of files to stdout

I have a large number of files in directory - ~100k. I want to combine them and pipe them to standard output (I need that to upload them as one file elsewhere), but cat $(ls) complains that -bash: /bin/cat: Argument list too long. I know how to merge all those files into a temporary one, but can I just avoid it?

For a start, cat $(ls) is not the right way to go about this - cat * would be more appropriate. If the number of files is too high, you can use find like this:
find -exec cat {} +
This combines results from find and passes them as arguments to cat, executing as many separate instances as needed. This behaves much in the same way as xargs but doesn't require a separate process or the use of any non-standard features like -print0, which is only supported in some versions of find.
find is recursive by default, so you can specify a -maxdepth 1 to prevent this if your version supports it. If there are other things in the directory, you can also filter by -type (but I guess there aren't, based on your original attempt).

find . -type f -print0 |xargs -0 cat
xargs will invoke cat several times, each time with as many arguments as it can fit on the command line (the combined length of the args can be no more than getconf ARG_MAX).
-print0 (seperate files with \0) for find in combination with -0 (process files separated with \0) for xargs is just a good habit to follow as it will prevent the commands from breaking on filenames with special or white characters in them.

Merging Multiple .mp3 files

On mac/linux there is a command to merge mp3 files together which is
cat file1.mp3 file2.mp3 > newfile.mp3
I was wondering if there is a simpler way or command to select multiple mp3's in a folder and output it as a single file?

The find command would work. In this example I produce a sorted list of *.mp3 files in the current directory, cat the file and append it to the output file called out
find . -maxdepth 1 -type f -name '*.mp3' -print0 |
sort -z |
xargs -0 cat -- >>out
I should warn you though. If your mp3 files have id3 headers in them then simply appending the files is not a good way to go because the headers are going to wind up littered in the file. There are some tools that manage this much better. http://mp3wrap.sourceforge.net/ for example.

simply linking files together won't work. Don't forget modern Mp3 files have metadata in the head. Even if you don't care about the player name, album name etc, you should at least make the "end of file" mark correct.
Better use some tools like http://mulholland.xyz/dev/mp3cat/.

You can use mp3cat by Darren Mulholland available at https://darrenmulholland.com/dev/mp3cat.html
Source is available at https://github.com/dmulholland/mp3cat

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio