Split and rejoin wav files - bash

I'm trying to edit around 200 wav files with a windows program that won't support command line batch sort of stuff. So it seems like the easiest way to do it would be to combine the wavs into one file (they're all short), and then split them back the way they are after editing.
Sox will give me the length, and I already have the names of course. Is there any way to say, combine all the wavs in a directory into a single wav file, while preserving the names, lengths, and which order they were combined in a txt file, and then use the txt to turn them back into wavs with the original names and lengths?
Edit: I seem to be doing something wrong. I ran this script first:
#!/bin/bash
for f in *.wav
do
dd if=$f of=new_$f bs=1 skip=44
done
Then I moved all of the original files out of the folder, deleted the first of the new files, and copied the first of the originals back in. Then I did this:
cat *.wav > merged.wav
This gives me one file that's as big as it should be, but when I open it with a media player, it just plays the portion that was the first file, and then stops before playing the others.

dont know how creative you want to get. A wav is just a headder with binary data. So long as theyre all the same format, sample size everything, you can use cut or split to strip 44 bytes off the beginming of all of them, keeping one copy of the headder at the beginning, cat them into 1 file, do what you want to them, split it back up using another script with the same list of filenames.

Sox can do this.
Assuming all wav files are in c:\temp the command is
sox c:\temp\*.wav c:\temp\merged.wav
(The example is for windows, for linux use linux path notation)
For preserving the length and names I'd use sox to get the length and
then create a cue-file from that info.
This cue-file can later get used to split the audio.

Related

Archiving differences between time sequence of text files

There is a sensor network from which I download measurements every ten minutes or on demand. Each download is a text file consisting of several lines with a timestamp and values. The name of the text file also contains a timestamp of when the download occured. So as time progresses I collect a lot of text files, which consist a sequence. Because of the physical parameters which the values are taken from, there are little to no differences between adjacent text files.
As I want to archive into a (compressed) file all of the text files that are being downloaded, in an efficient way. So I thought that archiving the differences between adjacent text files is one such way.
I want some ideas to work it out in BASH, using well-known tools like tar and diff. I know also about git, but it is not useful for creating an archive file.
I will try to clarify a bit. A text file is consisting of several lines of the following space-separated format:
timestamp sensor_uuid value_1 ... value_N
Not every line has exactly the same (say N) values, but there is little variation of tokens per line. Also the values themselves have little variation in time. As they come from sensors, and there is a single sensor per line, the number of the lines of the text file depends on how many responses I got for each call. Zero lines is possible.
Finally the text filename takes its own timestamp, a concatenation of an original name with a date time string:
sensors_2019-12-11_153043.txt for today’s 15:30:43 request.
Needless to say that timestamps in the lines of this example filename are usually earlier than the filename’s, or even there are lines and timestamps repeated from text files created before.
So my idea for efficient archiving is putting the first text file into the archive and then putting only the updates, i.e. the differences between two adjacent text files, which eventually will be tracing back to the first one text file actually archived. But at retrieving I need to get a complete text file, as if it was itself archived and not its difference from the past.
Tar takes in the whole text files, and a couple of differences between the text files’ lines are not producing a repeatable pattern suitable for strong compression.
tar command already identifies the repeating patterns and compress them. But if you want to eliminate the parts that are repeated you can use "diff" command with some other simple manipulation of diff output and then redirect all to file.
Let's say we have 2 file "file1.txt" and "file2.txt" you can use this command line to get only the line added from the second file (file2.txt) :
diff -u file1.txt file2.txt | grep -E "^\+" | sed -E 's/^\+//' | grep -v "\+"
then we need just to redirect the output or to the same file (example file2.txt) or in another file and then delete the file2.txt before the tar operation.

Maximum number of input file for Ghostscript (gs)

I simply want to combine multiple eps files into one big file using gs command
the command work flawlessly except that when I specify more than 20 input files.
Somehow the command ignore input files starting from 21st input.
Anyone experience the same behavior? Is there a cap of number of input files specify anywhere?
I look through the site and couldn't find one.
sample command
gs -o output.eps -sDEVICE=eps2write file1.eps file2.eps .... file21.eps
Thank you.
Edit: add sample command
Almost certainly you have simply reached the maximum length of the command line for your Operating System. You can use the # syntax for Ghostscript to supply a file containing the command line instead.
https://www.ghostscript.com/doc/current/Use.htm#Input_control
Note that the EPS files will not be placed appropriately using that command, and this does not actually combine EPS files, it creates a new EPS file whose marking content should be the same as the input(s).
If you actually want to combine the EPS files its easy enough, but will require a small amount of programming to parse the EPS file headers and produce appropriate scale/translate operations, as well as stripping off any bitmap previews (which will also happen when you run them through Ghostscript).

Merge two text files

I'm using Windows and Notepad++ to separate file in txt. I have 2 files which is I have to merge it side by side or line by line for my data analysis.
Here is the example:
file1.txt
Abcdefghijk
abcdefghijk
file2.txt
123456
123456
then the output I want is like this:
Abcdefghijk123456
abcdefghijk123456
in the next file or output file. Does anybody here know how to do this?
Your question answered here by TheMadTechnician. Using powershell, you should take both source files (1 and 2) as arrays of lines. Then comes simple cycle, like "merge line x from file1 with line x from file2 as long you have some lines in file1".
Unfortunately its impossible with pure cmd.
#riki.. you could also write a batch program to do this pro grammatically. There should probably be no limit over the number of lines.
It may depend on the number of lines you're having in each files. I suggest to copy paste the same if it is less than 50 lines.
Otherwise,
use some powerful languages like python, c,php etc. And make it run before performing data analysis.
There is a free utility you can download and run on your computer, called txtcollector. I read about it here. I used it because I had a whole folder of files to concatenate. It was a breeze. The only slight imperfection I noticed was that I couldn't paste in the path to the specific folder in the first step (choosing the folder where the files to be concatenated were). However, I could do this when choosing where to save the result.

Finding Duplicate image files

I have around 1 TB of images, stored in my hard disk. These are pictures taken over time of friends and family. Many of these pictures are duplicates, in the sense, same file saved in different locations, probably with different name too. I want to ask is there any tool, utility or approach(I can code one ) to find out the duplicate files.
I would recommend using md5deep or sha1deep. On Linux simply install package md5deep (it is included in most Linux distributions).
Once you have it installed, simply run it in recursive mode over your whole disk and save checksums for every file on your disk into text file using command like this:
md5deep -r -l . > filelist.txt
If you like sha1 better than md5, use sha1deep instead (it is part of the same package).
Once you have a file, simply sort it using sort (or pipe it into sort in previous step):
sort < filelist.txt > filelist_sorted.txt
Now, simply look at the result using any text editor - you will quickly see all the duplicates alongside with their locations on disk.
If you are so inclined, you can write simple script in Perl or Python to remove duplicates based on this file list.

Diff for 3 binary files

I have 3 binary files. Let's call them file1.bin, file2.bin and file3.bin.
file1.bin and file2.bin have some common parts.
file2.bin and file3.bin have some common parts.
I want to find the common parts between file1.bin and file2.bin that are different between file2.bin and file3.bin.
How do you recommend to accomplish that? I have already dumped the binary files to text files using xxd and then did a 3-way diff using vim -d file1.txt file2.txt file3.txt.
However, vim marks a part as changed in all the files even if it has only changed in one file and remains the same in the other two files. I want those special kind of occurrences to be marked differently.
Perhaps you can use the built-in unix diff (I think it is part of OSX), but use the --unchanged-group-format to list the similarities. Do that for file1 and file 2. Then do it for file2 and file3. You can then do a regular diff on the two resulting files.
For an idea of how to get the similarities, have a look at this post.
The tool that I work for (ECMerge) does that. You just have to diff the 3 binary files, it will present equal portions in front of each other, and modified bytes appropriately placed in between. No need to first get an hex dump. You can script in JavaScript to output whatever you like based on the diff results and the bytes in the files (it works also in command line).
Chromium uses bsdiff, then switched to courgette for doing binary diff as explained in their blog here. You might find useful leads from their blog.

Resources