How can I cut several sound files using a script? - praat

I am new to Praat and wondering, if someone can help me to find out, how I can cut all my sound files with a script or anything.
I have like 100 sound files I need for my research. They all have a different length, some are 1 min and others are 3 min long.
I would like to have only the first 22 sec from each sound file.
Thanks in advance!
Kind regards
Olga

The first step is to construct a script that extracts the initial 22 seconds of some specific sound object that is already open. In general, the easiest way to at least start a script is to do a thing manually once, and after you've done that, in a Praat script window, copy the command history (with ctrl-h) to see what the underlying commands are. The manual approach is to look for "Extract part" under "Convert", which corresponds to the command
Extract part: 0, 22, "rectangular", 1, "no"
There is also a command to save a file as a wav file, so you would add that to the core of the script.
Then you need to add a loop that does this a number of times, to different files. You will (probably) need a file with wav file names, and some system for naming the output files, for example if you have "input1.wav", you might want to call the cut-down version "output1.wav". This implies some computation of the output file name based on the input file name, so you need to get familiar with how string manipulation works in Praat.
If you have that much sorted out, then the basic logic is
get next input file name file
compute output name
open the input file
extract from that file
save the extracted file
remove the extract
remove the original
loop until no more files
I would plan on spending a lot of time trying to understand simple things like string variables, or object selection. I left out explicitly selecting objects since it is not necessarily required, but every command works on "the selected object" and it's easy to lose track of what is selected.
Another common approach is to beg a colleague to write it for you.

Related

Monitor value via OCR

Background:
Sometime I have the need to monitor the change of a value in a certain program.
My solution is to use a batch file to capture the part of the screen with where the value is shown with Minicap and then use Tesseract to convert the value to plain text. However this script would not work so good if I would need to monitor value change every second for several hours.
Current solution (simplified example):
minicap.exe -captureregion 800 600 850 620 -save C:\file.png -exit -escapequit
tesseract.exe C:\file.png out.txt
Question:
What I would like is some simple way to OCR a value directly from the screen to use in the batch file, perhaps buffer several values before appending them to a csv file. I would prefer to do this without the need to install python or write compiled software
(Posted on behalf of the question author, to move the solution to the answer space).
I found that I could use Capture2Text. The following command takes the on screen text and prints it to stdout:
Capture2Text_CLI.exe --screen-rect "800 600 850 620"
This way it's possible to run the command, check if the value is changed, and if so, append it to a log file together with a timestamp.

Why is in-place edditing of a file slower than making a new file?

As you can see in this answer. It seems like editing a text file in-place takes much more time than creating a new file, deleting the old file and moving a temporary file from another file-system and renaming it. Let alone creating a new file in the same file-system and just renaming it. I was wondering what is the reason behind that?
Because when you edit a file inplace you are opening the same file for both writing and reading. But when you use another file. you only read from one file and write to another file.
When you open a file for reading it's content are moved from disk to memory. Then after, when you want to edit the file you change the content of the file in the disk so the content you have in memory should be updated to prevent data inconsistency. But when you use a new file. You don't have to update the contents of the first file in the memory. You just read the whole file once and write the other file once. And don't update anything. Removing a file also takes very small time because you just remove it from the file system and you don't write any bits to the location of the file in the disk. The same goes for renaming. Moving can also be done very fast depending on the file-system but most likely not as fast as removing and renaming.
There is also another more important reason.
When you remove the numbers from the beginning of the first line, all of the other characters have to be shifted back a little. Then when you remove the numbers from the second line again all of the characters after that point have to be shifted back because the characters have to be consecutive. If you wanted to just change some characters, editing in place would have been a lit faster. But since you are changing the length of the file on each removal all of the other characters have to get shifted and that takes so much time. It's not exactly like this and it's much more complicated depending on the implementation of your operation system and your file-system but this is the idea behind it. It's like array operation. When you remove a single element from an array you have to shift all of the other elements of the array. Because it is an array. In contrast if you were to remove an element from a linked list you didn't need to shift other elements but files are implemented similar to arrays so that is that.
While tgwtdt's answer may give a few good insights it does not explain everything. Here is a counter example on a 140MB file:
$ time sed 's/a/b/g' data > newfile
real 0m2.612s
$ time sed -i -- 's/a/b/g' data
real 0m9.906s
Why is this a counter example, you may ask. Because I replace a with b which means that the replacement text has the same length. Thus, no data needs to be moved, but it still took about four times longer.
While tgwtdt gave a good reasoning for why in place usually takes longer, it's a question that cannot be answered 100% for the general case, because it is implementation dependent.

Nifi: how to avoid copying file that are partially written

I am trying to use Nifi to get a file from SFTP server. Potentially the file can be big , so my question is how to avoid getting the file while it is being written. I am planning to use ListSFTP+FetchSFTP but also okay with GetSFTP if it can avoid copying partially written files.
thank you
In addition to Andy's solid answer you can also be a bit more flexible by using the ListSFTP/FetchSFTP processor pair by doing some metadata based routing.
After ListSFTP each flowfile will have attributes such as 'file.lastModifiedTime' and others. You can read about them here https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.ListSFTP/index.html
You can put a RouteOnAttribute process in between the List and Fetch to detect objects that at least based on the reported last modified time are 'too new'. You could route those to a processor that is just a slow pass through to intentionally wait a bit. You can then run those back through the first router until they are 'old enough'. Now, this is admittedly a power user approach but it does give you a lot of flexibility and control. The approach I'm mentioning here is not fool proof as the source system may not report the last mod time correctly, it may not mean the source file is doing being written, etc.. But it gives you additional options IF you cannot do the definitely correct thing above that Andy talks about.
If you have control over the process which writes the file in, a common pattern to solve this is to initially write the file with a specific naming structure, such as beginning with .. After the successful write operation, the file is renamed without the . and it is picked up by the processor. Both GetSFTP and ListSFTP have a processor property called Ignore Dotted Files which is set to true by default and means those processors will not operate on or return files beginning with the dot character.
There is a minimum file age property you can use. The last modification time gets updated as the file is being written. Setting this value to something other than 0 will help fix the problem:

Monitor A File For Additions And Get Last Added Line

I'm having trouble monitoring a file for changes. I need to be able to know when a file changes, and when it does, I need the new line that was added. I intend to parse each line and find ones that match certain criteria, and act on information in those lines. I know the expected number of matching lines ahead of time, but I do not know how many lines in total will be added to the file, or where the matching lines will be.
I've tried 2 packages so far, with no avail.
fsnotify/fsnotify
As fas as I can tell, fsnotify can only tell me when a file is modified, not what the details of the modification was. Since I need to know what exactly was added to the file, this is no good for me.
(As a side-question, can this be run in a loop? The example that I tried exited after just one modification. I need to monitor for multiple modifications.)
hpcloud/tail
This package tries to mimic the Unix tail command, but it seems to have its own issues. The output that I get includes timestamps and other data - I just want the added line, nothing else. Also, it seems to think a file has been modified multiple times, even when it's just one edit. Further, the deal breaker here is that it does not output the last line if the line was not followed by a newline character.
Delegating to tail
I came across this answer, which suggests to delegate this work to the tail command itself, but I need this to work cross-platform (specifically, macOS, Linux and Windows). I don't believe that an equivalent command exists on Windows.
How do I go about tackling this?
#user2515526,
Usually changed diff is out of scope of file watchers' functionality, because, you know, you could change an image, and a watcher would need to keep a track several Mb of a diff in memory, and what if we have thousands of files?
However, as bad as it sounds, this may be exactly the way you want to implement this (sure, depends on your app, etc. - could be fine for text files), i.e. - keeping a map of diffs (1 diff per file) since last modification. Cannot say I like it, but sounds like fsnotify has no support for changes/diffs that you need.
Also, regarding your question about running in a loop, maybe you can get some hints here: https://github.com/kataras/iris/blob/8370d76910cdd8de043753ed81ae080eae8dc798/utils/file.go
Its a framework that allows to build a server that watches for TypeScript file changes. So sounds similar to your case/question.
Cheers,
-D

Incrementally reading logs

Looked around with numerous search strings but can't find anything quite like this:
I'm writing a custom log parser (ala analog or webalizer except not for webserver) and I want to be able to skip the hard work for the lines that have already been parsed. I have thought about using a history file like webalizer but have no idea how it actually works internally and my C is pretty poor.
I've considered hashing each line and writing the hashes out, then parsing the history file for their presence but I think this will perform poorly.
The only other method I can think of is storing the line number of the last parse and skipping until that number is reached the next time round. What happens when the log is rotated I am not sure.
Any other ideas would be appreciated. I will be writing the parser in ruby but tips in a similar language will help as well.
The solutions I can think of right now are bound to be brittle.
Even if you store the line number and later realize it would be past the length of the current file, what happens if old lines have been trimmed? You would start reading (well) after the last position.
If, on the other hand, you are sure your log files won't be tampered with and they will only be rotated, I only see two ways of doing what you want, and I'm not sure the second is applicable to you.
Anyway, here goes.
First solution
You store the last line you parsed along with a timestamp. At the next run, you consider all the rotated log files sorting them by their last modified date, figure out which one you read last time, and start reading from there.
I didn't think this through, there might be funny corner cases you will need to handle.
Second solution
You create a background script that continuously watches the log file. A quick search on Google turned out this gem, but I'm not sure if that's even an option for you. Even then, you might want to integrate this solution with the previous one just in case your daemon will get interrupted (because that's clearly bound to happen at some point).
As you read the file and parse the lines keep track of the byte count. Save that. On next read, try to seek to that byte offset in the file. If the file is smaller than the byte count, it's a new file so start at the beginning.

Resources