Subtitle Workshop and Pascal: divide subtitles intelligently - word-wrap

Is there any Pascal script good to make Subtitle Workshop to split lines "intelligently" from the point of view of the type of the word?
The idea is rather simple, and it is exactly the standards that professional subtitlers use for subtitles to be accepted for commercial use:
To be considered professional, subtitles not only need to follow certain timing and length rules, but also have correct word splits at certain words and places. From a professional point of view, subtitles are not allowed to end like this at the end of each line:
I am going to the
beach because today is a
very nice day and the
sun is up shinning on
........................of
........................for
I start from a .txt file, which is not optimized. And all those words at the end of the 1st line must be sent to the beginning of the 2nd line. Hopefully, a Pascal script could do that once you define which are the words that should be sent to the 2nd line, if they are at the end of the 1st line.
You will not see any TV series with subtitles ending in the forbidden words as it´s not professional. I really need to solve this. So, any tips you can give me are hugely welcome. Thanks!

Subtitle editing requires a not so very common complex algorithm.
Without additionnal input (your Pascal script first try) to your question, I am afraid the only best place where you can get valuable answer is at the URUWorks's forum.
Edit:
Apart from scripting consideration, it may take some additional knowledge to program such as hyphenation algorithm in a given language.
Judge by yourself, the hyphenation rule used by the Lout Document Formatting Language for the french language has more than 1000 lines of code.
Edit2:
You can find here an opensource Hyphenation Delphi unit (english, french, italian and spanish hyphen tables included ).

Related

software or algorithms to detect grammar units in texts

I'm not sure this is the right fit for stackoverflow but maybe you guys would suggest where to put this question otherwise but here it is anyway. Suppose I have a few sentences of a text like this:
John reads newspapers everyday. Right now he has just finished reading
one. He will read another one and might even read a small book
tomorrow.
This small extract contains the following grammar units:
present simple (reads)
present perfect (has finished)
future simple (will read)
modal verb may
Do you know of any software, algorithm or study that defines rules for identifying these grammar patterns?
Read this also if you are going to use Ruby than you can use TreeTop or find the equivalent parser in other programming language.
NTLK is a natural language parser for python, it works by tagging words. You can look at some examples here. It creates a parse-tree, which are very useful for these types of problems.
I haven't seen it distinguish between simple and perfect, but it could be modified to do so.

Adding Accents to Speech Generation

The first part of this question is now its own, here: Analyzing Text for Accents
Question: How could accents be added to generated speech?
What I've come up with:
I do not mean just accent marks, or inflection, or anything singular like that. I mean something like a full British accent, or a Scottish accent, or Russian, etc.
I would think that this could be done outside of the language as well. Ex: something in Russian could be generated with a British accent, or something in Mandarin could have a Russian accent.
I think the basic process would be this:
Analyze the text
Compare with a database (or something like that) to determine what needs an accent, how strong it should be, etc.
Generate the speech in specified language
Easy with normal text-to-speech processors.
Determine the specified accent based on the analyzed text.
This is the part in question.
I think an array of amplitudes and filters would work best for the next step.
Mesh speech and accent.
This would be the easy part.
It could probably be done by multiplying the speech by the accent, like many other DSP methods do.
This is really more of a general DSP question, but I'd like to come up with a programatic algorithm to do this instead of a general idea.
This question isn't really "programming" per se: It's linguistics. The programming is comparatively easy. For the analysis, that's going to be really difficult, and in truth you're probably better off getting the user to specify the accent; Or are you going for an automated story reader?
However, a basic accent is doable with modern text-to speech. Are you aware of the international phonetic alphabet? http://en.wikipedia.org/wiki/International_Phonetic_Alphabet
It basically lists all the sounds a human voice can possibly make. An accent is then just a mapping (A function) from the alphabet to itself. For instance, to make an American accent sound British to an American person (Though not sufficient to make it sound British to a British person), you can de-rhotacise all the "r" sounds in the middle of a word. So for instance the alveolar trill would be replaced with the voiced uvular fricative. (Lots of corner cases to work out just for this).
Long and short: It's not easy, which is probably why no-one has done it. I'm sure a couple of linguistics professors out their would say its impossible. But that's what linguistics professors do. But you'll basically need to read several thick textbooks on accents and pronunciation to make any headway with this problem. Good luck!
What is an accent?
An accent is not a sound filter; it's a pattern of acoustic realization of text in a language. You can't take a recording of American English, run it through "array of amplitudes and filters", and have British English pop out. What DSP is useful for is in implementing prosody, not accent.
Basically (and simplest to model), an accent consists of rules for phonetic realization of a sequence of phonemes. Perception of accent is further influenced by prosody and by which phonemes a speaker chooses when reading text.
Speech generation
The process of speech generation has two basic steps:
Text-to-phonemes: Convert written text to a sequence of phonemes (plus suprasegmentals like stress, and prosodic information like utterance boundaries). This is somewhat accent-dependent (e.g. the output for "laboratory" differs between American and British speakers).
Phoneme-to-speech: given the sequence of phonemes, generate audio according to the dialect's rules for phonetic realizations of phonemes. (Typically you then combine diphones and then adjust acoustically the prosody). This is highly accent-dependent, and it is this step that imparts the main quality of the accent. A particular phoneme, even if shared between two accents, may have strikingly different acoustic realizations.
Normally these are paired. While you could have a British-accented speech generator that uses American pronunciations, that would sound odd.
Generating speech with a given accent
Writing a text-to-speech program is an enormous amount of work (in particular, to implement one common scheme, you have to record a native speaker speaking each possible diphone in the language), so you'd be better off using an existing one.
In short, if you want a British accent, use a British English text-to-phoneme engine together with a British English phoneme-to-speech engine.
For common accents like American and British English, Standard Mandarin, Metropolitan French, etc., there will be several choices, including open-source ones that you will be able to modify (as below). For example, look at FreeTTS and eSpeak. For less common accents, existing engines unfortunately may not exist.
Speaking text with a foreign accent
English-with-a-foreign-accent is socially not very prestigious, so complete systems probably don't exist.
One strategy would be to combine an off-the-shelf text-to-phoneme engine for a native accent with a phoneme-to-speech engine for the foreign language. For example, a native Russian speaker that learned English in the U.S. would plausibly use American pronunciations of words like laboratory, and map its phonemes onto his native Russian phonemes, pronouncing them as in Russian. (I believe there is a website that does this for English and Japanese, but I don't have the link.)
The problem is that the result is too extreme. A real English learner would attempt to recognize and generate phonemes that do not exist in his native language, and would also alter his realization of his native phonemes to approximate the native pronunciation. How closely the result matches a native speaker of course varies, but using the pure foreign extreme sounds ridiculous (and mostly incomprehensible).
So to generate plausible American-English-with-a-Russian-accent (for instance), you'd have to write a text-to-phoneme engine. You could use existing American English and Russian text-to-phoneme engines as a starting point. If you're not willing to find and record such a speaker, you could probably still get a decent approximation using DSP to combine the samples from those two engines. For eSpeak, it uses formant synthesis rather than recorded samples, so it might be easier to combine information from multiple languages.
Another thing to consider is that foreign speakers often modify the sequence of phonemes under influence by the phonotactics of their native language, typically by simplifying consonant clusters, inserting epenthetic vowels, or diphthongizing or breaking vowel sequences.
There is some literature on this topic.

visualising piano performance evaluation

I need to develop a performance evaluator for piano playing. Based on a midi generated from sheet music, I need to evaluate the midi of the actual playing (midi keyboard). I'm planning to evaluate the playing based on note pitch, duration and loudness. The evaluation is I suppose a comparison of the notes of the sheet music and playing in midi.
But I have no idea how I can visualise (i.e. show where the person have gone wrong) this evaluation process. i.e. maybe show both the notation and highlight which note has gone wrong.
But how can I show any of this in some graphical form? Or more precisely on a stave (a music score) itself. I have note details (pitch, duration) and score details (key and time signature) stored in a table, and I'm using Java. But I have no clue as in how I can put all this into graphical form.
Any insight is most gratefully appreciated. Advance thanks
What you're talking about, really, is a graphical diff tool for musical notation. I think the easiest way to show differences is with an overlay of played notes (and rests) over "correct" score symbols. Where it gets tricky is in showing volume differences, whether notes are played (or should be played) staccato, marcato, tenuto, etc. For example, a note with a dot over it is meant to be played staccato, but your MIDI representation of a quarter note might be interpreted as an eight note followed by an eight rest, etc.
You'll also have to quantize the results of the live play, which means you will have to allow some leeway for the human being to be slightly before or after the beat without notating differently. If you don't do this, the only "correct" interpretation of the notes will be very mechanical (and not pleasing to the ear).
As for drawing the notation and placing it on the correct lines or spaces on the staves, that is not hard if you understand how to draw graphics. There are musical fonts available that permit you to use alphanumeric characters to represent note bodies, stems, rests, etc. you will also have to understand key signatures, accidentals, when certain notes are enharmonic, etc.
This is not a small undertaking you are proposing, and there is already a lot of software out there that does a lot of what you are trying to do. Maybe some exists that does exactly what you want to do, so do research it before you start coding. :) Look at various work that has already been done and see if there is anything you can use or which would put you off your project.
I made my own keyboard player/recorder for QuickTime's MIDI implementation a few years ago and had to solve a number of the problems you face. I did it for fun, and it was fun (and educational for me), but it could never compete with commercial software in the genre. And although people did enjoy it, I really did not have time to maintain it and add the features people wanted, so eventually I had to abandon it. This kind of thing is really a lot of work.

Word wrap algorithms for Japanese

In a recent web application I built, I was pleasantly surprised when one of our users decided to use it to create something entirely in Japanese. However, the text was wrapped strangely and awkwardly. Apparently browsers don't cope with wrapping Japanese text very well, probably because it contains few spaces, as each character forms a whole word. However, that's not really a safe assumption to make as some words are constructed of several characters, and it is not safe to break some character groups into different lines.
Googling around hasn't really helped me understand the problem any better. It seems to me like one would need a dictionary of unbreakable patterns, and assume that everywhere else is safe to break. But I fear I don't know enough about Japanese to really know all the words, which I understand from some of my searching, are quite complicated.
How would you approach this problem? Are there any libraries or algorithms you are aware of that already exist that deal with this in a satisfactory way?
Japanese word wrap rules are called kinsoku shori and are surprisingly simple. They're actually mostly concerned with punctuation characters and do not try to keep words unbroken at all.
I just checked with a Japanese novel and indeed, both words in the syllabic kana script and those consisting of multiple Chinese ideograms are wrapped mid-word with impunity.
Below listed projects are useful to resolve Japanese wordwrap (or wordbreak from another point of view).
budou (Python): https://github.com/google/budou
mikan (JS): https://github.com/trkbt10/mikan.js
mikan.sharp (C#): https://github.com/YoungjaeKim/mikan.sharp
mikan has regex-based approach while budou uses natural language processing.

extracting a specific melody/beat/rhythm from a specific instument from a mixed wave (or other music format) file

Is it possible to write a program that can extract a melody/beat/rhythm provided by a specific instument in a wave (or other music format) file made up of multiple instruments?
Which algorithms could be used for this and what programming language would be best suited to it?
This is a fascinating area. The basic mathematical tool here is the Fourier Transform. To get an idea of how it works, and how challenging it can be, take a look at the analysis of the opening chord to A Hard Day's Night.
An instrument produces a sound signature, just the same way our voices do. There are algorithms out there that can pick a single voice out of a crowd and identify that voice from its signature in a database which is used in forensics. In the exact same way, the sound signature of a single instrument can be picked out of a soundscape (such as your mixed wave) and be used to pick out a beat, or make a copy of that instrument on its own track.
Obviously if you're thinking about making copies of tracks, i.e. to break down the mixed wave into a single track per instrument you're going to be looking at a lot of work. My understanding is that because of the frequency overlaps of instruments, this isn't going to be straightforward by any means... not impossible though as you've already been told.
There's quite an interesting blog post by Comparisonics about sound matching technologies which might be useful as a start for your quest for information: http://www.comparisonics.com/SearchingForSounds.html
To extract the beat or rhythm, you might not need perfect isolation of the instrument you're targeting. A general solution may be hard, but if you're trying to solve it for a particular piece, it may be possible. Try implementing a band-pass filter and see if you can tune it to selects th instrument you're after.
Also, I just found this Mac product called PhotoSounder. They have a blog showing different ways it can be used, including isolating an individual instrument (with manual intervention).
Look into Karaoke machine algorithms. If they can remove voice from a song, I'm sure the same principles can be applied to extract a single instrument.
Most instruments make sound within certain frequency ranges.
If you write a tunable bandpass filter - a filter that only lets a certain frequency range through - it'll be about as close as you're likely to get. It will not be anywhere near perfect; you're asking for black magic. The only way to perfectly extract a single instrument from a track is to have an audio sample of the track without that instrument, and do a difference of the two waveforms.
C, C++, Java, C#, Python, Perl should all be able to do all of this with the right libraries. Which one is "best" depends on what you already know.
It's possible in principle, but very difficult - an open area of research, even. You may be interested in the project paper for Dancing Monkeys, a step generation program for StepMania. It does some fairly sophisticated beat detection and music analysis, which is detailed in the paper (linked near the bottom of that page).

Resources