Import Subtitles in Processing

Import Subtitles in Processing - processing

I am working on this sketch on Processing which gets the videofeed from my webcam/smartphone and shows it when running. I want to import an .srt converted to txt subtitles file from a film to it. I can see that the text file has all these numbers that stand for the start and end subtitle frame before the actual text.
Here's an example:
{9232}{9331}Those swings are dangerous.|Stay off there. I haven't fixed them yet.
{9333}{9374}I think you're gonna live.
What I would like to do is to figure outa code that will
use the numbers and set them as start/end frames to run at the right time as in the film
display the subtitles
figure out how the '|'sign can be used as a symbol to trigger in the script the change of line.
I guess that might already be quite complicated but I just wanted to check whether someone has done anything similar in the past..
I guess what I want to do is save me from doing the whole
if ((current_frame > 9232) && ((current_frame < 9331)) {
text("Those swings are dangerous.", 200, 500/2);
text("Stay off there. I haven't fixed them yet..", 200, (500/2 + 35));
}
thing for each subtitle...
I am quite new to processing so not that familiar with many commands apart from 'for' and 'if', a newbie at importing .txt files and an ignoramus in working with arrays. But I really want to find a nice way in the last two bits..
Any help in any form will be greatly appreciated :)
Cheers,
George

For displaying the appropriate subtitle, you could do something like the following (explanation below, sorry in advance for the wall of text):
String[] subtitles = loadStrings("subtitles.txt");
int currentFrame = 0;
int subtitleIndex = -1;
int startFrame = -1, endFrame = -1;
int fontSize = 10; //change to suit your taste
String[] currentSubtitle;
...
//draw loop start:
//video drawing code goes here
if(currentFrame > endFrame){ //update which subtitle is now/next
subtitleIndex++;
startFrame = int(subtitles[subtitleIndex].split("\\}\\{")[0].substring(1));
endFrame = int(subtitles[subtitleIndex].split("\\}\\{")[1].split("\\}")[0]);
currentSubtitle = subtitles[subtitleIndex].split("\\}")[2].split("\\|");
}
if(currentFrame >= startFrame && currentFrame <= endFrame){
for(int i = 0; i < currentSubtitle.length; i++){
text(currentSubtitle[i], width/2, height - fontSize * (currentSubtitle.length - i));
}
}
currentFrame++;
//draw loop end
Probably that looks pretty intimidating to you, so here's some walk-through commentary. Your program will be a type of state machine. It will either be in the state of displaying a subtitle, or not. We'll keep this in mind later when we're designing the code. First, you need to declare and initialize your variables.
The first line uses the loadStrings() function, which reads through a text file and returns a String array where each element in the array is a line in the file. Of course, you'll need to change the filename to fit your file.
Your code uses a variable called current_frame, which is a very good idea, but I've renamed it to currentFrame to fit the java coding convention. We'll start at zero, and later on our code will increment it on every frame display. This variable will tell us where we are in the subtitle sequence and which message should be displayed (if any).
Because the information for what frame each subtitle starts and ends on is encoded in a string, it's a bit tricky to incorporate it into the code. For now, let's just create some variables that represent when the "current" subtitle-- the subtitle that we're either currently displaying or will be displaying next-- starts and ends. We'll also create an index to keep track of which element in the subtitles array is the "current" subtitle. These variables all start at -1, which may seem a bit odd. Whereas we initialized currentFrame to 0, these don't really have a real "initial" value, at least not for now. If we chose 0, then that's not really true, because the first subtitle may not (probably doesn't) begin and end at frame 0, and any other positive number doesn't make much sense. -1 is often used as a dummy index that will be replaced before the variable actually gets used, so we'll do that here, too.
Now for the final variable: currentSubtitle. The immediate thought would be to have this be a plain String, not a String array. However, because each subtitle may need to be split on the pipe (|) symbols, each subtitle may actually represent several lines of text, so we'll create an array just to be safe. It's possible that some subtitles may be a single-element array, but that's fine.
Now for the hard part!
Presumably, your code will have some sort of loop in it, where on each iteration the pertinent video frame is drawn to the screen and (if the conditions are met), the subtitle is drawn over top of it. I've left out the video portion, as that's not part of your question.
Before we do anything else, we need to remember that some of our variables don't have real values yet-- all those -1s from before need to be set to something. The basic logic of the drawing loop is 1) figure out if a subtitle needs to be drawn, and if so, draw it, and 2) figure out if the "current" subtitle needs to be moved to the next one in the array. Let's do #2 first, because on the first time through the loop, we don't know anything about it yet! The criterion (in general) for moving to the next subtitle is if we're past the end of the current one : currentFrame > endFrame. If that is true, then we need to shift all of our variables to the next subtitle. subtitleIndex is easy, we just add one and done. The rest are... not as easy. I know it looks disgusting, but I'll talk about that at the end so as to not break the flow. Skip ahead to the bottom if you just can't wait :)
After (if necessary) changing all of the variables so that they're relevant to the current subtitle, we need to do some actual displaying. The second if statement checks to see if we're "inside" the frame-boundaries of the current subtitle. Because the currentSubtitle variable can either refer to the subtitle that needs to be displayed RIGHT NOW, or merely just the next one in the sequence, we need to do some checking to determine which one it is for this frame. That's the second if statement-- if we're past the start and before the end, then we should be displaying the subtitle! Recall that our currentSubtitle variable is an array, so we can't just display it directly. We'll need to loop through it and display each element on a separate line. You mentioned the text() command, so I won't go too in depth here. The tricky bit is the y-coordinate of the text, since it's supposed to be on multiple lines. We want the first element to be above the second, which is above the third, etc. To do that, we'll have the y-coordinate depend on which element we're on, marked by i. We can scale the difference between lines by changing the value of fontSize; that'll just be up to your taste. Know that the number you set it to will be equal to the height of a line in pixels.
Now for the messy bit that I didn't want to explain above. This code depends on String's split() method, which is performed on the string that you want split and takes a string as a parameter that instructs it how to split the string-- a regex. To get the startFrame out of a subtitle line in the file, we need to split it along the curly braces, because those are the dividers between the numbers. First, we'll split the string everywhere that "}{" occurs-- right after the first number (and right before the second). Because split() returns an array, we can reference a single string from it using an index between square braces. We know that the first number will be in the first string return by splitting on "}{", so we'll use index 0. This will return (for example) "{1234", because split() removes the thing you're splitting on. Now we need to just take the substring that occurs after the first character, convert it to an int using int(), and we're done!
For the second number, we can take a similar approach. Let's split on "}{" again, only we'll take the second (index 1) element in the returned array this time. Now, we have something like "9331}Those swings are dang...", which we can split again on "}", choose the first string of that array, convert to int, and we're done! In both of these cases, we're using subtitles[subtitleIndex] as the original String, which represents the raw input of the file that we loaded using loadStrings() at the beginning. Note that during all this splitting, the original string in subtitles is never changed-- split(), substring(), etc only return new sequences and don't modify the string you called it on.
I'll leave it to you to figure out how the last line in that sequence works :)
Finally, you'll see that there are a bunch of backslashes cluttering up the split() calls. This is because split() takes in a regex, not a simple string. Regexs use a lot of special notation which I won't get into here, but if you just passed split() something like "}{", it would try to interpret it and it would not behave as expected. You need to escape the characters, telling split() that you don't want them to be interpreted as special and you just want the characters themselves. To do that, you use a backlash before any character that needs to be escaped. However, the backslash itself is yet another special character, so you need to escape it, too! This results in stuff like "\\{" -- the first backslash escapes the second one, which escapes whatever the third character is. Note that the | character also needs to be escaped.
Sorry for the wall of text! It's nice to see questions asked intelligently and politely, so I thought I'd give a good answer in return.

Related

custom array printing in gdb

I know gdb has several means of exploring data, some of them quite convenient. However, I cannot combine them to get that I need/want. I would like to display some custom string based on the first n values of a big array starting at <PT_arr>, and the last m values of the same array at a distance (in this case) 4096. Looking something like this:
table beginning:
0x804cfe0 <PT_arr>: 0x00100300 0x00200300 0x00300300 0x00400300
table end:
0x804cfe0 <PT_arr+4064>: 0x00500300 0x00600300 0x00700300 0x00800300
printf let's me add custom text (like table beginning)
the examine x gives me that nice alignment, let's me read many elements and group them by byte, words, etc; and shows addresses at the left (which is ideal for my case).
x aligns the content of regions of memory in an easy to read manner with the size and unit parameters. (what I want)
display is constantly printing. (what I want).
The issue with display (manual), is that unlike examine x (manual) it doesn't have a size or unit parameter.
Is there a way to accomplish that?
Thanks.

How to program faster, (generate code from a pattern?)

I frequently run into problems that could be solved with automating code writing, but aren't long enough to justify it as tediously entering each piece is faster.
Here is an example:
Putting lists into dictionaries and things like this. Converting A into B.
A
hotdog HD
hamburger HB
hat H
B
def symbolizeType
case self.type
when "hotdog"
return "HD"
when "hamburger"
return "HB"
when "hat"
return "H"
end
Sure I could come up with something to do this automatically, but it would only make sense if the list was 100+ items long. For a list of 10-20 items, is there a better solution than tediously typing? This is a Ruby example, but I typically run into cases like this all the time. Instead of a case statement, maybe it's a dictionary, maybe it's a list, etc.
My current solution is a python template with the streaming input and output already in place, and I just have to write the parsing and output code. This is pretty good, but is there better? I feel like this would be something VIM macro would excel at, but I'm that experienced with VIM. Can VIM do this easily?

For vim, it'd be a macro running over a list of space separated pairs of words, inserting the first 'when "' bit, the long form word 'hotdog', the ending quote, a newline and 'return "', and then the abbreviation and then final quote, then going back to the list and repeating.
Starting with a register w of:
when "
register r of:
return "
an initial list of:
hotdog HD
hamburger HB
hat H
and a starting file of:
def symbolizeType
case self.type
"newline here"
you can use the following macro at the start of the initial list:
^"ayeeeb"byeo"wp"apa"^Mrb"j
where ^M is a newline.

I do this frequently, and I use a single register and a macro, so I'll share.
Simply pick a register, record your keystrokes, and then replay your keystrokes from the register.
This is a long explanation, but the process is extremely simple and intuitive.
Here are the steps that I would take:
A. The starting text
hotdog HD
hamburger HB
hat H
B. Insert the initial, non-repetitive lines preceding the text to transform
def symbolizeType
case self.type
hotdog HD
hamburger HB
hat H
C. Transform the first line, while recording your keystrokes in a macro
This step I'll write out in detailed sub-steps.
Place the cursor on the first line to transform ("hotdog") and type qa to begin recording your keystrokes as a macro into register a.
Type ^ to move the cursor to the start of the line
Type like you normally would to transform the line to what you want, which for me comes out looking like the following macro
^i^Iwhen "^[ea"^[ldwi^M^Ireturn "^[ea"^[j
Where ^I is Tab, ^[ is Esc, and ^M is Enter.
After the line is transformed to your liking, move your cursor to the next line that you want to transform. You can see this in the macro above with the final j at the end.
This will allow you to automatically repeat the macro while it cycles through each repetitive line.
Stop recording the macro by typing q again.
You can then replay the macro from register a as many times as you like using a standard vim count prefix, in this case two consecutive times starting from the next line to transform.
2#a
This gives the following text
def symbolizeType
case self.type
when "hotdog"
return "HD"
when "hamburger"
return "HB"
when "hat"
return "H"
D. Finally, insert the ending non-repetitive text
def symbolizeType
case self.type
when "hotdog"
return "HD"
when "hamburger"
return "HB"
when "hat"
return "H"
end
Final Comments
This works very quick for any random, repetitive text, and I find it very fluent.
Simply pick a register, record your keystrokes, and then replay your keystrokes from the register.

For things like this I have a few ways of making it easier. One is to use an editor like Sublime Text that allows you to multi-edit a number of things at once, so you can throw in markup with a few keystrokes and convert that into a Hash like:
NAME_TO_CODE = {
hotdog: 'HD',
hamburger: 'HB',
hat: 'H'
}
Not really a whole lot changed there. Your function looks like:
def symbolize_type(type)
NAME_TO_CODE[type.to_sym]
end
Defining this as a data structure has the bonus of being able to manipulate it:
CODE_TO_NAME = NAME_TO_CODE.invert
Now you can do this:
def unsymbolize_type(symbol)
CODE_TO_NAME[symbol.to_s]
end
You can also get super lazy and just parse it on the fly:
NAME_TO_CODE = Hash[%w[
hotdog HD
hamburger HB
hat H
].each_slice(2).to_a]

snippets are like the built-in :abbreviate on steroids, usually with parameter insertions, mirroring, and multiple stops inside them. One of the first, very famous (and still widely used) Vim plugins is snipMate (inspired by the TextMate editor); unfortunately, it's not maintained any more; though there is a fork. A modern alternative (that requires Python though) is UltiSnips. There are more, see this list on the Vim Tips Wiki.
There are three things to evaluate: First, the features of the snippet engine itself, second, the quality and breadth of snippets provided by the author or others; third, how easy it is to add new snippets.

confirm conditional statement applies to >0 observations in Stata

This is something that has puzzled me for some time and I have yet to find an answer.
I am in a situation where I am applying a standardized data cleaning process to (supposedly) similarly structured files, one file for each year. I have a statement such as the following:
replace field="Plant" if field=="Plant & Machinery"
Which was a result of the original code-writing based on the data file for year 1. Then I generalize the code to loop through the years of data. The problem becomes if in year 3, the analogous value in that variable was coded as "Plant and MachInery ", such that the code line above would not make the intended change due to the difference in the text string, but not result in an error alerting the change was not made.
What I am after is some sort of confirmation that >0 observations actually satisfied the condition each instance the code is executed in the loop, otherwise return an error. Any combination of trimming, removing spaces, and standardizing the text case are not workaround options. At the same time, I don't want to add a count if and then assert statement before every conditional replace as that becomes quite bulky.
Aside from going to the raw files to ensure the variable values are standardized, is there any way to do this validation "on the fly" as I have tried to describe? Maybe just write a custom program that combines a count if, assert and replace?

The idea has surfaced occasionally that replace should return the number of observations changed, but there are good reasons why not, notably that it is not a r-class or e-class command any way and it's quite important not to change the way it works because that could break innumerable programs and do-files.
So, I think the essence of any answer is that you have to set up your own monitoring process counting how many values have (or would be) changed.
One pattern is -- when working on a current variable:
gen was = .
foreach ... {
...
replace was = current
replace current = ...
qui count if was != current
<use the result>
}

Parsing text files in Ruby when the content isn't well formed

I'm trying to read files and create a hashmap of the contents, but I'm having trouble at the parsing step. An example of the text file is
put 3
returns 3
between
3
pargraphs 1
4
3
#foo 18
****** 2
The word becomes the key and the number is the value. Notice that the spacing is fairly erratic. The word isn't always a word (which doesn't get picked up by /\w+/) and the number associated with that word isn't always on the same line. This is why I'm calling it not well-formed. If there were one word and one number on one line, I could just split it, but unfortunately, this isn't the case. I'm trying to create a hashmap like this.
{"put"=>3, "#foo"=>18, "returns"=>3, "paragraphs"=>1, "******"=>2, "4"=>3, "between"=>3}
Coming from Java, it's fairly easy. Using Scanner I could just use scanner.next() for the next key and scanner.nextInt() for the number associated with it. I'm not quite sure how to do this in Ruby when it seems I have to use regular expressions for everything.

I'd recommend just using split, as in:
h = Hash[*s.split]
where s is your text (eg s = open('filename').read. Believe it or not, this will give you precisely what you're after.
EDIT: I realized you wanted the values as integers. You can add that as follows:
h.each{|k,v| h[k] = v.to_i}

Selecting random phrase from a list

I've been playing around with a .lua file which passes a random phrase using the following line:
SendChatMessage(GetRandomArgument("text1", "text2", "text3", "text4"), "RAID")
My problem is that I have a lot of phrases and the one line of code is very long indeed.
Is there a way to hold
text1
text2
text3
text3
in a list somewhere else in the code (or externally) and call a random value from the main code. Would make maintaining the list of text options easier.

For lists up to a few hundred elements, then the following will work:
messages = {
"text1",
"text2",
"text3",
"text4",
-- ...
}
SendChatMessage(GetRandomArgument(unpack(messages)), "RAID")
For longer lists, you would be well served to replace GetRandomArgument with GetRandomElement that would take a single table as its argument and return a random entry from the table.
Edit: Olle's answer shows one way that something like GetRandomElement might be implemented. But it used table.getn on every call which is deprecated in Lua 5.1, and its replacement (table.maxn) has a runtime cost proportional to the number of elements in the table.
The function table.maxn is only required if the table in use might have missing elements in its array part. However, in this case of a list of items to choose among, there is likely to be no reason to need to allow holes in the list. If you need to edit the list at run time, you can always use table.remove to remove an item since it will also close the gap.
With a guarantee of no gaps in the array of text, then you can implement GetRandomElement like this:
function GetRandomElement(a)
return a[math.random(#a)]
end
So that you send the message like this:
SendChatMessage(GetRandomElement(messages), "RAID")

You want a table to contain your phrases like
phrases = { "tex1", "text2", "text3" }
table.insert(phrases ,"text4") -- alternative syntax
SendChatMessage(phrases[math.random(table.getn(phrases))], "RAID")
Note: getn gets the size of the table; math.random gets a random number (with a max of the size of the phrases table) and the phrases[] syntax returns the table element at the index inside [].

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Import Subtitles in Processing - processing

Related

custom array printing in gdb

How to program faster, (generate code from a pattern?)

confirm conditional statement applies to >0 observations in Stata

Parsing text files in Ruby when the content isn't well formed

Selecting random phrase from a list

Categories

Resources