I read about NSSpeechRecognizer and found that it can recognize a set of commands beings associated with it in delegate: -speechRecognizer:didRecognizeCommand:
I have a simple question: can this delegate be called for any word spoken by user?? as I think.. only a finite number of words can be associated with it!
Thanks,
Miraaj
It's exactly what it says on the tin: It's for recognizing commands. So, yes, you need to tell it up front what commands it should recognize.
It's not a dictation API. I would guess that if you tried to load up the command list with an English dictionary, you'd make recognition very processor-intensive, slow, and inaccurate.
If you want dictation, you should file an enhancement request to ask for it.
Related
What's a good way to parse HTML in AppleScript?
I haven't dabbled in AppleScript in quite some time, and even when I did it was very minimal and uninvolved, so I don't really think naturally in the language quite yet. But I need to do some string manipulation and parse some HTML (basically some simple screen scraping).
Naturally, I'd like to avoid common pitfalls of HTML parsing. However, this is a temporary script and doesn't need to be particularly robust or supportable. I really just need to scrape specific substrings (from a known starting substring to the next known character) into a file.
I've done plenty of string manipulation in C# and similar languages, but AppleScript is an interesting change of pace to say the least. Can somebody point me to some good resources (Google searches on this subject seem to have a high noise-to-signal ratio), or help me out with some sample code snippets?
The ultimate goal of what I'm doing is to take a pre-determined list of pages, open each one in Safari (I'm doing everything through tell application "Safari"), parse out links which fit a certain pattern, and store all of those links in a file. Then go through that file, open each of those links, parse out more links which fit another pattern, and store all of those links in a file.
(The site is actually owned by someone we're working with, so don't worry about me violating any terms of service or anything like that. But for reasons outside the scope of this question, I'm doing some page scraping in AppleScript.)
I can't say enough good things about Matt Neuburg's AppleScript: the Definitive Guide. Without a doubt the most complete documentation of AppleScript ever done. Matt's also one of my favorite tech writers.
I would also check out this article. It contains a tutorial on how to do this; the example provided there parses HTML data from only one source, but I think it's worth looking at.
When searching for something in Google, if you misspell a word (may be by mistake or may be when you really mean this non-dictionary word), Google says:
"Showing results for ..... Search instead for .......".
I am trying to figure out how this would work.
This basically means being able to find the closest dictionary word to the non-dictionary word entered. How does it work? One way I can guess is :
count no. of instances of each character and then scan dictionary to find a word with same no. of instances of each character (only with +-1 difference). But this will also return anagrams.
Is some kind of probabilistic model of any use here such as Markov etc. I don't understand Markov well enough to throw it around but just a very wild guess.
Any insights?
You're forgetting that google has a lot more information available to it then you do. They track when people type in a word, don't select a result, and then do another search shortly afterwards. They then use this information to suggest better searches for you.
See How does the Google "Did you mean?" Algorithm work? for a fuller explanation.
Note that this approach makes sense when you consider that Google aren't actually doing spell-checking. Instead, they are trying to work out what search term will give you the answer you are looking for. Obviously there is a lot of overlap between this and spell-checking, but it means they are not always trying to correct a search for, e.g., "Flickr".
When you search something which is related to other searches performed earlied closed to yours and got more results, google shows suggest on them.
We are sure that it is not spell checking but it shows what other people queried the related keywords.
Edit: Turns out, I was misled during my initial explorations of the accessibility APIs. Once I found the secure text field in the AX hierarchy, I was easily able to set the value. Not sure what to do with this question beyond that, but I wanted to update this for future searchers.
I'm working on some code that will post keyboard events to targeted applications using the Accessibility APIs. So far, I have been able to write a trivial app that allows me to type in a string value and then post keyboard events with those key codes to the targeted application. In reality, the strings would be read from another location.
What I have not yet been able to figure out is how to ascertain whether and which modifier keys should also be posted. For instance, when I type Hello, world! into my test application, the input is sent to the other application as hello, world1 because I am not yet including the modifier keys to create the upper case H and the exclamation point. This is made doubly complicated by multi-keystroke characters like é or ü. Sending é sends a raw e with no accent for example.
Is there a simple method I am overlooking for discerning the modifiers to combine with a keycode for creating a particular NSString or unichar? If not, does anyone have a suggestion of how to proceed? So far, the best I have come up with is calling UCKeyTranslate with all possible modifier combinations until I find one that matches the unichar I get using -[NSString characterAtIndex:] I'm not sure this is scalable or reliable, though, given the multi-keystroke nature of some characters as noted above.
Thanks in advance!
This probably won't help. But just in case: Is it really necessary to send keyboard events? Because that is going to get really difficult if you need to support, say, Kotoeri.
It's a simple matter to override insertText: and doCommandBySelector: and send the results of the key sequence, rather than the individual keystrokes.
I have found a example which does the trick but it's incomplete:It will not be a general solution in any case ...how can this handle multiple keyboard layouts ?
There is an cgquartz obsolete function to do so: CGPostKeyboardEvent (not sure it's possible to pass only the char?) may be can still be used (marked undocumented with some side effect to but .. ).
EDIT: UCKeyTranslate as a way to build a dictionary. Interesting but how the OS do this? A better answer should be hidden somewhere !
I have a list of airport names and my users have the possibility to enter one airport name to select it for futher processing.
How would you handle misspelled names and present a list of suggestions?
Look up Levenshtein distances to match a correct name against a given user input.
http://norvig.com/spell-correct.html
does something like levenshtein but, because he doesnt go all the way, its more efficient
Employ spell check in your code. The list of words should contain only correct spellings of airports.
This is not a great way to do this. You should either go for a control that provides auto complete option or a drop down as someone else suggested.
Use AJAX if your technology supports.
I know its not what you asked, but if this is an application where getting the right airport is important (e.g. booking tickets) then you might want to have a confirmation stage to make sure you have the right one. There have been cases of people getting tickets for the wrong Sydney, for instance.
It may be better to let the user select from the list of airport names instead of letting them type in their own. No mistakes can be made that way.
While it won't help right away, you could keep track of typos, and see which name they finally enter when a correct name is entered. That way you can track most common typos, and offer the best options.
Adding to Kevin's suggestion, it might be a best of both worlds if you use an input box with javascript autocomplete. such as jquery autocomplete
edit: danish beat me :(
There may be an existing spell-check library you can use. The code to do this sort of thing well is non-trivial. If you do want to write this yourself, you might want to look at dictionary trie's.
One method that may work is to just generate a huge list of possible error words and their corrections (here's an implementation in Python), which you could cache for greater performance.
I want to write a Songbird extension binds the multimedia keys available on all Apple Mac OS X platforms. Unfortunately this isn't an easy google search and I can't find any docs.
Can anyone point me resources on accessing these keys or tell me how to do it?
I have extensive programming experience, but this will be my first time coding in both MacOSX and XUL (Firefox, etc), so any tips on either are welcome.
Please note that these are not regular key events. I assume it must be a different type of system event that I will need to hook or subscribe to.
This blog post has a solution:
http://www.rogueamoeba.com/utm/posts/Article/mediaKeys-2007-09-29-17-00.html
You basically need to subclass NSApplication and override sendEvent,
looking for special scan codes. I don't know what songbird is, but if it's
not a real application then I doubt you'll be able to do this.
Or maybe you can, a simple category may suffice:
#implementation NSApplication(WantMediaKeysCategoryKBye)
- (void)sendEvent: (NSEvent*)event
{
// intercept media keys here
}
#end
Are you sure your multimedia keys are working in your installation? Every single key generates a scan code which is translated into a key code by the kernel. If xev doesn't show you any keycodes I guess those scan codes aren't mapped and so the kernel has no knowledge of them.
http://gentoo-wiki.com/HOWTO_Use_Multimedia_Keys has a nice explanation of finding key codes and offers help on how you can find raw scan codes and translate them into key codes.
xev might help you if you want to find out which codes are being sent by multimedia keys.