Publish game in portuguese language on AirConsole - airconsole

It's possible publish a game in portuguese language on AirConsole, or is required be in english?

The amount of Portuguese speaking users on AirConsole is just a low percentage. The primary language on AirConsole is english. The games should be therefore also in english to reach as many players as possible.
A second option to offer the game in another language is of course no problem.


Speech recognition, Such as Siri

Softwares such as Siri, takes voice command and responds to those questions appropriately(98%). I wanted to know that when we write a software to take input stream of voice signal and to respond to those questions,
Do we need to convert the input into human readable language? such as English?
As in nature we have so many different languages but when we speak, we basically make different noise. That's it. However, we have created the so called alphabet to denote those noise variations.
So, Again my question is when we write speech recognition algorithms, Do we match those noise variation signals with our database or first we convert those noise variation into English and then check what to answer from database?
The "noise variation signals" you are referring to are called phonemes. How a speech recognition system translates these phonemes int words depends on the type of system. Siri is not a grammar based system where you tell the speech recognition system what types of phrases you are expecting based on a set of rules. Since Siri translates speech in an open context it probably uses some type of statistical modeling. A popular statistical model for speech recognition today is the Hidden Markov Model. While there is a database of sorts involved it is not a simple search of groups of phonemes into words. There is a pretty good high level description of the process and issues with translation here.
Apple's Siri Based on Natural Language understanding..
I believe Nuance is behind the Scene.. Refer This Article
Nuance is Leader in Speech recognition system development.
Accuracy of Nuance Dragon Engine is just Amazing...
The Client whom i m working for is Consuming Nuance NOD service for their IVR system...
I have tried Nuance Dragon SDK for Android...
from my experience if you use Nuance you need not to worry about the noise variation etc etc... But when you going for enterprise release of you application Nuance might be costly..
If you are planning to use Power of voice to drive your application Google API is also a better choice...
There are API's like Sphinx and pocket sphinx can also help you better for speech application development.. All the above API will take care of the noise rejection and Converting Speech into text etc etc..
all you need to worry is building your system to understand semantic meaning of the given String or recognized Speech content.. Apple should have very good semantic meaning interpreter. So give a try for Nuance SDK. it is available for Android ,iOS , Windows phone and HTTP Client Versions.
I hope it can help you

Adding Accents to Speech Generation

The first part of this question is now its own, here: Analyzing Text for Accents
Question: How could accents be added to generated speech?
What I've come up with:
I do not mean just accent marks, or inflection, or anything singular like that. I mean something like a full British accent, or a Scottish accent, or Russian, etc.
I would think that this could be done outside of the language as well. Ex: something in Russian could be generated with a British accent, or something in Mandarin could have a Russian accent.
I think the basic process would be this:
Analyze the text
Compare with a database (or something like that) to determine what needs an accent, how strong it should be, etc.
Generate the speech in specified language
Easy with normal text-to-speech processors.
Determine the specified accent based on the analyzed text.
This is the part in question.
I think an array of amplitudes and filters would work best for the next step.
Mesh speech and accent.
This would be the easy part.
It could probably be done by multiplying the speech by the accent, like many other DSP methods do.
This is really more of a general DSP question, but I'd like to come up with a programatic algorithm to do this instead of a general idea.
This question isn't really "programming" per se: It's linguistics. The programming is comparatively easy. For the analysis, that's going to be really difficult, and in truth you're probably better off getting the user to specify the accent; Or are you going for an automated story reader?
However, a basic accent is doable with modern text-to speech. Are you aware of the international phonetic alphabet?
It basically lists all the sounds a human voice can possibly make. An accent is then just a mapping (A function) from the alphabet to itself. For instance, to make an American accent sound British to an American person (Though not sufficient to make it sound British to a British person), you can de-rhotacise all the "r" sounds in the middle of a word. So for instance the alveolar trill would be replaced with the voiced uvular fricative. (Lots of corner cases to work out just for this).
Long and short: It's not easy, which is probably why no-one has done it. I'm sure a couple of linguistics professors out their would say its impossible. But that's what linguistics professors do. But you'll basically need to read several thick textbooks on accents and pronunciation to make any headway with this problem. Good luck!
What is an accent?
An accent is not a sound filter; it's a pattern of acoustic realization of text in a language. You can't take a recording of American English, run it through "array of amplitudes and filters", and have British English pop out. What DSP is useful for is in implementing prosody, not accent.
Basically (and simplest to model), an accent consists of rules for phonetic realization of a sequence of phonemes. Perception of accent is further influenced by prosody and by which phonemes a speaker chooses when reading text.
Speech generation
The process of speech generation has two basic steps:
Text-to-phonemes: Convert written text to a sequence of phonemes (plus suprasegmentals like stress, and prosodic information like utterance boundaries). This is somewhat accent-dependent (e.g. the output for "laboratory" differs between American and British speakers).
Phoneme-to-speech: given the sequence of phonemes, generate audio according to the dialect's rules for phonetic realizations of phonemes. (Typically you then combine diphones and then adjust acoustically the prosody). This is highly accent-dependent, and it is this step that imparts the main quality of the accent. A particular phoneme, even if shared between two accents, may have strikingly different acoustic realizations.
Normally these are paired. While you could have a British-accented speech generator that uses American pronunciations, that would sound odd.
Generating speech with a given accent
Writing a text-to-speech program is an enormous amount of work (in particular, to implement one common scheme, you have to record a native speaker speaking each possible diphone in the language), so you'd be better off using an existing one.
In short, if you want a British accent, use a British English text-to-phoneme engine together with a British English phoneme-to-speech engine.
For common accents like American and British English, Standard Mandarin, Metropolitan French, etc., there will be several choices, including open-source ones that you will be able to modify (as below). For example, look at FreeTTS and eSpeak. For less common accents, existing engines unfortunately may not exist.
Speaking text with a foreign accent
English-with-a-foreign-accent is socially not very prestigious, so complete systems probably don't exist.
One strategy would be to combine an off-the-shelf text-to-phoneme engine for a native accent with a phoneme-to-speech engine for the foreign language. For example, a native Russian speaker that learned English in the U.S. would plausibly use American pronunciations of words like laboratory, and map its phonemes onto his native Russian phonemes, pronouncing them as in Russian. (I believe there is a website that does this for English and Japanese, but I don't have the link.)
The problem is that the result is too extreme. A real English learner would attempt to recognize and generate phonemes that do not exist in his native language, and would also alter his realization of his native phonemes to approximate the native pronunciation. How closely the result matches a native speaker of course varies, but using the pure foreign extreme sounds ridiculous (and mostly incomprehensible).
So to generate plausible American-English-with-a-Russian-accent (for instance), you'd have to write a text-to-phoneme engine. You could use existing American English and Russian text-to-phoneme engines as a starting point. If you're not willing to find and record such a speaker, you could probably still get a decent approximation using DSP to combine the samples from those two engines. For eSpeak, it uses formant synthesis rather than recorded samples, so it might be easier to combine information from multiple languages.
Another thing to consider is that foreign speakers often modify the sequence of phonemes under influence by the phonotactics of their native language, typically by simplifying consonant clusters, inserting epenthetic vowels, or diphthongizing or breaking vowel sequences.
There is some literature on this topic.

How to get an application's GUI translated into other languages

I'd like to translate my GUI into other languages. Unfortunately I don't speak Mandarin, Spanish, Arabic, or any common language other than English.
The technical hurdles are no problem... what I'm wondering is: How do you get the actual translations?
Amazon's Mechanical Turk? Google Translate? Pay an actual translation company?
What you're looking for is called "Localization". Generally, you have to do the actual translating part on your own, or find people to do it for you. Translating text is just too ambiguous for most software to do it reliably.
You can try as well. There's quite a bit of translation work going on there really cheap.
Or you can just blog about it, or depending on your application's availability give the users the possibility to contribute voluntarily.
Your going to have to hire a native speaker who speaks your native language as well.
Another thing to consider is that your layout will likely have to change. Buttons that used to be big enough may be too narrow. Users may expect to see things in a right to left fashion and so on. There's a lot involved to do a proper localization.
There are companies that specialize in this kind of thing. Maybe one of them would be the way to go.
If you can afford it you can hire people who specifically translate software. Search for "software localization" and look at professional localization services companies (I found few on first page in google). They normally have bunch of translators for dozens of languages on payroll.
This will cost you though.

Why isn't speech recognition advancing? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's so difficult about the subject that algorithm designers are having a hard time tackling it?
Is it really that complex?
I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case?
Auditory processing is a very complex task. Human evolution has produced a system so good that we don't realize how good it is. If three persons are talking to you at the same time you will be able to focus in one signal and discard the others, even if they are louder. Noise is very well discarded too. In fact, if you hear human voice played backwards, the first stages of the auditory system will send this signal to a different processing area than if it is real speech signal, because the system will regard it as "no-voice". This is an example of the outstanding abilities humans have.
Speech recognition advanced quickly from the 70s because researchers were studying the production of voice. This is a simpler system: vocal chords excited or not, resonation of vocal tractus... it is a mechanical system easy to understand. The main product of this approach is the cepstral analysis. This led automatic speech recognition (ASR) to achieve acceptable results. But this is a sub-optimal approach. Noise separation is quite bad, even when it works more or less in clean environments, it is not going to work with loud music in the background, not as humans will.
The optimal approach depends on the understanding of the auditory system. Its first stages in the cochlea, the inferior colliculus... but also the brain is involved. And we don't know so much about this. It is being a difficult change of paradigm.
Professor Hynek Hermansky compared in a paper the current state of the research with when humans wanted to fly. We didn't know what was the secret —The feathers? wings flapping?— until we discovered Bernoulli's force.
Because if people find it hard to understand other people with a strong accent why do you think computers will be any better at it?
I remember reading that Microsoft had a team working on speech recognition, and they called themselves the "Wreck a Nice Beach" team (a name given to them by their own software).
To actually turn speech into words, it's not as simple as mapping discrete sounds, there has to be an understanding of the context as well. The software would need to have a lifetime of human experience encoded in it.
This kind of problem is more general than only speech recognition.
It exists also in vision processing, natural language processing, artificial intelligence, ...
Speech recognition is affected by the semantic gap problem :
The semantic gap characterizes the
difference between two descriptions of
an object by different linguistic
representations, for instance
languages or symbols. In computer
science, the concept is relevant
whenever ordinary human activities,
observations, and tasks are
transferred into a computational
Between an audio wave form and a textual word, the gap is big,
Between the word and its meaning, it is even bigger...
beecos iyfe peepl find it hard to arnerstand uvver peepl wif e strang acsent wie doo yoo fink compootrs wyll bee ani bettre ayt it?
I bet that took you half a second to work out what the hell I was typing and all Iw as doing was repeating Simons answer in a different 'accent'. The processing power just isn't there yet but it's getting there.
The variety in language would be the predominant factor, making it difficult. Dialects and accents would make this more complicated. Also, context. The book was read. The book was red. How do you determine the difference. The extra effort needed for this would make it easier to just type the thing in the first place.
Now, there would probably be more effort devoted to this if it was more necessary, but advances in other forms of data input have come along so quickly that it is not deemed that necessary.
Of course, there are areas where it would be great, even extremely useful or helpful. Situations where you have your hands full or can't look at a screen for input. Helping the disabled etc. But most of these are niche markets which have their own solutions. Maybe some of these are working more towards this, but most environments where computers are used are not good candidates for speech recognition. I prefer my working environment to be quiet. And endless chatter to computers would make crosstalk a realistic problem.
On top of this, unless you are dictating prose to the computer, any other type of input is easier and quicker using keyboard, mouse or touch. I did once try coding using voice input. The whole thing was painful from beginning to end.
Because Lernout&Hauspie went bust :)
(sorry, as a Belgian I couldn't resist)
The basic problem is that human language is ambiguous. Therefore, in order to understand speech, the computer (or human) needs to understand the context of what is being spoken. That context is actually the physical world the speaker and listener inhabit. And no AI program has yet demonstrated having adeep understanding of the physical world.
Speech synthesis is very complex by itself - many parameters are combined to form the resulting speech. Breaking it apart is hard even for people - sometimes you mishear one word for another.
Most of the time we human understand based on context. So that a perticular sentence is in harmony with the whole conversation unfortunately computer have a big handicap in this sense. It is just tries to capture the word not whats between it.
we would understand a foreigner whose english accent is very poor may be guess what is he trying to say instead of what is he actually saying.
To recognize speech well, you need to know what people mean - and computers aren't there yet at all.
You said it yourself, algorithm designers are working on it... but language and speech are not an algorithmic constructs. They are the peak of the development of the highly complex human system involving concepts, meta-concepts, syntax, exceptions, grammar, tonality, emotions, neuronal as well as hormon activity, etc. etc.
Language needs a highly heuristic approach and that's why progress is slow and prospects maybe not too optimistic.
I once asked a similar question to my instructor; i asked him something like what challenge is there in making a speech-to-text converter. Among the answers he gave, he asked me to pronounce 'p' and 'b'. Then he said that they differ for a very small time in the beginning, and then they sound similar. My point is that it is even hard to recognize what sound is made, recognizing voice would be even harder. Also, note that once you record people's voices, it is just numbers that you store. Imagine trying to find metrics like accent, frequency, and other parameters useful for identifying voice from nothing but input such as matrices of numbers. Computers are good at numerical processing etc, but voice is not really 'numbers'. You need to encode voice in numbers and then do all computation on them.
I would expect some advances from Google in the future because of their voice data collection through 1-800-GOOG411
It's not my field, but I do believe it is advancing, just slowly.
And I believe Simon's answer is somewhat correct in a way: part of the problem is that no two people speak alike in terms of the patterns that a computer is programmed to recognize. Thus, it is difficult to analysis speech.
Computers are not even very good at natural language processing to start with. They are great at matching but when it comes to inferring, it gets hairy.
Then, with trying to figure out the same word from hundreds of different accents/inflections and it suddenly doesn't seem so simple.
Well I have got Google Voice Search on my G1 and it works amazingly well. The answer is, the field is advancing, but you just haven't noticed!
If speech recognition was possible with substantially less MIPS than the human brain, we really could talk to the animals.
Evolution wouldn't spend all those calories on grey matter if they weren't required to do the job.
Spoken language is context sensitive, ambiguous. Computers don't deal well with ambiguous commands.
I don't agree with the assumption in the question - I have recently been introduced to Microsoft's speech recognition and am impressed. It can learn my voice after a few minutes and usually identifies common words correctly. It also allows new words to be added. It is certainly usable for my purposes (understanding chemistry).
Differentiate between recognising the (word) tokens and understanding the meaning of them.
I don't yet know about other languages or operating systems.
The problem is that there are two types of speech recognition engines. Speaker-trained ones such as Dragon are good for dictation. They can recognize almost any spoke text with fairly good accuracy, but require (a) training by the user, and (b) a good microphone.
Speaker-independent speech rec engines are most often used in telephony. They require no "training" by the user, but must know ahead of time exactly what words are expected. The application development effort to create these grammars (and deal with errors) is huge. Telephony is limited to a 4Khz bandwidth due to historical limits in our public phone network. This limited audio quality greatly hampers the speech rec engines' ability to "hear" what people are saying. Digits such as "six" or "seven" contain an ssss sound that is particularly hard for the engines to distinguish. This means that recognizing strings of digits, one of the most basic recognition tasks, is problematic. Add in regional accents, where "nine" is pronounced "nan" in some places, and accuracy really suffers.
The best hope are interfaces that combine graphics and speech rec. Think of an IPhone application that you can control with your voice.

What is a good game involving coding?

I remember the days of Shadowrun that got me excited about hacking. There is CodeWar and LightBot which are both fun (though CoreWar is a little dated). What other games are there involving coding that are fun and challenging that can be used to get someone excited about coding or flex their chops or even learn the basics?
How about RoboCode
You code your tank in Java and let it loose in the 'ring' with other coded tanks. People got pretty into coding strategy, targeting, etc. IBM sponsored it and came up with some nice introductory programming tutorials to get you started.
Here's a great article to get the feel for it:
Rock 'em, sock 'em Robocode!
Uplink isn't so much a coding game, but it is a great game that makes you feel like a hacker.
There's a whole bunch of "drag-and-drop" coding games, where you make a little thing (usually a robot) solve some puzzle by giving it a list of instructions. They're only vaguely similar to actual coding, but they are still pretty fun.
The Codex of Alchemical Engineering
Not sure if it's considered a "game", but the TopCoder Competitions are fun, and come in various sizes and commitment levels. You can also work on puzzles from the archives for some good programming practice.
The Python Challenge is like those "look at the html source" riddles, but requires a bit of programming to get the answers.
When I was a kid I played "Rocky's Boots", where you had to hook up logic gates to solve puzzles. That had a big impact on my thinking.
Core Wars.
Here's something that allows you to make games and animations: Alice
If you're looking for a board game, you might want to have a look at Robo Rally. In this game 2-8 people are trying to maneuver their robots over the board as quickly as possible, dodging deadly obstacles and trying to shove other people robots into obstacles on the way.
Each game round all players have to "code" the program the robot is going to execute in the next round and then the robots just follow their program. The programs are just five instructions long, but still creating an optimal program can be quite tricky. There usually is very little luck involved, which is why I really like this game.
Similar to Uplink is HackWars. Instead of point and click hacking though, it's multiplayer and you can write your own attack scripts. There's actually an included runtime for writing 2d/3d games and there's a bunch of different places to hook in scripts (for defense, banking, in game website, etc).
Scripting language looks similar to Java.
How about Ai-Board
You play it on your phone/tablet.
It's IDE is built into the game.
It has a built-in node-based visual programming language, whose code-behind is a python-like language.
You write the code, that drives the Ai, that moves the pawns, that plays the game, all still on your mobile device.
YouTube Video: Visual Programming Time-lapse on a mobile device
It comes with quite a few tutorials that introduce the player to programming, genetic algorithms etc, and you get a step-by-step walk-through of all these methods.
It also comes with ready-made scripts that work right out of the box, and are ready for you to copy into your free 'Dev' & 'Test' envs,... that you can tweak them to your heart's content, knowing that you can always revert to the original at any time.
The in-built machine learning engine allows you to
train your AiBot to play the board-game,
play your AiBot against their in-built Ai
play against your own AiBot
breed your AiBot (Genetic Algorithm)
fine-tune your AiBot (Back-Propagation)
...debug your AiBot, and so on.
YouTube Video: Machine Learning is mobile!
Its currently in BETA testing, but soon to be released, and everything described comes for free.
In addition, there are single and multi-player modes as well, but it is primarily a game about coding, coming complete "...with batteries included!"
Project Euler