Using Google's API to split string into words? - google-api

I'm trying to figure out which API I should use to get Google to intelligently split a string into words.
Input:
thequickbrownfoxjumpsoverthelazydog
Output:
the quick brown fox jumps over the lazy dog
When I go to Google Translate and input the string (with auto-detect language) and click on the "Listen" icon for Google to read out the string, it breaks up the words and reads it out correctly. So, I know they're able to do it.
But what I can't figure out is if it's the API for Google Translate or their Text-To-Speech API that's breaking up the words. Or if there's any way to get those broken up words in an API response somewhere.
Does anyone have experience using Google's APIs to do this?

AFAIK, there isn't an API in Google Cloud that does that specifically, although, it looks like when you translate text using the Translation API it is indeed parsing the concatenated words in the background.
So, as you can't use it with the same source language as the target language, what you could do is translate to any language and then translate back to the original language. This seems a bit overkill though.
You could create a Feature Request to ask for such a feature to be implemented in the NLP API for example.
But, depending on your use case, I suppose that you could also use the method suggested in this other Stackoverflow Answer that uses dynamic programming to infer the location of spaces in a string without spaces.
Another user even made a pip package named wordninja (See second answer on the same post) based on that.
pip3 install wordninja to install it.
Example usage:
$ python
>>> import wordninja
>>> wordninja.split('thequickbrownfoxjumpsoverthelazydog')
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

Related

Translate from a language to English on the console

I work with in a company with various other languages to my own (English) and so I use https://translate.google.com a reasonable amount, but as I am on the terminal a lot, I would find a lot of convenience in being able to do that there than having to open a new google tab. The URL structure is trivial, and this works if put into any browser https://translate.google.com/?sl=fr&tl=en&text=bonjour&op=translate, replace fr by any source language and en by any target language and bonjoun%20mon%20ami by any word/phrase. Ideally, I would like 2x functions in bash:
tt (translate to), tt <target-lang> <English word or phrase to translate to target-lang>
tf (translate from), tf <source-lang> <word or phrase to translate to English>
I have tried for a few days without success with lynx, elinks etc and many searches on commandlinefu and other sites (e.g. https://www.commandlinefu.com/commands/matching/translate-english/dHJhbnNsYXRlIGVuZ2xpc2g=/sort-by-votes), but not found the trick to returning the translated text. Are Google blocking things somehow, and is there a workaround - surely some tool (lynx, elinks, links2) can resolve the text being sent back when we hit the URL, and then we can extract just the translated text using sed, cut, grep etc?
If this is being blocked by cookies or some sign-on requirements, are there alternative console tools or sites to Google Translate that would allow other translation services?
Various translation services have an API, Google Translate has an API, Deepl has an API. I find some are more accurate than others, but this is a matter of personal preference.
https://www.deepl.com/docs-api
https://cloud.google.com/translate/docs/reference/rest/v2/translate
If you want to use it from shell, it is easy enough to cobble a small bash script with curl and jq to process the JSON responses, or better, use Python or Perl which supports all these operations natively.

Google Natural Language python library has problems predicting when certain words in sentence

On the portal, I can insert a sentence and get a score back. If I use the python library I get sentences with no scores. Upon further investigation, it turns out a single word (without punctuation) prevents the prediction. If I replace this word with another it works, if I replace it with 2 words it works, if I replace it with "United States" however, which is different from the original word, I also get no sentiment score. None of this is an issue on the portal so either its the python library or the portal is using a different predictor engine.
Anyone run into this before and have a solution. I am going to have to look at their rest interface now as I have lost confidence in the python library
c# library works fine - way to go google for a shoddy python library

BotComposer, how to iterate through the characters of a string using lg language?

We need to extract a number from a phrase. For example:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
"b3456f" -> "3456"
But we don't found how to iterate through a string using only language generator of the Bot Composer.
We try things like:
join(foreach(createArray("ab c"), x, concat(x, '-')), '')
But with no result... is there any prebuild function that converts a simple string on an array of chars, so we can iterate char by char using foreach?
Thanks!
As far as I know, this currently isn't possible as there's no way to iterate over a string or split a string into a new array by character. I've opened a GitHub issue to request it as an enhancement.
For:
"hey, 1234" -> "1234"
"ok, 4567" -> "4567"
You can use split().
Unfortunately, you're out of luck for your "b3456f" -> "3456" example, unless you know it's going to come in that exact format, in which case, you could use substring().
You could maybe look into using a Regex to do this, if you know the formats will be pretty controlled, but another option is to look at the LUIS language understanding services from Microsoft, which are built exactly for understanding different parts of a text message, especially in a bot context. Here's a link to getting started with this, for C# (on the menu just below in this link, is a Node example if that's what you need).
There's also a tag here on Stack Overflow focused just on LUIS, if you run into trouble or need any more help.
Hope that helps
[Update] I re-read your question and I see now it's about BotComposer, not a custom developed bot. As a result, the sample I linked to is not valid, but LUIS certainly is. I haven't used Bot Composer myself, but LUIS is integrated as part of it - see here.

Matching users with objects based on keywords and activity in Ruby

I have users that have authenticated with a social media site. Now based on their last X (let's say 200) posts, I want to map how much that content matches up with a finite list of keywords.
What would be the best way to do this to capture associated words/concepts (maybe that's too difficult) or just get a score of how much, say, my tweet history maps to 'Walrus' or 'banana'?
Would a naive Bayes work here to separate into 'matches' and 'no match'?
In Python I would say NLTK can easily do it. In Ruby maybe gem called lda-ruby will help you. Whole LDA concept is well explained here - look at Sarah Palin's email for example. There's even the example of an app (not entirely in Ruby, but still) which did that -> github.com/echen/sarah-palin-lda
Or maybe I just say stupid things and that can't help you at all. I'm not an expert ;)
A simple bayes would work in this case, it is highly used to detect if emails are spam or not so for a simple keyword matching it should work pretty well.
For this problem you could also apply a recommendation system where you look for the top recommended keyword for a user (or for a post).
There are a ton of ways for doing this. I would recommend you to read Programming Collective Intelligence. It is explained using python but since you know ruby there should be not problem to understand the code.

What's the quickest way to do complex queries on Twitter using their Ruby API?

I want to create a complex query, e.g. return the first 100 Twitter Users that match the following criteria:
Have greater than X # of followers
Have greater than X # of tweets
Have the string "Rails developer" or "Rails" in their bio
Have tweeted in the last X days.
I was looking through their API docs and it seems so complex to just get something up and running quickly. I don't want to create a full blown app, I just want something simple that will help me do some research.
Am I overthinking this and it should be easy to do via their API (Ruby preferably) ?
I also don't mind it being run locally, and spitting out a text file or a csv file - but also if there is a nice way to have it spit out a nicely formatted HTML page that would be good too.
I just want to get at the data, that's all.
Your best bet is going to be using the GET users/search API method. You can search on "rails" and page through the results discarding any users who don't match your followers/status requirements. It isn't going to be perfect but in general Twitter tries to return popular/relevant users first.

Resources