I heard that we can use the English words to express the number in Mathematica. Like using One hundred to express 100. Which function can do it?
A solution basically equivalent to dreeves's solution (but not available at the time of his answer) would be to call WolframAlpha[] directly from Mathematica (this requires an internet connection). For example,
WolframAlpha["6 million 2 hundred and 12 thousand and fifty two",
{{"Input", 1}, "Plaintext"}]
returns the string
"6212052"
So we can construct the following function that returns the actual number
textToNumber[num_String] :=
Module[{in = WolframAlpha[num, {{"Input", 1}, "Plaintext"}]},
If[StringMatchQ[in, NumberString], ToExpression[in], $Failed]]
It also works with decimals and negative numbers, e.g., textToNumber["minus one point one"].
Note that we could ask for things other than "Plaintext" output. The easiest way to find out what's available is to enter some number, eg,WolframAlpha["twelve"], and explore the options available when you press the ⨁ signs on the right of each "pod". It is also worth exploring the documentation, where you find useful output "formats" such as "MathematicaParse" and "PodIDs".
We can also go in the other direction:
numberToText[num_Integer] := WolframAlpha[ToString[num],
{{"NumberName", 1}, "Plaintext"}]
I couldn't find the right incantations to get the spoken phrase form for non-integers. If someone knows the right spell, or if W|A gains this ability, please feel free to update this answer. It's a shame that SpokenString does not have an option for reading numbers as their spoken phrases.
I see that Wolfram Alpha can do that, so here's a kludgy little function that sends the English string to Wolfram Alpha and parses the result:
w2n[s_String] := ToExpression[StringCases[
Import["http://www.wolframalpha.com/input/?i=" <> StringReplace[s, " "->"+"],
"String"],
RegularExpression["Hold\\[([^\\]]*)\\]"] -> "$1"][[1]]]
Example:
w2n["two million six hundred sixty-six"]
> 2000666
Does Wolfram Alpha provide an actual API? That would be really great!
PS: They have one now but it's expensive: http://products.wolframalpha.com/api/
PPS: I notice that the wolframalpha results page changed a bit and my scraping no longer works. Some variant on that regular expression should work though.
This is the code:
IntegerName[78372112345]
This is the output:
78 billion 372 million 112 thousand 345
not available back in '09...
SemanticInterpretation["one hundred fifty thousand three hunded and six"]
or
Interpreter["SemanticNumber"]["one hundred fifty thousand three hunded and six"]
150306
( notice my spelling error didn't phase it..)
The two functions are not the same by the way,
SemanticInterpretation["six and forty two thousandths"]//N (* 6.042 *)
Interpreter["SemanticNumber"]["six and forty two thousandths"] (*fails*)
Related
In our schools, we have books of the same title by the same author but different ISBN #s. I am working on an inventory list so that we can scan the different ISBNs and then find out what is on hand for a title.
Here is my working spreadsheet demo. The live version will be separated (columns A-D by data that comes in on another sheet (possibly by Google Forms) and a separate sheet (F-J) that does all the math. For convenience / testing, they are all on one sheet.
Essentially, in column F, I would like to sum all the quantities in A where the ISBN's in C match any of the values of G and place it in F.
The formula I am using in F doesn't seem to completely work:
=SUMIF(C:C,arrayformula(split(G2,",")),A:A)
It captures the first match but ignores / doesn't loop over the rest. I have looked at Sumifs and Match and I cannot seem to get any closer with the syntax. I would greatly appreciate if anyone can help me solve this dilemma.
Additionally, I know how to do this with a custom script but I need to avoid that as end users break things for one reason or another and I can't handle the debugging load the way this could possibly be deployed.
Thanks in advance for anyone willing to take a look at this!
~Allan
Try in F2
=sum(query(A:D,"select A where C matches '"& textjoin("|",,split(G2,",")) &"' ",0))
delete everything in F2:F & J2:J and use F2:
=INDEX(IF(G2:G="",,MMULT(IFERROR(VLOOKUP(SPLIT(G2:G, ","), {C:C, A:A}, 2, ), 0),
SEQUENCE(COLUMNS(SPLIT(G2:G, ",")), 1, 1, ))))
in J2 use:
=ARRAYFORMULA(IF(G2:G="",,F2:F*I2:I))
I'm using Gensim with Fasttext Word vectors for return similar words.
This is my code:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('cc.it.300.vec')
words = model.most_similar(positive=['sole'],topn=10)
print(words)
This will return:
[('sole.', 0.6860659122467041), ('sole.Ma', 0.6750558614730835), ('sole.Il', 0.6727924942970276), ('sole.E', 0.6680260896682739), ('sole.A', 0.6419174075126648), ('sole.È', 0.6401025652885437), ('splende', 0.6336565613746643), ('sole.La', 0.6049465537071228), ('sole.I', 0.5922051668167114), ('sole.Un', 0.5904430150985718)]
The problem is that "sole" ("sun", in english) return a series of words with a dot in it (like sole., sole.Ma, ecc...). Where is the problem? Why most_similar return this meaningless word?
EDIT
I tried with english word vector and the word "sun" return this:
[('sunlight', 0.6970556974411011), ('sunshine', 0.6911839246749878), ('sun.', 0.6835992336273193), ('sun-', 0.6780728101730347), ('suns', 0.6730450391769409), ('moon', 0.6499731540679932), ('solar', 0.6437565088272095), ('rays', 0.6423950791358948), ('shade', 0.6366724371910095), ('sunrays', 0.6306195259094238)]
Is it impossible to reproduce results like relatedwords.org?
Perhaps the bigger question is: why does the Facebook FastText cc.it.300.vec model include so many meaningless words? (I haven't noticed that before – is there any chance you've downloaded a peculiar model that has decorated words with extra analytical markup?)
To gain the unique benefits of FastText – including the ability to synthesize plausible (better-than-nothing) vectors for out-of-vocabulary words – you may not want to use the general load_word2vec_format() on the plain-text .vec file, but rather a Facebook-FastText specific load method on the .bin file. See:
https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_vectors
(I'm not sure that will help with these results, but if choosing to use FastText, you may be interesting it using it "fully".)
Finally, given the source of this training – common-crawl text from the open web, which may contain lots of typos/junk – these might be legimate word-like tokens, essentially typos of sole, that appear often enough in the training data to get word-vectors. (And because they really are typo-synonyms for 'sole', they're not necessarily bad results for all purposes, just for your desired purpose of only seeing "real-ish" words.)
You might find it helpful to try using the restrict_vocab argument of most_similar(), to only receive results from the leading (most-frequent) part of all known word-vectors. For example, to only get results from among the top 50000 words:
words = model.most_similar(positive=['sole'], topn=10, restrict_vocab=50000)
Picking the right value for restrict_vocab might help in practice to leave out long-tail 'junk' words, while still providing the real/common similar words you seek.
I'd like to calculate the standard deviation over two fields from the same dataset.
example:
MyFields1 = 10, 10
MyFields2 = 20
What I want now, is the standard deviation for (10,10,20), the expected result is 4.7
In SSRS I'd like to have something like this:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value)
Unfortunately this isn't possible, since (Fields!MyField1.Value + Fields!MyField2.Value) returns a single value and not a list of values. Is there no way to combine two fields from the same dataset into some kind of temporary dataset?
The only solutions I have are:
To create a new Dataset that contains all values from both fields. But this is very annoying because I need about twenty of those and I have six report parameters that need to filter every query. => It's probably getting very slow and annoying to maintain.
Write the formula by hand. But I don't really know how yet. StDevP is not that trivial to me. This is how I did it with Avg which is mathematically simpler:
=(SUM(Fields!MyField1.Value)+SUM(Fields!MyField2.Value))/2
found here: http://social.msdn.microsoft.com/Forums/is/sqlreportingservices/thread/7ff43716-2529-4240-a84d-42ada929020e
Btw. I know that it's odd to make such a calculation, but this is what my customer wants and I have to deliver somehow.
Thanks for any help.
CTDevP is standard deviation.
Such expression works fine for me
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value) but it's deviation from one value (Fields!MyField1.Value + Fields!MyField2.Value) which is always 0.
you can look here for formula:
standard deviation (wiki)
I believe that you need to calculate this for some group (or full dataset), to do this you need set in the CTDevP your scope:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value, "MyDataSet1")
I am new to Ruby and Shoes, I think I have everything. the program appears to work correctly except when I get to the last step. I, enter the loan amount, interest rate, in to edit_lines, when I press the calculate button, it performs the calculations, stores the calculated numbers to a variable. The last step is dividing the total loan (loan and interest) by the length of the loan in months to ge the monthly payment, so I can make a payment table for the entire loan, but I either get in-corredt results or I get no reeults.
I think I converted the integers to floats, etc. , but... not sure. It appears to add, multiply, subtrct, except it will not divide 2 qbjects. If I enter numbers it works ok.
What am I doing wrong. It does seem like it is that difficult. Example code of dividng the values in a varible by the value of another varible?
It looks like you're using eval(), which you almost never, ever want to use. You can do the exact same thing in normal ruby. I'm just guessing right now since the code I can see in your comment is lacking newlines, but I think this code would work:
#numberbox3.text = #totalinterest + #loadamount
#numberbox5.text = #totalloan / #lengthyears
Hope this helps!
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do you implement a “Did you mean”?
I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:
Is there source code available for such a thing or where can I find articles that would help me to build my own?
You should check out Peter Norvigs article about implementing the spell checker in a few lines of python:
How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)
I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.
Disclaimer: This may have just been part of their algorithm
Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:
get_close_matches(word, possibilities[, n][, cutoff])
Return a list of the best "good
enough" matches. word is a sequence
for which close matches are desired
(typically a string), and
possibilities is a list of sequences against which to match
word (typically a list of strings).
Optional argument n (default
3) is the maximum number of close
matches to return; n must be
greater than 0.
Optional argument cutoff (default
0.6) is a float in the range [0,
1]. Possibilities that don't score
at least that similar to word are
ignored.
The best (no more than n) matches
among the possibilities are returned
in a list, sorted by similarity
score, most similar first.
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
['apple', 'ape']
>>> import keyword
>>> get_close_matches('wheel', keyword.kwlist)
['while']
>>> get_close_matches('apple', keyword.kwlist)
[]
>>> get_close_matches('accept', keyword.kwlist)
['except']
Could this library help you?
You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.
You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/
I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.
I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.
AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.
A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.
U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram
Using python ngram module: http://packages.python.org/ngram/index.html
import ngram
G2 = ngram.NGram([ "iis7 configure ftp 7.5",
"ubunto configre 8.5",
"mac configure ftp"])
print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
print i[0], "\t", i[1]
U get:
>>>
String Similarity
"iis7 configure ftp 7.5" 0.76
"mac configure ftp 0.24"
"ubunto configre 8.5" 0.19
take a look at Levenshtein-Automata