How to detect numerical value of a text? - ruby

We have data for survey question (e.g. rate us between 1-5) that's supposed to be numerical. However, we find that the response also includes
👍 repeated 5 times
❤️ repeated 4 times
Great!
four
3 and a half
I'd like a way to turn the user response into a numerical value. e.g. the text above should translate into 5, 4, 5, 4, 3.5 respectively. Obviously this won't work 100% of the time so I'm looking for an optimal solution (perhaps a text analysis approach) that gets me over 80%.

If you are solely looking to turn THESE SPECIFIC responses into numerical values, then you can pass them through a series of if statements in a function:
def inputToNumber(string)
#thumbs up emoji
if string == "\u{1f44d}"
return 5
#the word four
elsif string == "four"
return 4
#etc., etc. with if statements for your other cases
end
end
But it might make more sense for you to only allow numeric answers to begin with, because:
You can't predict every possible written response
Someone could input malicious code
You didn't provide your code to show how you are accepting input so I can't really offer you specific solutions, but you can look here for some suggestions: Accept only numeric input
Good luck with your project.

Related

Yahtzee 3 of a Kind

Here is my situation, i am currently creating a Yahtzee game using Turbo Pascal Language in Lazarus IDE and i am up to the scoring side of the developement, i have already completed the Lower section of scoring and i have started the Higher section but i need some help writting a procedure to check for a three of a kind, my initial thought was to use an array and load the random numbers for the dice values and then use a loop function to check for 3 equal numbers but i'm not very confident in this area. Could i get some help ? I'm not asking for code, although it would be helpful, just a push in the right direction.
My dice integer value variables are, "Dice1" , "Dice2" , "Dice3" , "Dice4" , "Dice5" , "Dice6"
I think the conceptually simplest approach is to have an array of six counters - one for each possible value - that you initialize to zero and then loop over your dice array and increment the counters with each die's value.
You can then check if any of the counts becomes 3 (or more).
Or sort and then iterate to see if you have 3 same values in a row. The sorted array with dice values is also usable for the other detections like street, Carré (four of a kind), Yathzee etc.

Training neural network in Ruby

I'm a total beginner when it comes to neural networks. I've been wrestling with ruby-fann and ai4r all day and unfortunately I don't have anything to show for it, so I figured I would come onto Stack Overflow and ask the knowledgeable people here.
I have a set of samples -- each day has one data point, but they don't fit any clear pattern that I've been able to figure out (I tried a couple regressions). Still, I think it would be neat to see if there was any way to predict the data going into the future just from the date, and I thought a neural network would be a good way to generate a function that could hope to express that relationship.
The dates are DateTime objects and the data points are decimal numbers, like 7.68. I've been converting the DateTime objects to floats and then dividing by 10,000,000,000 to get a number between 0 and 1, and I've been dividing the decimal numbers by 1,000 to also get a number between 0 and 1. I have over a thousand samples... here's what a short excerpt looks like:
[
["2012-03-15", "7.68"],
["2012-03-14", "4.221"],
["2012-03-13", "12.212"],
["2012-03-12", "42.1"]
]
Which when transformed looks like this:
[
[0.13317696, 0.000768],
[0.13316832, 0.0004221],
[0.13315968, 0.0012212],
[0.13315104, 0.00421]
]
I kind of wish this transformation weren't necessary, but I digress. The problem is that both ai4r and ruby-fann return one constant number, generally something in the middle of the range of the samples, when I run them. Here's the code for ruby-fann:
#fann = RubyFann::Standard.new(:num_inputs=>1, :hidden_neurons=>[3, 3], :num_outputs=>1)
training_data = RubyFann::TrainData.new(:inputs => formatted_data.collect{|d| [d.first]}, :desired_outputs => formatted_data.collect{|d| [d.last]})
#fann.train_on_data(training_data, 1000, 1, 0.0001)
#fann.run([DateTime.now.to_f / 10000000000.0]) # Always something random, and always the same number no matter what date I request it for
And for ai4r:
#ai4r = Ai4r::NeuralNetwork::Backpropagation.new([1, 3, 3, 1])
1000.times do
formatted_data.each do |data|
#ai4r.train(data.first, data.last)
end
end
#ai4r.eval([DateTime.now.to_f / 10000000000.0]) # A different result frmo above, but always seemingly random and the same for any requested date
I feel like I'm missing something really basic here. I know this is a rather open-ended question but if anyone could help me figure out how I'm improperly teaching my neural networks, I'd really appreciate it!
alfa has a good point in his comment, an alternative ways of using the NN might be more appropriate.
It depends on the problem, but if the day's value is even partly a function of
the previous days' values, treating this as a time series might yield better
results.
You would then instead teach the NN to produce the day's value as a function
of a window of, say, the previous ten days values; you could also keep the date
parameter as a real input scale between [0, 1] as you believe it has a significant effect
on the day's value.

Need help psuedocoding a dice game and dont know where to even start [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am a beginner IT student and doing a project for my programming logic and design class. I need to create a psuedocode for a dice game that allows you 2 rolls with 5 dice. On the first roll you get to pick 1 die to keep. The computer then rolls the other 4 dice and calculates you're score based on what you rolled. There are 3 rolls per game and the total score is displayed. Rolling nothing takes points away. The scoring is: 2 of a kind=50 points, 3 of a kind=75 points, 4 of a kind=100 points and nothing subtracts 50 points.
The whole problem I have is I dont even know where to start. I think I need this to repeat 3 times, but what variables do set? Please someone help me, I cant really ask my instructor because he is outside smoking the whole class and everything I have learned about this class mostly came from the internet and reading the book. I dont want to fail this class...someone please help me through this???
First of all don't panic. What you are about to do is break the task down into small steps.
Pseudo-code is not really code - you can't use it directly as a language, but instead it is just plain english to describe what it is you are doing and the flow of events.
So what are the initial steps to get you started?
Ask yourself what are the facts, what do you know exist in advance. These are the "declarations" that you make.
You have five dice. Each is a seperate object so each gets it's own variable declaration
dice_1
dice_2
dice_3
dice_4
dice_5
Next decide if each die has an initial value
dice_1 initial value = 0
etc...
Next you know that you have to throw the dice a number of times. Throwing is a variable with an initial value
turns initial value = 2
turns_counter initial value = 2
You should be getting the idea now. Are there other things you should declare in advance? I think so!
Next you have to decide what it is you are doing step by step. Is it just a sequence of events or is it repeating? If it's repeating how do you get it to stop?
While turns_counter is less than 2
Repeat the following:
turns_counter = turns_counter + 1
if turns_counter = 2
Throw. Collect_result. Sum_result.
else
Throw. Collect_result. Sum_result. Remove_a_dice.
endif.
perhaps you have to tell the reusable code which objects they are going to be working with? These are parameters that you pass to the reusable code Throw(dice_1) perhaps also you need to update some variables that you created? do it in the reusable code with or without passing them as parameters.
This is by no means complete or perfect, but you should get the idea about what's going on and how to break it down. It could take quite a while to do.
Most languages provide a pseudo-random number generator function that returns a random number within a certain range. I would start by figuring out which language you'll use and which function it provides.
Once you have that, you will need to call it for each roll of each dice. If you are rolling 5 dice, you would call it 5 times. And you would call it 5 more times for a second roll.
That's a start anyway.
You have already almost answered the question by simply writing it down here. There is no strict definition of what pseudocode is. Why don't you start by re-writing what you've described here as a sequence of steps. Then, for each step simply refine that step further until you think you've made it as fine-grain as you like.
You could start with something like this:
Roll 5 dice.
Pick 1 die to keep.
Rolls the other 4 dice
Calculate the score.
// etc...
Quite weird to think that it's easier to ask SO than your instructor! :)
The easiest way to get started on this is to not rigorously bind yourself to the constraint of a specific language, or even to pseudocode. Simply, in natural English, write out how you would do this. Imagine that YOU are the computer, and somebody wants to play the game with you. Just imagine, in very specific detail, what you would do at each potential step, i.e.
Give the user 5 dice
Ask the user to roll them
From that roll, allow the user to pick one die to keep
...etc. Once you have done this, and you are sure it is correct, start transforming it into pseudo code by thinking about what a computer would need to do to solve this problem. For instance, you'll need a variable keeping track of how many points the user as, as well as how many total rolls have occurred. If you were very specific in your English description of the problem, this should mean you basically only need to plug pseudo code into a few sentences you already have - in other words, you're just substituting one type of pseudo code for another.
I'd like to help, but straight-up providing the pseudo code wouldn't be very helpful to you. One of the hardest steps in beginning programming is learning to break a problem down into its constituent elements. That type of granular thinking is unintuitive at first, but gets easier the more time you spend on it.
Well, pseudo-code, in my experience, is best drawn up when you pretend you're writing up the work for someone else to do:
THINGS WE NEED
Dice
Players
Score
THINGS WE TRACK
Dice rolls
Player score
THINGS WE KNOW
(These are also called constants)
Nothing (-50)
2 of a kind (+50)
3 of a kind (+75)
4 of a kind (+100)
All of these are vital tools to getting started. And...well, asking questions on stackoverflow.
Next, define your "actions" (things we do), which utilizes the above known things that we will need.
I would start the same place I always do: creating our things.
def player():
"""Create a new player"""
def dice():
"""Creates 4 new, 6 sided dice"""
def welcome():
"""Welcome player by name, give option to quit"""
def game():
"""Initialize number of turns (start at 0)"""
def humanturn():
"""Roll dice, display, ask which one they'll keep"""
def compturn():
"""Roll four dice"""
def check():
"""Check for any matches in the dice"""
def score():
"""Tally up the score for any matches"""
def endturn():
"""Update turn(s), update total score"""
def gameover():
"""Display name, total score, ask for retry"""
def quit():
"""Quit the game"""
Those are your components, all fleshed out in a very procedural manner. There are many other ways to do this that are much better, but for now you're just writing the skeleton of an idea. You may be tempted to combine many of these methods together when you're ready to start coding, but it's a good idea to separate everything until you're confident you won't get lost chasing down a bug.
Good luck!

How to spot and analyse similar patterns like Excel does?

You know the functionality in Excel when you type 3 rows with a certain pattern and drag the column all the way down Excel tries to continue the pattern for you.
For example
Type...
test-1
test-2
test-3
Excel will continue it with:
test-4
test-5
test-n...
Same works for some other patterns such as dates and so on.
I'm trying to accomplish a similar thing but I also want to handle more exceptional cases such as:
test-blue-somethingelse
test-yellow-somethingelse
test-red-somethingelse
Now based on this entries I want say that the pattern is:
test-[DYNAMIC]-something
Continue the [DYNAMIC] with other colours is whole another deal, I don't really care about that right now. I'm mostly interested in detecting the [DYNAMIC] parts in the pattern.
I need to detect this from a large of pool entries. Assume that you got 10.000 strings with this kind of patterns, and you want to group these strings based on similarity and also detect which part of the text is constantly changing ([DYNAMIC]).
Document classification can be useful in this scenario but I'm not sure where to start.
UPDATE:
I forgot to mention that also it's possible to have multiple [DYNAMIC] patterns.
Such as:
test_[DYNAMIC]12[DYNAMIC2]
I don't think it's important but I'm planning to implement this in .NET but any hint about the algorithms to use would be quite helpful.
As soon as you start considering finding dynamic parts of patterns of the form : <const1><dynamic1><const2><dynamic2>.... without any other assumptions then you would need to find the longest common subsequence of the sample strings you have provided. For example if I have test-123-abc and test-48953-defg then the LCS would be test- and -. The dynamic parts would then be the gaps between the result of the LCS. You could then look up your dynamic part in an appropriate data structure.
The problem of finding the LCS of more than 2 strings is very expensive, and this would be the bottleneck of your problem. At the cost of accuracy you can make this problem tractable. For example, you could perform LCS between all pairs of strings, and group together sets of strings having similar LCS results. However, this means that some patterns would not be correctly identified.
Of course, all this can be avoided if you can impose further restrictions on your strings, like Excel does which only seems to allow patterns of the form <const><dynamic>.
finding [dynamic] isnt that big of deal, you can do that with 2 strings - just start at the beginning and stop when they start not-being-equals, do the same from the end, and voila - you got your [dynamic]
something like (pseudocode - kinda):
String s1 = 'asdf-1-jkl';
String s2= 'asdf-2-jkl';
int s1I = 0, s2I = 0;
String dyn1, dyn2;
for (;s1I<s1.length()&&s2I<s2.length();s1I++,s2I++)
if (s1.charAt(s1I) != s2.charAt(s2I))
break;
int s1E = s1.length(), s2E = s2.length;
for (;s2E>0&&s1E>0;s1E--,s2E--)
if (s1.charAt(s1E) != s2.charAt(s2E))
break;
dyn1 = s1.substring(s1I, s1E);
dyn2 = s2.substring(s2I, s2E);
About your 10k data-sets. You would need to call this (or maybe a little more optimized version) with each combination to figure out your patten (10k x 10k calls). and then sort the result by pattern (ie. save the begin and the ending and sort by these fields)
I think what you need is to compute something like the Levenshtein distance, to find the group of similar strings, and then in each group of similar strings, you indentify the dynamic part in a typical diff-like algorithm.
Google docs might be better than excel for this sort of thing, believe it or not.
Google has collected massive amounts of data on sets - for example the in the example you gave it would recognise the blue, red, yellow ... as part of the set 'colours'. It has far more complete pattern recognition than Excel so would stand a better chance of continuing the pattern.

Best way to generate order numbers for an online store?

Every order in my online store has a user-facing order number. I'm wondering the best way to generate them. Criteria include:
Short
Easy to say over the phone (e.g., "m" and "n" are ambiguous)
Unique
Checksum (overkill? Useful?)
Edit: Doesn't reveal how many total orders there have been (a customer might find it unnerving to make your 3rd order)
Right now I'm using the following method (no checksum):
def generate_number
possible_values = 'abfhijlqrstuxy'.upcase.split('') | '123456789'.split('')
record = true
while record
random = Array.new(5){possible_values[rand(possible_values.size)]}.join
record = Order.find(:first, :conditions => ["number = ?", random])
end
self.number = random
end
As a customer I would be happy with:
year-month-day/short_uid
for example:
2009-07-27/KT1E
It gives room for about 33^4 ~ 1mln orders a day.
Here is an implementation for a system I proposed in an earlier question:
MAGIC = [];
29.downto(0) {|i| MAGIC << 839712541[i]}
def convert(num)
order = 0
0.upto(MAGIC.length - 1) {|i| order = order << 1 | (num[i] ^ MAGIC[i]) }
order
end
It's just a cheap hash function, but it makes it difficult for an average user to determine how many orders have been processed, or a number for another order. It won't run out of space until you've done 230 orders, which you won't hit any time soon.
Here are the results of convert(x) for 1 through 10:
1: 302841629
2: 571277085
3: 34406173
4: 973930269
5: 437059357
6: 705494813
7: 168623901
8: 906821405
9: 369950493
10: 638385949
At my old place it was the following:
The customer ID (which started at 1001), the sequence of the order they made then the unique ID from the Orders table. That gave us a nice long number of at least 6 digits and it was unique because of the two primary keys.
I suppose if you put dashes or spaces in you could even get us a little insight into the customer's purchasing habits. It isn't mind boggling secure and I guess a order ID would be guessable but I am not sure if there is security risk in that or not.
Ok, how about this one?
Sequentially, starting at some number (2468) and add some other number to it, say the day of the month that the order was placed.
The number always increases (until you exceed the capacity of the integer type, but by then you probably don't care, as you will be incredibly successful and will be sipping margaritas in some far-off island paradise). It's simple enough to implement, and it mixes things up enough to throw off any guessing as to how many orders you have.
I'd rather submit the number 347 and get great customer service at a smaller personable website than: G-84e38wRD-45OM at the mega-site and be ignored for a week.
You would not want this as a system id or part of a link, but as a user-friendly number it works.
Douglas Crockford's Base32 Encoding works superbly well for this.
http://www.crockford.com/wrmg/base32.html
Store the ID itself in your database as an auto-incrementing integer, starting at something suitably large like 100000, and simply expose the encoded value to the customer/interface.
5 characters will see you through your first ~32 million orders, whilst performing very well and satisfying most of these requirements. It doesn't allow for the exclusion of similar sounding characters though.
Rather than generating and storing a number, you might try creating an encrypted version that would not reveal the number of orders in the system. Here's an article on exactly that.
Something like this:
Get sequential order number. Or, maybe, an UNIX timestamp plus two or three random digits (when two orders are placed at the same moment) is fine too.
Bitwise-XOR it with some semi-secret value to make number appear "pseudo-random". This is primitive and won't stop those who really want to investigate how many orders you have, but for true "randomness" you need to keep a (large) permutation table. Or you'll need to have large random numbers, so you won't be hit by the birthday paradox.
Add checkdigit using Verhoeff algorithm (I'm not sure it will have such a good properties for base33, but it shouldn't be bad).
Convert the number to - for example - base 33 ("0-9A-Z", except for "O", "Q" and "L" which can be mistaken with "0" and "1") or something like that. Ease of pronouncation means excluding more letters.
Group the result in some visually readable pattern, like XXX-XXX-XX, so users won't have to track the position with their fingers or mouse pointers.
Sequentially, starting at 1? What's wrong with that?
(Note: This answer was given before the OP edited the question.)
Just one Rube Goldberg-style idea:
You could generate a table with a random set of numbers that is tied to a random period of time:
Time Period Interleaver
next 2 weeks: 442
following 8 days: 142
following 3 weeks: 580
and so on... this gives you an unlimited number of Interleavers, and doesn't let anyone know your rate of orders because your time periods could be on the order of days and your interleaver is doing a lot of low-tech "mashing" for you.
You can generate this table once, and simply ensure that all Interleavers are unique.
You can ensure you don't run out of Interleavers by simply adding more characters into the set, or start by defining longer Interleavers.
So you generate an order ID by getting a sequential number, and using today's Interleaver value, interleave its digits (hence, the name) in between each sequential number's digits. Guaranteed unique - guaranteed confusing.
Example:
Today I have a sequential number 1, so I will generate the order ID: 4412
The next order will be 4422
The next order will be 4432
The 10th order will be 41402
In two weeks my interleaver will change to 142,
The 200th order will be 210402
The 201th order will be 210412
Eight days later, my interleaver changes to 580:
The 292th order will be 259820
This will be completely confusing but completely deterministic. You can just remove every other digit starting at the 1's place. (except when your order id is only one digit longer than your interleaver)
I didn't say this was the best way - just a Friday idea.
You could do it like a postal code:
2b2 b2b
That way it has some kind of checksum (not really, but at least you know it's wrong if there are 2 consecutive numbers or letters). It's easy to read out over the phone, and it doesn't give an indication of how many orders are in the system.
http://blog.logeek.fr/2009/7/2/creating-small-unique-tokens-in-ruby
>> rand(36**8).to_s(36)
=> "uur0cj2h"
How about getting the current time in miliseconds and using that as your order ID?

Resources