Noob here, sorry if this is a too silly question. Only I am looking to the name of the algorithm. because I am pretty sure someone solved this problem before, but I cannot found anything on google, mostly because of a lack of vocabulary.
Basically, what I am looking, is the best algorithm to solve the following situation:
I have a group of elements, let say Companies. I need to process all of them, one by one, but the criteria are that the next one will be the least attended. For example, if my universe where 3 companies:
Oracle
Apple
Google
The first time, whichever of those will fulfill the criteria, so let's say we choose Oracle. We process Oracle, so in the next round it will be or Apple or Google, but clearly not Oracle. Let's choose now Apple. The next round is clearly Google. When I finished the first round, I need to attend them again, this time, I do not need to choose at random, because of the 3 companies, Oracle was processed the most time ago.
I am sure there is a well known algorithm for this
As #Henry mentioned in the comments, the answer to my question is "Round Robin"
Related
Intro
Some time ago there was a giveaway that a certain company producing beverages made: you could buy a product and within the bottle you would get a 10 letter-and-number code that you could enter online and possibly win a prize. I myself liked that drink so whenever I bought one I would use the code. It happened that one of those codes was winning and I saved it. Later on I found out that my friend also won a prize, so out of interest I asked him for his code. When comparing the two, I discovered that ASCII codes of those two winning codes' characters added to a certain number. Trying my other non-winning ones, I found that none of them satisfied the ASCII code criteria, but they all fell within the certain range (I'm not sure what it is).
I tried typing random codes but it turned out that none of those were valid (not non-winning, but rather weren't accepted by the system at all). So I thought that it must be one of the two:
They had pre-prepared list of codes (which's length I guess is measured in millions since the drink is very popular) and only some of them are winning (also pre-determined).
The system tested each code for some specific criteria to establish whether it comes from the actual product or is randomly typed.
Question
So, this got me thinking: "Is there an algorithm that takes a list of such codes and tries to find a rule that all of them are following?"
Sadly, I'm not familiar with any algorithms (C, Haskell, Prolog...) that are able to do this, so I'm asking here. I would appreciate if someone could help me with this since I am as well a computer science student and the existence of such an algorithm (or anything for that matter that is related to my question) could help me in the future.
Thanks in advance!
I am running a Tennis Website in behalf of a friend of mine because she's not that really passionate about technology and computers.
When we create a tournament subscription page, users and amateur tennis players fill out a form to subscribe to that tournament.
There is a field in the form where the user can describe their availability based on their needs.
Basically, users write when they can play matches, and most of the times they are time costraints, like for example:
"I can play all the evenings after 9.00 PM",
"only in the weekdays",
"because of work, I can play only in the weekends",
"Always, except not after 10.00 PM every evening because I have to wake up early".
I call them time costraints.
Yesterday I found a new costraint, and it is like so:
"Me (UserA) and my friend (userB) will share the car in order to partecipate in the tournament, because we live far from you, and we have to travel long miles and we would like to come together in order to save fuel.
As long as my friend is not eliminated in the tournament, I'd prefer to play in similar times with my friend (userB).
If my friend is eliminated, I can always play everytime"
My question now is if there is an algorithm to satisfy all these costraints, or a precooked solution my friend can use even if she's not a techie or a geek.
I undertand that this algorithm should run after every day, because of course match winners are not known in advance and hence user time costraints vary.
I also understand it is an operations research problem, but I haven't got experience and I'm not a professional programmer.
Please leave any pointer you may have on specific literature or software.
Thanks
There is no precooked solution to such problems AFAIK. Somebody will have to build a model and an application for that.
As suggested, Constraint Programming is one technique that solves this kind of problem and proposes solutions that satisfy all given constraints. Choco is a very handy open source tool.
However, you may want to formulate it as an optimization problem. You want the algorithm to place each pair UserA/UserB in the same day/time slot when scheduling the next round. How many such pairs are there? What if it is not possible to place all such pairs?
Go for the largest number of pairs would be doable using MILP. Maybe take history into account and average out the number of times each pair comes together ? Such a model is definitely more complex...
I have an algorithm that uses time, and other variables to create keys that are 30 characters in length.
The FULL results look like this: d58fj27sdn2348j43dsh2376scnxcs
Because it uses time, and other variables, it will always change.
All you see is the first 6: d58fj2
So basically, if you monitor my results, each time you would see different once, and only the first 6:
d58fj2
kfd834
n367c6
9vh364
vc822c
c8u23n
df8m23
jmsd83
QUESTION: Would you ever be able to reverse engineer and figure out the algorithm calculating them? REMEMBER, you NEVER see the full result, just the first 6 digits out of 30.
QUESTION 2: To those saying it's possible, how many keys would you need in order to do that? (And I still mean, just the first 6 digits out of the 30)
Thanks
I'm willing to risk the downvotes, but somehow this quickly started smelling like school (home)work.
The question itself - "Is this reverse-engineerable? REMBER, you never see the full result" is suspicious enough; if you can see the full result, so can I. Whether you store it locally so i can take my time inspecting it, or whether it goes thru the wire so i have to hunt it down is another matter - having to use wireshark or not, I can still see what's being transmitted to and from the app.
Remember, at some point WEP used to be "unbreakable" while now alot of lowend laptops can crack them easily.
The second question however - "how many samples would you need to see to figure it out" sounds like one of those dumb impractical teacher questions. Some people guess their friends' passwords on first try, some take a few weeks... The number of tries, unfortunatelly, isn't the deciding factor in reverse-engineering. Only having the time to try them all is; which is why people use expensive locks on their doors - because they're not unbreakable, but because it takes more than a few seconds to break them which increases the chances that the neighbours will see suspicious activity.
But asking the crowd "how many keys would you need to see to crack this algorithm you know nothing about" leads nowhere, as it's merely a defensive move that does not provide any guarantees; the author of the algorithm very well knows how many samples one needs to break it using statistical analysis. (in case of WEP, that's anywhere between 5000 - 50000 - 200000 IVs). Some passwords break down with 5k, some hardly breaking with 200k...
Answering your questions in more detail with academic proof requires more info from your side; much more than the ambiguous "can you do it, and if yes, how long would it take?" question which is what it currently is.
I'm working on a web application which will be used for classifying photos of automobiles. The users will be presented with photos of various vehicles, and will be asked to answer a series of questions about what they see. The results will be recorded to a database, averaged, and displayed.
I'm looking for algorithms to help me identify users which frequently don't vote with the group, indicating that they're probably either not paying attention to the photos, or that they're lying about what they see. I then want to exclude these users, and recalculate the results, such that I can say, with a known amount of confidence, that this particular photo shows a vehicle that is this and that.
This question goes out to all you computer science guys, where to find such algorithms or to give myself the theoretical background to design such algorithms. I'm assuming I'm going to have to learn some probability and statics, maybe some data mining. Some book recommendations would be great. Thanks!
P.S. These are multiple choice questions.
All of these are good suggestions. Thank you! I wish there was a way on stack overflow to select multiple correct answers so more of you could be acknowledged for your contributions!!
Read The Elements of Statistical Learning, it is a great compendium on data mining.
You can be interested especially in unsupervised algorithms, for example clustering. Assuming that most people do not lie, the biggest cluster is right and the rest is wrong. Mark people accordingly, then apply some bayesian statistics and you'll be done.
Of course, most data mining technologies are pretty experimentative, so don't count on that they will be always right... or even in most cases.
I believe what you described is solved using outlier/anomaly detection.
A number of techniques exist:
statistical-based methods
distance-based methods
model-based methods
I suggest you take a look at these slides from the excellent book Introduction to Data Mining
If you know what answers you are expecting why do you ask people to vote? By excluding some values you basically turn the vote in something that you like. Automobiles make different impression to different individuals. If 100 ppl loved a car then when someone comes and says that he/she doesn't like it, you exclude the vote?
But anyway, considering that you still want to do this, first of all you will need a large set o data from "trusted" voters. This will give you an idea of "good" answer and from this point you can choose the exclude threshold.
Without an initial set of data you cannot apply any algorithm because you will get false results. Consider just one vote of 100 from on a scale from 0 to 100. The second vote is "1" The you will exclude this vote because is too far away from the average.
I think a pretty simple algorithm could accomplish this for you. You could try and get fancier by calculating the standard deviations and such, but I wouldn't bother.
Here's a simple approach that should be sufficient:
For each of your users, calculate the number of questions they answered and the number of times they selected the most popular answer for the question. The users which have the lowest ratio of picking the popular answer versus total answers you can guess are providing bogus data.
You probably would not want to throw out the data from users where they've only answered a small number of questions because they likely have just disagreed on a few versus putting in bogus data.
What kind of questions are they (Yes/No, or 1 to 10?).
You may be able to get away with not discarding anything by using a mean instead of an average. With averages if there are extreme outliers in the response it could affect the average, but if you use median you may get a better answer. So for example if you had 5 answers, order them and pick the middle one.
I think what you are saying is that you are concerned that certain people are "outliers", and they are adding noise to your data, making the categorizations less reliable. So, if you have a Chevy Camaro, and most people say it is either a pony car, a muscle car, or a sports car, but you have some goofball who says it's a family sedan, you would want to minimize the impact of his vote.
One thing you could do is provide a Stack Overflow-like reputation score for users:
The more a user is "in agreement" with other users, the better his or her score would be. For a given user (User X), this could be determined by a simple calculation of what percentage of users who responded to a question chose the same category as User X, then averaging this value over all questions answered.
You may want to to multiply this value by the total number of question answered to encourage people to answer as many questions as possible. (Note: if you choose to do this, it would be equivalent to just summing the percentage agreement scores rather than averaging them.)
You could present the final reputation score to users, making sure to explain that they will be rewarded for how well their responses agree with those of other users. This will encourage people to answer more questions but also to take care in their answers.
Finally, you could calculate a certainty score for a given categorization by adding up the total reputation score of all people who chose a given category.
Some of these ideas may need some refinement, especially since I don't know your exact situation. Certainly, if people can see what other people chose before they vote, it would be way too easy to game the system.
If you were to collect votes like "on a scale from 1 to 10, how would you rate this car", you could probably use simple average and standard deviation: the smaller the standard deviation, the more unanimous the general consensus is among your voters, and you can flag users who are e.g. 3 standard devs from the average.
For multiple choice, you need to be more careful. Simply discarding all but the most-voted option will do nothing but disgruntle the voters. You need to establish a measure of how significant the winner is w.r.t. the other options, e.g. flag users who voted for options with less than 1/3 of the winning options count.
Note that I wrote "flag users", not discard votes. If you discard votes, you can't tell how confident you are about the result ("91% voted this to be a Ford Mustang"). If a user has more than a certain percentage of his votes flagged - well, that's up to you.
Your trickiest problem, however, will probably be to collect sufficient votes. Depending on how easy the multiple choice problem is, you probably need several times the number of options as votes, per photo. Otherwise the statistics are meaningless.
I'd just like someone to verify whether the following problem is NP-complete or if there is actually a better/easier solution to it than simple brute-force combination checking.
We have a sort-of resource allocation problem in our software, and I'll explain it with an example.
Let's say we need 4 people to be at work during the day-shift. This number, and the fact that it is a "day-shift" is recorded in our database.
However, we don't require just anyone to fill those spots, there's some requirements that needs to be filled in order to fit the bill.
Of those 4, let's say 2 of them has to be a nurse, and 1 of them has to be doctors.
One of the doctors also has to work as part of a particular team.
So we have this set of information:
Day-shift: 4
1 doctor
1 doctor, need to work in team A
1 nurse
The above is not the problem. The problem comes when we start picking people to work the day-shift and trying to figure out if the people we've picked so far can actually fill the criteria.
For instance, let's say we pick James, John, Ursula and Mary to work, where James and Ursula are doctors, John and Mary are nurses.
Ursula also works in team A.
Now, depending on the order we try to fit the bill, we might end up deducing that we have the right people, or not, unless we start trying different combinations.
For instance, if go down the list and pick Ursula first, we could match her with the "1 doctor" criteria. Then we get to James, and we notice that since he doesn't work in team A, the other criteria about "1 doctor, need to work in team A", can't be filled with him. Since the other two people are nurses, they won't fit that criteria either.
So we backtrack and try James first, and he too can fit the first criteria, and then Ursula can fit the criteria that needs that team.
So the problem looks to us as we need to try different combinations until we've either tried them all, in which case we have some criteria that aren't filled yet, even if the total number of heads working is the same as the total number of heads needed, or we've found a combination that works.
Is this the only solution, can anyone think of a better one?
Edit: Some clarification.
Comments to this question mentions that with this few people, we should go with brute-force, and I agree, that's probably what we could do, and we might even do that, in the same lane that some sort optimizations look at the size of the data and picks different sort algorithms with less initial overhead if the data size is small.
The problem though is that this is part of a roster planning system, in which you might have quite a few number of people involved, both as "We need X people on the day shift" as well as "We have this pool of Y people that will be doing it", as well as potential for a large "We have this list of Z criteria for those X people that will have to somehow match up with these Y people", and then you add to the fact that we will have a number of days to do the same calculation for, in real-time, as the leader adjusts the roster, and then the need for a speedy solution has come up.
Basically, the leader will see a live sum information on-screen that says how many people are still missing, both on the day-shift as a whole, as well as how many people is fitting the various criteria, and how many people we actually ned in addition to the ones we have. This display will have to update semi-live while the leader adjusts the roster with "What if James takes the day-shift instead of Ursula, and Ursula takes the night-shift".
But huge thanks to the people that has answered this so far, the constraint satisfaction problem sounds like the way we need to go, but we'll definitely look hard at all the links and algorithm names here.
This is why I love StackOverflow :)
What you have there is a constraint satisfaction problem; their relationship to NP is interesting, because they're typically NP but often not NP-complete, i.e. they're tractable to polynomial-time solutions.
As ebo noted in comments, your situation sounds like it can be formulated as an exact cover problem, which you can apply Knuth's Algorithm X to. If you take this tack, please let us know how it works out for you.
It does look like you have a constraint satisfaction problem.
In your case I would particularly look at constraint propagation techniques first -- you may be able to reduce the problem to a manageable size that way.
What happens if no one fits the criteria?
What you are describing is the 'Roommate Problem' it is lightly described in this thesis.
Bear with me, I'm searching for better links.
EDIT
Here's another fairly dense thesis.
As for me I would most likely trying to find reduction to bipartite graph matching problem. Also to prove that problem is NP usually is much more complicated than staying you cannot find polynomial solution.
I am not sure your problem is NP, it does not smell that way, but what I would do if I was you would be to order the requirements for the positions such that you try to fill the most specific first since fewer people will be available fill these positions, so you are less likely to have to backtrack a lot. There is no reason why you should not combine this with algorithm X, an algorithm of pure Knuth-ness.
I'll leave the theory to others, since my mathematical savvy is not so great, but you may find a tool like Cassowary/Cassowary.net or NSolver useful to represent your problem declaratively as a constraint satisfaction problem and then solve the constraints.
In such tools, the simplex method combined with constraint propagation is frequently employed to deterministically reduce the solution space and then find an optimal solution given a cost function. For larger solution spaces (which don't seem to apply in the size of problem you specify), occasionally genetic algorithms are employed.
If I remember correctly, NSolver also includes in sample code a simplification of an actual Nurse-rostering problem that Dr. Chun worked on in Hong Kong. And there's a paper on the work he did.
It sounds to me like you have a couple of separable problems that would be a lot easier to solve:
-- select one doctor from team A
-- select another doctor from any team
-- select two nurses
So you have three independent problems.
A clarification though, do you have to have two doctors (one from the specified team) and two nurses, or one doctor from the specified team, two nurses, and one other that can be either doctor or nurse?
Some Questions:
Is the goal to satisfy the constraints exactly, or only approximately (but as much as possible)?
Can a person be a member of several teams?
What are all possible constraints? (For example, could we need a doctor which is a member of several teams?)
If you want to satisfy the constraints exactly, then I would order the constraints decreasingly by strictness, that is, the ones which are most hardest to achieve (e.g. doctor AND team A in your example above) should be checked first!
If you want to satisfy the constraints approximately, then its a different story... you would have to specify some kind of weighting/importance-function which determines what we rather would have, when we can't match exactly, and have several possibilities to choose from.
If you have several or many constraints, take a look at Drools Planner (open source, java).
Brute force, branch and bound and similar techniques take to long. Deterministic algorithms such as fill the largest shifts first are very suboptimal. Meta-heuristics are a very good way to deal with this.
Take a specific look at the real-world nurse rostering example of Drools Planner. It's easy to add many constraints, such as "young nurses don't want to work the Saturday night" or "some nurses don't want to work to many days in a row".