So I got this table
The given min support is 2 and the confidence level is = 50%
From the table above, I can see that all the items were frequent, so I was confused about making the association rule.
So the question is how to make the association rule based on the given min support and the confidence level.
Related
There is an inventory of products like eg. A- 10Units, B- 15units, C- 20Units and so on. We have some customer orders of some products like customer1{A- 10Units, B- 15Units}, customer2{A- 5Units, B- 10Units}, customer3{A- 5Units, B- 5Units}. The task is fulfill maximum customer orders with the limited inventory we have. The result in this case should be filling customer2 and customer3 orders instead of just customer1.[The background for this problem is a realtime online retail scenario, where we have millions of customers and millions of products and we are trying to fulfill the orders as efficiently as possible]
How do I solve this?Is there an algorithm for this kind of problem, something like optimisation?
Edit: The requirement here is fixed. The only aim here is maximizing the number of fulfilled orders regardless of value. But we have millions of users and millions of products.
This problem includes as a special case a knapsack problem. To see why consider only one product A: the storage amount of the product is your bag capacity, the order quantities are the weights and each rock value is 1. Your problem is to maximize the total value you can fit in the bag.
Don't expect an exact solution for your problem in polynomial time...
An approach I'd go for is a random search: make a list of the orders and compute a solution (i.e. complete orders in sequence, skipping the orders you cannot fulfill). Then change the solution by applying a permutation on the orders and see if it's better.
Keep going with search until time runs out or you're happy with the solution.
It can be solved by DP.
Firstly sort all your orders with respect to A in increasing order.
Use this DP :
DP[n][m][o] = DP[n-a][m-b][o-c] + 1 where n-a>=0 and m-b >=0 o-c>=0
DP[0][0][0] = 1;
Do bottom up computation :
Set DP[i][j][k] = 0 , for all i =0 to Amax; j= 0 to Bmax; k = 0 to Cmax
For Each n : 0 to Amax
For Each m : 0 to Bmax
For Each o : 0 to Cmax
if(n>=a && m>=b && o>= c)
DP[n][m][o] = DP[n-a][m-b][o-c] + 1;
You will then have to find the max value of DP[i][j][k] for all values of i,j,k possible. This is your answer. - O(n^3)
Reams have been written about order fulfillment and yet no one has come up with a standard answer. The reason being that companies have different approaches and different requirements.
There are so many variables that a one size solution that fits all is not possible.
You would have to sit down and ask hundreds of questions before you could even start to come up with an approach tailored to your customers needs.
Indeed those needs might also vary, based on the time of year, the day of the week, what promotions are currently being run, whether customers are ranked, numbers of picking and packing staff/machinery currently employed, nature, size, weight of products, where products are in the warehouse, whether certain products are in fast/automated picking lines, standard picking faces or in bulk. The list can appear endless.
Then consider whether all orders are to be filled or are you allowed to partially fill an order and back-order out of stock products.
Does the entire order have to fit in a single box or are multiple box orders permitted.
Are you dealing with multiple warehouses and if so can partial orders be sent from each or do they have to be transferred for consolidation.
Should precedence be given to local or overseas orders.
The amount of information that you need at your finger tips before you can even start to plan a methodology to fit your customers specific requirements can be enormous and sadly, you are not going to get a definitive answer. It does not exist.
Whilst I realise that this is not a) an answer or b) necessarily a welcome post, the hard truth is that you will require your customer to provide you with immense detail as to what it is that they wish to achieve, how and when.
You job, initially, is the play devils advocate, in attempting to nail them down.
P.S. Welcome to S.O.
When there are no ratings, a common scenario is to use implicit feedback (items bought, pageviews, clicks, ...) to suggests recommendations. I'm using a model-based approach and I wondering how to deal with multiple identical feedback.
As an example, let's imagine that consummers buy items more than once. Should I have to consider the number of feedback (pageviews, items bought, ...) as a rating or compute a custom value ?
To model implicit feedback, we usually have a mapping procedure to map implicit user feedback into the explicit ratings. I guess in most domains, repeated user action against the same item indicates that the user's preference over the item is increasing.
This is certainly true if the domain is music or video recommendation. In a shopping site, such a behavior might indicate the item is consumed periodically, e.g., diapers or printer ink.
One way I am aware of to model this multiple implicit feedback is to create a numeric rating mapping function. When the number of times (k) of implicit feedback increases, the mapped value of rating should increase. At k = 1, you have a minimal rating of positive feedback, for example 0.6; when k increases, it approaches 1. For sure, you don't need to map to [0,1]; you can have integer ratings, 0,1,2,3,4,5.
To give you a concrete example of the mapping, here is what they did in a music recommendation domain. For short, they used the statistic info of the items per user to define the mapping function.
We assume that the more
times the user has listened to an artist the more the user
likes that particular artist. Note that user’s listening habits
usually present a power law distribution, meaning that a few
artists have lots of plays in the users profile, while the rest
of the artists have significantly less play counts. Therefore,
we compute the complementary cumulative distribution of
artist plays in the users’ profile. Artists located in the top
80-100% of the distribution are assigned a score of 5, while
artists in the 60-80% range assign a score of 4.
Another way I have seen in the literature is to create another variable besides a binary rating variable. They call it confidence levels. See here for details.
Probably not that helpful for OP any longer, but it might be for others in the same boat.
Evaluating Various Implicit Factors in E-commerce
Modelling User Preferences from Implicit Preference Indicators via Compensational Aggregations
If anyone knows more papers/methods, please share as I'm currently looking for state of the art approaches to this problem. Thanks in advance.
You typically use a sum of clicks, or some weighted sum of events, as a "score" for each user-item pair in implicit feedback systems. It's not a rating, and that's more than a semantic distinction. You won't get good results if you feed these values into a process that's expecting rating-like and trying to minimize a squared-error loss.
You treat 3 clicks as adding 3 times the value of 1 click to the user-item interaction strength. Other events, like a purchase, might be weighted much more highly than a click. But in the end it also adds to a sum.
I am new to data mining. I want to mine multi-dimensional and ordinal association rules from my data set e.g.
if (income => 100) ^ (priority=>1) ^ (skill=>technician ) then (approve=>prove)
What I have learned is that
categorical = for skills e.g. technician, plumber or any textual data
quantitative = numeric for date, balance
So major then is which association rule algorithm should be used? Mostly algorithm are quantitative or categorical is there any combined?
I think you are misunderstanding the concept of association rule mining on its own.
Your quantitative data cannot be used as such in association rule mining (as I understood your question). At least, you cannot 'tune' the quantity to fit your needs because everything in association rule mining is either items (quantitative or qualitative) and transactions so that you can define the rules that relate the items between each other. Therefore, the quantities become 'fixed' items.
Note what is association rule mining: given a set of binary attributes (items) and set of transactions, each containing a subset of items, you define a set of rules, which are implications: X -> Y (with X and Y being subsets of the set of items, and disjunct as well).
You can interpret it (or model) the implication of a rule as an if, but that is just syntactic sugar. There are not quantitative or qualitative as we know them in association rule mining. Just, items that belong to a set and the relationships (implications/rules) that we define between them.
I have this problem but not sure what algorithm it belongs to.
We are trying to make a scheduling system where the users can choose the time preferences and then they are grouped into classes with their most preferred time.
Let say I have 100 users. Those users have their time preferences. We want to divide them into 4 -> 6 class with about 20 -> 25 students in each class. My question is
How to schedule them into the class time that they most preferred with the least amount of classes used ?
(Another constraint factor we have is the amount of teachers we have and the maximum hours a week they can teach. Also we want to be able to have makeup class. For eg: students who miss class this week can be reschedule for next week. )
One way to go about a multi-objective optimization problem like this is to find a solution that satisfies one objective, and then use local search to attempt to satisfy the remaining constraints.
For example, if you ignore the constraint that you would like to minimize the number of classes used, then you can treat this as a variant on the Stable Marriage Problem (or weighted bipartite graph matching problem) - this has the nice property that it can be solved in polynomial time. Your variant on the problem is most similar to the "hospitals/residents" problem (assigning many residents to a few hospitals based on preference).
This will leave you with several classes with only a few students in them, so you next perform a local search to satisfy the "minimize the number of classes" constraint (if you formulated the Stable Marriage algorithm correctly then you should have already satisfied the "no class exceeds 25 students" constraint) - from there you have two options:
Sort the classes from fewest to most students and close the classes with the fewest students, reassigning the evicted students to the remaining classes
Continue to take the students' preferences into account and sort the classes from least-preferred to most-preferred (so if you have 5 students in a class who assigned it a weight of 10 then you would first close a class with 10 students who assigned it a weight of 2).
You would then perform another local search to satisfy the teachers' hours constraints - you would perform the "teachers can't teach more than X hours" search after you perform the "minimize the number of classes" search, since the latter optimization will make it easier to perform the former optimization.
If the resulting algorithm is fast enough then you can randomize it and run it a few dozen times, saving the best result. For example, rather than closing the class with the fewest students first, randomly select a class to close (weight the selection so that it usually selects the smallest class - a completely random search won't perform well)
It's possible that you'll find that one of your constraints is causing a lot of conflicts, e.g. when satisfying the teachers' hours constraint you discover that you're having to rearrange a large percentage of the students. If this occurs then change the constraint ordering, so that satisfying the teachers' hours is done on the second or even the first pass rather than on the third pass.
The makeup-class-constraint might actually belong at the top of the constraint list (even though it's got a low priority), depending on its specifications. For example, you may have the requirement that the makeup class occur before any other class (so that e.g. a student with class on Tuesday can have the makeup class on Monday and get caught up prior to his regular class); even though the makeup class constraint has a low priority, it has the tightest requirements, and so it needs to get ordered first.
Can the values in User-Item matrix be binary values like 0 and 1 which indicate “didn’t buy”-vs-“bought”?
And if apply latent factor model on the matrix, can the predicted value (for example 0.8) stand for the probability of user's behavior(i.e. didn’t buy or bought)?
Yes, it is quite common to have implicit feedback to represent ratings. One slight pitfall with the suggestion you made would be if 0 means the user saw the item but chose not to buy it, or the user never even saw the item (i.e gave no feedback.)
Typically the value output from your recommendation algorithm isn't a probability of a purchase, but rather a numerical score used to rank that item versus all other potential items. This way you can identify the top X items to recommend to a user.
You can use standard collaborative filtering on the type of data you discussed, and also using factorisation techniques.