I'm trying to figure out how to best match a borrower to a lender for a real estate transaction. Let’s say there’s a network of a 1000 lenders on a platform. A borrower would log in, and be asked to provide the following:
Personal Information and Track Record (how many projects they have done, credit score, net worth etc.)
Loan Information (loan size, type, leverage etc.)
Project Information (number of units, floors, location, building type etc.)
On the other side, a lender would provide criteria on which they would agree to lend on. For example, a lender agrees to lend to a borrower if:
They have done more than 5 projects
Credit Score > 700
Net Worth > Loan Amount
$500,000 < Loan Amount < $5,000,000
Leverage < 75%
Building Size > 10 Units
Location = CA, AZ, NY, CO
etc...
I want to create a system that matches a lender to a borrower based on the information the borrower provided and the criteria the lender provided. Ideally, the system would assign a 1000 scores to the borrower that represent the “matchmaking” score for each lender on the platform. A borrower that meets more of the lender’s lending requirements would get a higher score since the match should be better. What machine learning algorithm would be best suited to generate such a score? Or would this problem be solved using combinatorial optimization?
Thanks!
If you don't have the system yet, you are unlikely to have good data for machine learning.
So write a few custom rules and start collecting data. Once you have data, do something like build a logistic regression for estimating the probability of acceptance. Once the model is good enough to beat your home grown rules in an A/B test, switch to the machine learning model.
But you can't invoke the magic of machine learning until you have data to learn from.
Related
I've noticed on food delivery apps it says something like 'orders with nearby collection and drop off points are grouped together for efficiency'. I have a similar problem where delivery jobs can come in real time or pre booked, and the algorithm needs to group jobs to get them done faster. I have data on distance between locations and how jobs are grouped manually.
I was wondering what kind of algorithms these big companies use (here its grab, foodpanda, deliveroo etc) to group orders. Is it like a secret?
Also, I was told this algorithm has to have AI in it because its a buzzword that clients love. I'm scratching my head trying to figure out how to incorporate that. e.g. use supervised learning and treat it like a classification problem on which person to choose for each job, based on distance or something?? The 'label' would be data on how humans grouped jobs, which isn't really optimal and the client wants an improvement from that as well.
My question is if these commercial algorithms out there for grouping food orders use AI, if its appropriate to use AI and how, and in general any insight into what kind of algorithms they use. Thanks in advance.
I have a task of prognosing the quickness of selling goods (for example, in one category). E.g, the client inputs the price that he wants his item to be sold and the algorithm should displays that it will be sold with the inputed price for n days. And it should have 3 intervals of quick, medium and long sell. Like in the picture:
The question: how exactly should I prepare the algorithm?
My suggestion: use clustering technics for understanding this three price ranges and then solving regression task for each cluster for predicting the number of days. Is it a right concept to do?
There are two questions here, and I think the answer to each lies in a different domain:
Given an input price, predict how long will it take to sell the item. This is a well defined prediction problem, and can be tackled using ML algorithms. e.g. use your entire dataset to train and test a regression model for prediction.
Translate the prediction into a class: quick-, medium- or slow-sell. This problem is product oriented - there doesn't seem to be any concrete data allowing you to train a classifier on this translation; and I agree with #anony-mousse that using unsupervised learning might not yield easy-to-use results.
You can either consult your users or a product manager on reasonable thresholds to use (there might be considerations here like the type of item, season etc.), or try getting some additional data in order to train a supervised classifier.
E.g. you could ask your users, post-sell, if they think the sell was quick, medium or slow. Then you'll have some data to use for thresholding or for classification.
I suggest you simply define thesholds of 10 days and 31 days. Keep it simple.
Because these are the values the users will want to understand. If you use clustering, you may end up with 0.31415 days or similar nonintuitive values that you cannot explain to the user anyway.
I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.
I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.
One approach I was considering was as follows...
split data into training/holdout sets, train on training data
For each item (A) in the set, select data from the holdout set where users bought A
Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers
The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.
Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.
http://en.wikipedia.org/wiki/Precision_and_recall
Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator
Best way to test a recommender is always to manually verify that the results. However some kind of automatic verification is also good.
In the spirit of a recommendation system, you should split your data in time, and see if you algorithm can predict what future buys the user does. this should be done for all users.
Don't expect that it can predict everything, a 100% correctness is usually a sign of over-fitting.
I'm a CS student doing a report on alternative voting systems. One of the best systems I believe is a ranked vote. For example.. In a presidential election, each president would be ranked 1-5. (IMO the problem with the US system is that only votes for the winner actually count)
Just wondering if anyone knows the best way to add up the ratings? I have searched around and I know Amazon uses weighted averages. I would think it might make sense to just add up each "star" and the person with the most wins. Maybe someone more mathematically inclined can suggest something better?
One fun thing you can do to a rating system is pass the votes through a low pass filter. This helps eliminate extremes where some dude out of 100 just wants to troll blam something. This also helps mitigate those people that after they initially post something they 5 star their product with there self made accounts.
An average does almost the same thing, but a low pass filter you can bias the voting system to be harder or easier to raise the ranking or keep a ranking which can vary from subject to subject.
A low pass filter can look as simple as:
ranks = [2,3,1,2,4,2,1,2,3,4,3,4,2]
y = [ranks[0], ranks[1], ranks[2]]
for(i=2; i<ranks.length; ++i)
currentRank = .2*ranks[i] + .3*y[2] + .2*y[1] + .3*y[0]
y.push(currentRank)
y.shift()
There are other properties to using a filter like this, but that would just require to research Digital Low Pass Filters to find those cool properties out :)
If all items have a lot of raters, taking the average should work reasonably well. Problems arise if data is sparse. For example, assume product A has one 5-star rating, and product B has 5 5-star and 5 4-star rating. I would trust product B more, although the arithmetic average is lower (4.5 vs 5).
The underlying issue is to take uncertainty into account. Intuitively, for few ratings we take a prior belief into account that is somewhere in an average range. This excellent blog post formalizes this idea and derives a Bayesian approach.
I'm thinking of writing an app to classify movies in an HTPC based on what the family members like.
I don't know statistics or AI, but the stuff here looks very juicy. I wouldn't know where to start do.
Here's what I want to accomplish:
Compose a set of samples from each users likes, rating each sample attribute separately. For example, maybe a user likes western movies a lot, so the western genre would carry a bit more weight for that user (and so on for other attributes, like actors, director, etc).
A user can get suggestions based on the likes of the other users. For example, if both user A and B like Spielberg (connection between the users), and user B loves Batman Begins, but user A loathes Katie Holmes, weigh the movie for user A accordingly (again, each attribute separately, for example, maybe user A doesn't like action movies so much, so bring the rating down a bit, and since Katie Holmes isn't the main star, don't take that into account as much as the other attributes).
Basically, comparing sets from user A similar to sets from user B, and come up with a rating for user A.
I have a crude idea about how to implement this, but I'm certain some bright minds have already thought of a far better solution already, so... any suggestions?
Actually, after a quick research, it seems a Bayesian filter would work. If so, would this be the better approach? Would it be as simple as just "normalizing" movie data, training a classifier for each user, and then just classify each movie?
If your suggestion includes some brain melting concepts (I'm not experienced in these subjects, specially in AI), I'd appreciate it if you also included a list of some basics for me to research before diving into the meaty stuff.
Thanks!
Matthew Podwysocki had some interesting articles on this stuff
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/03/30/functional-programming-and-collective-intelligence.aspx
http://codebetter.com/blogs/matthew.podwysocki/archive/2009/04/01/functional-programming-and-collective-intelligence-ii.aspx
http://weblogs.asp.net/podwysocki/archive/2009/04/07/functional-programming-and-collective-intelligence-iii.aspx
This is similar to this question where the OP wanted to build a recommendation system. In a nutshell, we are given a set of training data consisting of users ratings to movies (1-5 star rating for example) and a set of attributes for each movie (year, genre, actors, ..). We want to build a recommender so that it will output for unseen movies a possible rating. So the inpt data looks like:
user movie year genre ... | rating
---------------------------------------------
1 1 2006 action | 5
3 2 2008 drama | 3.5
...
and for an unrated movie X:
10 20 2009 drama ?
we want to predict a rating. Doing this for all unseen movies then sorting by predicted movie rating and outputting the top 10 gives you a recommendation system.
The simplest approach is to use a k-nearest neighbor algorithm. Among the rated movies, search for the "closest" ones to movie X, and combine their ratings to produce a prediction.
This approach has the advantage of being very simple to easy implement from scratch.
Other more sophisticated approaches exist. For example you can build a decision tree, fit a set of rules on the training data. You can also use Bayesian networks, artificial neural networks, support vector machines, among many others... Going through each of these wont be easy for someone without the proper background.
Still I expect you would be using an external tool/library. Now you seem to be familiar with Bayesian Networks, so a simple naive bayes net, could in fact be very powerful. One advantage is that it allow for prediction under missing data.
The main idea would be somewhat the same; take the input data you have, train a model, then use it to predict the class of new instances.
If you want to play around with different algorithms in simple intuitive package which requires no programming, I suggest you take a look at Weka (my 1st choice), Orange, or RapidMiner. The most difficult part would be to prepare the dataset to the required format. The rest is as easy as choosing what algorithm and applying it (all in a few clicks!)
I guess for someone not looking to go into too much details, I would recommend going with the nearest neighbor method as it is intuitive and easy to implement.. Still the option of using Weka (or one of the other tools) is worth looking into.
There are a few algorithms that are good for this:
ARTMAP: groups via probability against each other (this isn't fast but its the best thing for your problem IMO)
ARTMAP holds a group of common attributes and determines likelyhood of simliarity via a percentages.
ARTMAP
KMeans: This seperates out the vectors by the distance that they are from each other
KMeans: Wikipedia
PCA: will seperate the average of all the values from the varing bits. This is what you would use to do face detection, and background subtraction in Computer Vision.
PCA
The K-nearest neighbor algorithm may be right up your alley.
Check out some of the work of the top teams for the netflix prize.