Knapsack With 4 Constraints - knapsack-problem

I am trying to figure out the logic for a knapsack with four constraints. I want to make a program where you input the calories, fat, carbs, and protein you want to consume in a meal and it looks through a list of possible foods for the closest combination of foods that would meet the input criteria.
Example
I have these items
4oz beef (231 calories, 15g fat, 0g carbs, 22g protein)
1/2 cup oatmeal (260 calories, 2g fat, 58g carbs, 10g protein)
1/2 cup black beans (120 calories, .5g fat, 23g carbs, 7g protein)
1 banana (105 calories, 0g fat, 27g carbs, 1g protein)
1/2 cup cottage cheese (110 calories, 5g fat, 6g carbs, 11g protein)
1/2 cup whole wheat pasta (200 calories, 1g fat, 40g carbs, 8g protein)
and my goal is to consume 745 calories, <= 20g fat, <=80g carbs, >= 40g protein
I've seen a lot of implementations of the knapsack problem, but I've never seen it with 4 constraints. My question is if it is do-able. Can you lead me to the right algorithms for my program? Thank you.

This is basic Linear Programming. You should look at a solver for these types of problems. In the Microsoft world - the Solver Foundation will solve this. You can also play with the Excel solver to see it done interactively.
Lot's of open source solutions too. If you prefer to do the code yourself- and need a general solution, use array math - you will be solving a system of equations of inequalities. That way you can handle N constraints and Y variables.

Related

Is this a classification problem or regression problem? And what algorithm can be applied to solve it?

Let's say we are selling 3 different flavors of juice (orange, apple and grape), and customers purchase several bottles of juice for a group of people. For the sake of this question, let's assume they select flavors depending on various input data such as season, weather, temperature, etc.. There can be many inputs but let's limit the inputs to 4 in this example. Here is an example of their purchase history:
Order qty
Input_1 (season)
Input_2 (weather)
Input_3
Input_4
orange
apple
grape
50
summer
sunny
78
adult
20
0
30
30
winder
rainy
35
children
20
10
0
75
spring
cloudy
50
both
30
30
15
What machine learning algorithm can predict how many of each flavor a customer would purchase given the input parameters? Notice that the total of 3 flavors must add up to the order quantity, and number cannot be less than zero.
It is a regression problem.
You can solve it easily with deep learning: just one-hot encode the categorical features and please normalize numerical values.
X is features 0 to 5 and y is orange apple and grapes features.
Then you can train and predict a deep neural network model.
Tensorflow and Pytorch are good example of deep learning libraries

Liftover dog canFam3 to human hg19

Hi my question is relatively simple. I've converted dog coordinates to human using UCSC LiftOver. These are 200bp intergenic regions that are differentially methylated from normal dogs to cancer dogs.
I've converted these to human coordinates and found that a lot of them overlap with differentially methylated regions we found in the human model. Is this okay to do?
What stringencies should I check or modify, the min ratio of bases needed to overlap?
Lastly, is this program taking the 200bp dog sequence and saying that at least 10% of the sequence has to align with a region within the human genome?
I haven't seen a sufficient answer online yet.

Is there any non-heuristic approach in resolving these types of problems

I am trying to resolve the following problem.
Given the following:
resources: food, wood, stone, gold
units: peon(requirements: 50 food, town hall, 3 turns to make) which can produce: 10 amount of any resource or 3 building points
warrior(requirements: 30 food, 20 wood, 4 turns to make, barracks)
archer(requirements: 30 wood, 25 gold, archery, 3 turns to make)
buildings: town hall(requirements: 500 food, 500 points, 20 building points) required to produce peons, only one peon can be produced at a time
barracks: 100 wood, 50 stone, 10 building points, required to produce warriors
archery: 200 wood, 30 gold, 12 building points, at least one barracks, required to produce archers
and the following: starting resources, buildings, units and their quantities
final resources, buildings, units and their quantities
output: the minimum required turns to get from starting quantities to final quantities of resources, buildings and units
notes: you start with at least one town hall
what's the point of having multiple town halls: they can produce peons faster
Now, my first approach was to resolve this problem using an heuristic approach, by selecting the most expensive resource/building/unit from the final state and determine what do I need in order to get to that quantity.
And my question is: Is there any non-heuristic approach in resolving this problem/this types problems.
Well, you can simplify the problem with a little analysis...
warriors and archers are given no value in the problem as posed, so you'd never spend resources creating them, and therefore have no need for barracks or archery buildings
You're left with:
units: peon(requirements: 50 food, town hall, 3 turns to make) which can produce: 10 amount of any resource or 3 building points
buildings: town hall(requirements: 500 food, 500 points, 20 building points) required to produce peons, only one peon can be produced at a time
You can then evaluate the number of turns needed if you use only existing peon and town halls, working away until you have the required resources.
If creating more town halls will improve on that, you obviously would want to start creating the town hall as early as possible. So, speculatively analyse the impact of doing so and compare it to the earlier result. If the building helped, do similar analysis to decide whether to start yet another building as each completes....
You could perform a breath first search in the search-tree of all possible moves.
This might take a long time, but guarantees to find the optimal solution.
wikipedia: breath first search
A A*-search can be much faster, but you need to find a heuristic which never underestimates the cost of the remaining (unknown) part of the solution.
wikipedia: A*-search
It seems like the only decision is whether to build extra buildings not needed in the final result. I think it could be done using a cost benefit approach. E.g. building extra town hall, has a known cost, benefit depends on how long it is operational.
I think it is OK to use heuristics, general solutions to optimisation problems are NP hard.

Clustering algorithm to cluster objects based on their relation weight

I have n words and their relatedness weight that gives me a n*n matrix. I'm going to use this for a search algorithm but the problem is I need to cluster the entered keywords based on their pairwise relation. So let's say if the keywords are {tennis,federer,wimbledon,london,police} and we have the following data from our weight matrix:
tennis federer wimbledon london police
tennis 1 0.8 0.6 0.4 0.0
federer 0.8 1 0.65 0.4 0.02
wimbledon 0.6 0.65 1 0.08 0.09
london 0.4 0.4 0.08 1 0.71
police 0.0 0.02 0.09 0.71 1
I need an algorithm to to cluster them into 2 clusters : {tennis,federer,wimbledon} {london,police}. Is there any know clustering algorithm than can deal with such thing ? I did some research, it appears that K-means algorithm is the most well known algorithm being used for clustering but apparently K-means doesn't suit this case.
I would greatly appreciate any help.
You can treat it as a network clustering problem. With a recent version of mcl software (http://micans.org/mcl), you can do this (I've called your example fe.data).
mcxarray -data fe.data -skipr 1 -skipc 1 -write-tab fe.tab -write-data fe.mci -co 0 -tf 'gq(0)' -o fe.cor
# the above computes correlations (put in data file fe.cor) and a network (put in data file fe.mci).
# below proceeds with the network.
mcl fe.mci -I 3 -o - -use-tab fe.tab
# this outputs the clustering you expect. -I is the 'inflation parameter'. The latter affects
# cluster granularity. With the default parameter 2, everything ends up in a single cluster.
Disclaimer: I wrote mcl and a slew of associated network loading/conversion and analysis programs recently rebranded as 'mcl-edge'. They all come together in a single software package. Seeing your example made me curious whether it would be doable with mcl-edge, so I quickly tested it.
Consider DBSCAN. If it suits your needs, you might wish to take a closer look at an optimised version, TI-DBSCAN, which uses triangle inequality for reducing spatial query cost.
DBSCAN's advantages and disadvantages are discussed on Wikipedia. It splits input data to a set of clusters whose cardinality isn't known a priori. You'd have to transform your similarity matrix into a distance matrix, for example by taking 1 - similarity as a distance.
Check this book on Information retrieval
http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html
it explains very well what you want to do
Your weights are higher for more similar words and lower for more different words. A clustering algorithm requires similar points/words to be closer spatially and different words to be distant. You should change the matrix M into 1-M and then use any clustering method you want, including k-means.
If you've got a distance matrix, it seems a shame not to try http://en.wikipedia.org/wiki/Single_linkage_clustering. By hand, I think you get the following clustering:
((federer, tennis), wimbledon) (london, police)
The similarity for the link that joins the two main groups (either tennis-london or federer-london) is smaller than any of the similarities that build the two groups: london-police, tennis-federer, and federer-wimbledon: this characteristic is guaranteed by single linkage clustering, since it binds together closest clusters at each stage, and the two main groups are linked by the last binding found.
DBSCAN (see other answers) and successors such as OPTICS are clearly an option.
While the examples are on vector data, all that the algorithms need is a distance function. If you have a similarity matrix, that can trivially be used as distance function.
The example data set probably is a bit too small for them to produce meaningful results. If you just have this little of data, any "hierarchical clustering" should be feasible and do the job for you. You then just need to decide on the best number of clusters.

Algorithm to generate numerical concept hierarchy

I have a couple of numerical datasets that I need to create a concept hierarchy for. For now, I have been doing this manually by observing the data (and a corresponding linechart). Based on my intuition, I created some acceptable hierarchies.
This seems like a task that can be automated. Does anyone know if there is an algorithm to generate a concept hierarchy for numerical data?
To give an example, I have the following dataset:
Bangladesh 521
Brazil 8295
Burma 446
China 3259
Congo 2952
Egypt 2162
Ethiopia 333
France 46037
Germany 44729
India 1017
Indonesia 2239
Iran 4600
Italy 38996
Japan 38457
Mexico 10200
Nigeria 1401
Pakistan 1022
Philippines 1845
Russia 11807
South Africa 5685
Thailand 4116
Turkey 10479
UK 43734
US 47440
Vietnam 1042
for which I created the following hierarchy:
LOWEST ( < 1000)
LOW (1000 - 2500)
MEDIUM (2501 - 7500)
HIGH (7501 - 30000)
HIGHEST ( > 30000)
Maybe you need a clustering algorithm?
Quoting from the link:
Cluster analysis or clustering is the
assignment of a set of observations
into subsets (called clusters) so that
observations in the same cluster are
similar in some sense. Clustering is a
method of unsupervised learning, and a
common technique for statistical data
analysis used in many fields
Jenks Natural Breaks is a very efficient single dimension clustering scheme: http://www.spatialanalysisonline.com/OUTPUT/html/Univariateclassificationschemes.html#_Ref116892931
As comments have noted, this is very similar to k-means. However, I've found it even easier to implement, particularly the variation found in Borden Dent's Cartography: http://www.amazon.com/Cartography-Thematic-Borden-D-Dent/dp/0697384950
I think you're looking for something akin to data discretization that's fairly common in AI to convert continuous data (or discrete data with such a large number of classes as to be unwieldy) into discrete classes.
I know Weka uses Fayyad & Irani's MDL Method as well as Kononeko's MDL method, I'll see if I can dig up some references.
This is only a 1-dimensional problem, so there may be a dynamic programming solution. Assume that it makes sense to take the points in sorted order and then make n-1 cuts to generate n clusters. Assume that you can write down a penalty function f() for each cluster, such as the variance within the cluster, or the distance between min and max in the cluster. You can then minimise the sum of f() evaluated at each cluster. Work from one point at a time, from left to right. At each point, for 1..# clusters - 1, work out the best way to split the points so far into that many clusters, and store the cost of that answer and the location of its rightmost split. You can work this out for point P and cluster size c as follows: consider all possible cuts to the left of P. For each cut add f() evaluated on the group of points to the right of the cut to the (stored) cost of the best solution for cluster size c-1 at the point just to the left of the cut. Once you have worked your way to the far right, do the same trick once more to work out the best answer for cluster size c, and use the stored locations of rightmost splits to recover all the splits that give that best answer.
This might actually be more expensive than a k-means variant, but has the advantage of guaranting to find a global best answer (for your chosen f() under these assumptions).
Genetic hierarchical clustering algorithm
I was wondering.
Apparently what you are looking for are clean breaks. So before launching yourself into complicated algorithms, you may perhaps envision a differential approach.
[1, 1.2, 4, 5, 10]
[20%, 333%, 25%, 100%]
Now depending on the number of breaks we are looking for, it's a matter of selecting them:
2 categories: [1, 1.2] + [4, 5, 10]
3 categories: [1, 1.2] + [4, 5] + [10]
I don't know about you but it does feel natural in my opinion, and you can even use a treshold approach saying that a variation less than x% is not worth considering a cut.
For example, here 4 categories does not seem to make much sense.

Resources