Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm having a real world problem at work and I was hoping of solving it with python but I can't find the right algorithm to solve it.
Say I have trashcans with fixed sized holes and I need to throw away a number of specific sized trash bags. Each bag must be specifically in the right size. What algorithm can i use to thro away the trash using the minimum number of trashcans?
This sounds like it could be a bin packing or knapsack problem. You should know that both are NP-hard so there are no polynomial time optimal solutions to either problem. However there are a number of heuristic algorithms that are easy to implement that can guarantee "near" optimal solutions.
The most well known are First Fit an Best Fit. Both of these algorithms are guaranteed to pack items into bins such that the number of bins used is not greater than 1.7x the number of bins that would be used in an optimal solution. You can read more about these algorithms here.
A very simple example using first fit in python might look like this:
import random
bin_capacity = 1.0
bins = [[]]
bin_stats = [0.0] # used to keep track of how full each bin is
jobs = [random.uniform(0.0,1.0) for i in range(100)]
for job in jobs: # iterate through jobs
job_assigned = False
for i in range(len(bin_stats)):
if job + bin_stats[i] <= bin_capacity:
# if job will fit into bin, assign it there
bins[i].append(job)
bin_stats[i] += job
job_assigned = True
break
if not job_assigned:
# if job doesn't fit into any bin, open a new bin
bins.append([job])
bin_stats.append(job)
for i in range(len(bins)):
print "Bin {:} is {:.2f}% full and contains the following jobs".format(i,bin_stats[i]*100)
for job in bins[i]:
print "\t",job
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I build a machine learning algorithms to predict Y' value. For this, I used Log value of Y for data scaling.
As I got the predicted Y' and actual Y value, I have to convert Log value of Y&Y' to Exponential value.
BUT, there was so huge distortion from the values over exp7 (=ln1098)... It makes a lot of MSE(error).
How can I avoid this huge distortion?? (Generally, I need to get values over 1000)
Thanks!!
For this, I used Log value of Y for data scaling.
Not for scaling, but to make target variable distribution normal.
If your MSE arises when real target value arises too - it means that the model simply can't fit enough on big values. Usually it can be solved by cleaning data (removing outliers). Or take another ML-model.
UPDATE
You can run KFold and for each fold calculate MSE/MAE between predicted and real values. Then take big errors and take a look which parameters/features this cases have.
You can eliminate cases with big errors, but it's usually dangerous.
In general bad fit on big values mean that you did not remove outliers from your original dataset. Plot histograms and scatter plots and make sure that you don't have them.
Check categorical variables: maybe you have small values (<=5%). If so, group them.
Or you need to create 2 models: one for small values, one for big ones.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am looking for a way to express the following logistics/distribution problem as an equation that can be run thru a solver to find an optimal solution (or for a known algorithm that would fit this problem best):
assume there is a distribution center that is tasked with distributing different types of soda/drinks to a list of target locations (#ofStores).
assume the distribution center has a list of trucks (#ofTrucks), each with a different carrying capacity #truckCapacity (lets say in tons), and each rated with what type of products it can distribute (for example there might be trucks that can only distribute soda cans, while others could distribute both cans as well as bottles), but only one product type will be transported at one time.
also assume that it takes X number of days for a truck to deliver the product #travelDays (for now assume at least one day for reaching the destination, one day for unloading, and same return speed)
each target distribution location (store) has a maximum limit on how much of each item it can store (again assume tons), called maximumInventory, as well as a minimum limit for each type of product minimumInventory (the supply should not drop bellow this limit)
each distribution target (store) also provides a list of expected sales for the next X numbers of days, for each product type salesRates (again to simplify assume tons). So given the current stock level, we can estimate the inventory for each product in the upcoming days.
Assume we are the distribution center and we are assigned with scheduling trucks to deliver the products to each destination, by looking at the current store inventory for each product, expected consumption/sales, maximum and minimum inventory, the available trucks and the number of days it takes to deliver the product, and any trucks that are already on-route for delivery. The scope is to schedule the trucks in such a way that the stock stays within the max/min limit given the expected sales.
Also, to simplify the problem for the first iteration, we could further assume that we only have one product type (lets say only tasked with distributing Coke cans), and that all trucks can distribute this product type (we could even assume all trucks can carry the same load for further simplification).
What is the right optimization algorithm to solve this problem and how should the input be specified? Thank you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
The game 2048 has exploded in popularity since its release in February 2014. For a description of the game and discussion of optimal algorithms, see What is the optimal algorithm for the game 2048?. Here is the source code.
A blind algorithm for 2048 is one that cannot see the board; the only feedback the algorithm receives is whether or not an attempted slide occurred (we may suppose a blocked slide produces an audible beep). A blind algorithm is practically useful for getting started in 2048 without having to give the game your undivided attention.
Here is my specific question: is there a blind algorithm for 2048 that consistently does better than a mean score of 3500 in 10^6 trials? (only post an answer you have validated)
This is the performance of the LADDER algorithm, which may be notated as (LD* RD*)* (+U). That is, one loops over "left, down repeatedly until stuck, right, down repeated until stuck" and presses up iff left, right, and down are all blocked, which occurs iff the top row(s) are completely empty and the bottom row(s) are completely full. I call this algorithm LADDER because of the letters LDDR, and because I imagine climbing down ladders like Mario in Donkey Kong. The motivation for the algorithm is to maintain an increasing gradient from top to bottom of the board, similar to many of the non-blind algorithms.
Here is a histogram for 10^6 trials of LADDER colored by top tile on the final board with bin width 32 and mean 3478.1. I generated this data by simulating the game and algorithm in Python, using probability .9 that each new tile is a 2, as in the original game. You can't see the 1024 games at this vertical scale but they are sparsely distributed between 8000 and 16000. The fractal structure relates to the number of occurrences of the top tile, second-from-top tile, and so on. By comparison, random button mashing gave a mean of about 800 in 10^4 trials.
The most important in the 2048 game is to concentrate the high numbers along the borders and not in the middle. So a very good strategy is to put everything along the bottom as long as possible. Your LADDER algorithm does this, but I'd like to concentrate more on the left side and not switch to the right side completely. This is the algorithm in pseudo code:
while(true)
{
if (down)
continue;
elseif(left)
continue;
elseif (right)
continue;
else
{
up;
down; //if forced to go up; go back down immediately
}
}
Using your convention this would be:
((D*L)*R)U
in words: go down as long as you can; if you cannot; go left; if you cannot go left; go right. You will rarely need to go up.
Since I won't have time shortly to implement this to use it 10⁶ times; I hope someone else can give the correct statisctics for this, but my guess is this will outperform your LADDER algorithm
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm using MATLAB.
I have a three dimensional array filled with logicals. This array represents data of a cylinder with N uniformly shaped, but arbitrary orientated staples in it. The volume is discretized in voxels (3 dimensional pixels) and a logical '1' means 'at this point in the cylinder IS a part of a staple', while a '0' means 'at this point in the cylinder is air'.
The following picture contains ONE two dimensional slice of the full volume. Imagine the complete volume composed of such slices. White means '1' and black means '0'.
To my problem now: I have to separate each staple as good as possible.
The output products should be N three dimensional arrays with only the voxels belonging to a certain staple being '1', everything else '0'. So that I have arrays that only contain the data of one staple.
The biggest problem is, that '1's of different staples can lie next to each other (touching each other and being entangled), making it difficult to decide to which staple they belong to.
Simplifying is the fact, that boundary voxels of a staple may be cut away, I can work with any output array which preserves the approximate shape of the original staple.
Maybe somebody of you can provide an idea how such a problem could be solved, or even name me algorithms which I can take a look at.
Thanks in advance.
Since the staples are many pixel objects, you can reduce noise using 3d median filtering or bwareaopen to start with. Then bwlabeln can be used to label connected components in the binary array. Then you can use
REGIONPROPS to further analyze each connected object, and see if this is a standalone staple or more. This can be done using features such as 'Perimeter' to identify different cases, but you'll have to investigate yourself these and other regionprops features .
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
This post was edited and submitted for review 9 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
From a series of MIDI notes stored in array (with MIDI note number), does an algorithm exist to get the most likely key or scale implied by these notes?
If you're using Python you can use the music21 toolkit to do this:
import music21
score = music21.converter.parse('filename.mid')
key = score.analyze('key')
print(key.tonic.name, key.mode)
if you care about specific algorithms for key finding, you can use them instead of the generic "key":
key1 = score.analyze('Krumhansl')
key2 = score.analyze('AardenEssen')
etc. Any of these methods will work for chords also.
(Disclaimer: music21 is my project, so of course I have a vested interest in promoting it; but you can look at the music21.analysis.discrete module to take ideas from there for other projects/languages. If you have a MIDI parser, the Krumhansl algorithm is not hard to implement).
The algorithm by Carol Krumhansl is the best-known. The basic idea is very straightforward. A reference sample of pitches are drawn from music in a known key, and transposed to the other 11 keys. Major and minor keys must be handled separately. Then a sample of pitches are drawn from the music in an unknown key. This yields a 12-component pitch vector for each of 24 reference samples and one unknown sample, something like:
[ I, I#, II, II# III, IV, IV#, V, V#, VI, VI#, VII ]
[ 0.30, 0.02, 0.10, 0.05, 0.25, 0.20, 0.03, 0.30, 0.05, 0.13, 0.10 0.15]
Compute the correlation coefficient between the unknown pitch vector and each reference pitch vector and choose the best match.
Craig Sapp has written (copyrighted) code, available at http://sig.sapp.org/doc/examples/humextra/keycor/
David Temperley and Daniel Sleator developed a different, more difficult algorithm as part of their (copyrighted) Melisma package, available at
http://www.link.cs.cmu.edu/music-analysis/ftp-contents.html
A (free) Matlab version of the Krumhansl algorithm is available from T. Eerola and P. Toiviainen in their Midi Toolbox:
https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/miditoolbox
There are a number of key finding algorithms around, in particular the ones of Carol Krumhansl (most papers that I've seen always cite Krumhansl's methods)
Assuming no key changes, a simple algorithm could be based on a pitch class histogram (an array with 12 entries for each pitch class (each note in an octave)), when you get a note you add one in the correct entry, then at the end you'll very likely have two most frequent notes that will be 7 semitones (or entries) apart representing the tonic and the dominant, the tonic being the note you're looking for and the dominant being 7 semitones above or 5 semitones below.
The good thing about this approach is that it's scale-independent, it relies on the tonic and the dominant being the two most important notes and occurring more often. The algorithm could probably be made more robust by giving extra weight to the first and last notes of large subdivisions of a piece.
As for detecting the scale then once you have the key you can generate a list of the notes you have above a certain threshold in your histogram as offsets from that root note, so let's say you detect a key of A (from having A and E occur more often) and the notes you have are A C D E G then you would obtain the offsets 0 3 5 7 10, which searching in a database like this one would give you "Minor Pentatonic" as a scale name.