Best algorithm to fill pages with collages - ruby

I'm working on an algorithm to fill a book with a number of pictures.
Each has a single picture of an collage of 8, 10 or 12 pictures.
There is a constraint on the maximum pages in the book.
I need an efficient algorithm to get layout the pages so that:
All pictures are used
The number of pages is as close to the maximum as possible
I could resolve this with a little recursion but I recently read something about Dynamic Programming and figured this might be a good problem for DP.
Having absolutely zero experience with DP I did some research but I didn't find a good example/tutorial that describes a problem like this.
Could someone give me a explanation about how and where to start?

Related

Calculating efficient use of window casing (trim)

I'm working on an app that's going to estimate building material for my business. The part I'm working on right now deals specifically with trim that goes around windows.
The best way to explain this is to give an example:
Window trim is purchased in lengths of 14 feet (168 inches). Let say I have 5 rectangular windows of various sizes, all of which consist of 4 pieces of trim each (the top and bottom, and right and left). I'm trying to build an algorithm that will determine the best way to cut these pieces with the least amount of waste.
I've looked into using permutations to calculate every possibly outcome and keep track of waste, but the number of permutations where beyond the trillions once I got past 5 windows (20 different pieces of trim).
Does anyone have any insight on how I might do this.
Thanks.
You are looking at a typical case of the cutting stock problem.
I find this lecture from the University of North Carolina (pdf) is rather clear. More oriented towards implementing, with an example throughout, and few requirements -- maybe just looking up a few acronyms. But there are also 2 hours of video lectures from the university of Madras on the topic, if you want more details and at a reasonably slow pace.
It relies on solving the knapsack problem several times, which you can grab directly from Rosetta Code if you don't want to go through a second linear optimization problem.
In short, you want to select some ways (how many pieces of each length) in which to cut stock (in your case the window trim), and how many times to use each way.
You start with a trivial set : for each length you need, make a way of cutting with just that size. You then iterate : the knapsack problem gives the least favourable way to cut stock from your current configuration, and the simplex method then "removes" this combination from your set of ways to cut stock, by pivoting.
To optimize the casements on windows and bi-fold doors for the company i worked for, i used this simple matrix - i simply took the most common openings and decided what would be the most reasonable and optimal cut lengths.
for example a 3050 window could be trimmed with waste by using one 8' cut and one 12'cut.

Understanding figures in the algorithm design manual

I want to start learning about algorithms so I began reading The Algorithm Design Manual by Steven Skiena because it is recommended in some threads I read in SO. However, I just stopped here
and can't understand most of it because many explanations are represented in figures or images and my screen reader recognize but can't read them.
For example,
"The nearest neighbor rule is very efficient, for it looks at each pair of points
tex2html_wrap_inline23349 //That's how my screen reader read it, I asume it's an image.
at most twice, once when adding
tex2html_wrap_inline23351 //image
to the tour, the other when adding
tex2html_wrap_inline23353 //another image
Against all these positives there is only one problem. This algorithm is completely wrong."
This is really frustrating for me because I'm beginning to enjoy this though I can understand why those images help a lot of readers.
So is their a way to understand this thing without seeing the figures? Or should I read another book?
Thanks in advance and happy new year everyone.
Considering these algorithms are dealing with geometrical analysis, I am afraid it would be difficult to understand them without the images, and even more difficult to replace these images with an equivalent textual description.

Cut optimisation algorithm

Me and some of my friends at college were assigned a practical task of developing a net application for optimization of cutting rectangular parts from some kind of material. Something like apps in this list, but more simplistic. Basically, I'm interested if there is any source code for this kind of optimization algorithms available on the internet. I'm planning to develop the app using Adobe Flex framework. The programming part will be done in Actionscript 3, ofc. However, I doubt that there are any optimization samples for this language. There may be some for Java, C++, C#, Ruby or Python and other more popular languages, though(then I'd just have to rewrite it in AS). So, if anyone knows any free libs or algorithm code samples that would suit me, I'd like to hear your suggestions. :)
This sounds just like the stock cutting problem which is extermely hard! The best solutions use linear programming (typically based on the simplex method) with column generation (which, even after years on a constraint solving research project I feel unequipped to give a half decent explanation). In short, you won't want to try this approach in Actionscript; consequently, with whatever you do implement, you shouldn't expect great results on anything other than small problems.
The best advice I can offer, then, is to see if you can cut the source rectangle into strips (each of the width of the largest rectangles you need), then subdivide the remainder of each strip after the "head" rectangle has been removed.
I'd recommend using branch-and-bound as your optimisation strategy. BnB works by doing an exhaustive tree search that keeps track of the best solution seen so far. When you find a solution, update the bound, and backtrack looking for the next solution. Whenever you know your search takes you to a branch that you know cannot lead to a better solution than the best you have found, you can backtrack early at that point.
Since these search trees will be very large, you will probably want to place a time limit on the search and just return your best effort.
Hope this helps.
I had trouble finding examples when I wanted to do the same for the woodwoorking company I work for. The problem itself is NP-hard so you need to use an approximation algorithm like a first fit or best fit algorithm.
Do a search for 2d bin-packing algorithms. The one I found, you sort the panels biggest to smallest, then add the to the sheets in in order, putting in the first bin it will fit. Sorry don't have the code with with me and its in vb.net anyway.

Pagerank and its mathematics: Explanation needed

I am a student interested in developing a search engine that indexes pages from my country. I have been researching algorithms to use for sometime now and I have identified HITS and PageRank as the best out there. I have decided to go with PageRank since it is more stable than the HITS algorithm (or so I have read).
I have found countless articles and academic papers related to PageRank, but my problem is that I don't understand most of the mathematical symbols that form the algorithm in these papers. Specifically, I don't understand how the Google Matrix (the irreducible,stochastic matrix) is calculated.
My understanding is based on these two articles:
http://online.redwoods.cc.ca.us/instruct/darnold/LAPROJ/fall2005/levicob/LinAlgPaperFinal2-Screen.pdf
http://ilpubs.stanford.edu:8090/386/1/1999-31.pdf
Could someone provide a basic explanation (examples would be nice) with less mathematical symbols?
Thanks in advance.
The formal defintion of PageRank, as defined at page 4 of the cited document, is expressed in the mathematical equation with the funny "E" symbol (it is in fact the capital Sigma Greek letter. Sigma is the letter "S" which here stands for Summation).
In a nutshell this formula says that to calculate the PageRank of page X...
For all the backlinks to this page (=all the pages that link to X)
you need to calculate a value that is
The PageRank of the page that links to X [R'(v)]
divided by
the number of links found on this page. [Nv]
to which you add
some "source of rank", [E(u)] normalized by c
(we'll get to the purpose of that later.)
And you need to make the sum of all these values [The Sigma thing]
and finally, multiply it by a constant [c]
(this constant is just to keep the range of PageRank manageable)
The key idea being this formula is that all web pages that link to a given page X are adding to value to its "worth". By linking to some page they are "voting" in favor of this page. However this "vote" has more or less weight, depending on two factors:
The popularity of the page that links to X [R'(v)]
The fact that the page that links to X also links to many other pages or not. [Nv]
These two factors reflect very intuitive ideas:
It's generally better to get a letter of recommendation from a recognized expert in the field than from a unknown person.
Regardless of who gives the recommendation, by also giving recommendation to other people, they are diminishing the value of their recommendation to you.
As you notice, this formula makes use of somewhat of a circular reference, because to know the page range of X, you need to know the PageRank of all pages linking to X. Then how do you figure these PageRank values?... That's where the next issue of convergence explained in the section of the document kick in.
Essentially, by starting with some "random" (or preferably "decent guess" values of PageRank, for all pages, and by calculating the PageRank with the formula above, the new calculated values get "better", as you iterate this process a few times. The values converge, i.e. they each get closer and closer to what is the actual/theorical value. Therefore by iterating a sufficient amount of times, we reach a moment when additional iterations would not add any practical precision to the values provided by the last iteration.
Now... That is nice and dandy, in theory. The trick is to convert this algorithm to something equivalent but which can be done more quickly. There are several papers that describe how this, and similar tasks, can be done. I don't have such references off-hand, but will add these later. Beware they do will involve a healthy dose of linear algebra.
EDIT: as promised, here are a few links regarding algorithms to calculate page rank.
Efficient Computation of PageRank Haveliwala 1999 ///
Exploiting the Block Structure of the Web for Computing PR Kamvar etal 2003 ///
A fast two-stage algorithm for computing PageRank Lee et al. 2002
Although many of the authors of the links provided above are from Stanford, it doesn't take long to realize that the quest for efficient PageRank-like calculation is a hot field of research. I realize this material goes beyond the scope of the OP, but it is important to hint at the fact that the basic algorithm isn't practical for big webs.
To finish with a very accessible text (yet with many links to in-depth info), I'd like to mention Wikipedia's excellent article
If you're serious about this kind of things, you may consider an introductory/refresher class in maths, particularly linear algebra, as well a computer science class that deal with graphs in general. BTW, great suggestion from Michael Dorfman, in this post, for OCW's video of 1806's lectures.
I hope this helps a bit...
If you are serious about developing an algorithm for a search engine, I'd seriously recommend you take a Linear Algebra course. In the absence of an in-person course, the MIT OCW course by Gilbert Strang is quite good (video lectures at http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/).
A class like this would certainly allow you to understand the mathematical symbols in the document you provide-- there's nothing in that paper that wouldn't be covered in a first-year Linear Algebra course.
I know this isn't the answer you are looking for, but it's really the best option for you. Having someone try to explain the individual symbols or algorithms to you when you don't have a good grasp of the basic concepts isn't a very good use of anybody's time.
This is the paper that you need: http://infolab.stanford.edu/~backrub/google.html (If you do not recognise the names of the authors, you will find more information about them here: http://www.google.com/corporate/execs.html).
The symbols used in the document, are described in the document in lay English.
Thanks for making me google this.
You might also want to read the introductory tutorial on the mathematics behind the construction of the Pagerank matrix written by David Austin's entitled How Google Finds Your Needle in the Web's Haystack; it starts with a simple example and builds to the full definition.
"The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google". from Rose-Hulman is a bit out of date, because now Page Rank is the $491B linear algebra problem. I think the paper is very well written.
"Programming Collective Intelligence" has a nice discussion of Page Rank as well.
Duffymo posted the best refernce in my opinion. I studied the page rank algorithm in my senior undergrad year. Page rank is doing the following:
Define the set of current webpages as the states of a finite markov chain.
Define the probability of transitioning from site u to v where the there is an outgoing link to v from u to be
1/u_{n} where u_{n} is the number of out going links from u.
Assume the markov chain defined above is irreducible (this can be enforced with only a slight degradation of the results)
It can be shown every finite irreducible markov chain has a stationary distribution. Define the page rank to be the stationary distribution, that is to say the vector that holds the probability of a random particle to end up at each given site as the number of state transitions goes to infinity.
Google uses a slight variation on the power method to find the stationary distribution (the power method finds dominant eigenvalues). Other than that there is nothing to it. Its rather simple and elegant and probably one of the simplest applications of markov chains I can think of, but it is wortha lot of money!
So all the pagerank algorithm does is take into account the topology of the web as an indication of whether a website should be important. The more incoming links a site has the greater the probability of a random particle spending its time at the site over an infinite amount of time.
If you want to learn more about page rank with less math, then this is very good tutorial on basic matrix operations. I recommend it for everyone who has little math background but wants to dive into ranking algorithms.

How does the PageRank algorithm handle links?

We discussed Google's PageRank algorithm in my algorithms class. What we discussed was that the algorithm represents webpages as a graph and puts them in an adjacency matrix, then does some matrix tweaking.
The only thing is that in the algorithm we discussed, if I link to a webpage, that webpage is also considered to link back to me. This seems to make the matrix multiplication simpler. Is this still the way that PageRank works? If so, why doesn't everyone just link to slashdot.com, yahoo.com, and microsoft.com just to boost their page rankings?
If you read the PageRank paper, you will see that links are not bi-directional, at least for the purposes of the PageRank algorithm. Indeed, it would make no sense if you could boost your page's PageRank by linking to a highly valued site.
If you link to the web page, that web page gets it's pagerank number increased according to your site page rank.
It doesn't work the other way around. Links are not bidirectional. So if you link to slashdot, you won't get any increase in pagerank, if slashdot links to you, you will get increase in pagerank.
Its a mystery beyond what we know about the beginnings of backrub and the paper that avi linked.
My favorite (personal) theory involves lots and lots of hamsters with wheel revolutions per minute heavily influencing the rank of any particular page. I don't know what they give the hamsters .. probably something much milder than LSD.
See the paper "The 25 Billion dollar eigenvector"
http://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf

Resources