Scholarly reference for IDW variable radius algorithm - algorithm

I'm trying to come up with a scholarly reference for the variant Inverse Distance Weighting algorithm where only the closest N points are used. The Wikipedia page for IDW lists it at the bottom under the heading Modified Shepard's Algorithm, but the information there is pretty sketchy.
This algorithm is in common use in the GIS world (see the bottom of this ArcGIS Desktop Help page for a simple description). Does anyone know of a better (preferably authoritative) reference?

The citation in the Wikipedia page is the original paper by Shepard, and is heavily cited. It does not get more authoritative than that. Otherwise, a good book on remote sensing or GIS could be an adequate reference.
The modified Shepard version is attributed to Franke and Nielson in this paper, and is also heavily cited in the literature (you can get an idea of citation counts in Google Scholar).

Related

Efficiently finding all points dominated by a point in 2D

In the OCW Advanced Data Structures course, Prof. E. Demaine mentions a Data Structure that is able to find all the points dominated by a query point (b2, b3) using O(n) space and O(k) time, provided that a search for point b3 has already been completed, where k is the size of the output.
The solution works by transforming the above problem into a ray stabbing problem, and using a technique similar to fractional cascading, as shown in the following image from the lecture notes:
While the concept itself is intuitive, implementing the actual data structure is not straightforward at all.
Chazelle describes this in a paper as Filtering Search (pp712).
I would like to find additional literature or answers that describe and explain of this data structure and algorithm (perhaps with pseudo code and more images, with focus on implementation).
Additionally, I would also like to know more about whether this structure can be implemented in a way that is not "static". That is, I would like to be able to insert and delete points from the structure as efficiently as possible.
The book "Computational Geometry: Algorithms and Applications" covers data structures for questions like these. Each chapter has a nice section describing where to learn more, including more complex structures for answering the same problems that are not covered in the book. There are enough diagrams, but not much pseudocode.
Many structures like this can be dynamized using techniques discussed in the book "The design of dynamic data structures". Jeff Erickson has some nice notes on the topic. Using fractional cascading with it is discussed is Cache-Oblivious Streaming B-trees" - see the section about "cache-oblivious lookahead arrays.

Is this possible to find equation of a series using genetic programming?

I have a list of numbers which form a series. I want to find the equation which can regenerate the same series. Is this possible? Also, what would you recommend to program it (GA, GP, etc). Please give an example.
You may take a look at project Eureqa
Eureqa (pronounced "eureka") is a software tool for detecting equations and hidden mathematical relationships in your data. Its goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use.
The software is designed to find least squares approximations for series of data. If your series can be exactly described as a function, you'll probably find it. Eureqa uses genetic algorithms, and in the web page there are a few references to papers and articles.
Below you may see the results (from my machine) for a series formed as 3*x^2+4 running on Eureqa:
Post Scriptum:
Regrettably the software isn't free anymore :(

Nesting maximum amount of shapes on a surface

In industry, there is often a problem where you need to calculate the most efficient use of material, be it fabric, wood, metal etc. So the starting point is X amount of shapes of given dimensions, made out of polygons and/or curved lines, and target is another polygon of given dimensions.
I assume many of the current CAM suites implement this, but having no experience using them or of their internals, what kind of computational algorithm is used to find the most efficient use of space? Can someone point me to a book or other reference that discusses this topic?
After Andrew in his answer pointed me to the right direction and named the problem for me, I decided to dump my research results here in a separate answer.
This is indeed a packing problem, and to be more precise, it is a nesting problem. The problem is mathematically NP-hard, and thus the algorithms currently in use are heuristic approaches. There does not seem to be any solutions that would solve the problem in linear time, except for trivial problem sets. Solving complex problems takes from minutes to hours with current hardware, if you want to achieve a solution with good material utilization. There are tens of commercial software solutions that offer nesting of shapes, but I was not able to locate any open source solutions, so there are no real examples where one could see the algorithms actually implemented.
Excellent description of the nesting and strip nesting problem with historical solutions can be found in a paper written by Benny Kjær Nielsen of University of Copenhagen (Nielsen).
General approach seems to be to mix and use multiple known algorithms in order to find the best nesting solution. These algorithms include (Guided / Iterated) Local Search, Fast Neighborhood Search that is based on No-Fit Polygon, and Jostling Heuristics. I found a great paper on this subject with pictures of how the algorithms work. It also had benchmarks of the different software implementations so far. This paper was presented at the International Symposium on Scheduling 2006 by S. Umetani et al (Umetani).
A relatively new and possibly the best approach to date is based on Hybrid Genetic Algorithm (HGA), a hybrid consisting of simulated annealing and genetic algorithm that has been described by Wu Qingming et al of Wuhan University (Quanming). They have implemented this by using Visual Studio, SQL database and genetic algorithm optimization toolbox (GAOT) in MatLab.
You are referring to a well known computer science domain of packing, for which there are a variety of problems defined and research done, for both 2-dimnensional space as well as 3-dimensional space.
There is considerable material on the net available for the defined problems, but to find it you knid of have to know the name of the problem to search for.
Some packages might well adopt a heuristic appraoch (which I suspect they will) and some might go to the lengths of calculating all the possibilities to get the absolute right answer.
http://en.wikipedia.org/wiki/Packing_problem

Pagerank and its mathematics: Explanation needed

I am a student interested in developing a search engine that indexes pages from my country. I have been researching algorithms to use for sometime now and I have identified HITS and PageRank as the best out there. I have decided to go with PageRank since it is more stable than the HITS algorithm (or so I have read).
I have found countless articles and academic papers related to PageRank, but my problem is that I don't understand most of the mathematical symbols that form the algorithm in these papers. Specifically, I don't understand how the Google Matrix (the irreducible,stochastic matrix) is calculated.
My understanding is based on these two articles:
http://online.redwoods.cc.ca.us/instruct/darnold/LAPROJ/fall2005/levicob/LinAlgPaperFinal2-Screen.pdf
http://ilpubs.stanford.edu:8090/386/1/1999-31.pdf
Could someone provide a basic explanation (examples would be nice) with less mathematical symbols?
Thanks in advance.
The formal defintion of PageRank, as defined at page 4 of the cited document, is expressed in the mathematical equation with the funny "E" symbol (it is in fact the capital Sigma Greek letter. Sigma is the letter "S" which here stands for Summation).
In a nutshell this formula says that to calculate the PageRank of page X...
For all the backlinks to this page (=all the pages that link to X)
you need to calculate a value that is
The PageRank of the page that links to X [R'(v)]
divided by
the number of links found on this page. [Nv]
to which you add
some "source of rank", [E(u)] normalized by c
(we'll get to the purpose of that later.)
And you need to make the sum of all these values [The Sigma thing]
and finally, multiply it by a constant [c]
(this constant is just to keep the range of PageRank manageable)
The key idea being this formula is that all web pages that link to a given page X are adding to value to its "worth". By linking to some page they are "voting" in favor of this page. However this "vote" has more or less weight, depending on two factors:
The popularity of the page that links to X [R'(v)]
The fact that the page that links to X also links to many other pages or not. [Nv]
These two factors reflect very intuitive ideas:
It's generally better to get a letter of recommendation from a recognized expert in the field than from a unknown person.
Regardless of who gives the recommendation, by also giving recommendation to other people, they are diminishing the value of their recommendation to you.
As you notice, this formula makes use of somewhat of a circular reference, because to know the page range of X, you need to know the PageRank of all pages linking to X. Then how do you figure these PageRank values?... That's where the next issue of convergence explained in the section of the document kick in.
Essentially, by starting with some "random" (or preferably "decent guess" values of PageRank, for all pages, and by calculating the PageRank with the formula above, the new calculated values get "better", as you iterate this process a few times. The values converge, i.e. they each get closer and closer to what is the actual/theorical value. Therefore by iterating a sufficient amount of times, we reach a moment when additional iterations would not add any practical precision to the values provided by the last iteration.
Now... That is nice and dandy, in theory. The trick is to convert this algorithm to something equivalent but which can be done more quickly. There are several papers that describe how this, and similar tasks, can be done. I don't have such references off-hand, but will add these later. Beware they do will involve a healthy dose of linear algebra.
EDIT: as promised, here are a few links regarding algorithms to calculate page rank.
Efficient Computation of PageRank Haveliwala 1999 ///
Exploiting the Block Structure of the Web for Computing PR Kamvar etal 2003 ///
A fast two-stage algorithm for computing PageRank Lee et al. 2002
Although many of the authors of the links provided above are from Stanford, it doesn't take long to realize that the quest for efficient PageRank-like calculation is a hot field of research. I realize this material goes beyond the scope of the OP, but it is important to hint at the fact that the basic algorithm isn't practical for big webs.
To finish with a very accessible text (yet with many links to in-depth info), I'd like to mention Wikipedia's excellent article
If you're serious about this kind of things, you may consider an introductory/refresher class in maths, particularly linear algebra, as well a computer science class that deal with graphs in general. BTW, great suggestion from Michael Dorfman, in this post, for OCW's video of 1806's lectures.
I hope this helps a bit...
If you are serious about developing an algorithm for a search engine, I'd seriously recommend you take a Linear Algebra course. In the absence of an in-person course, the MIT OCW course by Gilbert Strang is quite good (video lectures at http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/).
A class like this would certainly allow you to understand the mathematical symbols in the document you provide-- there's nothing in that paper that wouldn't be covered in a first-year Linear Algebra course.
I know this isn't the answer you are looking for, but it's really the best option for you. Having someone try to explain the individual symbols or algorithms to you when you don't have a good grasp of the basic concepts isn't a very good use of anybody's time.
This is the paper that you need: http://infolab.stanford.edu/~backrub/google.html (If you do not recognise the names of the authors, you will find more information about them here: http://www.google.com/corporate/execs.html).
The symbols used in the document, are described in the document in lay English.
Thanks for making me google this.
You might also want to read the introductory tutorial on the mathematics behind the construction of the Pagerank matrix written by David Austin's entitled How Google Finds Your Needle in the Web's Haystack; it starts with a simple example and builds to the full definition.
"The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google". from Rose-Hulman is a bit out of date, because now Page Rank is the $491B linear algebra problem. I think the paper is very well written.
"Programming Collective Intelligence" has a nice discussion of Page Rank as well.
Duffymo posted the best refernce in my opinion. I studied the page rank algorithm in my senior undergrad year. Page rank is doing the following:
Define the set of current webpages as the states of a finite markov chain.
Define the probability of transitioning from site u to v where the there is an outgoing link to v from u to be
1/u_{n} where u_{n} is the number of out going links from u.
Assume the markov chain defined above is irreducible (this can be enforced with only a slight degradation of the results)
It can be shown every finite irreducible markov chain has a stationary distribution. Define the page rank to be the stationary distribution, that is to say the vector that holds the probability of a random particle to end up at each given site as the number of state transitions goes to infinity.
Google uses a slight variation on the power method to find the stationary distribution (the power method finds dominant eigenvalues). Other than that there is nothing to it. Its rather simple and elegant and probably one of the simplest applications of markov chains I can think of, but it is wortha lot of money!
So all the pagerank algorithm does is take into account the topology of the web as an indication of whether a website should be important. The more incoming links a site has the greater the probability of a random particle spending its time at the site over an infinite amount of time.
If you want to learn more about page rank with less math, then this is very good tutorial on basic matrix operations. I recommend it for everyone who has little math background but wants to dive into ranking algorithms.

Graph drawing algorithms - I'm trying to render finite state automata

I want to write something that will draw finite state automata. Does anyone know any algorithms that are related to this?
EDIT: I should mention that I know about graphviz. I want to build my own draw program/function, so what I'm looking for is some more theoretical stuff/pseudo-code for algorithms.
Graph drawing is a fairly complex subject due to the fact that different graphs need to be drawn in different ways - there is no one algorithm fits all approach.
May I suggest the following resource:
http://cs.brown.edu/people/rtamassi/papers/gd-tutorial/gd-constraints.pdf
It should be a good starting point, page 15 provides a number of links and books to follow up.
To get started with graph drawing algorithms, see this famous paper:
"A technique for drawing directed graphs" (1993), by Emden R. Gansner, Eleftherios Koutsofios, Stephen C. North, Kiem-phong Vo, IEEE Transactions on Software Engineering.
It describes the algorithm used by dot, a graphviz drawing program. On the linked page you will find many more references. You will also find some more papers when you google for "drawing directed graphs".
Also, you might find OpenFst convenient, a general toolkit for finite-state machines. It has a binary called fstdraw, which will output a finite-state machine in a format that can be read by dot.
Check out Graphviz. It's an open source graph visualization software.
EDIT: Check out the documentation section which links to some of the layout algorithms used.
Maybe, I'm a little late in answering this question. Anyway this is a very comprehensive reference to the different types of graphs and the algorithms to visualize them.
http://www.cs.brown.edu/~rt/gdhandbook/

Resources