How can I calculate the trending nature of a link? [closed] - algorithm

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
The above image represents an article's page views over time. I'm looking for a decent, not to complex either physics or statistical calculation that would be able to give me (based on the history of the page views) what the current trending of the page views is for the past n days (which is represented by the blue box).
So basically, in the past 5 days is this link trending unusually higher than it usually does and if so by what degree/magnitude?
Ideally the accepted answer would provide an algorithm class that applies to this problem as well as some example of that using the data provided from this chart above.

One approach could be to perform a least squares fit of the points within the blue box. Trends could then measured by the difference between the points and the least squares fit approximation value.

It sounds like you want to compare a short term (5-day) moving average to a longer-term moving average (e.g., something like 90 days).
As a refinement, you might want to do a least-squares linear regression over the longer term, and then compare the shorter term average to the projection you get from that.


What's a good data structure to store and work with a thesaurus? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I've been working for a few years on an English-language thesaurus project, which combines a few sources (e.g. WordNet, Wiktionary Thesaurus, Moby Thesaurus, Word2vec) to make a large thesaurus. Currently I have the data defined as a list of lists. And each link has a score (higher = stronger), so "hotel" and "inn" might have a score of 2.0; but "hotel" and "fleabag" has a score of 0.2. High scores are near synonyms, low scores are more distant associations. I've been able to use Dijkstra and A* to find links between words (so-called "synonym chains").
Is there a type of graph database and/or analysis tools which is ideally suited for this sort of data? Word relationship strengths are often asymmetric. For example "Hoover Dam" links to "Herbert Hoover" more strongly than "Herbert Hoover" links back to "Hoover Dam". I'm interested in better ways to find the links between words, find unrelated words, measure word similarity.
I'd appreciate any new pointers/direction.
Interesting question. Not sure about the best data structure, but for processing, you can look at shell neighbors within this package:

How does SVM work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Is it possible to provide a high-level, but specific explanation of how SVM algorithms work?
By high-level I mean it does not need to dig into the specifics of all the different types of SVM, parameters, none of that. By specific I mean an answer that explains the algebra, versus solely a geometric interpretation.
I understand it will find a decision boundary that separates the data points from your training set into two pre-labeled categories. I also understand it will seek to do so by finding the widest possible gap between the categories and drawing the separation boundary through it. What I would like to know is how it makes that determination. I am not looking for code, rather an explanation of the calculations performed and the logic.
I know it has something to do with orthogonality, but the specific steps are very "fuzzy" everywhere I could find an explanation.
Here's a video that covers one seminal algorithm quite nicely. The big revelations for me are (1) optimize the square of the critical metric, giving us a value that's always positive, so that minimizing the square (still easily differentiable) gives us the optimum; (2) Using a simple, but not-quite-obvious "kernel trick" to make the vector classifications compute easily.
Watch carefully at how unwanted terms disappear, leaving N+1 vectors to define the gap space in N dimensions.
I'll give you a very small details that will help you to continue understanding how SVM works.
make everything simple, 2 dimensions and linearly seperable data. The general idea in SVM is to find a hyperplan that maximize the margine between two classes. each of your data is a vector from the center. One you suggest a hyperplan, you project you data vector into the vector defining the hyperplan and then you see if the length of you projected vector is before or after the hyperplan and this is how you define your two classes.
This is very simple way of seeing it, and then you can go into more details by following some papers or videos.

What's a good selective pressure to use in tournament selection in a genetic algorithm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What is the optimal and usual value of selective pressure in tournament selection? What percent of the best members of the current generation should propagate to the next generation?
Unfortunately, there isn't a great answer to this question. The optimal parameters will vary from problem to problem, and people use a wide range of them. Selecting the right tournament selection parameters is currently more of an art than a science. Stronger selective pressure (a larger tournament) will generally result in the population converging on a solution faster, at the cost of that solution potentially not being as good. This is called the exploration vs. exploitation tradeoff, and it underlies most algorithms for searching a large space of possible solutions - you're not going to get away from it.
I know that's not very helpful, though - you want a starting place, and that's completely reasonable. So here's the best one I know of (and I know a number of others who use it as a go-to default tournament configuration as well): a tournament size of two. Basically, this means you just keep picking random pairs of solutions, choosing the best one, and sending it to the next generation (with mutation and crossover as desired), until the next generation is the desired size. This has the nice property that any member of the population besides the absolute worst has a chance of getting to the next generation, but better ones have a better chance.

Algorithm for creating infinite terrain/landscape/surface? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Does any have an algorithm for creating infinite terrain/landscape/surface?
The algorithm should start by a random seed
The algorithm should be one to one, (the same seed gives the same result)
Other input parameter are allowed as long as 2 is fulfilled
The algorithm may output a 2d map
It suppose to create only surface with varying height (mountains), not three, ocean etc.
I’m looking for an algorithm and not a software.
It should be fast
None of other related questions in here answers this question.
If anything is unclear please let me know!
I would suggest something like Perlin noise, I've used it before for something like you're describing above, and it fits the bill. Check out this Example and you can see the sort of output you would expect from the noise generator.Here is a link to algorithm p-code too.
As others already said perlin noise is a possibility. Gpugems 3 has a nice capter about procedual generation using (IIRC, it has been some time since I read this) 3D Perlin noise.
Of course there are other methods too, e.g. might be worth a look.

Google similar images algorithm [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Does any one have an idea regarding what sort of algorithm might Google be using to find similar images ?
No, but they could be using SIFT.
I'm not sure this has much to do with image processing. When I ask for "similar images" of the Eiffel tower, I get a bunch of photos of Paris Hilton, and street maps from Paris. Curiously, all of these images have the word "Paris" in the file name.
Currently the Google Image Search provides these filtering options:
Image size
Face detection
Continuous-tone ("Photo") vs. Smooth shading ("Clipart") vs. bitonal("Line drawing")
Color histogram
These options can be seen in its Image Search Result page.
I don't know about faces, but see at least:
Compare two images the python/linux way
I have heard, that one should use this when comparing images
(I mean: make the prob model, calc. the probs, use this):
Or then it might even be one of those PCFG things that MIT people tend to use with robotics stuff. One I read used some sort of PCFG model made of basic shapes (that you can rotate magically) and searched the best match with
