Reading this interesting Book ( thé nature of magnetism) , I came across a particular approximation of hyperpolic tangent, while in first case T bigger than Tc , it is just Taylor series, in case T smaller than Tc ( where an additional terme is added to the argument of the hyperbolic tangent) , the approximation is not clear for me . I attached a screenshot of the pages.
Any hints ? Please Thank you very much for your help Bestenter image description here
For more context see
I hope someone can help me on this one (PLEASE) :
I want to do similarity between some article features ( author, category, year, impact factor , citation)
And I dont have a clue how to do it for the nominal data , for the numerical features I can do the cosine similarity but how can I do it for the nominal ?
Thanks in advance for everybody !
While I don't want to recommend this approach, it seems to be very popular:
encode your categories as binary attributes. i.e.:
A1=Car -> (1,0,0)
A1=Truck -> (0,1,0)
A1=Bike -> (0,0,1)
then you can continue as you would with text. This is effectively the same as treating them as three different words.
It will work, but IMHO there is just no notion of "correlation" outside of continuous numerical values. Already on text it is more of a hack to make things than a good approach.
I have a set of samples (vectors) each have a dimension about of M (10000) and the size of the set is also about N(10000), and i want to find first (with biggest eiegenvalues) 10 PC of this set. Due to the big dimension of samples i cannot calculate covariation matrix in reasonable time. Are there any methods to select PC without calculation of full cov matrix or methods that can effectively handle big dimension of data or something like this? So these methods should require less operations than O(M*M*N).
NIPALS -- Non-linear iterative partial least squares
see for example here:
guys, maybe it could help somehow, i have found solution in family of EM-PCA methods (see for example this,
I have implemented AdaBoost sequence algorithm and currently I am trying to implement so called Cascaded AdaBoost, basing on P. Viola and M. Jones original paper. Unfortunately I have some doubts, connected with adjusting the threshold for one stage. As we can read in original paper, the procedure is described in literally one sentence:
Decrease threshold for the ith classifier until the current
cascaded classifier has a detection rate of at least
d × Di − 1 (this also affects Fi)
I am not sure mainly two things:
What is the threshold? Is it 0.5 * sum (alpha) expression value or only 0.5 factor?
What should be the initial value of the threshold? (0.5?)
What does "decrease threshold" mean in details? Do I need to iterative select new threshold e.g. 0.5, 0.4, 0.3? What is the step of decreasing?
I have tried to search this info in Google, but unfortunately I could not find any useful information.
Thank you for your help.
I had the exact same doubt and have not found any authoritative source so far. However, this is what is my best guess to this issue:
1. (0.5*sum(aplha)) is the threshold.
2. Initial value of the threshold is what is above. Next, try to classify the samples using the intermediate strong classifier (what you currently have). You'll get the scores each of the samples attain, and depending on the current value of threshold, some of the positive samples will be classified as negative etc. So, depending on the desired detection rate desired for this stage (strong classifier), reduce the threshold so that that many positive samples get correctly classified ,
say thresh. was 10, and these are the current classifier outputs for positive training samples:
9.5, 10.5, 10.2, 5.4, 6.7
and I want a detection rate of 80% => 80% of above 5 samples classified correctly => 4 of above => set threshold to 6.7
Clearly, by changing the threshold, the FP rate also changes, so update that, and if the desired FP rate for the stage not reached, go for another classifier at that stage.
I have not done a formal course on ada-boost etc, but this is my observation based on some research papers I tried to implement. Please correct me if something is wrong. Thanks!
I have found a Master thesis on real-time face detection by Karim Ayachi (pdf) in which he describes the Viola Jones face detection method.
As it is written in Section 5.2 (Creating the Cascade using AdaBoost), we can set the maximal threshold of the strong classifier to sum(alpha) and the minimal threshold to 0 and then find the optimal threshold using binary search (see Table 5.1 for pseudocode).
Hope this helps!
i have an array of date=>values, like this
"2010-10-12 14:58:36" =>13.4
"2010-10-17 14:58:36" =>12
"2010-10-22 14:58:36" =>17.6
"2010-10-27 14:58:36" =>22
"2010-11-01 14:58:36" =>10
I use this date-value combination to paint an graph in javascript.
Now i like to mark those dates, who are "very special".
My problem (and Question) is, which aspect should consider to find those specific dates?
As an human, i prefer the date "2010-10-17 14:58:36", because "something" should be happens on this date, because the value on the next dates rises for 5.6 points, which is the biggest step up followed by one mor big step up. On the other hand, also the date "2010-10-27 14:58:36" is an "highlight", because this is
the top of all values and
after this date, there comes the biggest step down.
So as an human, i would be choose both dates.
My problem is: how could an algorithm look like?
I tried averages values for n dates before and after the current values, which results in an accumulation of those specifics dates at the beginning and at the end of the graph
So i tried to find the biggest percentage step up (depending on the date before), but I'm not sure, if i really find the specific dates, I'm looking for?!
How would you tackle the problem?
Thank you.
Looks like financial stocking issue :-) You are looking for Time series analysis - this is a statistical issue. I'd recommend to use R programming language to play with it (you can do complex statistical things very fast). There are tens of special packages, for sure financial one's too. Once you know what you want, you may implement the solution in any other language.
Just try to google time series analysis r.
EDIT: note that R is very powerful - I'd bet there is a tool how to use R packages from other languages.
If you have information over a timeline you could use Inerpolation.
A Polynomial interpolation will give you an approximated polynomial that goes through the points.
What's nice about this is you can then use Mathematical analysis which is easy on polynomials to find interesting points (large gradients, min-max points etc...)
Also you get an approximation of how the function behaves, so you could "future" points and see what may happen in the near future.
Of course looking into the future isn't so accurate, but forms of interpolation are used in analytic to see trends and behaviors.
And of course, it's easy to plot a polynomial, which is always nice.
This is really a question of Statistics and the context of your data and what you're looking to highlight, for example, the fact that between 12/10 and 17/10 the data moved negative 1.4 units may be more useful in some scenarios than a larger positive step change.
You need sample data, on which build up a function which can calculate an expected value for any given date; for instance averaging the values of the day before, the same week day of the previous week, of the previous month and so on. After that decide a threshold: interesting date are those for which real value is outside expected value +- threshold
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How do you calculate the distance between 2 cities?
If you need to take the curvature of the earth into account, the Great-Circle distance is what you're looking for. The Wikipedia article probably does a better job of explaining how the formula works than me, and there's also this aviation formulary page that covers that goes into more detail.
The formulas are only the first part of the puzzle though, if you need to make this work for arbitrary cities, you'll need a location database to get the lat/long from. Luckily you can get this for free from, although there are commercial db's available (ask google). So, in general, look up the two cities you want, get the lat/long co-orinates and plug them into the formula as in the Wikipedia Worked Example.
Other suggestions:
For a full commercial solution,
there's PC Miler which is used
by many trucking companies to
calculate shipping rates.
Make calls to the Google Maps (or other) api. If you need to do many requests per day, consider caching the results on the server.
Also very important is to consider building an equivalence database for cities, suburbs, towns etc. if you think you'll ever need to group your data. This gets really complicated though, and you may not find a one-size-fits-all solution for your problem.
Last but not least, Joel wrote an article about this problem a while back, so here you go: New Feature: Job Search
You use the Haversine formula.
This is very easy to do with geography type in SQL Server 2008.
SELECT geography::Point(lat1, lon1, 4326).STDistance(geography::Point(lat2, lon2, 4326))
-- computes distance in meters using eliptical model, accurate to the mm
4326 is SRID for WGS84 elipsoidal Earth model
You ca use the A* algorithm to find the shortest path between those two cities and this way you'll have the distance.
If you're talking about the shortest distance between two real cities on a real spherical planet, like Earth, you want the great circle distance.
If you are working in the plane and you want the Euclidean distance "as the crow flies":
// Cities are points x0,y0 and x1,y1 in kilometers or miles or Smoots[1]
dx = x1 - x0;
dy = y1 - y0;
dist = sqrt(dx*dx + dy*y);
No trigonometry needed! Just the Pythagorean theorem and the fact that squares are always positive so you don't need dx = abs(x1 - x0), etc. to get a positive number to pass to sqrt().
Note that you could probably do this in one line and a compiler would probably reduce it the equivalent above code:
dist = sqrt((x1-x0)*(x1-x0) + (y1-y0)*(y1-y0));
You can get the distance between two cities from google map api.
Here is an implementation of it in Python
import requests
from sys import argv
def get_distance(origin,destination):
payload={"origins":origin,"destinations":destination,"sensor":'false' }
data = a.json()
origin = str(data['origin_addresses'][0])
destination= str(data['destination_addresses'][0])
distance = data['rows'][0]['elements'][0]['distance']['text']
return distance,origin,destination
except Exception,e:
print "The %s or %destination does not exists :(" %(origin,destination)
if __name__=="__main__":
if len(argv)<3:
print "sorry Check the format"
print "%s ---> %s : %s" %(origin,destination,distance)
Example link:
You find the Lat/Lon of the city, then use a distance estimation algorithm for Lat/Lon coordinates.
if you need a code example I think I have one I could dig up at home, but like many of the previous answers, you need a long / lat db to do the calculation
It is better to use a look-up table for obtaining the distance between two cities.
This makes sense because
* The Formula to calculate the distance ais quite computationally intensive..
* Distance between cities is unlikely to change.
So unless you needs are very specific (like terrain mapping from a satellite or some or topography algorithm or something else), you should really just save the list of cities and distances between them, into a table and look it up as needed.
I've been doing a lot of work with this recently. I'm finding SQL2008's new features really make this easy. I can find all the points that are withing Xkm of a 100k record table in sub-second time...not too shabby.
The great circle (spherical assumption) method in my testing was about 2.5 miles off when compared to the vincenty formula (elipsoidal assumption, which is what the earth is).
The real trick is getting the lat and long..for that I'm using Google.
#Jared - a minor correction to your code example. The last line of the first code example should read:
dist = sqrt(dx*dx + dy*dy);
I agree that once you have the info, if it's not going to change, store it somehow. #Marko Tinto Thanks for the T-SQL sample. For those who don't have access to SQL Server or prefer another method: If you need high accuracy, check out Wikipedia's entry on the Vincenty algorithm for more info. I believe there is a js implementation, which would (if not already) be easily ported to other languages. Also, at the bottom of that page is a link to geographicLib, which purports to be 1000 time more accurate than the Vincenty algorithm (if you have data that good, it might matter).
Why would you use something like the Vincenty method? Because the earth is not a perfect sphere and methods like that allow for inputting a more accurate major and minor axis for modeling the earth.
i use distancy
so simple and clean