Cursors for data selection in matplotlib - user-interface

I am trying to get user input from matplotlib XY plot. The plot contains multiple datasets and I need get from user selection of which dataset to use and the range. I need this to fit model to right dataset and range.
Therefore I need two indicators, which would be "attached" to specific dataset, per user choosing. I need to get from them both the dataset info and the range info.
Somehow in line with what commercial plotting packages (Igor Pro, Kaleidagraph, Sigmaplot...) provide as "cursors" and similarly named widgets for control of their fitting interface, which is what I am trying to reproduce.
I have checked various examples with rangeselector and other methods I was able to Google on the web, but none I was able to find seems to be able to provide what I need.
Would anyone have any pointers to where to look or what to start with, please?

You might want to look at this example: http://matplotlib.sourceforge.net/examples/pylab_examples/ginput_manual_clabel.html
The interesting functions are ginput, waitforbuttonpress.

Related

GoogleMaps Autocomplete (Places API)

I am trying to fetch airports using the Places API autocomplete feature.
Looking at the types parameter, if this is an airport I display the result or else show it as no airports found.
I want to enhance this app, I want to show terminals within each Airport object that I display on the front end.
I have found the nearby search within places API but it is difficult to create a search query using keyword, and type to get exact results for all the airports around the world.
Does anyone have any idea as to what would be the best way to get airports and their terminals using the Places API?
Well I'm not sure if this is really your expected result but here's what I tried:
Get the placeId of an airport using Place Autocomplete.
https://maps.googleapis.com/maps/api/place/autocomplete/json?input=dublin&radius=500&types=airport&key=API_KEY
Then use that placeId to do a Place Details request and get the coordinates of the airport.
https://maps.googleapis.com/maps/api/place/details/json?place_id=ChIJLxmTab4RZ0gRVfMlt7UbElU&key=API_KEY
This also returns an overview of the airport wherein in this case the result says: "
"Airport with 2 runways, a 2nd terminal opened in 2010 plus buses into Dublin & other towns/cities."
Then after having the coordinates, I use that to do a Nearby Search request.
https://maps.googleapis.com/maps/api/place/nearbysearch/json?keyword=terminal&location=53.42644809999999,-6.249909799999999&radius=10000&key=API_KEY
This managed to get the terminals around it, I just threw in some radius but I guess it should be different on other locations. I also tried this with other Airports and it somehow worked.
If this won't work for your use case. Another thing I think you could do is to store the coordinates of known airports (Please note that coordinates/placeID are the only thing that is allowed for us to store/cache. Please see Specific Terms). And create an object which also stores the coordinate of their corresponding terminals. This would be an extensive work if you want to do this with airports all around the world.
Hope this helps.

How do I access h2o xgb model input features after saving a model to disk and reloading it?

I'm using h2o's xgboost implementation in Python. I've saved a model to disk and I'm trying to load it later on for analysis and predicting. I'm trying to access the input features list or, even better, the feature list used by the model which does not include the features it decided not to use. The way people advise doing this is to use varimp function to get the variable importance and while this does remove features that aren't used in the model this actually gives you the variable importance of intermediate features created by OHE the categorical features, not the original categorical feature names.
I've searched for how to do this and so far I've found the following but no concrete way to do this:
Someone asking something very similar to this and being told the feature has been requested in Jira
Said Jira ticket which has been marked resolved but I believe says this was implemented but not customer visible.
A similar ticket requesting this feature (original categorical feature importance) for variable importance heatmaps but it is still open.
Someone else who found an unofficial way to access the columns with model._model_json['output']['names'] but that doesn't give the features that weren't used by the model and they are told to use a different method that doesn't work if you have saved the model to disk and reloaded it (which I am doing).
The only option I see is to just use the varimp features, split on period character to break the OHE feature names, select the first part of all the splits, and then run a set over everything to get the unique column names. But I'm hoping there's a better way to do this.

export PMML Model

I have a PMML file generated with SPSS's neural networks.
in this model i have 20 predictors, the hidden layer, and a binary result.
Now, i can run this model as many times as i want at home, but i really want to export it for the hospital's pc with a simple interface, like a small window where to enter the 20 predictors and a field displaying the prediction's value.
i'm really really new to the field, i really appreciate any help you can provide.
I have tried googling but there doesn't seem to be an easy way to get this done.

Nvidia Digits accuracy and loss plots data

I trained my model in Nvidia Digits 5 and I would now like to extract the accuracy and loss plots that were generated during training for a report. Is this data saved somewhere so that it would possible to extract the data for these plots so that I could plot it in Python and perhaps ultimately modify the plots to compare different models etc?
The best solution I have found is to either look at the HTML file or to scan the text file caffe_output.log that is produced by Caffe. The text file is usually stored in /var/digits/jobs/insert_your_job_id/ but you can also just run on linux systems:
locate caffe_output.log
Go to your DIGITS job folder and locate your job's subfolder. Inside you'll find a file status.pickle, which is a pickled object containing all your job's information.
You can load it in python like so:
import digits
import pickle
data = pickle.load(open('status.pickle','rb'))
This object is somewhat generic and may contain multiple tasks. For a typical classification task it will likely be just one, but you will still need to access it via data.tasks[0]. From there you can grab the plots:
data.tasks[0].combined_graph_data()
which returns a somewhat convoluted dict (unfortunately - since your network can produce many accuracy/loss outputs, as well as even custom ones). It contains everything you need though - I managed to plot accuracy with:
plt.plot( data.tasks[0].combined_graph_data()['columns'][2][1:] )
but it's likely that you'll have to write a bit of custom code. As always, dir() is your friend.

Generating vector data (points) for OpenLayers Cluster

In my web application I am going to use OpenLayers.Strategy.AnimatedCluster strategy due to the fact that I need to visualize a great amount of point features. Here is a very good example of what it looks like. In both examples in above mentioned example the data (point features) are generated of taken from the GeoJSON file.
So, can anybody provide me with a file containing 100 000+ (better is even 500 000+) features (world cities, for instance), or explain how I can generate them so that they will be located all over the world (not like in Spain in the first example in above mentioned link).
use a geolocation database to supply you the data you need. GeoLite, for example
If 400K+ locations is ok, use download their CSV CITY LIST
If you want more, then you might want to give the Nominatim downloads, but they are quite bulky (more than 25GB) and parsing data is not as simple as a csv file.

Resources