I'm using django rest gis to load up leaflet maps, and at the top level of my app I'm looking at a map of the world. The basemap is from Mapbox. I make a call to my rest-api and return an outline of all of the individual countries that are included in the app. Currently, the GeoJSON file that is returned in 1.1MB in size and I have more countries to add so I'd like to reduce the size to improve performance.
Here is an example of the contents:
{"type":"FeatureCollection","features":[{"type":"Feature","geometry":{"type":"MultiPolygon","coordinates":[[[[-64.54916992187498,-54.71621093749998],[-64.43881835937495,-54.739355468749984],[-64.22050781249999,-54.721972656249996],[-64.10532226562495,-54.72167968750003],[-64.054931640625,-54.72988281250001],[-64.03242187499995,-54.74238281249998],[-63.881933593750006,-54.72294921875002],[-63.81542968749997,-54.725097656250014],[-63.83256835937499,-54.76796874999995],[-63.97124023437499,-54.810644531250034],[-64.0283203125,-54.79257812499999],[-64.32290039062497,-54.79648437499999],[-64.45327148437497,-54.84033203124995],[-64.50869140625,-54.83994140624996],[-64.637353515625,-54.90253906250001],
The size of the file is a function the number of points and the precision of those points. I was thinking that the most expedient way to reduce the size, while preserving my original data, would be to reduce the precision of the geom points. But, I'm at a bit of a loss as to how to do this. I've looked through the documentation on github and haven't found any clues.
Is there a field option to reduce the precision of the GeoJSON returned? Or, is there another way to achieve what I'm try to do?
Many thanks.
I ended up simplifying the geometry using PostGIS and then passing that queryset to the serializer. I started with creating a raw query in the model manager.
class RegionQueryset(models.query.QuerySet):
def simplified(self):
return self.raw(
"SELECT region_code, country_code, name, slug, ST_SimplifyVW(geom, 0.01) as geom FROM regions_region "
"WHERE active=TRUE AND region_type = 'Country'"
)
class RegionsManager (models.GeoManager):
def get_queryset(self):
return RegionQueryset(self.model, using=self._db)
def simplified(self):
return self.get_queryset().simplified()
The view is quite simple:
class CountryApiGeoListView(ListAPIView):
queryset = Region.objects.simplified()
serializer_class = CountryGeoSerializer
And the serializer:
class CountryGeoSerializer(GeoFeatureModelSerializer):
class Meta:
model = Region
geo_field = 'geom'
queryset = Region.objects.filter(active=True)
fields = ('name', 'slug', 'region_code', 'geom')
I ended up settling on the PostGIS function ST_SimplifyVW() after running some tests.
My dataset has 20 countries with geometry provided by Natural Earth. Without optimizing, the geojson file was 1.2MB in size, the query took 17ms to run and 1.15 seconds to load in my browser. Of course, the quality of the rendered outline was great. I then tried the ST_Simplify() and ST_SimplifyVW() functions with different parameters. From these very rough tests, I decided on ST_SimplifyVW(geom, 0.01)
**Function Size Query time Load time Appearance**
None 1.2MB 17ms 1.15s Great
ST_Simplify(geom, 0.1) 240K 15.94ms 371ms Barely Acceptable
ST_Simplify(geom, 0.01) 935k 22.45ms 840ms Good
ST_SimplifyVW(geom, 0.01) 409K 25.92ms 628ms Good
My setup was Postgres 9.4 and PostGIS 2.2. ST_SimplifyVW is not included in PostGIS 2.1, so you must use 2.2.
You could save some space by setting the precision with GeometryField during serialization. This is an extract of my code to model the same WorldBorder model defined in geodjango GIS tutorial. For serializers.py:
from rest_framework_gis.serializers import (
GeoFeatureModelSerializer, GeometryField)
from .models import WorldBorder
class WorldBorderSerializer(GeoFeatureModelSerializer):
# set a custom precision for the geometry field
mpoly = GeometryField(precision=2, remove_duplicates=True)
class Meta:
model = WorldBorder
geo_field = "mpoly"
fields = (
"id", "name", "area", "pop2005", "fips", "iso2", "iso3",
"un", "region", "subregion", "lon", "lat",
)
Defining explicitely the precision with mpoly = GeometryField(precision=2) will do the trick. The remove_duplicates=True will remove identical points generated by truncating numbers. You need to keep the geo_field reference to your geometry field in the Meta class, or the rest framework will not work. This is my views.py code to see the GeoJSON object using ViewSet:
from rest_framework import viewsets, permissions
from .models import WorldBorder
from .serializers import WorldBorderSerializer
class WorldBorderViewSet(viewsets.ModelViewSet):
queryset = WorldBorder.objects.all()
serializer_class = WorldBorderSerializer
permission_classes = (permissions.IsAuthenticatedOrReadOnly, )
However the most effective improvement in saving space is to simplify geometries as described by geoAndrew. Here I calculate on the fly the geometry simplification using serializers:
from rest_framework_gis.serializers import (
GeoFeatureModelSerializer, GeometrySerializerMethodField)
from .models import WorldBorder
class WorldBorderSerializer(GeoFeatureModelSerializer):
# in order to simplify poligons on the fly
simplified_mpoly = GeometrySerializerMethodField()
def get_simplified_mpoly(self, obj):
# Returns a new GEOSGeometry, simplified to the specified tolerance
# using the Douglas-Peucker algorithm. A higher tolerance value implies
# fewer points in the output. If no tolerance is provided, it
# defaults to 0.
return obj.mpoly.simplify(tolerance=0.01, preserve_topology=True)
class Meta:
model = WorldBorder
geo_field = "simplified_mpoly"
fields = (
"id", "name", "area", "pop2005", "fips", "iso2", "iso3",
"un", "region", "subregion", "lon", "lat",
)
The two solutions are different and can't be merged (see how rest_framework.gis.fields is implemented). Maybe simplifing the geometry is the better solution to preserve quality and save space. Hope it helps!
Related
My goal is to create a DRF model in H2O with the TRAIN, VALIDATION and TEST datasets I have and predict the RMSE, R2, MSE etc on the TEST model.
Below is the piece of code:
DRFParameters rfParms = (DRFParameters) algParameter;
rfParms._response_column = trainDataFrame._names[responseColumn(trainDataFrame)]; //The response column
rfParms._train = trainDataFrame._key;
//rfParms._valid = testDataFrame._key;
rfParms._nfolds = 5;
DRF job = new DRF(rfParms);
DRFModel drf = job.trainModel().get(); // Train the model
Frame pred = drf.score(testDataFrame); //Score the test
Here I don't know how to proceed with in finding the predictions (R2, RMSE, MSE, MAE etc) after scoring.
Could you please help in H2O DRF modeling and predictions calculation using JAVA?
Depending on whether your model is a regression, binomial or multinomial model you'll have to use one of ModelMetricsRegression.make(), ModelMetricsBinomial.make() or ModelMetricsMultinomial.make(). They have slightly different signatures - you can find them in our Java docs.
For the trainDataFrame you can get them from your drf model, it's in drf._output._training_metrics (you might need to cast it to an appropriate type as this one is a generic ModelMetrics). If you use your test dataset as a validation frame you can get the metrics from drf._output._validation_metrics.
#Edit:
DRFModel drf = job.trainModel().get(); // Train the model
Frame pred = drf.score(testDataFrame); //Score the test
ModelMetricsBinomial mm = ModelMetricsBinomial.make(preds.vec(2), trainDataFrame.vec(rfParms._response_column));
double auc = mm.auc();
double rmse = mm.rmse();
double r2 = mm.r2();
// etc.
Is it better to use nested relationships or PrimaryKeyRelated field if you have lots of data?
I have a model with deep relationships.
For simplicity I did not add the colums.
Model:
Usecase:
User creates 1 Workoutplan with 2 Workouts and 3 WorkoutExercises.
User creates 6 Sets for each WorkoutExercise/Exercise.
User starts workout > new FinishedWorkout is created
User does first exercise and enters the used weights > new FinishedWorkoutExercise with FinishedSet is created
Question:
I want to track the progression for each workoutplan > workout > exercise.
So with time the user may have finished dozens of workouts therefore hundreds if sets are already in the database.
If I now use nested Relationships I may load a lot of data I don't need.
But if I use PrimaryKeyRelatedFields I have to load all the data I need separately which means more effort in my frontend.
Which method is preferred in such a situation?
Edit:
If I use PrimaryKeyRelatedFields how do I distinguish if e.g. Workouts in Workoutplan is an array with primary keys or an array with the loaded objects?
If you use PrimaryKeyRelatedField, you'll have a big overload to request the the necessary data in frontend
In your case, I would create specific serializers with the fields you want (using Meta.fields attribute). So, you won't load unecessary data and the frontend won't need to request more data from backend.
I can write a sample code, if you need more details.
I'll get to the question regarding serializers in a second, but first of all and for clarification. What is the purpose of having duplicate models as Workout/Finished Workout, Set/Finished Set,...?
Why not...
class Workout(models.Model):
#...stuff...
finished = models.DateTimeField(null=True, blank=True)
#...more stuff...
Then you can just set a finished date on a workout when it's done.
Now, regarding the question. I would suggest you think about user interactions. What parts of the front-end are you trying to populate? How is the data related and how would the user access it?
You should think about what parameters you're querying DRF with. You can send a date and expect workouts finished on a specific day:
// This example is done in Angular, but you get the point...
var date= {
'day':'24',
'month':'10',
'year':'2015'
};
API.finishedWorkout.query(date).$promise
.then(function(workouts){
//...workouts is an array of workout objects...
});
Viewset...
class FinishedWorkoutViewset(viewsets.GenericAPIView,mixins.ListModelMixin):
serializer_class = FinishedWorkOutSerializer
queryset = Workout.objects.all()
def list(self, request):
user = self.request.user
day = self.data['day'];
month = self.data['month'];
year = self.data['year'];
queryset = self.filter_queryset(self.get_queryset().filter(finished__date=datetime.date(year,month,day)).filter(user=user))
page = self.paginate_queryset(queryset)
serializer = self.get_serializer(queryset, many=True)
return response.Response(serializer.data)
And then your FinishedWorkoutSerializer can just have whatever fields you want for that specific type of query.
This leaves you with a bunch of very specific URLs, which isn't all that great, but you can use specific serializers for those interactions and you're also open to dynamically changing the filter, depending on what paramaters are in self.data.
There is also a chance that you may want to filter differently depending what method is being called, say you want to list only active exercises, but if a user queries a specific exercise, you want him to have access to it (note that the Exercise object should have a models.BooleanField attribute called "active").
class ExerciseViewset(viewsets.GenericViewSet, mixins.RetrieveModelMixin, mixins.ListModelMixin):
serializer_class = ExerciseSerializer
queryset = Exercise.objects.all()
def list(self, request):
queryset = self.filter_queryset(self.get_queryset().filter(active=True))
page = self.paginate_queryset(queryset)
serializer = self.get_serializer(queryset, many=True)
return response.Response(serializer.data)
Now you have different objects show up on the same URL, depending on the action. It's a bit closer to what you need, but you're still using the same serializer, so if you need a huge nested object on retrieve(), you're also gonna get a bunch of them when you list().
In order to keep lists short and details nested, you need to use different serializers.
Let's say you want to only send exercises' pk and name attributes when they are listed, but whenever an exercise is queried, you wan't to send along all related "Set" objects ordered inside an array of "WorkoutSets"...
# Taken from an SO answer on an old question...
class MultiSerializerViewSet(viewsets.GenericViewSet):
serializers = {
'default': None,
}
def get_serializer_class(self):
return self.serializers.get(self.action, self.serializers['default'])
class ExerciseViewset(MultiSerializerViewSet, mixins.RetrieveModelMixin, mixins.ListModelMixin):
queryset = Exercise.objects.all()
serializers = {
'default': SimpleExerciseSerializer,
'retrieve': DetailedExerciseSerializer
}
Then your serializers.py could look a bit like...
#------------------Exercise
#--------------------------Simple List
class SimpleExerciseSerializer(serializers.ModelSerializer):
class Meta:
model Exercise
fields = ('pk','name')
#--------------------------Detailed Retrieve
class ExerciseWorkoutExerciseSetSerializer(serializers.ModelSerializer):
class Meta:
model Set
fields = ('pk','name','description')
class ExerciseWorkoutExerciseSerializer(serializers.ModelSerializer):
set_set = ExerciseWorkoutExerciseSetSerializer(many=True)
class Meta:
model WorkoutExercise
fields = ('pk','set_set')
class DetailedExerciseSerializer(serializers.ModelSerializer):
workoutExercise_set = exerciseWorkoutExerciseSerializer(many=True)
class Meta:
model Exercise
fields = ('pk','name','workoutExercise_set')
I'm just throwing around use cases and attributes that probably make no sense in your model, but I hope this is helpfull.
P.S.; Check out how Java I got in the end there :p "ExcerciseServiceExcersiceBeanWorkoutFactoryFactoryFactory"
Somebody knows - is it possible to save trained model of Spark's Naive Bayes classificator (for example in text file), and load it in future if required?
Thank You.
I tried saving and loading the model. I was not able to recreate the model using the stored weights. ( Couldn't find the proper constructor ). But the whole model is serializable. So you can store and load it as follows :
store as :
val fos = new FileOutputStream(<storage path>)
val oos = new ObjectOutputStream(fos)
oos.writeObject(model)
oos.close
and load it in:
val fos = new FileInputStream(<storage path>)
val oos = new ObjectInputStream(fos)
val newModel = oos.readObject().asInstanceOf[org.apache.spark.mllib.classification.LogisticRegressionModel]
It worked for me
it is discussed in this thread :
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-td11953.html
You can use built-in functions (Spark version 2.1.0). Use NaiveBayesModel#save in order to store the model and NaiveBayesModel#load in order to read previously stored model.
Method save comes from Saveable and is implemented by wide range of classification models. Method load seems to be static in each classification model implementation.
I have a ModelViewSet that I want to add filtering to. My simple model looks like
class Article(models.Model):
date = = models.DateField()
language = models.CharField(max_length=10)
class Meta:
ordering = ['-date']
And the ModelViewSet (read only):
class ArticleViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Article.objects.all()
serializer_class = ArticleSerializer
Articles on the API are now ordered by date descending as I would expect. Now I wich to allow filtering on language. I've set the filter backend to DjangoFilterBackend in settings.py. My updated ModelViewSet now looks like:
class ArticleViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Article.objects.all()
serializer_class = ArticleSerializer
filter_fields = ['language']
This changes the ordering to language ASC. Adding order_by('-date') to queryset does not change anything. Adding ordering = ('-date', ) does not change anything. => How do I specify both filtering and ordering (or simply use default ordering while allowing filtering)?
EDIT:
Current functionality seems to come from AutoFilterSet created in Rest Framework by default:
https://github.com/tomchristie/django-rest-framework/blob/822eb39599b248c68573c3095639a831ab6df99a/rest_framework/filters.py#L53
... where order_by=True and the handing of this in django-filter get_ordering_field here: https://github.com/alex/django-filter/blob/d88b98dd2b70551deb9c128b209fcf783b325acc/django_filters/filterset.py#L325
=> Seems I have to create a FilterSet class:
class LanguageFilter(django_filters.FilterSet):
class Meta:
model = Article
fields = ['language']
order_by = model()._meta.ordering
class ArticleViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Article.objects.all()
serializer_class = ArticleSerializer
filter_class = LanguageFilter
Does this look correct? Seems a bit "much"/verbose to retain default ordering.
Rather than implementing your own FilterSet, you can instead just add an OrderingFilter, specifying an ordering = ['-date'] or better: ordering = Article._meta.ordering on your view, to restore the lost (default) ordering. This would also allow your users to use an ordering query parameter to override your default ordering of results.
Note that this issue has been resolved in master... https://github.com/tomchristie/django-rest-framework/pull/1836 and is due to be released in version 2.4.3.
Good question.
Is ok to apply an ordering filter in conjuction with a Django-Filter but I think is not right that a Filter Backend applies a reorder function.
In my case I have to cache my random queryset and so i can't use Django-Filter anymore, even if I'm not filtering at the page's first asyncronous call.
I'm using PySide to write a plugin browser. The available plugins are stored in a three dimensional model like this:
pluginType/pluginCategory/pluginName
e.g.:
python/categoryA/toolA
python/categoryB/toolAA
etc.
In my custom view, I am showing all tools of a given plugin type (i.e. "python") in a list, regardless of their category:
(python)
categoryA/toolA
categoryA/toolB
categoryA/toolC
categoryB/toolAA
categoryB/toolBB
categoryB/toolCC
I am now wondering how to best sort this view, so the tools are sorted by name regardless of their parent category. The sorting method in my current proxy model yields a sorted list per category like the above one, but what I am after is this:
(python)
categoryA/toolA
categoryB/toolAA
categoryA/toolB
categoryB/toolBB
categoryA/toolC
categoryB/toolCC
Do I have to make my proxy model convert the multi-dimensional source model into a one-dimensional one in order to achieve this or is there a better way? I would love to be able to sync the custom view with a standard tree view which is why I chose the multi-dimensional model.
Thanks,
frank
edit 1:
Here is what I have as a simplified example. I'm not sure if this is the way to go about it (changing the model structure into a 1-dimensional model), and if it is, I'm not sure how to create the data in the proxy model properly so it is linked with the source model as expected.
import sys
from PySide.QtGui import *
from PySide.QtCore import *
class ToolModel(QStandardItemModel):
'''multi dimensional model'''
def __init__(self, parent=None):
super(ToolModel, self).__init__(parent)
self.setTools()
def setTools(self):
for contRow, container in enumerate(['plugins', 'python', 'misc']):
contItem = QStandardItem(container)
self.setItem(contRow, 0, contItem)
for catRow, category in enumerate(['catA', 'catB', 'catC']):
catItem = QStandardItem(category)
contItem.setChild(catRow, catItem)
for toolRow, tool in enumerate(['toolA', 'toolB', 'toolC']):
toolItem = QStandardItem(tool)
catItem.setChild(toolRow, toolItem)
class ToolProxyModel(QSortFilterProxyModel):
'''
proxy model for sorting and filtering.
need to be able to sort by toolName regardless of category,
So I might have to convert the data from sourceModel to a 1-dimensional model?!
Not sure how to do this properly.
'''
def __init__(self, parent=None):
super(ToolProxyModel, self).__init__(parent)
def setSourceModel(self, model):
index = 0
for contRow in xrange(model.rowCount()):
containerItem = model.item(contRow, 0)
for catRow in xrange(containerItem.rowCount()):
categoryItem = containerItem.child(catRow)
for itemRow in xrange(categoryItem.rowCount()):
toolItem = categoryItem.child(itemRow)
# how to create new, 1-dimensional data for self?
app = QApplication(sys.argv)
mainWindow = QWidget()
mainWindow.setLayout(QHBoxLayout())
model = ToolModel()
proxyModel = ToolProxyModel()
proxyModel.setSourceModel(model)
treeView = QTreeView()
treeView.setModel(model)
treeView.expandAll()
listView = QListView()
listView.setModel(proxyModel)
mainWindow.layout().addWidget(treeView)
mainWindow.layout().addWidget(listView)
mainWindow.show()
sys.exit(app.exec_())
edit:
Or maybe I should be asking how to best prepare a source model so that it can be used by QTreeView but also sorted in the above mentioned way for display in a list view?!
Use a QTableView and sort (using the sort proxy) by the tool_name column.