Django REST - separating valid data from non-valid and serializing the former with many=True - django-rest-framework

I am using Django REST framework to come up with a REST api for my app.
In one of my views, I am trying to use many=True when initializing my serializer object in order to bulk_insert multiple rows at once. The problem is that if one of the records in the dataset is invalid, serializer's is_valid() method return False, thus rejecting the entire dataset. Whereas the desired behavior would be inserting valid records and ignoring invalid ones.
I have succeed in achieving this using the following code, but I have a terrible feeling that this is junk code and the REST framework has a native way to do this.
My code below (that I consider junk code :)):
serializers.py
class MySerializer(serializers.ModelSerializer):
class Meta:
model = CalendarEventAttendee
fields = '__all__'
view.py
def my_view(request):
validated_data = []
# Separate valid data from invalid
for record in request.data:
if MySerializer(data = record).is_valid():
validated_data.append(record)
# bulk_insert valid data
serializer = MySerializer(data=validated_data, many=True)
if serializer.is_valid():
serializer.save()
Can anyone suggest a better approach ?

Related

Django Rest Framework ModelSerializer get field data depending on query param

I have a ModelSerializer with several field, many of them are StringRelatedField so the default representation of this fields is given by the __str__ method on the model, in my case this are mostly name field in each model. In other cases I need to retrieve the id instead, so how can I do this, for example depending of a query param.
Solving this way for now, please share if someone has a better solution:
class ExampleSerializer(serializers.ModelSerializer):
...
def __init__(self, *args, **kwargs):
super(ExampleSerializer, self).__init__(*args, **kwargs)
v = self.context['request'].query_params.get('v', None) # using v query param for a "variant" handling
if v == '1': # one of my variants
self.fields['examplefield'] = serializers.StringRelatedField(many=False, allow_null=True)
Note that there is no else or another if statement because I only needed two variants, one of them for retrieving the name field, the other for the id. By default DRF will use PrimaryRelatedField (which is the id) for ModelSerializer related fields see doc

Django REST Framework (DRF): ModelSerializer does not validate models on serialization

I want to ask how to use Django REST Framework (DRF) ModelSerializers correctly for serializing from model.
I have Django model with two required fields:
class Book(models.Model):
title = models.CharField()
desc = models.CharField()
I have DRF ModelSerializer:
class BookSerializer(serializers.ModelSerializer):
class Meta:
model = Book
fields = ['title', 'desc']
I can deseralize and validate incoming request using:
serializer = BookSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
But how to serialize and send response? DRF allows me to break contact built using ModelSerializer. If I forgot to set one of mandatory Book fields, it will still still pass through BookSerializer!
invalid_book = Book(title="Foo") # but forgotten to set "desc"
serializer = BookSerializer(instance=invalid_book)
serializer.data # it contains book without required "desc"
Serialized created using instance parameter throws error if I try is_validate().
Why ModelSerializer can validate incoming data, but cannot outgoing?
Validation is only performed when deserializing. As per the documentation:
Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data.
That makes sense (edit: in the way Django Rest Framework seems to be construed). Because it isn't the 'role' of the Serializer to make sure that your complex data such as querysets and model instances (eg. your Book instance) that you are going to serialize is construed 'legitimately', thus they also don't validate while serializing.
So if you would save the instance like invalid_book.save(), Django would throw an error because of the missing field.
Edit
After a comment about being 'a point of view' and thus being opiniated I want to stress and make clear that this seems to be the way that Django Rest Framework (DRF) is construed. After digging deeper on SO I link this answer in support.
Also if you read the documentation of DRF, it is somewhat implied that serialization and validation are two separate concepts.
Furthermore, analyzing serializers.py makes clear that validation is only run when calling is_valid() and the validation is only run on the provided data flag. In fact, it can't even be run when only an instance is provided:
def __init__(self, instance=None, data=empty, **kwargs):
self.instance = instance
if data is not empty:
self.initial_data = data
self.partial = kwargs.pop('partial', False)
self._context = kwargs.pop('context', {})
kwargs.pop('many', None)
super().__init__(**kwargs)
...
def is_valid(self, raise_exception=False):
assert hasattr(self, 'initial_data'), (
'Cannot call `.is_valid()` as no `data=` keyword argument was '
'passed when instantiating the serializer instance.'
)
if not hasattr(self, '_validated_data'):
try:
self._validated_data = self.run_validation(self.initial_data)
except ValidationError as exc:
self._validated_data = {}
self._errors = exc.detail
else:
self._errors = {}
if self._errors and raise_exception:
raise ValidationError(self.errors)
return not bool(self._errors)
You are under a very wrong assumption. A serializer (not de-serializer) does one thing. Convert an Object to JSON. Here, you are creating an object Book(name='sad book'). This is just a regular Python Object. Django Serializers will attempt to serialize any Object that is passed to it.
What you might be wondering is the field is required in Model but why doesn't the serializer validate? Because of the way DRF handles serialization. I will show some excerpts from DRF Source code.
This is how the data property is calulated.
class BaseSerializer():
...
...
#property
def data(self):
if hasattr(self, 'initial_data') and not hasattr(self, '_validated_data'):
msg = (
'When a serializer is passed a `data` keyword argument you '
'must call `.is_valid()` before attempting to access the '
'serialized `.data` representation.\n'
'You should either call `.is_valid()` first, '
'or access `.initial_data` instead.'
)
raise AssertionError(msg)
if not hasattr(self, '_data'):
if self.instance is not None and not getattr(self, '_errors', None):
# THIS IS WHERE WE GO. THE to_representation() CAN BE FOUND IN THE IMPLEMENTATION
# OF ModelSerializer() which inherits from this class BaseSerializer
self._data = self.to_representation(self.instance)
elif hasattr(self, '_validated_data') and not getattr(self, '_errors', None):
self._data = self.to_representation(self.validated_data)
else:
self._data = self.get_initial()
return self._data
What happens in ModelSerializer.to_representation() ?
class ModelSerializer(BaseSerializer):
...
...
def to_representation(self, instance):
"""
Object instance -> Dict of primitive datatypes.
"""
ret = OrderedDict()
fields = self._readable_fields
for field in fields:
try:
attribute = field.get_attribute(instance)
except SkipField:
continue
# We skip `to_representation` for `None` values so that fields do
# not have to explicitly deal with that case.
#
# For related fields with `use_pk_only_optimization` we need to
# resolve the pk value.
check_for_none = attribute.pk if isinstance(attribute, PKOnlyObject) else attribute
if check_for_none is None:
ret[field.field_name] = None
else:
ret[field.field_name] = field.to_representation(attribute)
return ret
As you can see, in this case the serializer only maps the fields from the Object being passed. So, there is no validation during serialization. For more info, check the source code of DRF. It's pretty easy if you use Pycharm Pro.

django-REST: Nested relationships vs PrimaryKeyRelatedField

Is it better to use nested relationships or PrimaryKeyRelated field if you have lots of data?
I have a model with deep relationships.
For simplicity I did not add the colums.
Model:
Usecase:
User creates 1 Workoutplan with 2 Workouts and 3 WorkoutExercises.
User creates 6 Sets for each WorkoutExercise/Exercise.
User starts workout > new FinishedWorkout is created
User does first exercise and enters the used weights > new FinishedWorkoutExercise with FinishedSet is created
Question:
I want to track the progression for each workoutplan > workout > exercise.
So with time the user may have finished dozens of workouts therefore hundreds if sets are already in the database.
If I now use nested Relationships I may load a lot of data I don't need.
But if I use PrimaryKeyRelatedFields I have to load all the data I need separately which means more effort in my frontend.
Which method is preferred in such a situation?
Edit:
If I use PrimaryKeyRelatedFields how do I distinguish if e.g. Workouts in Workoutplan is an array with primary keys or an array with the loaded objects?
If you use PrimaryKeyRelatedField, you'll have a big overload to request the the necessary data in frontend
In your case, I would create specific serializers with the fields you want (using Meta.fields attribute). So, you won't load unecessary data and the frontend won't need to request more data from backend.
I can write a sample code, if you need more details.
I'll get to the question regarding serializers in a second, but first of all and for clarification. What is the purpose of having duplicate models as Workout/Finished Workout, Set/Finished Set,...?
Why not...
class Workout(models.Model):
#...stuff...
finished = models.DateTimeField(null=True, blank=True)
#...more stuff...
Then you can just set a finished date on a workout when it's done.
Now, regarding the question. I would suggest you think about user interactions. What parts of the front-end are you trying to populate? How is the data related and how would the user access it?
You should think about what parameters you're querying DRF with. You can send a date and expect workouts finished on a specific day:
// This example is done in Angular, but you get the point...
var date= {
'day':'24',
'month':'10',
'year':'2015'
};
API.finishedWorkout.query(date).$promise
.then(function(workouts){
//...workouts is an array of workout objects...
});
Viewset...
class FinishedWorkoutViewset(viewsets.GenericAPIView,mixins.ListModelMixin):
serializer_class = FinishedWorkOutSerializer
queryset = Workout.objects.all()
def list(self, request):
user = self.request.user
day = self.data['day'];
month = self.data['month'];
year = self.data['year'];
queryset = self.filter_queryset(self.get_queryset().filter(finished__date=datetime.date(year,month,day)).filter(user=user))
page = self.paginate_queryset(queryset)
serializer = self.get_serializer(queryset, many=True)
return response.Response(serializer.data)
And then your FinishedWorkoutSerializer can just have whatever fields you want for that specific type of query.
This leaves you with a bunch of very specific URLs, which isn't all that great, but you can use specific serializers for those interactions and you're also open to dynamically changing the filter, depending on what paramaters are in self.data.
There is also a chance that you may want to filter differently depending what method is being called, say you want to list only active exercises, but if a user queries a specific exercise, you want him to have access to it (note that the Exercise object should have a models.BooleanField attribute called "active").
class ExerciseViewset(viewsets.GenericViewSet, mixins.RetrieveModelMixin, mixins.ListModelMixin):
serializer_class = ExerciseSerializer
queryset = Exercise.objects.all()
def list(self, request):
queryset = self.filter_queryset(self.get_queryset().filter(active=True))
page = self.paginate_queryset(queryset)
serializer = self.get_serializer(queryset, many=True)
return response.Response(serializer.data)
Now you have different objects show up on the same URL, depending on the action. It's a bit closer to what you need, but you're still using the same serializer, so if you need a huge nested object on retrieve(), you're also gonna get a bunch of them when you list().
In order to keep lists short and details nested, you need to use different serializers.
Let's say you want to only send exercises' pk and name attributes when they are listed, but whenever an exercise is queried, you wan't to send along all related "Set" objects ordered inside an array of "WorkoutSets"...
# Taken from an SO answer on an old question...
class MultiSerializerViewSet(viewsets.GenericViewSet):
serializers = {
'default': None,
}
def get_serializer_class(self):
return self.serializers.get(self.action, self.serializers['default'])
class ExerciseViewset(MultiSerializerViewSet, mixins.RetrieveModelMixin, mixins.ListModelMixin):
queryset = Exercise.objects.all()
serializers = {
'default': SimpleExerciseSerializer,
'retrieve': DetailedExerciseSerializer
}
Then your serializers.py could look a bit like...
#------------------Exercise
#--------------------------Simple List
class SimpleExerciseSerializer(serializers.ModelSerializer):
class Meta:
model Exercise
fields = ('pk','name')
#--------------------------Detailed Retrieve
class ExerciseWorkoutExerciseSetSerializer(serializers.ModelSerializer):
class Meta:
model Set
fields = ('pk','name','description')
class ExerciseWorkoutExerciseSerializer(serializers.ModelSerializer):
set_set = ExerciseWorkoutExerciseSetSerializer(many=True)
class Meta:
model WorkoutExercise
fields = ('pk','set_set')
class DetailedExerciseSerializer(serializers.ModelSerializer):
workoutExercise_set = exerciseWorkoutExerciseSerializer(many=True)
class Meta:
model Exercise
fields = ('pk','name','workoutExercise_set')
I'm just throwing around use cases and attributes that probably make no sense in your model, but I hope this is helpfull.
P.S.; Check out how Java I got in the end there :p "ExcerciseServiceExcersiceBeanWorkoutFactoryFactoryFactory"

Django Rest Framework "This field is required" only when POSTing JSON, not when POSTing form content

I'm getting a strange result whereby POSTing JSON to a DRF endpoint returns:
{"photos":["This field is required."],"tags":["This field is required."]}'
Whereas when POSTing form data DRF doesn't mind that the fields are empty.
My model is:
class Story(CommonInfo):
user = models.ForeignKey(User)
text = models.TextField(max_length=5000,blank=True)
feature = models.ForeignKey("Feature", blank=True, null=True)
tags = models.ManyToManyField("Tag")
My serializer is:
class StorySerializer(serializers.HyperlinkedModelSerializer):
user = serializers.CharField(read_only=True)
def get_fields(self, *args, **kwargs):
user = self.context['request'].user
fields = super(StorySerializer, self).get_fields(*args, **kwargs)
fields['feature'].queryset = fields['feature'].queryset.filter(user=user)
fields['photos'].child_relation.queryset = fields['photos'].child_relation.queryset.filter(user=user)
return fields
class Meta:
model = Story
fields = ('url', 'user', 'text', 'photos', 'feature', 'tags')
And my api.py is:
class StoryViewSet(viewsets.ModelViewSet):
serializer_class = StorySerializer
def get_queryset(self):
return self.request.user.story_set.all()
def perform_create(self, serializer):
serializer.save(user=self.request.user)
The results:
# JSON request doesn't work
IN: requests.post("http://localhost:8001/api/stories/",
auth=("user", "password",),
data=json.dumps({'text': 'NEW ONE!'}),
headers={'Content-type': 'application/json'}
).content
OUT: '{"photos":["This field is required."],"tags":["This field is required."]}'
# Form data request does work
IN: requests.post("http://localhost:8001/api/stories/",
auth=("user", "password",),
data={'text': 'NEW ONE!'},
).content
OUT: '{"url":"http://localhost:8001/api/stories/277/","user":"user","text":"NEW ONE!","photos":[],"feature":null,"tags":[]}'
The issue here isn't obvious at first, but it has to do with a shortcoming in form-data and how partial data is handled.
Form data has two special cases that Django REST framework has to handle
There is no concept of "null" or "empty" data for some inputs, including checkboxes and other inputs that allow for multiple selections.
There is no input type that supports multiple values for a single field, checkboxes being the one exception.
Both of these combine together to make it difficult to handle accepting form data within Django REST framework, so it has to handle a few things differently from most parsers.
If a field is not passed in, it is assumed to be None or the default value for the field. This is because inputs with no values are not passed along in the form data, so their key is missing.
If a single value is passed in for a multiple-value field, it will be treated like the one selected value. This is because there is no difference between a single checkbox selected out of many and a single checkbox at all in form data. Both of them are passed in as a single key.
But the same doesn't apply to JSON. Because you are not passing an empty list in for the photos and tags keys, DRF does not know what to give it for a default value and does not pass it along to the serializer. Because of this, the serializer sees that there is nothing passed in and triggers the validation error because the required field was not provided.
So the solution is to always provide all keys when using JSON (not including PATCH requests, which can be partial), even if they contain no data.

Django: How do I cache object from get_object() in a class-based view?

I have been struggling for hours with this: I just can't figure a proper way to cache an object queryset result (object = queryset.get()) in order to avoid re-hitting the database on each view request.
This is my current (simplified) code, and as you can see, I override get_object() to add some extra data (not only the today variable), check if object is in sessions and add object to session.
views.py
from myapp import MyModel
from django.core.cache.utils import make_template_fragment_key
from django.views.generic import DetailView
class myClassView(DetailView):
model = MyModel
def get_object(self,queryset=None):
if queryset is None:
queryset = self.get_queryset()
pk = self.kwargs.get(self.pk_url_kwarg, None)
if pk is not None:
queryset = queryset.filter(pk=pk)
else:
raise AttributeError("My error message.")
try:
today = datetime.today().strftime('%Y%m%d')
cache_key = make_template_fragment_key('some_name', [pk, today])
if cache.has_key(cache_key):
object = self.request.session[cache_key]
return object
else:
object = queryset.get()
object.id = my_id
object.today = today
# Add object to session
self.request.session[cache_key] = object
except queryset.model.DoesNotExist:
raise Http404("Error 404")
return object
The above only works if I add the following:
settings.py
SESSION_SERIALIZER = 'django.contrib.sessions.serializers.PickleSerializer'
But I don't like this hack since it is not secure for Django 1.6 and newer versions because, according to How To Use Sessions (Django 1.7 documents):
If the SECRET_KEY is not kept secret and you are using the PickleSerializer, this can lead to arbitrary remote code execution
If I don't add the SESSIONS_SERIALIZER line I get a "django object is not JSON serializable" error. However, elsewhere my code breaks and I get KeyError errors when trying to pull data from session. This issue is solved converting my string keys into integers. Before changing the settings file Django was converting the str keys into integers automatically when session data was getting requested.
So considering this session serializer security issue I'd prefer other option. So I read here and here about caching get_object(), but I just don't get how to fit that into my get_object() bit. I tried..
if cache.has_key(cache_key):
self._object = super(myClassView,self).get_object(queryset=None)
return self._object
...but it fails. This seems the best solution so far. But how do I implement this into my code? Or, is there a better idea? I'm all ears. Thanks!
You should step back and reassess the situation. What are you trying to achieve?
The get_object is a method that get called in the detailed view to access one specific object from the database.
If you access this method the first time the object gets invalidated and cached in the Queryset.
In order to cache the get_queryset method you need a good cache backend like Redis or Memcached in place so that you can do a simple Write-through Cache operation:
if cache.has_key(cache_key):
object = cache.get(cache_key)
return object
else:
object = queryset.get(pk=pk)
cache.set(cache_key,object)
return object
Note that the django objects are serialized in the cache backend and retrieved as objects when deserialised.
That approach is the just a starting point. You cache the object the first time it misses.
You can also add a post_save,post_update signal to save the object in the cache every time the model is saved or updated:
#receiver(post_save, sender=MyModel)
#receiver(post_delete, sender=MyModel)
def add_MyModel_to_cache(sender, **kwargs):
object = kwargs['instance']
cache.set(cache_key,object)
You have to carefully review what you want to cache and when as it is very easy to misjudge requests

Resources