Scrapy check if image response is 404 - image

I want to process image URLs, I enabled and configured as Scrapy Docs; but what happens if the image URL returns 404 or is redirected. I want to log that, save the failed URLs and the HTTP error/redirect code. Where can I put the code to do that?

It is completely wrong to handle that in the pipleline, because the response would go throw all the middlewares back to your spider then to your pipleline, while your purpose is just logging the failure.
You should build your own middleware to handle any HTTP response code.
By default, scrapy allows responses with statues codes between 200 and 300. You can edit that by listing the statue codes that you would like to receive like this:
class Yourspider(spider):
handle_httpstatus_list = [404, 302] #add any other code you need
Then you should build your middleware and add it to your configuration like this:
DOWNLOADER_MIDDLEWARES = {
'myProject.myMiddlewares.CustomSpiderMiddleware': SELECT_NUMBER_SUITS_FOR_YOU,
}
in your CustomSpiderMiddleware check the status like this:
process_spider_input(response, spider):
if response.status == 404
#do what ever you want

You have to create your custom pipeline, inherit it from the Imagepipeline, then override the item_completed method, as mentioned in the documentation
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
and lastly in the settings.py add your custom pipeline
ITEM_PIPELINES = {
'myproject.mypipeline': 100,
}

Related

Django response send file as well as some text data

Currently I am send a zip file in response from my Django-Rest controller, the zip file will get downloaded in front-end and this feature is working fine but now I want to send some data as well with the zip file in response, is there any way?
This is my Django-REST controller code
response = HttpResponse(byte_io.getvalue(),content_type='application/x-zip-compressed')
response['Content-Disposition'] = f'attachment;filename{my-sample-zip-file}'
return response
How can I send some data with this zip file on front-end?
You can do it with HTTP Request-Response module of django.
Refer: https://docs.djangoproject.com/en/2.2/ref/request-response/#telling-the-browser-to-treat-the-response-as-a-file-attachment
If you want to do it in production like code, you might also want to handle case where file is not available, set error and return neat. Also, better put that in a common file and then define download_file like below:
from common_library import FileResponse
from django.http import JsonResponse
def download_file( self ):
if self.error_response:
response = JsonResponse( { "error" : self.error_response } )
else:
response = FileResponse(self.report_file_abs_path, self.report_filename)
response['Content-Type'] = 'application/xlsx'
response['Content-Disposition'] = 'attachment; filename=' + self.report_filename
return response
NOTE: FileResponse is a user defined wrapper function which you can define.

drf custom response to override Response()

I'm using DRF and to return response, I used Response() which located at from rest_framework.response import Response
To make custom response, first, I copied all of the source to custom file.
And in my views, I changed Response() to my own file.
But when I running django server and access via web, it through errors.
AssertionError: .accepted_renderer not set on Response
I just copied original Repsonse() and re-use it.
Why it occur errors?
Purpose of custom response is I want to add more argument that something likes cursor for pagination.
As you know that in original Response, takes 6 arguments.
def __init__(self, data=None, status=None,
template_name=None, headers=None,
exception=False, content_type=None):
And return it in def rendered_content(self), line of ret = renderer.render(self.data, accepted_media_type, context)
So my scenario is, add cursor to __init__ and pass through it to renderer.render().
Any problem with my way?
Thanks.

Django ajax warning on same page after DoesNotExist exception on form POST

I have a django form attached to a view. In the form a user types in a query which is passed to a Model.objects.get( query ) like so:
def post(self, request):
try:
Model.objects.get(query)
except Model.DoesNotExist:
# something here
Upon exception i'd like to send an ajax request to my template that stops it from refreshing, and displays a warning to the user that there's nothing in the database matching that get request. What would I put in the view and the template?
The http standard response would be a 404 response. Django has a shortcut function for this: get_object_or_404
def post(self, request):
my_object = get_object_or_404(Model, query)
If the lookup fails, django will raise an 404 error, which will result in a 404 http response back to the client. In your javascript ajax handling code, you should check the http status, and handle any 404 responses appropriately.
For example, if you are using the fetch api, the code might look like this.
fetch('/some/url/?query=foobar').then(response => {
if (response.ok) return response.json()
if (response.status == 404) throw new Error('404')
})

How to do AJAX in Django CMS Admin

I have 2 models in Django CMS - a Map (which has name and an image attributes), and a Location, which one of it's attributes is a Map. I want, when the user changes the Map, to perform an AJAX request to get the map details for that item so that I can add the Map image to the page to do some further jQuery processing with it. But I'm new to Django and I can't seem to figure it out. Anything I find seems unrelated - in that it talks about using AJAX on front end forms.
I have my jQuery file ready to go but I don't know what to put for the URL of the AJAX call and how/where to set up the endpoint in Django.
Your question seem related to custom Django admin url.
First, update your MapAdmin to provide an endpoint to search location
from django.contrib import admin
from django.http import JsonResponse
class MapAdmin(admin.ModelAdmin):
def get_urls(self):
admin_view = self.admin_site.admin_view
info = self.model._meta.app_label, self.model._meta.model_name
urls = [
url(r'^search_location$', admin_view(self.search_location), name=("%s_%s_search_location" % (info))),
]
return urls + super(VideoAdmin, self).get_urls()
def search_location(self, request, *args, **kwargs):
map = request.GET.get('map')
# Do something with map param to get location.
# Set safe=False if location_data is an array.
return JsonResponse(["""..location_data"""], safe=False)
Next, somewhere in your template file, define the URL point to search location endpoint. And use that URL to fetch location data
once map is changed.
var searchLocationUrl = "{% url 'admin:appName_mapModel_search_location' %}";

Django POST data dictionary is empty when posting from test client

I am trying to test and AJAX view in my Django Project. When submit the post from JQuery the data is correctly accessible in the Django View but when I try to make queries for the Django test client it is emplty.
Here is the code I use:
The view
def add_item(request):
if request.is_ajax() and request.method == 'POST':
post_data = request.POST
print post_data ## <----- THIS IS EMPTY
name = post_data.get('name')
# Render the succes response
json_data = json.dumps({"success":1})
return HttpResponse(json_data, content_type="application/json")
else:
raise Http404
And the test
class TestAddItem(TestCase):
def test_some_test(self):
data = {
'description':"description",
}
response = self.client.post('theurl', data, content_type='application/json')
Any Idea what I might be doing wrong?
I tried also without content type and also using plain url like thurl/?name=name without succes.
Any help will be appreciated.
Olivier
After trying different combinations of parameter formating, content types, etc..
I found one solution that works :
response = self.client.post(reverse('theurl'), data,
**{'HTTP_X_REQUESTED_WITH': 'XMLHttpRequest'})
And I get the following dictionary on the POST parameters of the request:
<QueryDict: {u'name': [u'name']}>
Edit I love testing :)

Resources