Dajaxice. Cannot call method '...' of undefined. Again - ajax

I'm stuck in trying to create a simplest application with dajaxice.
I'n read all topics about this problem here and not only here, rewrite all code many times, but still do not see what the problem is.
the most interesting that, these examples are working (almost all):
https://github.com/jorgebastida/django-dajaxice/downloads dajaxice-examples.tar.gz
But in my project i have this:
Uncaught TypeError: Cannot call method 'sayhello' of undefined
my tools:
Windows 7 64
python-2.7.3
Django-1.4.2
django-dajaxice-0.2
project structure:
BlocalProject/
templates/
template_1.html
manage.py
BlocalProject/
ajapp/
__init__.py
ajview.py
__init__.py
settings.py
urls.py
views.py
wsgi.py
urls.py:
from django.conf.urls.defaults import *
import settings
from dajaxice.core import dajaxice_autodiscover
dajaxice_autodiscover()
urlpatterns = patterns('',
(r'^%s/' % (settings.DAJAXICE_MEDIA_PREFIX), include('dajaxice.urls')),
(r'^$', 'BlocalProject.views.start_page'),
)
views.py:
from django.shortcuts import render
def start_page(request):
return render(request,'template_1.html')
ajapp.py:
from django.utils import simplejson
from dajaxice.core import dajaxice_functions
def sayhello(request):
return simplejson.dumps({'message': 'Trololo!'})
dajaxice_functions.register(sayhello)
template_1.html:
{% load dajaxice_templatetags %}
<html>
{% dajaxice_js_import %}
<script>
function alertMessage(data){
alert(data.message);
return false;
}
</script>
<body>
Some text
<input type="button" value="Get!" onclick="Dajaxice.ajapp.sayhello(alertMessage);" />
</body>
</html>
settings.py:
# Django settings for BlocalProject project.
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
# ('Your Name', 'your_email#example.com'),
)
MANAGERS = ADMINS
DATABASE_ENGINE = '' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = '' # Or path to database file if using sqlite3.
DATABASE_USER = '' # Not used with sqlite3.
DATABASE_PASSWORD = '' # Not used with sqlite3.
DATABASE_HOST = '' # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = '' # Set to empty string for default. Not used with sqlite3.
# Local time zone for this installation. Choices can be found here:
# http://en.wikipedia.org/wiki/List_of_tz_zones_by_name
# although not all choices may be available on all operating systems.
# In a Windows environment this must be set to your system time zone.
TIME_ZONE = 'America/Chicago'
# Language code for this installation. All choices can be found here:
# http://www.i18nguy.com/unicode/language-identifiers.html
LANGUAGE_CODE = 'en-us'
SITE_ID = 1
# If you set this to False, Django will make some optimizations so as not
# to load the internationalization machinery.
USE_I18N = True
# Absolute filesystem path to the directory that will hold user-uploaded files.
# Example: "/home/media/media.lawrence.com/media/"
MEDIA_ROOT = ''
# URL that handles the media served from MEDIA_ROOT. Make sure to use a
# trailing slash.
# Examples: "http://media.lawrence.com/media/", "http://example.com/media/"
MEDIA_URL = ''
# URL prefix for admin media -- CSS, JavaScript and images. Make sure to use a
# trailing slash.
# Examples: "http://foo.com/media/", "/media/".
ADMIN_MEDIA_PREFIX = '/media/'
# URL prefix for static files.
# Example: "http://media.lawrence.com/static/"
STATIC_URL = '/'
# Make this unique, and don't share it with anybody.
SECRET_KEY = ')er9!%4v0=nmxd#2=j1*tlktmidq8aam2y)-%fjf6%^xp*5r)c'
# List of callables that know how to import templates from various sources.
TEMPLATE_LOADERS = (
'django.template.loaders.filesystem.Loader',
'django.template.loaders.app_directories.Loader',
#'django.template.loaders.eggs.load_template_source',
)
MIDDLEWARE_CLASSES = (
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
)
ROOT_URLCONF = 'BlocalProject.urls'
# Python dotted path to the WSGI application used by Django's runserver.
#WSGI_APPLICATION = 'BlocalProject.wsgi.application'
TEMPLATE_DIRS = (
'D:/_/Site_test/Djpr/BlocalProject/templates',
# Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
# Always use forward slashes, even on Windows.
# Don't forget to use absolute paths, not relative paths.
)
TEMPLATE_CONTEXT_PROCESSORS = ("django.contrib.auth.context_processors.auth",
"django.core.context_processors.debug",
"django.core.context_processors.i18n",
"django.core.context_processors.media",
"django.core.context_processors.static",
"django.core.context_processors.request",
"django.contrib.messages.context_processors.messages",)
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'dajaxice',
'BlocalProject.ajapp',
)
DAJAXICE_MEDIA_PREFIX = "dajaxice"
DAJAXICE_DEBUG = True
DAJAXICE_JS_DOCSTRINGS = True
#DAJAXICE_NOTIFY_EXCEPTIONS = True
import logging
logging.basicConfig(level=logging.DEBUG)

This might be too late, but...
IMHO, the dajaxice_autodiscover() doesn't actually get your method. I remember having a similar problem with another ajax lib and I've solved it by adding an import in the models.py (or views.py), which get imported when the app starts. The example you mention has an import in simple/views.py:
# Create your views here.
from django.shortcuts import render
# THIS ONE!
from dajaxice.core import dajaxice_functions
# ---------
def index(request):
return render(request, 'simple/index.html')
which looks like it's initialising stuff. My approach would be:
put a couple of print statements in your ajapp.py and see if they get printed out.
create empty models.py and views.py in your BlocalProject.ajapp (does django still use models.py to validate a module as a django app?)
If your print statements don't get triggered, then you need to find out why. As I mentioned, it might be as simple as importing your ajax module in models :)

Related

How to limit url parameters in Django REST Framework's Router to ints?

When I have an urls.py file like this:
# urls.py
router = SimpleRouter(trailing_slash=False)
router.register("/?", MembersController, basename="member")
urlpatterns = router.urls
Then the generated URL for the single object is (?P<pk>[^/.]+)$. I'd like for it to include the int: "converter type". Is that possible at all? Or would I have to stop using DRF's router and create my own URL patterns?
In your MembersController you can specify the '[0-9]+' as lookup_value_regex:
class MembersController(ModelViewSet):
lookup_value_regex = '[0-9]+'
# &vellip;
as default it makes use of '[^/.]+', as we can see in the source code [GitHub]:
lookup_value = getattr(viewset, 'lookup_value_regex', '[^/.]+')

Possible to replace Scrapy's default lxml parser with Beautiful Soup's html5lib parser?

Question: Is there a way to integrate BeautifulSoup's html5lib parser into a scrapy project--instead of the scrapy's default lxml parser?
Scrapy's parser fails (for some elements) of my scrape pages.
This only happens every 2 out of 20 pages.
As a fix, I've added BeautifulSoup's parser to the project (which works).
That said, I feel like I'm doubling the work with conditionals and multiple parsers...at a certain point, what's the reason for using Scrapy's parser? The code does work....it feels like a hack.
I'm no expert--is there a more elegant way to do this?
Much appreciation in advance
Update: Adding a middleware class to scrapy (from the python package scrapy-beautifulsoup) works like a charm. Apparently, lxml from Scrapy is not as robust as BeautifulSoup's lxml. I didn't have to resort to the html5lib parser--which is 30X+ slower.
class BeautifulSoupMiddleware(object):
def __init__(self, crawler):
super(BeautifulSoupMiddleware, self).__init__()
self.parser = crawler.settings.get('BEAUTIFULSOUP_PARSER', "html.parser")
#classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_response(self, request, response, spider):
"""Overridden process_response would "pipe" response.body through BeautifulSoup."""
return response.replace(body=str(BeautifulSoup(response.body, self.parser)))
Original:
import scrapy
from scrapy.item import Item, Field
from scrapy.loader.processors import TakeFirst, MapCompose
from scrapy import Selector
from scrapy.loader import ItemLoader
from w3lib.html import remove_tags
from bs4 import BeautifulSoup
class SimpleSpider(scrapy.Spider):
name = 'SimpleSpider'
allowed_domains = ['totally-above-board.com']
start_urls = [
'https://totally-above-board.com/nefarious-scrape-page.html'
]
custom_settings = {
'ITEM_PIPELINES': {
'crawler.spiders.simple_spider.Pipeline': 400
}
}
def parse(self, response):
yield from self.parse_company_info(response)
yield from self.parse_reviews(response)
def parse_company_info(self, response):
print('parse_company_info')
print('==================')
loader = ItemLoader(CompanyItem(), response=response)
loader.add_xpath('company_name',
'//h1[contains(#class,"sp-company-name")]//span//text()')
yield loader.load_item()
def parse_reviews(self, response):
print('parse_reviews')
print('=============')
# Beautiful Soup
selector = Selector(response)
# On the Page (Total Reviews) # 49
search = '//span[contains(#itemprop,"reviewCount")]//text()'
review_count = selector.xpath(search).get()
review_count = int(float(review_count))
# Number of elements Scrapy's LXML Could find # 0
search = '//div[#itemprop ="review"]'
review_element_count = len(selector.xpath(search))
# Use Scrapy or Beautiful Soup?
if review_count > review_element_count:
# Try Beautiful Soup
soup = BeautifulSoup(response.text, "lxml")
root = soup.findAll("div", {"itemprop": "review"})
for review in root:
loader = ItemLoader(ReviewItem(), selector=review)
review_text = review.find("span", {"itemprop": "reviewBody"}).text
loader.add_value('review_text', review_text)
author = review.find("span", {"itemprop": "author"}).text
loader.add_value('author', author)
yield loader.load_item()
else:
# Try Scrapy
review_list_xpath = '//div[#itemprop ="review"]'
selector = Selector(response)
for review in selector.xpath(review_list_xpath):
loader = ItemLoader(ReviewItem(), selector=review)
loader.add_xpath('review_text',
'.//span[#itemprop="reviewBody"]//text()')
loader.add_xpath('author',
'.//span[#itemprop="author"]//text()')
yield loader.load_item()
yield from self.paginate_reviews(response)
def paginate_reviews(self, response):
print('paginate_reviews')
print('================')
# Try Scrapy
selector = Selector(response)
search = '''//span[contains(#class,"item-next")]
//a[#class="next"]/#href
'''
next_reviews_link = selector.xpath(search).get()
# Try Beautiful Soup
if next_reviews_link is None:
soup = BeautifulSoup(response.text, "lxml")
try:
next_reviews_link = soup.find("a", {"class": "next"})['href']
except Exception as e:
pass
if next_reviews_link:
yield response.follow(next_reviews_link, self.parse_reviews)
It’s a common feature request for Parsel, Scrapy’s library for XML/HTML scraping.
However, you don’t need to wait for such a feature to be implemented. You can fix the HTML code using BeautifulSoup, and use Parsel on the fixed HTML:
from bs4 import BeautifulSoup
# …
response = response.replace(body=str(BeautifulSoup(response.body, "html5lib")))
You can get a charset error using the #Gallaecio's answer, if the original page was not utf-8 encoded, because the response has set to other encoding.
So, you must first switch the encoding.
In addition, there may be a problem of character escaping.
For example, if the character < is encountered in the text of html, then it must be escaped as <. Otherwise, "lxml" will delete it and the text near it, considering it an erroneous html tag.
"html5lib" escapes characters, but is slow.
response = response.replace(encoding='utf-8',
body=str(BeautifulSoup(response.body, 'html5lib')))
"html.parser" is faster, but from_encoding must also be specified (to example 'cp1251').
response = response.replace(encoding='utf-8',
body=str(BeautifulSoup(response.body, 'html.parser', from_encoding='cp1251')))

Scrapy works in shell but spider returns empty csv

I am learning Scrapy. Now I just try to scrapy items and when I call spider:
planefinder]# scrapy crawl planefinder -o /User/spider/planefinder/pf.csv -t csv
it shows tech information and no scraped content (Crawled 0 pages .... etc), and it returns an empty csv file.
The problem is when i test xpath in scrapy shell it works:
>>> from scrapy.selector import Selector
>>> sel = Selector(response)
>>> flights = sel.xpath("//div[#class='col-md-12'][1]/div/div/table//tr")
>>> items = []
>>> for flt in flights:
... item = flt.xpath("td[1]/a/#href").extract_first()
... items.append(item)
...
>>> items
The following is my planeFinder.py code:
# -*-:coding:utf-8 -*-
from scrapy.spiders import CrawlSpider
from scrapy.selector import Selector, HtmlXPathSelector
from planefinder.items import arr_flt_Item, dep_flt_Item
class planefinder(CrawlSpider):
name = 'planefinder'
host = 'https://planefinder.net'
start_url = ['https://planefinder.net/data/airport/PEK/']
def parse(self, response):
arr_flights = response.xpath("//div[#class='col-md-12'][1]/div/div/table//tr")
dep_flights = response.xpath("//div[#class='col-md-12'][2]/div/div/table//tr")
for flight in arr_flights:
arr_item = arr_flt_Item()
arr_flt_url = flight.xpath('td[1]/a/#href').extract_first()
arr_item['arr_flt_No'] = flight.xpath('td[1]/a/text()').extract_first()
arr_item['STA'] = flight.xpath('td[2]/text()').extract_first()
arr_item['From'] = flight.xpath('td[3]/a/text()').extract_first()
arr_item['ETA'] = flight.xpath('td[4]/text()').extract_first()
yield arr_item
Please before going to CrawlSpider please check the docs for Spiders, some of the issues I've found were:
Instead of host use allowed_domains
Instead of start_url use start_urls
It seem that the page needs to have some cookies set or maybe it's using some kind of basic anti-bot protection, and you need to land somewhere else first.
Try this (I've also changed a bit :
# -*-:coding:utf-8 -*-
from scrapy import Field, Item, Request
from scrapy.spiders import CrawlSpider, Spider
class ArrivalFlightItem(Item):
arr_flt_no = Field()
arr_sta = Field()
arr_from = Field()
arr_eta = Field()
class PlaneFinder(Spider):
name = 'planefinder'
allowed_domains = ['planefinder.net']
start_urls = ['https://planefinder.net/data/airports']
def parse(self, response):
yield Request('https://planefinder.net/data/airport/PEK', callback=self.parse_flight)
def parse_flight(self, response):
flights_xpath = ('//*[contains(#class, "departure-board") and '
'./preceding-sibling::h2[contains(., "Arrivals")]]'
'//tr[not(./th) and not(./td[#class="spacer"])]')
for flight in response.xpath(flights_xpath):
arrival = ArrivalFlightItem()
arr_flt_url = flight.xpath('td[1]/a/#href').extract_first()
arrival['arr_flt_no'] = flight.xpath('td[1]/a/text()').extract_first()
arrival['arr_sta'] = flight.xpath('td[2]/text()').extract_first()
arrival['arr_from'] = flight.xpath('td[3]/a/text()').extract_first()
arrival['arr_eta'] = flight.xpath('td[4]/text()').extract_first()
yield arrival
The problem here is not understanding correctly which "Spider" to use, as Scrapy offers different custom ones.
The main one, and the one you should be using is the simple Spider and not CrawlSpider, because CrawlSpider is used for a more deep and intensive search into forums, blogs, etc.
Just change the type of spider to:
from scrapy import Spider
class plane finder(Spider):
...
Check the value of ROBOTSTXT_OBEY in your settings.py file. By default it's set to True (but not when you run shell). Set it to False if you wan't to disobey robots.txt file.

Django1.11 - Show media on localhost

I'm having hard time to display media on local system.. the problem is that:
{{ producer.img.url }}
gives me a url path relative to the page I'm browsing, so it always fails to locate the file. It actually prints something like:
media/media/djprofiles/john_0VtCrdA.jpg
which obviously fails (note the missing initial "/").
Following Django docs, I added in my urls.py:
urlpatterns = [
url(r'^i18n/', include('django.conf.urls.i18n')),
]
urlpatterns += i18n_patterns(
...
) + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
and settings.py is as follow:
MEDIA_ROOT = os.path.join(BASE_DIR, "media")
MEDIA_URL = 'media/'
The img field is defined in models.py as follow:
img = models.ImageField(upload_to=settings.MEDIA_URL + 'djprofiles')
I know there are already many questions relative to showing media on local system, but none seems to provide me with a working solution.
did you try
MEDIA_URL = '/media/'
in settings.py?

Django Urls and App urls with Ajax POST

I've found an issue and tracked it down to url conf. I'm attempting to perform an ajax post to the /gallery/add page which adds a new record into the database.
Originally I added a urls.py into my app and then 'include'ed it from the root urls.py but that failed during the ajax post (appears /gallery/ is just returned from logging).
Then i reverted to just the root urls.py and it worked as i expected.
So the question is are these urlconfs equivalent?
(A)
# ./urls.py
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^gallery$', 'gallery.views.home'),
(r'^gallery/add$', 'gallery.views.add'), # ajax post works with this one
)
(B)
# ./urls.py
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^gallery/', include('gallery.urls')),
)
# ./gallery/urls.py
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'$', 'gallery.views.home'),
(r'add$', 'gallery.views.add'), # ajax request doesn't work, instead it goes to gallery.views.home
)
In the second example you still need the ^ because otherwise the first regex will just match any old string that has an ending (due to the $), and that is of course all of them :)
# ./gallery/urls.py
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^$', 'gallery.views.home'),
(r'^add$', 'gallery.views.add'),
)

Resources