Scrapy Shell FormRequest - xpath

I am trying to scrape some ajax content from http://lsgelection.kerala.gov.in/lbtrend2015/views/lnkResultsGrama.php
Once the two dropdowns are selected and submitted, from chrome networks tab
Request URL: http://lsgelection.kerala.gov.in/lbtrend2015/includes/detailed_results_grama_ajax.php
Request Method: POST
FormData
token: 9fd54c089d36035c3ce2b5cf08f38982
process: getGramaWonCandData
cno: 46
districtCode: D02001
Panchayat: G02069
I tried to scrape data from scrapy shell along with splash, as cno comes from JS
scrapy shell 'http://localhost:8050/render.html?url=http://lsgelection.kerala.gov.in/lbtrend2015/views/lnkResultsGrama.php'
token = response.xpath('//input[#id="token"]/#value').extract_first()
cno = response.xpath('//input[#id="cno"]/#value').extract_first()
Then i tried to fetch the form response using
fetch(scrapy.FormRequest.from_response(response,url='http://lsgelection.kerala.gov.in/lbtrend2015/includes/detailed_results_grama_ajax.php',method='POST',formdata={'token':token,'process': 'getGramaWonCandData','cno':cno,'districtCode': 'D02001','Panchayat': 'G02069'},headers={'Content-Type': 'json/...'}))
When i tried to get response.text or response.body it is returning b'\n\n\n\n\n\n\n'
Where have i gone wrong ?

Related

Cypress Redirect To Incorrect (new url)

I'm trying to test the submission of a form. However, when I submit the form and wait/check for an element the form navigates to an new/incorrect domain. Example:
Form URL: https://formsenv.test.co.uk/IdentityDocuments/Submission/Index
Page it tries it navigate to: (new url) https://sts.test.co.uk/IdentityDocuments/default/Thankyou
It looks like it's taking the base url from the browser url field (screenshot - had to redact some info)
Screenshot of runner
Quick code snippet:
cy.get('[id="submitButton"]').click()
//Commented out as this didn't help...I tried to wait for the request to see //if I was requesting the 'cy.get('[id="thankyou-details-panel"]')' element //too early
//cy.intercept({
//method: 'GET',
//url: '/IdentityDocuments/default/Thankyou',
//}).as('dataGetFirst');
// Wait for response.status to be 200
//cy.wait('#dataGetFirst').its('response.statusCode').should('equal', 200)
cy.get('[id="thankyou-details-panel"]').should('contain','Thank you for completing the ID verification / Right to Study form. You will be notified if there are any issues with the document(s) provided.')
any help would be appreciated. Thanks

Scrapy ajax POST request not working, though working in Postman

I am implementing a scrapy spider to crawl a website that contains real estate offers. The site contains a telephone number to the real estate agent, which can be retreived be an ajax post request. The request yielded by scrapy returns an error from the server, while the same request sent from Postman returns the desired data.
Here's the site URL: https://www.otodom.pl/oferta/piekne-mieszkanie-na-mokotowie-do-wynajecia-ID3ezHA.html
I recorded the request using Network tab in chrome's dev tools. The url of the ajax request is: enter link description here The data needed to send the request is the CSRFtoken contained in the page's source, which changes periodically. In Postman giving only the CSRFtoken as form-data gives an expected answer.
This is how I construct the request in scrapy:
token_input = response.xpath('//script[contains(./text(), "csrf")]/text()').extract_first()
csrf_token = token_input[23:-4]
offerID_input = response.xpath('//link[#rel="canonical"]/#href').extract_first()
offerID = (offerID_input[:-5])[-7:]
form_data = {'CSRFToken' : csrf_token}
request_to_send = scrapy.Request(url='https://www.otodom.pl/ajax/misc/contact/phone/3ezHA/', headers = {"Content-Type" : "application/x-www-form-urlencoded"}, method="POST", body=urllib.urlencode(form_data), callback = self.get_phone)
yield request_to_send
Unfortunately, I get an error, though everything should be ok. Does anybody have any idea what might be the problem? Is is maybe connected with encoding? The site uses utf-8.
You can find the token in page source:
<script type="text/javascript">
var csrfToken = '0ec80a520930fb2006e4a3e5a4beb9f7e0d6f0de264d15f9c87b572a9b33df0a';
</script>
And you can get it quite easily with this regular expression:
re.findall("csrfToken = '(.+?)'", response.body)
To get the whole thing you can use scrapy's FormRequest which can make a correct post request for you:
def parse(self, response):
token = re.findall("csrfToken = '(.+?)'", response.body)[0]
yield FormRequest('https://www.otodom.pl/ajax/misc/contact/phone/3ezHA/',
formdata={'CSRFToken': token},
callback=self.parse_phone)
def parse_phone(self, response):
print(response.body)
#'{"value":"515 174 616"}'
You can debug your scrapy requests by insersting inspect_response call and looking into request object:
def parse_phone(self, response):
from scrapy.shell import inspect_response
inspect_response(response, self)
# shell opens up here and spider is put on pause
# now check `request.body` and `request.headers`, match those to what you see in your browser

Django API Requests

I'm trying to access another service's API, using my model's fields as the keywords in the API request. The URL would be like like so:
http://api.example.com/json/?first_name=FNAME&last_name=LNAME&key={key}
Here's my code from views.py:
class ExamplePersonView(ListView):
context_object_name = "example_person"
template_name = "templates/example_person.html"
def get_queryset(self):
lname = get_object_or_404(ExamplePeople, lname__iexact=self.args[0])
return ExamplePeople.objects.filter(lname=lname)
From what I understand, I'll need to use AJAX to communicate between my page template and my views.py to send the request and then present the information on the page.
I've found several Django apps that make it easy to turn your models into a public API, but none that help you access API's from another service. Does anyone know of an app like that?
If not, does anyone have a good understanding of using AJAX with Django to make the request and present it in the template?
There's several ways to communicate with a "foreign" API. There's no necessity for ajax. Ajax is just for making background calls in a template, triggering whatever event you have in mind.
But let's say you want to communicate with the facebook GraphAPI to retrieve a profile
http://graph.facebook.com/bill.clinton
The standard result is serialized as JSON, which implements easily into AJAX or any JavaScript library, hence the name JavaScript Object Notation.
So an example with AJAX might be:
function callFacebook() {
$.ajax({
type: "GET",
data: ({}),
dataType: 'json',
url: "http://graph.facebook.com/bill.clinton",
success: function(data){
alert("Hi I am former "+data.name);
}
});
}
callFacebook();
Include this in your javascript file or within your template between script tags and you should get a nice alert message:
Hi I am former President Bill Clinton
Now you could turn this alert into something more meaningful, and put it within a h1 tag (not sure why this is meaningful)
$("body").html("<h1>"+data.name+"</h1>");
But sometimes you would want to retrieve data and do something with it server side in your application.
So create a django urlpattern and view, e.g.:
from urllib2 import urlopen
from django.http import HttpResponse
from django.utils import simplejson
def call_bill(request):
url = "http://graph.facebook.com/bill.clinton"
json = urlopen(url).read()
# do whatever you want
return HttpResponse(simplejson.dumps(json), mimetype="application/json")
# add this to your url patterns
url("^call_bill_clinton/$", call_bill)
Now visit your url
As a logic result, it's also perfectly possible to trigger async events by some user action. Eg the URL parameter in the previously mentioned ajax example, could also be a django url like "/call_bill_clinton/".
<!-- add a button to call the function -->
<button onclick="callFacebook();">Call Bill</button>
function callFacebook() {
$.ajax({
type: "GET",
data: ({}),
dataType: 'json',
url: "/call_bill_clinton/",
success: function(data){
alert("Hi I am former "+data.name+" and I came from Django");
}
});
)
// remove the auto call
Furthermore ajax calls let you do the same trickery as http requests, you can use a variety of request methods combined with cool javascript events, like a beforeSend event
beforeSend: function() {
$('#loading').show();
},
Where the #loading could be something like:
<div id="loading" style="display:none;">
<img src="{% static "images/loading.gif" %}" />
</div>

codeigniter get URL after ajax

I am trying to get the URL i see on my browser after i do an ajax request but the problem is that it changes the URL with the Ajax URL.
ex.
i am on domain.com/user/username
and the ajax URL that i call is in domain.com/posts/submit
when i echo $_SERVER['REQUEST_URI'] on the posts controller in submit function it will display the second URL and not the first... how can i assure and get the first inside the ajax function that its 100% valid and not changed by the user to prevent any bad action?
Thanks
There is HTTP_REFERER but I don't know if that works for javascript requests. Another problem of this: It won't work for all browsers.
You could try the following:
1.) As the user visits domain.com/user/username the current URL is saved with a token - let's say 5299sQA332 - into the database and the token is provided through PHP to Javascript
2.) The ajax request will send this token along with the other variables needed to the controller through POST
3.) In your ajax controller you search the database for the given token 5299sQA332 and there you have your first URL and you can be damn sure, that it hasn't been manupulated
:)
If I understand you correctly, you want to make sure the ajax call is coming from the page it is supposed to be on? In that case just pass a token with the call.
In the controller function set a token variable in session;
public function username() {
$this->session->set_userdata('ajax_token', time());
}
Then in the view with the js;
$.ajax({
url: '/user/username',
type: 'post',
data: 'whatever=bob&token='+<?php echo $this->session->userdata('ajax_token'),
success: function( data ) {
},
error: function( data ) {
}
});
Then in you form validation, do a custome callback to check they are the same.
Have you looked at CodeIgniter's Input Class ?
$this->input->get('something', TRUE);
i used javascript for it and it seems to work... hope not to have any problems in the future with it...
ps: i dont get why my other answer was deleted.. thats the answer anyway.

Django - Start Session by Ajax Request

I need to know how to start a session by Ajax in Django. I'm doing exactly as described bellow, but it is not working! The request is sent correctly, but don't start any session. If a request directly without ajax it works! What is going on?
'# urls
r'^logout/$', 'autenticacao.views.logout_view'
'# view of login
def login_view(request):
username = request.GET.get('username', '')
password = request.GET.get('password', '')
user = authenticate(username=username, password=password)
if user is not None:
if user.is_active:
login(request, user)
return HttpResponse(user.get_profile().sos_user.name)
return HttpResponse('user invalido')
'# ajax in a html page
$(function(){
$.get('http://localhost:8000/logout/?username=usuario?>&password=senha', function(data){
alert(data);
});
You're not calling the login_view. You're ajax request is going to the /logout/ url which is calling the autenticacao.views.logout_view.
Also, The ?> after username=usuario doesn't look right in the your get url.
My guess is you should be doing something like http://localhost:8000/login/?username=usuario&password=senha. (but I'd need to see your login url mapping to be sure).
Also, you should be POSTing the login information and using HTTPS for security reasons, but that's a different issue.

Resources