Proper FormRequest to AJAX scrolling page - ajax

I want to scrape all 'belts' from https://www.thingiverse.com/thing:3270948/remixes in Scrapy.
First of all I want write proper request.
I tryied:
scrapy.FormRequest(url="https://www.thingiverse.com/thing:3270948/remixes",
method="POST",
formdata={
'page': '7',
'id': '3270948'},
headers={
'x-requested-with': 'XMLHttpRequest',
'content-type':
['application/x-www-form-urlencoded',
'charset=UTF-8']}
Response contain only first page(24 belts). How write proper request to get next/whole belts?

You have more parameters in request payload, I've copied them all from Network tab:
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://www.thingiverse.com/thing:3270948/remixes']
ajax_url = 'https://www.thingiverse.com/ajax/things/remixes'
payload = 'id=3270948&auto_scroll=true&page={}&total=153&per_page=24&last_page=7&base_url=%2Fthing%3A3270948%2Fremixes%2F&extra_path=&%24container=.results-container&source=%2Fajax%2Fthings%2Fremixes'
def parse(self, response):
page = response.meta.get('page', 1)
# why 7: check `last_page` param in payload
if page == 7:
return
print '----'
# just to show that content is always different, so pages are different
print page, response.css('div.item-header a span::text').getall()[:3]
print '----'
yield scrapy.Request(self.ajax_url,
method='POST',
headers={
'x-requested-with': 'XMLHttpRequest',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
},
body=self.payload.format(page + 1),
meta={'page': page + 1}
)

Related

Ajax PATCH/PUT problem 401 (Unauthorized)

I wrote an API using django and djano-ninja.
Here is my section of api.py file which is imported to URL.
class ORJSONRenderer(BaseRenderer):
media_type = "application/json"
def render(self, request, data, *, response_status):
return orjson.dumps(data)
class ApiKey(APIKeyQuery):
param_name = "api_key"
def authenticate(self, request, key):
try:
return CustomUser.objects.get(api_key=key)
except CustomUser.DoesNotExist:
pass
api_key = ApiKey()
api = NinjaAPI(
title="Good TExt",
version="0.0.1",
description="That This",
renderer=ORJSONRenderer(),
# csrf=True
)
#api.patch(
"/car/color/{new_color}", auth=api_key, tags=["Car"], summary="Does something",
description="Does something"
)
def update_team_name(request, new_color):
try:
#Do something
msg = {"success": "Done"}
except:
msg = {"error": "Problem"}
return HttpResponse(json.dumps(msg), content_type='application/json')
I have other get endpoints too. There is no problem when I request get endpoints.
But when I send a request to patch endpoints I am getting 401 (Unauthorized) only with ajax. I mean python's requests work.
import requests
load = dict(
api_key='SOME HEY'
)
r = requests.get("http://127.0.0.1:8000/api/car/color/red", params=load)
print(r.text)
But javascript doesn't:
$.ajax({
url: "/api/car/color/red",
data: {
"api_key": "some key"
},
cache: false,
type: "PATCH",
success: function(response_country) {
console.log(response_country);
},
error: function(xhr) {
console.log(xhr);
}
});
What I did try
I tried to add:
headers:{"X-CSRFToken": $crf_token},
to header of the ajax request. Even though csrf is set to False in django-ninja
I tried to change from PATCH to PUT
I tried to add a timeout to ajax request
I tried to send the api_key trough header and not the data
with no success.

Bing spell check api doesn't work - error code 404

I've followed through the guide written here
https://learn.microsoft.com/en-us/azure/cognitive-services/bing-spell-check/quickstarts/python, but I'm getting a 404 error code. This is the code from the guide:
import requests
import json
api_key = myke
example_text = "Hollo, wrld" # the text to be spell-checked
endpoint = "https://api.cognitive.microsoft.com/bing/v7.0/SpellCheck"
data = {'text': example_text}
params = {
'mkt':'en-us',
'mode':'spell'
}
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'Ocp-Apim-Subscription-Key': api_key,
}
response = requests.post(endpoint, headers=headers, params=params, data=data)
json_response = response.json()
print(json.dumps(json_response, indent=4))
But when I create a resource, the endpoint I get is either https://api.bing.microsoft.com/ or https://spellcheck3.cognitiveservices.azure.com/ depending on the guide.
How do I correctly run this code?
The code as written in the guide doesn't seem to work; here's a solution I found.
search_url = "https://api.bing.microsoft.com/v7.0/spellcheck"
search_term = "wrld helath"
params = {
'mkt':'en-us',
'mode':'spell',
'text' : search_term
}
headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.get(search_url, headers=headers, params=params)
response.raise_for_status()
search_results = response.json()

How can I past a variable in request with HTTParty?

I need to generate a value using Post and pass this value in the query and delete. How to do this?
Is it possible to pass the value of a variable directly in the def retrieve method of request get or delete?
I want to use the same value generated in the var that stores the faker gem and pass both get and delete.
require 'HTTParty'
require 'httparty/request'
require 'httparty/response/headers'
class Crud
include HTTParty
def create
##codigo = Faker::Number.number(digits: 5)
#nome = Faker::Name.first_name
#salario = Faker::Number.decimal(l_digits: 4, r_digits: 2)
#idade = Faker::Number.number(digits: 2)
#base_url = 'http://dummy.restapiexample.com/api/v1/create'
#body = {
"id":##codigo,
"name":#nome,
"salary":#salario,
"age":#idade
}.to_json
#headers = {
"Accept": 'application/vnd.tasksmanager.v2',
'Content-Type': 'application/json'
}
##request = Crud.post(#base_url, body: #body, headers: #headers)
end
def retrieve
self.class.get('http://dummy.restapiexample.com/api/v1/employee/1')
end
end
Just parse response from API and use fetched id. You don't need to pass id when create an employee, it is generated automatically
class Crud
include HTTParty
base_uri 'http://dummy.restapiexample.com/api/v1'
def create
nome = Faker::Name.first_name
salario = Faker::Number.decimal(l_digits: 4, r_digits: 2)
idade = Faker::Number.number(digits: 2)
#note, you should pass body as JSON string
body = { name: nome, salary: salario, age: idade }.to_json
headers = {
'Accept' => 'application/vnd.tasksmanager.v2',
'Content-Type' => 'application/json'
}
self.class.post('/create', body: body, headers: headers)
end
def retrieve(id)
self.class.get("/employee/#{ id }")
end
end
> client = Crud.new
> response = client.create
> id = JSON.parse(response)['id']
> client.retrieve(id)
Please, read about variables in ruby - what is the difference between local, instance and global variables. Global variables should be used in rare case, more often you need instance/local ones.

Rails Ajax -> Sinatra -> Amazon API and back

I'm not sure that I really understand how Sinatra works.
I'd like to get some products from Amazon using their API, in my Rails app. But HTTP requests are blocking the IO. I got the tip to create a Sinatra app and make an Ajax request to there instead.
Ajax: (From my Rails app)
$.ajax({
url: "http://sinatra.mydomain.com",
dataType: "json",
success: function(data) {
console.log(data);
}
});
Sinatra app: (I also make use of the Sinatra-synchrony gem)
require 'sinatra'
require 'sinatra/synchrony'
require 'erb'
require 'rest-client'
require 'amazon_product'
Sinatra::Synchrony.overload_tcpsocket!
get '/' do
req = AmazonProduct["us"]
req.configure do |c|
c.key = "KEY"
c.secret = "SECRET"
c.tag = "TAG"
end
req << { :operation => 'ItemSearch',
:search_index => "DVD",
:response_group => %w{ItemAttributes Images},
:keywords => "nikita",
:sort => "" }
resp = req.get
#item = resp.find('Item').shuffle.first
erb :layout, :locals => { :amazon_product => #item }
end
Layout.erb: (renders fine if I go to this Url in the browser)
<%= amazon_product %>
Problem:
My Ajax response is a 200 OK but with an empty response.
I'm can't figure out what's wrong. Please advise.
It seems that you've faced with ajax 'cross-domain security' problem. Try to use JSONP (JSON with padding).
Change your sinatra get handler:
get '/' do
req = AmazonProduct["us"]
req.configure do |c|
c.key = KEY
c.secret = SECRET
c.tag = TAG
end
req << { :operation => 'ItemSearch',
:search_index => "DVD",
:response_group => %w{ItemAttributes Images},
:keywords => "nikita",
:sort => "" }
resp = req.get
#item = resp.find('Item').shuffle.first
content_type :json
callback = params.delete('callback') # jsonp
json = #item.to_json
if callback
content_type :js
response = "#{callback}(#{json})"
else
content_type :json
response = json
end
response
end
And change your Ajax request:
$.getJSON("http://address_of_sinatra?callback=?",
function(data) {
console.log(data);
});
Or you can add dataType: 'jsonp' to your $.ajax request.
After that you should see data object in js debugger (at least it's working in my case :D )

Ruby mechanize post with header

I have page with js that post data via XMLHttpRequest and server side script check for this header, how to send this header?
agent = WWW::Mechanize.new { |a|
a.user_agent_alias = 'Mac Safari'
a.log = Logger.new('./site.log')
}
agent.post('http://site.com/board.php',
{
'act' => '_get_page',
"gid" => 1,
'order' => 0,
'page' => 2
}
) do |page|
p page
end
I found this post with a web search (two months later, I know) and just wanted to share another solution.
You can add custom headers without monkey patching Mechanize using a pre-connect hook:
agent = WWW::Mechanize.new
agent.pre_connect_hooks << lambda { |p|
p[:request]['X-Requested-With'] = 'XMLHttpRequest'
}
ajax_headers = { 'X-Requested-With' => 'XMLHttpRequest', 'Content-Type' => 'application/json; charset=utf-8', 'Accept' => 'application/json, text/javascript, */*'}
params = {'emailAddress' => 'me#my.com'}.to_json
response = agent.post( 'http://example.com/login', params, ajax_headers)
The above code works for me (Mechanize 1.0) as a way to make the server think the request is coming via AJAX, but as stated in other answers it depends what the server is looking for, it will be different for different frameworks/js library combos.
The best thing to do is use Firefox HTTPLiveHeaders plugin or HTTPScoop and look at the request headers sent by the browser and just try and replicate that.
Seems like earlier that lambda had one argument, but now it has two:
agent = Mechanize.new do |agent|
agent.pre_connect_hooks << lambda do |agent, request|
request["Accept-Language"] = "ru"
end
end
Take a look at the documentation.
You need to either monkey-patch or derive your own class from WWW::Mechanize to override the post method so that custom headers are passed through to the private method post_form.
For example,
class WWW::Mechanize
def post(url, query= {}, headers = {})
node = {}
# Create a fake form
class << node
def search(*args); []; end
end
node['method'] = 'POST'
node['enctype'] = 'application/x-www-form-urlencoded'
form = Form.new(node)
query.each { |k,v|
if v.is_a?(IO)
form.enctype = 'multipart/form-data'
ul = Form::FileUpload.new(k.to_s,::File.basename(v.path))
ul.file_data = v.read
form.file_uploads << ul
else
form.fields << Form::Field.new(k.to_s,v)
end
}
post_form(url, form, headers)
end
end
agent = WWW::Mechanize.new
agent.post(URL,POSTDATA,{'custom-header' => 'custom'}) do |page|
p page
end

Resources