Ruby Mechanize, fill dynamic Form / Send JSON (Airbnb calendar) - ruby

My goal
I try to update my airbnb calendar using Ruby. For example, here is a URL of a calendar : https://www.airbnb.com/manage-listing/ROOM_ID/calendar
The issue
If you already use Airbnb, to update your calendar, you have to click on the start date then the end date and after that, a form pop-up.
So, when I use Mechanize to get the page content, this form is not loaded and doesn't appears (even the calendar is load dynamically, not able to simulate click too), impossible to use basic Mechanize form filling...
What I did so far
I tried to use the developer tools from Chrome to check the Network. When I update my calendar using Chrome, there is one JSON PUT at https://www.airbnb.com/api/v2/calendars/ROOM_ID/START_DATE/END_DATE?_format=host_calendar&t=1427377357561&key=d306zoyjsyarp7ifhu67rjxn52tv0t20 with some JSON data such as days, availability, price...
My first solution was to tried to reproduce this JSON call with this code :
data = { "event_name" => "calendar",
"event_data" => { "page_uri" => "/manage-listing/ROOM_ID/calendar",
"controller" => "rooms",
"action" => "manage_listing",
"hosting_id" => ROOM_ID,
"start_date" => "2015-03-26",
"end_date" => "2015-03-29",
"available" => true,
"native_price" => 111,
"native_currency" => "EUR"
}
}
page = agent.post 'https://www.airbnb.com/api/v2/calendars/ROOM_ID/2015-03-26/2015-03-29?_format=host_calendar&t=1427374574309&key=d306zoyjsyarp7ifhu67rjxn52tv0t20', data.to_json, {'Content-Type' => 'application/json'}
But I get a 404 response :
Mechanize::ResponseCodeError (404 => Net::HTTPNotFound for https://www.airbnb.com/api/v2/calendars/ROOM_ID/2015-03-26/2015-03-29?_format=host_calendar&t=1427374574309&key=d306zoyjsyarp7ifhu67rjxn52tv0t20 -- unhandled response)
Do you have any suggestions to either send the form even if it is not on the page content, or POST the request with JSON ?
Thanks for your help
Here is the complete JSON call from Chrome :
General
Remote Address:xx.xx.xx.xx:xx
Request URL:https://www.airbnb.com/api/v2/calendars/ROOM_ID/2015-03-26/2015-03-29?_format=host_calendar&t=1427379998507&key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr-CA
Request Method:PUT
Status Code:200 OK
Response Headers
cache-control:max-age=0, private, must-revalidate
connection:keep-alive
content-encoding:gzip
content-length:236
content-type:application/json; charset=utf-8
date:Thu, 26 Mar 2015 14:26:46 GMT
etag:W/"10845765865e36a6ccb1541bbda1c2a7"
server:nginx/1.7.7
status:200 OK
status:200 OK
strict-transport-security:max-age=10886400; includeSubdomains
vary:Accept-Encoding
version:HTTP/1.1
x-frame-options:SAMEORIGIN
x-hi-human:The Production Infrastructure team added this header. Come work with us! Email kevin.rice+hiring#airbnb.com
x-ua-compatible:IE=Edge,chrome=1
x-xss-protection:1; mode=block
Request Headers
:host:www.airbnb.com
:method:PUT
:path:/api/v2/calendars/ROOM_ID/2015-03-26/2015-03-29?_format=host_calendar&t=1427379998507&key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr-CA
:scheme:https
:version:HTTP/1.1
accept:application/json, text/javascript, */*; q=0.01
accept-encoding:gzip, deflate, sdch
accept-language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
content-length:59
content-type:application/json
cookie:__ssid=4166c81a-49bd-4826-ac44-08307c5700ca; _csrf_token=V4%24.airbnb.ca%24CL1nNdfYkF0%24ulPyJJJWr1h6CvuBMf32YcXtnZssDud3_CqBQoqXOU0%3D; li=1; roles=0; _airbed_session_id=dfa72c17e6d014f9fd0b9705d097e5d8; flags=4027711488; EPISODES=s=1427377914349&r=https%3A%2F%2Ffr.airbnb.ca%2Fmanage-listing%2F5780104%2Fcalendar; _ga=GA1.2.1981489078.1427272843; fbs=not_authorized; _pt=1--WyJjZmYxZmE4N2RhOTU4NGNhYzhhN2M5YTIyNzkyMDliMDI0YTk1YWEzIl0%3D--2890e7d8df5181677516659fbdc4761e6de82a61; bev=1427272835_bw8KI59ELTQAsMt3; _user_attributes=%7B%22curr%22%3A%22EUR%22%2C%22guest_exchange%22%3A0.9134%2C%22id%22%3A29905162%2C%22hash_user_id%22%3A%22cff1fa87da9584cac8a7c9a2279209b024a95aa3%22%2C%22eid%22%3A%22FBPqvskr4MN1Rnpqf-oY-lG7-VNdCJVSYwUMUtm6YyOXzEpbRvmU9FWTxKNdf0UA%22%2C%22num_msg%22%3A0%2C%22num_h%22%3A1%2C%22name%22%3A%22St%C3%A9phane%22%2C%22is_admin%22%3Afalse%2C%22can_access_photography%22%3Afalse%7D
origin:https://www.airbnb.com
referer:https://www.airbnb.com/manage-listing/ROOM_ID/calendar
user-agent:Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
x-csrf-token:V4$.airbnb.ca$CL1nNdfYkF0$ulPyJJJWr1h6CvuBMf32YcXtnZssDud3_CqBQoqXOU0=
x-requested-with:XMLHttpRequest
Query String Parameters
_format:host_calendar
t:1427379998507
key:d306zoyjsyarp7ifhu67rjxn52tv0t20
currency:EUR
locale:fr-CA
Request Payload
{availability: "available", daily_price: "999", notes: ""}
availability: "available"
daily_price: "999"
notes: ""

I succeeded to update the calendar of my room I used a JSON PUT request. Here is what I did.
The data looks like :
data = { "availability" => availability,
"daily_price" => price,
"notes" => note
}.to_json
Retrieve the cookie :
cookie_csrf_token = ''
cookie_airbed_session_id = ''
agent.cookie_jar.each do |value|
if value.to_s.include? "_csrf_token"
cookie_csrf_token = value.to_s
elsif value.to_s.include? "_airbed_session_id"
cookie_airbed_session_id = value.to_s
end
end
The headers :
headers = { 'X-CSRF-Token' => URI.unescape(cookie_csrf_token.scan(/=(.*)/).join(",")),
'Content-Type' => 'application/json',
'Cookie' => "#{cookie_csrf_token}; #{cookie_airbed_session_id}"
}
The only cookies you need is csrf_token and airbed_session_id which are related to each other. My mistake was to use the csrf_token from the login page... You can find these cookies in the cookie_jar variable from your Mechanize agent.
After that you will need to construct your URL. The URL has a particular parameter which is called "key". You can retrieve it in a meta tag (id='_bootstrap-layout-init') from your calendar page. Do to that I used Nokogiri combined with some regex :
param_t = Time.now.to_i
noko.xpath("//meta[#id='_bootstrap-layout-init']/#content").each do |attr|
param_key = attr.value[/key":"(.*?)"/, 1]
end
Now you are good to go to update your calendar :
url = "https://www.airbnb.com"
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
# Send the PUT request to update the calendar
res = http.start { |req|
req.send_request('PUT', "/api/v2/calendars/#{room_id}/#{start_date}/#{end_date}?_format=host_calendar&t=#{param_t}&key=#{param_key}", data, headers)
}

Related

Post request detected as bot with HTTPARTY but not with postman (same headers)

I am trying to fetch a public API. When I do it from the postman everything works fine however when I do it from my app I get and error message: <META NAME=\"robots\" CONTENT=\"noindex,nofollow\"
I do not understand how this is possible?
Here is are the headers variables I adjust when I make my request with postman:
Cookie:"some cookie"
Cache-Control: no-cache
Content-Type:application/json
Host:"some host"
Here is my httparty request:
response = HTTParty.post(url,
:body => body_request (same as with postman),
:headers => {
'Content-Type' => 'application/json',
'cookie' => 'same cookie as above',
'Host' => 'same host as above',
'Cache-Control' => 'no-cache'
}
)
Why would it work with postman but not with a httparty request?
Thank you
I would look into User-Agent, even if you don't explicitely set the header, your http client is still sending one.
Postman uses :
"User-Agent": "PostmanRuntime/7.26.8",
while HTTParty is simply
"User-Agent": "Ruby"
Maybe your public API (could be more precise if we knew which) has a whitelist of 'non-bot' user agents and HTTParty is not among them
Try overriding it
resp = HTTParty.get 'https://httpbin.org/headers' , headers: {'User-Agent': 'xx'}

Trying to replicate Mobile App POST Request in Ruby, Getting 502 Gateway Error

I'm trying to automate actions I can take manually in an iPhone app using Ruby, but when I do, I get a 502 bad gateway error.
Using Charles Proxy I got the request the iPhone app is making:
POST /1.1/user/-/friends/invitations HTTP/1.1
Host: redacted.com
Accept-Locale: en_US
Accept: */*
Authorization: Bearer REDACTED
Content-Encoding: gzip
Accept-Encoding: br, gzip, deflate
Accept-Language: en_US
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 66
Connection: keep-alive
X-App-Version: 814
invitedUserId=REDACTED&source=PROFILE_INVITATION
I wrote the following code in Ruby to send this same request:
#header_post = {
"Host" => "redacted.com",
"Accept-Locale" => "en_US",
"Accept" => "*/*",
"Authorization" => "Bearer REDACTED",
"Content-Encoding" => "gzip",
"Accept-Encoding" => "br, gzip, deflate",
"Accept-Language" => "en_US",
"Content-Type" => "application/x-www-form-urlencoded; charset=UTF-8",
"Connection" => "keep-alive",
"X-App-Version" => "814"
}
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
path = '/1.1/user/-/friends/invitations'
data = "invitedUserId=REDACTED&source=PROFILE_INVITATION"
resp, data = http.post(path, data, #header_post)
Unfortunately I get a 502 Bad Gateway Error when running this code.
One thing I noticed which I think is key to the solution here is that, in the POST request the mobile app is making, the content length is 66. But the length of the string "invitedUserId=REDACTED&source=PROFILE_INVITATION" with un-redacted userId is only 46.
Am I missing another form variable with format "&param=value" which has length 20? Or am I missing something else?
Thank you in advance!
This is probably not directly tied to the body length you're sending.
I see possibly 2 problems here :
the 502 error : are your uri.host and port correct ? A 502 error means there is something wrong on the server side. Also try by removing the Host header.
body content is not gzipped
You're defining an header Content-Encoding: gzip but you didn't compress the data (Net::Http doesn't do that automatically).
Try with something like that :
require "gzip"
#header_post = {
# ...
}
http = Net::HTTP.new(uri.host, uri.port)
path = '/1.1/user/-/friends/invitations'
data = "invitedUserId=REDACTED&source=PROFILE_INVITATION"
# instanciate a new gzip buffer
gzip = Zlib::GzipWriter.new(StringIO.new)
# append your data
gzip << data
# get the gzip body and use it in your request
body = gzip.close.string
resp, data = http.post(path, body, #header_post)
Alternatively, maybe the server is accepting a non-gzipped content. You could try simply by deleting the Content-Encoding
error from your original code.
However if it was the only mistake, the server should not send a 502 but a 4xx error. So I'm guessing there is another issue there with the uri config like a suggested above.

Redirect After AJAX Request even though Headers and Data Attributes are passed correctly with Python3

On a government site I managed to login via my credidentials (specified as a python dictonary in login_data) as follows:
with requests.Session() as s:
url = 'https:......../login'
r = s.get(url, data=login_data, headers=headers, verify=False)
r = s.post(url, data=login_data, headers = headers, verify=False)
print(r.content)
which displays a html:
b'<!DOCTYPE html..... and if I search for my username i find <span class="rich-messages-label msg-def-inf-label">Welcome, USER..XYZ!< From which i conclude a succesful login.
Next I want to proceed to the search subsite (url = 'https:......./search) of the site I'm now logged in to. This subsite allows me to search the government records for an incident (incident-ID) on a given date (start_date, end_date).
because of the login success I tried the following:
with requests.Session() as s:
url = 'https:......../search'
r = s.get(url, data=search_data, headers=headers, verify=False)
r = s.post(url, data=search_data, headers = headers, verify=False)
print(r.content)
In advance i defined search_data using Google Chrome Inspecor for Network and Header:
search_data:{
'AJAXREQUEST': '_viewRoot',
'theSearchForm': 'theSearchForm',
'incident-ID' : '12345',
'start_date' : '05/03/2019 00:00:00 +01:00',
'end_date' : '05/03/2019 23:59:59 +01:00',
}
and i specified headers to include more than just the agent:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
'Connection': 'keep-alive',
'Cookie': 'JSESSIONID=8351xxxxxxxxxxxxFD5; _ga=GA1.2.xxxxxxx.xxxxxxxx',
'Host': 'somehost...xyz.eu',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
}
So far the setup should be nice, no? But I ran into a problem as the print(r.content) doesn't give me the .html as after the login but some disappointingly short: b'<?xml version="1.0" encoding="UTF-8"?>\n<html xmlns="http://www.w3.org/1999/xhtml"><head><meta name="Ajax-Response" content="redirect" /><meta name="Location" content="home.seam?cid=3774801" /></head></html>
it's a pitty because I can see in the inspctor that the response for the post-request in the browser yields the exact data I am looking for. Similarl the first post-request yields the exact same data as my python command r = s.post(url, data=login_data, headers = headers, verify=False). But the print(r.content) as already said seams to be a redirect which only brings me back to the login site, stating you're already logged in.
To sum up:
The first request.Session.get & -.post worked (I get the same response html as in the Google Chrome Inspector).
The second request.Session.post doesnt work as it just yields some weird redirect (but I get
the correct response in the Google Chrome Inspector).
What am I missing??? Please Help! :S

ruby http request freeze with SSL

I'm trying to download images with ruby and found interesting issue
Its part of my code for downloading an image (HTTP request only):
HTTParty.get(url)
or with
Net::HTTP.new(URI.parse(url))
and when I'm trying to download an image from Nike
url = 'https://c.static-nike.com/a/images/t_PDP_1728_v1/f_auto,b_rgb:f5f5f5/bfau7aauvleh5puvuiqa/zoom-pegasus-turbo-mens-running-shoe-Z163c3.jpg'
it works well
but for some reasons, it freezes when I'm opening Adidas:
url = 'https://www.adidas.com.sg/dis/dw/image/v2/bcbs_prd/on/demandware.static/-/Sites-adidas-products/default/dw0eb054ad/zoom/G27805_01_standard.jpg'
I have suck logs
SSL established
<- "GET /dis/dw/image/v2/bcbs_prd/on/demandware.static/-/Sites-adidas-products/default/dw0eb054ad/zoom/G27805_01_standard.jpg HTTP/1.1\r\nUser-Agent: Mozilla/5.0\r\nConnection: close\r\nHost: www.adidas.com.sg\r\n\r\n"
tried to switch off SSL validation by
verify: false,
but it doesn't solve my pain ¯\_(ツ)_/¯
However, it works well with curl -O for both URLs
There is filtering being done on the server side for the Adidas URL, likely to prevent automated scraping. At a minimum you must specify additional headers to successfully make a connection.
The following example successfully returns a response from the Adidas URL:
url = 'https://www.adidas.com.sg/dis/dw/image/v2/bcbs_prd/on/demandware.static/-/Sites-adidas-products/default/dw0eb054ad/zoom/G27805_01_standard.jpg'
headers = {
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' => 'br, gzip, deflate',
'Accept-Language' => 'en-us'
}
response = HTTParty.get(url, headers: headers)
=> #<HTTParty::Response:0x7fcb02856298 parsed_response="\xFF\xD8\xFF\xE0\x00\x10JFIF ...
The three headers listed are the only headers required to get a response, but all three headers are required.
You can see from the returned response that it is returning a JPEG, so this example should work as requested.
It's possible that they block requests when some specific headers are missing, so you might want to set some of them:
HTTParty.get(url, { headers: {
"User-Agent" => "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/7.0.4 Mobile/16B91 Safari/605.1.15",
"Accept-Language" => "en-US,en;q=0.9,bg;q=0.8",
"Accept-Encoding" => "gzip, deflate, br"
}
})

scrapy get data by post method but got 403

I used both F12(Chrome) and postman to check the request and its detailed info on site
http://www.zhihu.com/
(email:jianguo.bai#hirebigdata.cn, password:wsc111111), then go to
http://www.zhihu.com/people/hynuza/columns/followed
I want to get all the columns the people Hynuza had followed which is 105 currently. When open the page, there is only 20 of them, then I need to scroll down to get more. Each time I scroll down the details of the request is like this:
Remote Address:60.28.215.70:80
Request URL:http://www.zhihu.com/node/ProfileFollowedColumnsListV2
Request Method:POST
Status Code:200 OK
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate
Accept-Language:en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4
Connection:keep-alive
Content-Length:157
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie:_xsrf=f1460d2580fbf34ccd508eb4489f1097; q_c1=867d4a58013241b7b5f15b09bbe7dc79|1419217763000|1413335199000; c_c=2a45b1cc8f3311e4bc0e52540a3121f7; q_c0="MTE2NmYwYWFlNmRmY2NmM2Q4OWFkNmUwNjU4MDQ1OTN8WXdNUkVxRDVCMVJaODNpOQ==|1419906156|cb0859ab55258de9ea95332f5ac02717fcf224ea"; __utma=51854390.1575195116.1419486667.1419902703.1419905647.11; __utmb=51854390.7.10.1419905647; __utmc=51854390; __utmz=51854390.1419905647.11.9.utmcsr=zhihu.com|utmccn=(referral)|utmcmd=referral|utmcct=/people/hynuza/columns/followed; __utmv=51854390.100--|2=registration_date=20141222=1^3=entry_date=20141015=1
Host:www.zhihu.com
Origin:http://www.zhihu.com
Referer:http://www.zhihu.com/people/hynuza/columns/followed
User-Agent:Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36
X-Requested-With:XMLHttpRequest
Form Dataview sourceview URL encoded
method:next
params:{"offset":20,"limit":20,"hash_id":"18c79c6cc76ce8db8518367b46353a54"}
_xsrf:f1460d2580fbf34ccd508eb4489f1097
Then I use postman to simulate the request like this:
As you can see, it got want I wanted, and it worked even I logout this site.
According to all of this, I write my spider like this:
# -*- coding: utf-8 -*-
import scrapy
import urllib
from scrapy.http import Request
class PostSpider(scrapy.Spider):
name = "post"
allowed_domains = ["zhihu.com"]
start_urls = (
'http://www.zhihu.com',
)
def __init__(self):
super(PostSpider, self).__init__()
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'email': 'jianguo.bai#hirebigdata.cn', 'password': 'wsc111111'},
callback=self.login,
)
def login(self, response):
yield Request("http://www.zhihu.com/people/hynuza/columns/followed",
callback=self.parse_followed_columns)
def parse_followed_columns(self, response):
# here deal with the first 20 divs
params = {"offset": "20", "limit": "20", "hash_id": "18c79c6cc76ce8db8518367b46353a54"}
method = 'next'
_xsrf = 'f1460d2580fbf34ccd508eb4489f1097'
data = {
'params': params,
'method': method,
'_xsrf': _xsrf,
}
r = Request(
"http://www.zhihu.com/node/ProfileFollowedColumnsListV2",
method='POST',
body=urllib.urlencode(data),
headers={
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cache-Control': 'no-cache',
'Cookie': '_xsrf=f1460d2580fbf34ccd508eb4489f1097; '
'c_c=2a45b1cc8f3311e4bc0e52540a3121f7; '
'__utmt=1; '
'__utma=51854390.1575195116.1419486667.1419855627.1419902703.10; '
'__utmb=51854390.2.10.1419902703; '
'__utmc=51854390; '
'__utmz=51854390.1419855627.9.8.utmcsr=zhihu.com|utmccn=(referral)|utmcmd=referral|utmcct=/;'
'__utmv=51854390.100--|2=registration_date=20141222=1^3=entry_date=20141015=1;',
'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) '
'Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36',
'host': 'www.zhihu.com',
'Origin': 'http://www.zhihu.com',
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
},
callback=self.parse_more)
r.headers['Cookie'] += response.request.headers['Cookie']
print r.headers
yield r
print "after"
def parse_more(self, response):
# here is where I want to get the returned divs
print response.url
followers = response.xpath("//div[#class='zm-profile-card "
"zm-profile-section-item zg-clear no-hovercard']")
print len(followers)
Then I got 403 like this:
2014-12-30 10:34:18+0800 [post] DEBUG: Crawled (403) <POST http://www.zhihu.com/node/ProfileFollowedColumnsListV2> (referer: http://www.zhihu.com/people/hynuza/columns/followed)
2014-12-30 10:34:18+0800 [post] DEBUG: Ignoring response <403 http://www.zhihu.com/node/ProfileFollowedColumnsListV2>: HTTP status code is not handled or not allowed
So it never enter the parse_more.
I've been working for two days and still got nothing, any help or advice will be appreciated.
The login sequence is correct. However the parsed_followed_columns() method totally corrupts the session.
you cannot use hardcoded values for data['_xsrf'] and params['hash_id']
You should find a way to read this information directly from html content of previous page and inject the values dynamically.
Also, I suggest you remove the headers parameter in this request which can only cause trouble.

Resources