I am getting 400 response for a http post request. How do I resolve this issue? - http-post

I have been trying to make a simple web request using python post data, the response ,
The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
The code below posts a request using python requests library. 400 response is received when executed. Could this issue be due to header syntax or format issues.
code:
headers = {
'Host': 'host.url',
'Content-Length': '1847',
'Sec-Ch-Ua': '"Chromium";v="95", ";Not A Brand";v="99"',
'Accept': 'application/json, text/plain, /',
'Content-Type': 'application/json',
'Authorization': 'auth-key',
'Sec-Ch-Ua-Mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36',
'Sec-Ch-Ua-Platform': '"Windows"',
'Origin': 'origin.url',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'referer.url',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'close',
}
data = {}
import json
json_object = json.dumps(data, indent = 4)
response = requests.post('url', data=json_object ,headers=headers, verify=False)
print(response.text)

Related

Web scraping beginner. AJAX POST request not working

I have just started out with web scraping.
The data I need seems to be returned by an AJAX POST request. POST requests are very rarely covered by scraping tutorials and seem to come with lots of "gotcha's" for new users like myself.
I copied the request from Chrome dev tools into Postman using cURL and then generated the Python request code. The request uses a peculiar set query parameters... I have however repeated this process and the only parameter that changes is the session ID.
The problem is that the request stops working after some time has elapsed (Internal server error 500). I would then have to copy the request from the site again with the new session ID.
Any pointers in the right direction would be appreciated.
import requests
url = "https://online.natis.gov.za/gateway/PreBooking?_flowId=PreBooking&_flowExecutionKey=e1s2&flowName=[object%20Object]&_eventId_next=Next?dtoName=perSummaryDetailDto&viewId=perSummaryDetail&flowExecutionKey=e1s2&flowExecutionUrl=%2Fgateway%2FPreBooking%3F_flowId%3DPreBooking%26_flowExecutionKey%3De1s2&sessionId=IWhelPTLyYDa7JohJV6x8So_qEKdC8wOknArAXkS&surname={SURNAME}&initials=R&firstName1={FIRSTNAME}&emailAddress={EMAIL}&cellN={CELL}&isWithinPriorityDate=false&viewPrioritySlots=false&showPrioritySlotsModal=false&provcdt=4&supportUser=false"
payload = {}
headers = {
'Connection': 'keep-alive',
'Content-Length': '0',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'Accept': 'application/json, text/plain, */*',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
'Origin': 'https://online.natis.gov.za',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://online.natis.gov.za/',
'Accept-Language': 'en-US,en;q=0.9',
'Cookie': 'JSESSIONID=IWhelPTLyYDa7JohJV6x8So_qEKdC8wOknArAXkS.master:gateway_3; Gateway=R35619282; ROUTEID.33f40c02f95309866c572c0def16f016=.node1; JSESSIONID=BadmtwJ7c8YWEz73xe6Wu165Q7gapmm4WTY6at-p.master:gateway_3; Gateway=R35619282',
'dnt': '1',
'sec-gpc': '1'
}
response = requests.request(
"POST", url, headers=headers, data=payload, verify=False)
print(response.text)

No response for XHR request in python with requests.get()

I want to scrape german poll data from a server. Here, I search for an examplary street, straße "Judengasse".
I have been trying to reproduce this. Unfortunately, the link from the reference is not intact anymore, so I couldn't directly compare it to my problem. Since I am fairly inexperienced, I do not know what is exactly needed to reproduce the request that is submitted via the web interface.
I don't now which attributes of the header are needed for my request to work and what of it might be redundant. In Chrome's inspect mode I see that in my case there are more header attributes than in the referenced example.
My code so far (which does not work) from trying to reproduce the SE post:
import requests
url = 'https://online-service2.nuernberg.de/Finder/action/getItems'
data = {
"finder":"Wahlraumfinder",
"strasse":"Judengasse",
"hausnummer":"0"
}
headers = {
'Host': 'online-service2.nuernberg.de',
'Referer': 'https://online-service2.nuernberg.de/Finder/?Wahlraumfinder',
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection': 'keep-alive',
'Content-Length': '312',
'Content-Type': 'multipart/form-data; boundary=----WebKitFormBoundaryeJZfrnZATOw6B5By',
'DNT': '1',
'Host': 'online-service2.nuernberg.de',
'Referer': 'https://online-service2.nuernberg.de/Finder/?Wahlraumfinder',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
response = requests.get(url, data=data, headers=headers)
I don't get a respone. I added all request headers to headers.
Not sure, if more headers are needed.
Further, I am not sure if the url is correct.
I am looking to generate output of the following form, for this specific request "Judengasse":
Nr 0652
Wahllokal Willstätt.-Gym., Innerer Laufer Platz 11
This corresponds to putting in "Judengasse" into the search bar and hitting go on
the search "Suche" and extracting parts of the first output box "Wahl-/Stimmbezirk"
When I look at the XHR in Chrome's dev mode:
General
Request URL: https://online-service2.nuernberg.de/Finder/action/getItems
Request Method: POST
Status Code: 200 OK
Remote Address: 193.22.166.102:443
Referrer Policy: no-referrer-when-downgrade
Response Header
Connection: Keep-Alive
Content-Length: 1149
Content-Type: application/json;charset=UTF-8
Date: Wed, 04 Dec 2019 00:21:30 GMT
Keep-Alive: timeout=5, max=100
Server: Apache
Request Header
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Connection: keep-alive
Content-Length: 312
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryx2jHYJHo3ejnKw0l
DNT: 1
Host: online-service2.nuernberg.de
Origin: https://online-service2.nuernberg.de
Referer: https://online-service2.nuernberg.de/Finder/?Wahlraumfinder
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
X-Requested-With: XMLHttpRequest
From Data
------WebKitFormBoundaryx2jHYJHo3ejnKw0l
Content-Disposition: form-data; name="action"
"action/getItems"
------WebKitFormBoundaryx2jHYJHo3ejnKw0l
Content-Disposition: form-data; name="data"
{"finder":"Wahlraumfinder","strasse":"Judengasse","hausnummer":"0"}
------WebKitFormBoundaryx2jHYJHo3ejnKw0l--
Thank you for reading.
After some research I finally managed to get a 200 response from this server.
Firstly, requests.get in this case should be replace by requests.post, since you want to replicate an HTTP POST request, according to the info you got from Chrome's dev mode, "General" section.
Secondly, from the headers we can see that the data is sent as being of type "multipart/form-data" request. As far as I could understand, this is a type of request that is used to send files instead of regular data (more about this type of request here).
So, I converted the string sent through the POST request to binary (this is achieved by prepending b) and passed it to the files parameter of the request. For some reason, this parameter requires a tuple (a, b) inside a set {c}, hence the {(None, data)}.
I also passed the street name as a parameter to data, so it's easier to manipulate it.
I got this working code (I'm using my browser's request):
import requests
url = 'https://online-service2.nuernberg.de/Finder/action/getItems'
street = b'Judengasse'
data = b'-----------------------------15242581323522\r\n' \
b'Content-Disposition: form-data; name=\"action\"\r\n\r\n' \
b'\"action/getItems\"\r\n-----------------------------15242581323522\r\n' \
b'Content-Disposition: form-data; name="data"\r\n\r\n' \
b'{\"finder\":\"Wahlraumfinder\",\"strasse\":\"%s\",\"hausnummer\":\"0\"}\r\n' \
b'-----------------------------15242581323522--' % street
headers = {"Host": "online-service2.nuernberg.de",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "multipart/form-data; boundary=---------------------------15242581323522",
"Content-Length": "321",
"Origin": "https://online-service2.nuernberg.de",
"DNT": "1",
"Connection": "keep-alive",
"Referer": "https://online-service2.nuernberg.de/Finder/?Wahlraumfinder",
}
multipart_data = {(None, data,)}
response = requests.post(url, files=multipart_data, headers=headers)
print(response.text)
I got this raw response:
{"id":"8c4f7a57-1bd6-423a-8ab8-e1e40e1e3852","items":[{"zeilenbeschriftung":"Wahl-/Stimmbezirk","linkAdr":null,"mapUrl":"http://online-service.nuernberg.de/Themenstadtplan/sta_gebietsgli
ederungen.aspx?p_urlvislayer=Stimmbezirke&XKoord=4433503.05&YKoord=5480253.301&Zaehler=1&Textzusatz=Judengasse+0&z_XKoord=4433670.0&z_YKoord=5480347.0&z_Zaehler=1&z_Textzusatz=Wahllokal%
20Willst%E4tt.-Gym.%2C+Innerer+Laufer+Platz+11","items":["0652","Judengasse, Neue Gasse","Willstätt.-Gym., Innerer Laufer Platz 11","Zi. 101 ,1. OG",null]},{"zeilenbeschriftung":"Stimmkr
eis Landtagswahl","linkAdr":null,"mapUrl":"http://online-service.nuernberg.de/Themenstadtplan/sta_gebietsgliederungen.aspx?p_urlvislayer=Stimmkreis_LTW&XKoord=4433503.05&YKoord=5480253.3
01&Zaehler=1&Textzusatz=Judengasse+0&p_scale=100000","items":["501","Nürnberg-Nord"]},{"zeilenbeschriftung":"Wahlkreis Bundestagswahl","linkAdr":null,"mapUrl":"http://online-service.nuer
nberg.de/Themenstadtplan/sta_gebietsgliederungen.aspx?p_urlvislayer=Wahlkreis_BTW&XKoord=4433503.05&YKoord=5480253.301&Zaehler=1&Textzusatz=Judengasse+0&p_scale=150000","items":["244","N
ürnberg-Nord"]}],"status":200}
which you can easily parse to get the result you expect:
print(response.json()["items"][0]["items"])
yilding...
['0652', 'Judengasse, Neue Gasse', 'Willstätt.-Gym., Innerer Laufer Platz 11', 'Zi. 101 ,1. OG', None]
Hope it helps.
Regards

Why is $request->file('document') not being recognized? Always is null

For some reason, I cannot quite pin-point the reason for my application to fail to send a file with the request body. I have tried all manners of configuring the request header, but to no avail. I know it's not my end-point in Laravel because Postman works just fine with it.
Various header comibinations I have tried:
'Content-Type': 'application/x-www-form-urlencoded',
'Content-Transfer-Encoding': 'multipart/form-data'
OR
'Content-Type': 'multipart/form-data',
'Content-Transfer-Encoding': 'multipart/form-data'
OR
'Content-Type': 'application/json; charset=UTF-8',
'Content-Transfer-Encoding': 'multipart/form-data'
When ever I have my endpoint check to see if $request->file('document') is null, it always comes back true.
Here is my api code:
public function store(Request $request)
{
$value = $request->file('document')->storeAs(
$request->input('path'), $request->input('name')
);
return response()->json($value, 201);
}
Here is the last set of header's that was used and failed:
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Content-Length: 164
Content-Transfer-Encoding: multipart/form-data
Content-Type: multipart/form-data; charset=UTF-8
Cookie: remember_web_59ba36addc2b2f9401580f014c7f58ea4e30989d=eyJpdiI6IjhZMXRwM3BSYnlsMUdlUHQ1OEVzZkE9PSIsInZhbHVlIjoibFJBV3AxaHU0T3BJY1M5UGRQZG5YdmhxTndWYXRRRHFsZlhEZ0tNa1NqWnlFZndwUGdkeGxFNzZXVW53OUxKMWJ0Q0s3VkFxZTM5T1dKUTdQVE5HbHVhcHBoS29rMllQb1wvbUhKeWFMcjdOOGU3elRYWWlyV3daY1duUUZCb1k1amE3aEVHWEN5SkJLZFVCNnNlRlJIa0hVT2FGb1poVjhCZzVOR21EMUttND0iLCJtYWMiOiI2YjFjYTA5MzcyYzcxMDk4OWFmNzJlNTMzMzQ0ODRkYTZmYzEzZDNjYmQ2YTdiNmZhZWFhODc2NWM0MWExMzZiIn0%3D; XSRF-TOKEN=eyJpdiI6Ikh4R1JhSXJ4M1IycmJTNmFsRjRic0E9PSIsInZhbHVlIjoib0xpQzZqaHRRRFQ3V0RQU0lVT3VSdmU4RFwvS3MwSWpSeTdmOURVZE9kRlhPaTBFeWlBOHljQ1F4aGt0VEFIbWwiLCJtYWMiOiI3MjcyMGM5YzIwZjE5NTFkOTQyNjA3MDlmOTJjMjY0OTg2NGViZWY5NzYwZmJlNGEyYmM0MzFmNDYxMDRlN2U5In0%3D; conversion_session=eyJpdiI6InIwc2NRMzZhc1RZdWZtMDl0OFVMaHc9PSIsInZhbHVlIjoiNmY5MjJUYTNteW5zVG9MVmlmXC90a0FCaTlEY3VhS0w2UXYreUtmdG5yUE5IUnpWVE1qTWthemdnSzRKbmE5NWEiLCJtYWMiOiJjNDkzZTVlNzE0YjBhMDRiNDU3ZmQxMzNlZDkzMjQ1MWQ2YTcxMzYxODU0ZGMyYTFkOTZhYjA2ZmUwNDZhOGQ2In0%3D
Host: conversion:8000
Origin: http://conversion:8000
Referer: http://conversion:8000/accounts/1/projects/1?_token=
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
X-XSRF-TOKEN: // Not pasted here
I discovered, that I was assigning my post data incorrectly for the files. A tutorial had instructed my incorrectly and I found another one that showed me the correct way.

Unable to Send Data in header to Spring Boot application in Angular 6

I am trying to hit my Spring Boot APIs using Angular 6. I am sending some data as a part of headers like this
const headers = new HttpHeaders({
'X-TenantID': tenantId,
'Accept': 'application/json'
});
this.httpClient.get(this.constants.urls.TENANT.VALIDATE + '/' + tenantId, {headers: headers });
At the back end, I am using interceptor and there i am trying to get 'X-TenantID' from my request Headers like this
if (request.getHeader("X-TenantID") != null) {
String tenantName = request.getHeader("X-TenantID");
ThreadLocaleStorage.setTenant(tenantName);
return tenantName;
}
And unfortunately its always returning null value for 'X-TenantID'. When I tried to print all headers it giving me following response
host
localhost:8080
connection
keep-alive
access-control-request-method
GET
origin
http://localhost:4200
user-agent
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36
dnt
1
access-control-request-headers
**x-tenantid**
accept
*/*
accept-encoding
gzip, deflate, br
accept-language
en-US,en;q=0.9
It's clear x-tenantid is present in headers but why I am not getting the value for it.
Please help me, how can I get this value from headers.

scrapy get data by post method but got 403

I used both F12(Chrome) and postman to check the request and its detailed info on site
http://www.zhihu.com/
(email:jianguo.bai#hirebigdata.cn, password:wsc111111), then go to
http://www.zhihu.com/people/hynuza/columns/followed
I want to get all the columns the people Hynuza had followed which is 105 currently. When open the page, there is only 20 of them, then I need to scroll down to get more. Each time I scroll down the details of the request is like this:
Remote Address:60.28.215.70:80
Request URL:http://www.zhihu.com/node/ProfileFollowedColumnsListV2
Request Method:POST
Status Code:200 OK
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate
Accept-Language:en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4
Connection:keep-alive
Content-Length:157
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie:_xsrf=f1460d2580fbf34ccd508eb4489f1097; q_c1=867d4a58013241b7b5f15b09bbe7dc79|1419217763000|1413335199000; c_c=2a45b1cc8f3311e4bc0e52540a3121f7; q_c0="MTE2NmYwYWFlNmRmY2NmM2Q4OWFkNmUwNjU4MDQ1OTN8WXdNUkVxRDVCMVJaODNpOQ==|1419906156|cb0859ab55258de9ea95332f5ac02717fcf224ea"; __utma=51854390.1575195116.1419486667.1419902703.1419905647.11; __utmb=51854390.7.10.1419905647; __utmc=51854390; __utmz=51854390.1419905647.11.9.utmcsr=zhihu.com|utmccn=(referral)|utmcmd=referral|utmcct=/people/hynuza/columns/followed; __utmv=51854390.100--|2=registration_date=20141222=1^3=entry_date=20141015=1
Host:www.zhihu.com
Origin:http://www.zhihu.com
Referer:http://www.zhihu.com/people/hynuza/columns/followed
User-Agent:Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36
X-Requested-With:XMLHttpRequest
Form Dataview sourceview URL encoded
method:next
params:{"offset":20,"limit":20,"hash_id":"18c79c6cc76ce8db8518367b46353a54"}
_xsrf:f1460d2580fbf34ccd508eb4489f1097
Then I use postman to simulate the request like this:
As you can see, it got want I wanted, and it worked even I logout this site.
According to all of this, I write my spider like this:
# -*- coding: utf-8 -*-
import scrapy
import urllib
from scrapy.http import Request
class PostSpider(scrapy.Spider):
name = "post"
allowed_domains = ["zhihu.com"]
start_urls = (
'http://www.zhihu.com',
)
def __init__(self):
super(PostSpider, self).__init__()
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'email': 'jianguo.bai#hirebigdata.cn', 'password': 'wsc111111'},
callback=self.login,
)
def login(self, response):
yield Request("http://www.zhihu.com/people/hynuza/columns/followed",
callback=self.parse_followed_columns)
def parse_followed_columns(self, response):
# here deal with the first 20 divs
params = {"offset": "20", "limit": "20", "hash_id": "18c79c6cc76ce8db8518367b46353a54"}
method = 'next'
_xsrf = 'f1460d2580fbf34ccd508eb4489f1097'
data = {
'params': params,
'method': method,
'_xsrf': _xsrf,
}
r = Request(
"http://www.zhihu.com/node/ProfileFollowedColumnsListV2",
method='POST',
body=urllib.urlencode(data),
headers={
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cache-Control': 'no-cache',
'Cookie': '_xsrf=f1460d2580fbf34ccd508eb4489f1097; '
'c_c=2a45b1cc8f3311e4bc0e52540a3121f7; '
'__utmt=1; '
'__utma=51854390.1575195116.1419486667.1419855627.1419902703.10; '
'__utmb=51854390.2.10.1419902703; '
'__utmc=51854390; '
'__utmz=51854390.1419855627.9.8.utmcsr=zhihu.com|utmccn=(referral)|utmcmd=referral|utmcct=/;'
'__utmv=51854390.100--|2=registration_date=20141222=1^3=entry_date=20141015=1;',
'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) '
'Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36',
'host': 'www.zhihu.com',
'Origin': 'http://www.zhihu.com',
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
},
callback=self.parse_more)
r.headers['Cookie'] += response.request.headers['Cookie']
print r.headers
yield r
print "after"
def parse_more(self, response):
# here is where I want to get the returned divs
print response.url
followers = response.xpath("//div[#class='zm-profile-card "
"zm-profile-section-item zg-clear no-hovercard']")
print len(followers)
Then I got 403 like this:
2014-12-30 10:34:18+0800 [post] DEBUG: Crawled (403) <POST http://www.zhihu.com/node/ProfileFollowedColumnsListV2> (referer: http://www.zhihu.com/people/hynuza/columns/followed)
2014-12-30 10:34:18+0800 [post] DEBUG: Ignoring response <403 http://www.zhihu.com/node/ProfileFollowedColumnsListV2>: HTTP status code is not handled or not allowed
So it never enter the parse_more.
I've been working for two days and still got nothing, any help or advice will be appreciated.
The login sequence is correct. However the parsed_followed_columns() method totally corrupts the session.
you cannot use hardcoded values for data['_xsrf'] and params['hash_id']
You should find a way to read this information directly from html content of previous page and inject the values dynamically.
Also, I suggest you remove the headers parameter in this request which can only cause trouble.

Resources