I'm having a bit of a hard time receiving an image using sockets. I think the problem is related to the fact that sockets send both a header and the actual image, and that the two need different decoding.
This is the code:
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(
'GET http://www.py4inf.com/cover.jpg HTTP/1.0\n\n'.encode('utf-8'))
count = 0
fhand = open("stuff.jpg", "wb")
while True:
data = mysock.recv(512)
if len(data) < 1:
break
fhand.write(data)
mysock.close()
fhand.close()
Yes, there is a header. The end of it is after the first \r\n\r\n sequence. Once you see that sequence send the rest to a file. Here's a crude fix:
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as mysock:
mysock.connect(('www.py4inf.com', 80))
mysock.send(b'GET http://www.py4inf.com/cover.jpg HTTP/1.0\n\n')
header = b''
while True:
data = mysock.recv(512)
if not data:
raise RuntimeError('no header?')
header += data
# end-of-header in buffer yet?
eoh = header.find(b'\r\n\r\n')
if eoh != -1:
break
# split the header off and keep data read after it.
header,data = header[:eoh+4],header[eoh+4:]
print(header.decode())
with open("stuff.jpg", "wb") as fhand:
fhand.write(data)
while True:
data = mysock.recv(512)
if not data:
break
fhand.write(data)
Here's the header. Note that the content length is in the header, so if you were to send an HTTP request with a keepalive, you would have to read exactly that many bytes after the header. since Connection: close is specified, you only have to read until no more data is received.
HTTP/1.1 200 OK
Date: Sun, 22 May 2016 23:22:20 GMT
Server: Apache
Last-Modified: Fri, 04 Dec 2015 19:05:04 GMT
ETag: "b294001f-111a9-526172f5b7cc9"
Accept-Ranges: bytes
Content-Length: 70057
Connection: close
Content-Type: image/jpeg
Related
I am sending a get request to any host using sockets tcp, but I keep on getting "301 Moved Permanently" from pages with https.
I have tried to do it by changing the port from 80 to 443.
I have tried with the ssl library as well.
But keep getting 301 code
This is the code
import socket
import click
#click.command()
#click.option("-h", "--host", prompt=True)
#click.option("-p", "--port", type=int, prompt=True, default=80)
def cli(host, port):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
message = f"GET / HTTP/1.1\r\nHost:{host}\r\nConnection: close\r\n\r\n"
request = message.encode('utf-8')
sent = 0
while sent < len(request):
sent = sent + sock.send(request[sent:])
response = b""
while True:
chunk = sock.recv(4096)
if len(chunk) == 0: # If no more data received, quitting
break
response = response + chunk
response_decode = response.decode('latin-1')
sock.close()
print(response_decode)
This is the response when I try to connect to www.eltiempo.com by port 80
HTTP/1.1 301 Moved Permanently
Server: AkamaiGHost
Content-Length: 0
Location: https://www.eltiempo.com/
Cache-Control: max-age=120
Expires: Sat, 12 Feb 2022 18:24:28 GMT
Date: Sat, 12 Feb 2022 18:22:28 GMT
Connection: close
Server-Timing: cdn-cache; desc=HIT
Server-Timing: edge; dur=1
version: desktop
I get this error with port 443
chunk = sock.recv(4096)
ConnectionResetError: [Errno 104] Connection reset by peer
Please tell me how to improve my code to avoid this 301 code.
I need check if response is an image.
For requirements of the work I need to generate the url of the photos that can exist or no and record the url that contains an image.
When the url generated doesn't show a photo the response of the website is an html when the body is:
<body>No File Found</body>
also the response.status =200
The response header doesn't have a valuable info for both results with image and No File Found
For instance
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Transfer-Encoding: chunked
Expires: 0
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
X-Frame-Options: AllowAll
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: *
Date: Tue, 13 Aug 2019 01:44:40 GMT
The way that I found to check if the response is an image for this case was:
try :
no_file_found = response.xpath("/html/body[contains(., 'No File Found')]")
except:
photo_url = response.url
photo = PhotoItem()
photo['id'] = id
photo['url'] = photo_url
yield photo
Because When the response is an image the line
no_file_found = response.xpath("/html/body[contains(., 'No File Found')]")
throw this exception:
raise NotSupported("Response content isn't text")
I know that this isn't an elegant solution , but for this context it works
Question
My question is If there is another way more elegant to solve this problem, that not use try to solve that.
Notice that I don't need to download the image just need to record the valid url
Any suggestion is welcome.
Thanks in advance!!!
The simplest way would probably be to just check the type of the response:
from scrapy.http.response.text import TextResponse
if not isinstance(response, TextResponse):
# it's probably an image; do image stuff
I have run into the same issue as this person: X-Drupal-Cache for Drupal 7 website always hits MISS, and can not find a way out.
I am running Drupal 7 - Pressflow
and
Varnish 4.0
When I curl I get this result:
TTP/1.1 200 OK
Date: Fri, 08 Jul 2016 17:45:08 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Set-Cookie: __cfduid=db5fd757e7485622ac16af86f292603f51467999908; expires=Sat, 08-Jul-17 17:45:08 GMT; path=/; domain=.adland.tv; HttpOnly
X-Content-Type-Options: nosniff
**X-Drupal-Cache: MISS**
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: public, max-age=86400
X-Content-Type-Options: nosniff
Content-Language: en
X-Generator: Drupal 7 (http://drupal.org)
Last-Modified: Fri, 08 Jul 2016 17:41:27 GMT
Vary: Accept-Encoding
X-Varnish: 196743 3
Age: 213
Via: 1.1 varnish-v4
**X-Cache: HIT**
X-Cache-Hits: 22
Server: cloudflare-nginx
CF-RAY: 2bf55922d49b23d8-IAD
isvarnishworking.com tells me: "You deserve a gold star, here you go: gold star badge"....
While the "Varnish Indicator Chrome Extension" suggested in the linked Drupal org thread, tells me Varnish missed, on every single page of my website regardless if I am logged in or not.
If I turn Drupal cache for anonymous users at admin/config/development/performance off, Varnish will not work at all. If I set different minimum cache lifetimes there, it makes no difference.
In my settings.php I have this:
$conf['varnish_version'] = 4;
$conf['reverse_proxy'] = True;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');
$conf['page_cache_invoke_hooks'] = FALSE;
$conf['page_cache_maximum_age'] = 86400;
$conf['cache_backends'][] = 'sites/all/modules/varnish/varnish.cache.inc';
$conf['cache_class_cache_page'] = 'VarnishCache';
$conf['reverse_proxy_header'] = 'HTTP_X_FORWARDED_FOR';
$conf['omit_vary_cookie'] = True;
$conf['drupal_http_request_fails'] = FALSE;
and this
$conf['cache_backends'][] = 'sites/all/modules/filecache/filecache.inc';
$conf['cache_backends'][] = 'sites/all/modules/authcache/authcache.cache.inc';
$conf['cache_backends'][] = 'sites/all/modules/authcache/modules/authcache_builtin/authcache_builtin.cache.inc';
$conf['cache_class_cache_page'] = 'DrupalFileCache';
while this has been commented out from the Varnish config in settings.php because if I don't, Varnish fails:
//$conf['cache'] = 1;
//$conf['cache_lifetime'] = 01080;
I have turned off all modules that could interfere, such as captcha modules, and I will note that the statistics won't count node hits correctly now, so something is being cached...
The VCL I use is grabbed straight from this github master with minimum changes
How can I troubleshoot this X-Drupal-Cache: MISS issue?
Your backend is clearly sending cookies:
Set-Cookie: __cfduid=db5fd757e7485622ac16af86f292603f51467999908; expires=Sat, 08-Jul-17 17:45:08 GMT; path=/; domain=.adland.tv; HttpOnly
In default configuration, Varnish will not cache a object coming from the backend with a Set-Cookie header present. Also, if the client sends a Cookie header, Varnish will bypass the cache and go directly to the backend.
I'm using Jmeter and would like to identify the endtime of each request for each user.
Please take a look my testplan:
Thread group: 2 users
loop:1
2 HTTP request (request_1, request_2)
Start testing Web performance, the View Result tree shows: 4 results (2 for request_1, 2 for request_2)
request_2: 1 passed and 1 failed. Look in request table of result tree, I see:
Thread Name: jp#gc - Stepping Thread Group 1-1
Sample Start: 2014-04-18 09:28:06 ICT
Load time: 1100554
Latency: 550450
Size in bytes: 408190
Headers size in bytes: 4774
Body size in bytes: 403416
Sample Count: 1
Error Count: 0
Response code: 200
Response message: OK
Response headers:
HTTP/1.1 200 OK
Date: Fri, 18 Apr 2014 02:28:15 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Set-Cookie: ls23166422738597439695-runtime-publicportal=h4knpfldt76e3kvmunrn5i4u16; path=/limesurvey/
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Last-Modified: Fri, 18 Apr 2014 02:36:09 GMT
Cache-Control: post-check=0, pre-check=0
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
HTTPSampleResult fields:
ContentType: text/html; charset=utf-8
DataEncoding: utf-8
The questions are:
How to identify the time which cause request_2 is fail ? and how to display the endtime of each request for each user ?
How to displays information in the log panel of Jmeter (enable DEbug log mode on GUI), like "This is error....due to..."
Besides, as in the log panel (active log debug in GUI), some time the log entries stop at Thread 1-n (n=1,2...), after that 30s, the log is continue showing. So, I wonder about this time, web server has error, and in this time, Jmeter still send request or waiting Web server response ?
Thanks.
It can be done via Beanshell Pre Processor which you can add as a child of any "interesting" request.Example code would look like:
import java.util.Date;
long end_time_ms = prev.getEndTime(); // obtain sampler end time (in milliseconds from 1st Jan 1970)
Date end_time_date = new Date(end_time_ms); //convert it to human-readable date if you prefer
String response_message = prev.getResponseMessage(); // get initial response message
StringBuilder response = new StringBuilder(); // initialize StringBuilder to construct new response
response.append(response_message); // add initial response message
response.append(System.getProperty("line.separator")); // add new line
response.append("Thread finished at: ").append(end_time_date); // add thread finish date
prev.setResponseMessage(response.toString()); // set new response message
log.info("Thread finished at:" + end_time_date"); // to print it to the log
See above for Beanshell code and image below for UI impact
Never use GUI for anything apart from developing or debugging tests. If you want to add something to the log use log.info("something"); as above or JMeter __log() function
I am doing some http request with WinHttp.lib,
while Get data with Range header, such as
"GET someURL\r\n Range: bytes=4096-8191,0-4095",
received the respone data after response header like this(according to RFC2616):
================================
--46228a661764c4210
Content-type: text/plain
Content-range: bytes 4096-8191/14065
...Content Data of Rang#1
--46228a661764c4210
Content-type: text/plain
Content-range: bytes 0-4095/14065
...Content Data of Rang#2
--46228a661764c4210--
Then ,is there any efficient way to extract the Content Data that i exactly expected of each range,while data's received as stream.
Have you tried Fiddler?