Python: User-agent information urllib retrieve() - user-agent

I am attempting to use the following code to save a image from URL using python:
image = urllib.URLopener()
image.retrieve("http://example.com/image.jpg","image.jpg")
The image saves as expected, I was wondering whether it would be possible to set assign a User-agent using the urllib method?

i dont think you can add custom headers while using urllib
but i know there are multiple ways to do it using urllib2
one way you could is like this:
import urllib2
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://example.com/image.jpg', None, headers)
html = urllib2.urlopen(req).read()
with open('download.jpg','r+') as f:
f.write(html)
this will download the image but the 'download.jpg' has to already exist
there are more ways to do it i would take a look at this Setting the User-Agent
also take a look at this Question
Good Luck!

Related

Data looks like binary data , but returns as string. How to save as image?

I'm grabbing image data using the Request module. The data that comes back looks like interpreted binary data like so:
`����JFIF��>CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), default quality
��C
 $.' ",#(7),01444'9=82<.342��C
2!!
I have tried saving using:
image = open("test.jpg", "wb")
image.write(image_data)
image.close()
But that complains that it needs a bytes-like object. I have tried doing result.text.encode() with various formats like "utf-8" etc but the resulting image file cannot be opened. I have also tried doing bytes(result.text, "utf-8") and bytearray(result.text, "utf-8") and same problem. I think those are all roughly equivalent, anyway. Can someone help me convert this to a bytes-like object without destroying the data?
Also, my headers in the request is 'image/jpeg' but it still sends me the data as a string.
Thanks!
Use the content field instead of text:
import requests
r = requests.get('https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png')
with open('test.png', 'wb') as file:
file.write(r.content)
See: https://requests.readthedocs.io/en/master/user/quickstart/#binary-response-content

Use Dash with websockets

What is the best way to use Dash with Websockets to build a real-time dashboard ? I would like to update a graph everytime a message is received but the only thing I've found is calling the callback every x seconds like the example below.
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_daq as daq
from dash.dependencies import Input, Output
import plotly
import plotly.graph_objs as go
from websocket import create_connection
from tinydb import TinyDB, Query
import json
import ssl
# Setting up the websocket and the necessary web handles
ws = create_connection(address, sslopt={"cert_reqs": ssl.CERT_NONE})
app = dash.Dash(__name__)
app.layout = html.Div(
[
dcc.Graph(id='live-graph', animate=True),
dcc.Interval(
id='graph-update',
interval=1*1000,
n_intervals=0)
]
)
#app.callback(Output('live-graph', 'figure'),
[Input('graph-update', 'n_intervals')])
def update_graph_live(n):
message = ws.recv()
x=message.get('data1')
y=message.get('data2')
.....
fig = go.Figure(
data = [go.Bar(x=x,y=y)],
layout=go.Layout(
title=go.layout.Title(text="Bar Chart")
)
)
)
return fig
if __name__ == '__main__':
app.run_server(debug=True)
Is there a way to trigger the callback everytime a message is received (maybe storing them in a database before) ?
This forum post describes a method to use websocket callbacks with Dash:
https://community.plot.ly/t/triggering-callback-from-within-python/23321/6
Update
Tried it, it works well. Environment is Windows 10 x64 + Python 3.7.
To test, download the .tar.gz file and run python usage.py. It will complain about some missing packages, install these. Might have to edit the address from 0.0.0.0 to 127.0.0.1 in usage.py. Browse to http://127.0.0.1:5000 to see the results. If I had more time, I'd put this example up on GitHub (ping me if you're having trouble getting it to work, or the original gets lost).
I had two separate servers: one for dash, the other one as a socket server. They are running on different ports. On receiving a message, I edited a common json file to share information to dash's callback. That's how I did it.

How to encode image to send over Python HTTP server?

I would like some help on my following handler:
class MyHandler(http.server.BaseHTTPRequestHandler):
def do_HEAD(client):
client.send_response(200)
client.send_header("Content-type", "text/html")
client.end_headers()
def do_GET(client):
if client.path == "/":
client.send_response(200)
client.send_header("Content-type", "text/html")
client.end_headers()
client.wfile.write(load('index.html'))
def load(file):
with open(file, 'r') as file:
return encode(str(file.read()))
def encode(file):
return bytes(file, 'UTF-8')
I've got this, the function load() is someone else in the file. Sending a HTML page over my HTTP handler seems to be working, but how can I send an image? How do I need to encode it and what Content-type should I use?
Help is greatly appreciated!
(PS: I would like the image that is send to be seen in the browser if I connect to my httpserver)
For a PNG image you have to set the content-type to "image/png". For jpg: "image/jpeg".
Other Content types can be found here.
Edit: Yes, I forgot about encoding in my first edit.
The answer is: You don't! When you load your image from a file, it is in the correct encoding already.
I read about your codec problem: The problem is, as much I see in your load function. Don't try to encode the file content.
You may use for binary data this:
def load_binary(filename):
with open(filename, 'rb') as file_handle:
return file_handle.read()
As mentioned by Juergen you have to set the accordingly content-type.
This example I found may help you: https://github.com/tanzilli/playground/blob/master/python/httpserver/example2.py
The example is in Python 2, but the changes should be minor.
Ah and it's better to use self instead of client -> see PEP 8, Python's style guide

How to get image URI in Selenium?

I used the first answer to this question in order to adapt it to my need: saving pictures of a given URL on my laptop automatically. My problem is how to get the URI of every image that exist on the webpage in order to complete my code correctly:
import selenium
class TestFirefox:
def testFirefox(self):
self.driver=webdriver.Firefox()
# There are 2 pictures on google.com, I want to download them
self.driver.get("http://www.google.com")
self.l=[] # List to store URI to my images
self.r=self.driver.find_element_by_tag_name('img')
# I did print(self.r) but it does not reflect the URI of
# the image: which is what I want.
# What can I do to retrieve the URIs and run this:
self.l.append(self.image_uri)
for uri_to_img in self.l:
self.driver.get(uri_to_img)
# I want to download the images, but I am not sure
# if this is the good way to proceed since my list's content
# may not be correct for the moment
self.driver.save_screenshot(uri_to_image)
driver.close()
if __name__=='__main__':
TF=TestFirefox()
TF.testFirefox()
You need to get get src attribute of the given image in order to determine it's name and (possibly) address - remember, src can be also relative URI.
for img in self.l:
url = img.get_attribute("src")
For downloading image you should try simple HTTP client like urllib
import urllib.request
urllib.request.urlretrieve(url, "image.png")

Convert multiple querysets to json in django

I asked a related question earlier today
In this instance, I have 4 queryset results:
action_count = Action.objects.filter(complete=False, onhold=False).annotate(action_count=Count('name'))
hold_count = Action.objects.filter(onhold=True, hold_criteria__isnull=False).annotate(action_count=Count('name'))
visible_tags = Tag.objects.filter(visible=True).order_by('name').filter(action__complete=False).annotate(action_count=Count('action'))
hidden_tags = Tag.objects.filter(visible=False).order_by('name').filter(action__complete=False).annotate(action_count=Count('action'))
I'd like to return them to an ajax function. I have to convert them to json, but I don't know how to include multiple querysets in the same json string.
I know this thread is old, but using simplejson to convert django models doesn't work for many cases like decimals ( as noted by rebus above).
As stated in the django documentation, serializer looks like the better choice.
Django’s serialization framework provides a mechanism for
“translating” Django models into other formats. Usually these other
formats will be text-based and used for sending Django data over a
wire, but it’s possible for a serializer to handle any format
(text-based or not).
Django Serialization Docs
You can use Django's simplejson module. This code is untested though!
from django.utils import simplejson
dict = {
'action_count': list(Action.objects.filter(complete=False, onhold=False).annotate(action_count=Count('name')).values()),
'hold_count': list(Action.objects.filter(onhold=True, hold_criteria__isnull=False).annotate(action_count=Count('name')).values()),
...
}
return HttpResponse( simplejson.dumps(dict) )
I'll test and rewrite the code as necessary when I have the time to, but this should get you started.

Resources