How to get image URI in Selenium? - firefox

I used the first answer to this question in order to adapt it to my need: saving pictures of a given URL on my laptop automatically. My problem is how to get the URI of every image that exist on the webpage in order to complete my code correctly:
import selenium
class TestFirefox:
def testFirefox(self):
self.driver=webdriver.Firefox()
# There are 2 pictures on google.com, I want to download them
self.driver.get("http://www.google.com")
self.l=[] # List to store URI to my images
self.r=self.driver.find_element_by_tag_name('img')
# I did print(self.r) but it does not reflect the URI of
# the image: which is what I want.
# What can I do to retrieve the URIs and run this:
self.l.append(self.image_uri)
for uri_to_img in self.l:
self.driver.get(uri_to_img)
# I want to download the images, but I am not sure
# if this is the good way to proceed since my list's content
# may not be correct for the moment
self.driver.save_screenshot(uri_to_image)
driver.close()
if __name__=='__main__':
TF=TestFirefox()
TF.testFirefox()

You need to get get src attribute of the given image in order to determine it's name and (possibly) address - remember, src can be also relative URI.
for img in self.l:
url = img.get_attribute("src")
For downloading image you should try simple HTTP client like urllib
import urllib.request
urllib.request.urlretrieve(url, "image.png")

Related

How to check that a PDF file has some link with Ruby/Rspec?

I am using prawnpdf/pdf-inspector to test that content of a PDF generated in my Rails app is correct.
I would want to check that the PDF file contains a link with certain URL. I looked at yob/pdf-reader but haven't found any useful information related to this topic
Is it possible to test URLs within PDF with Ruby/RSpec?
I would want the following:
expect(urls_in_pdf(pdf)).to include 'https://example.com/users/1'
The https://github.com/yob/pdf-reader contains a method for each page called text.
Do something like
pdf = PDF::Reader.new("tmp/pdf.pdf")
assert pdf.pages[0].text.include? 'https://example.com/users/1'
assuming what you are looking for is at the first page
Since pdf-inspector seems only to return text, you could try to use the pdf-reader directly (pdf-inspector uses it anyways).
reader = PDF::Reader.new("somefile.pdf")
reader.pages.each do |page|
puts page.raw_content # This should also give you the link
end
Anyway I only did a quick look at the github page. I am not sure what raw_content exactly returns. But there is also a low-level method to directly access the objects of the pdf:
reader = PDF::Reader.new("somefile.pdf")
puts reader.objects.inspect
With that it surely is possible to get the url.

Carrierwave filename method creating issue when uploading file to s3

I have an ImageUploader and I want to upload an image to S3.
Also, I would like to change file name using filename method.
Here is the code:
class ImageUploader < CarrierWave::Uploader::Base
storage :fog
def store_dir
"images"
end
def filename
"#{model.id}_#{SecureRandom.urlsafe_base64(5)}.#{file.extension}" if original_filename
end
end
First time when I save an image, it gets a correct file name, e.g 1_23434.png but when I get the model object from the console, it returns a different image name.
Is there anyone here who can help me? It works fine when I don't use fog.
The problem is in the filename method. On every call, it returns a different value. This is because SecureRandom.urlsafe_base64(5) generates a random string (and it isn't cached). filename is also used under the hood to build path-related strings by CarrierWave. This is why you are getting different image name when you run object.image.filename from the console.
The method that you are looking for is image_identifier (where image prefix is under what name your uploader is mounted).
You can try something like:
object.public_send("#{object.image.mounted_as}_identifier") || generate_unique_name
where generate_unique_name is your current filename implementation. Another approach is storing the hash in the model itself for the future use.
Also, the official wiki page about creating random and unique filenames might be useful for you.

Manually populate an ImageField

I have a models.ImageField which I sometimes populate with the corresponding forms.ImageField. Sometimes, instead of using a form, I want to update the image field with an ajax POST. I am passing both the image filename, and the image content (base64 encoded), so that in my api view I have everything I need. But I do not really know how to do this manually, since I have always relied in form processing, which automatically populates the models.ImageField.
How can I manually populate the models.ImageField having the filename and the file contents?
EDIT
I have reached the following status:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
And this is updating the file reference, using the right value configured in upload_to in the ImageField.
But it is not saving the image. I would have imagined that the first .save call would:
Generate a file name in the configured storage
Save the file contents to the selected file, including handling of any kind of storage configured for this ImageField (local FS, Amazon S3, or whatever)
Update the reference to the file in the ImageField
And the second .save would actually save the updated instance to the database.
What am I doing wrong? How can I make sure that the new image content is actually written to disk, in the automatically generated file name?
EDIT2
I have a very unsatisfactory workaround, which is working but is very limited. This illustrates the problems that using the ImageField directly would solve:
# TODO: workaround because I do not yet know how to correctly populate the ImageField
# This is very limited because:
# - only uses local filesystem (no AWS S3, ...)
# - does not provide the advance splitting provided by upload_to
local_file = os.path.join(settings.MEDIA_ROOT, file_name)
with open(local_file, 'wb') as f:
f.write(data)
instance.image = file_name
instance.save()
EDIT3
So, after some more playing around I have discovered that my first implementation is doing the right thing, but silently failing if the passed data has the wrong format (I was mistakingly passing the base64 instead of the decoded data). I'll post this as a solution
Just save the file and the instance:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
No idea where the docs for this usecase are.
You can use InMemoryUploadedFile directly to save data:
file = cStringIO.StringIO(base64.b64decode(request.POST['file']))
image = InMemoryUploadedFile(file,
field_name='file',
name=request.POST['name'],
content_type="image/jpeg",
size=sys.getsizeof(file),
charset=None)
instance.image = image
instance.save()

Avoid repeated calls to an API in Jekyll Ruby plugin

I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22
To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.

amazon s3 and carrierwave random image name in bucket does not match in database

I'm using carrier wave, rails and amazon s3. Every time I save an image, the image shows up in s3 and I can see it in the management console with the name like this:
https://s3.amazonaws.com/bucket-name/
uploads/images/10/888fdcfdd6f0eeea_1351389576.png
But in the model, the name is this:
https://bucket-name.s3.amazonaws.com/
uploads/images/10/b3ca26c2baa3b857_1351389576.png
First off, why is the random name different? I am generating it in the uploader like so:
def filename
if original_filename
"#{SecureRandom::hex(8)}_#{Time.now.to_i}#{File.extname(original_filename).downcase}"
end
end
I know it is not generating a random string every call because the wrong url in the model is consistent and saved. Somewhere in the process a new one must be getting generated to save in the model after the image name has been saved and sent to amazon s3. Strange.
Also, can I have the url match the one in terms of s3/bucket instead of bucket.s3 without using a regex? Is there an option in carrierwave or something for that?
CarrierWave by default doesn't store the URL. Instead, it generates it every time you need it.
So, every time filename is called it will return a different value, because of Time.now.to_i.
Use created_at column instead, or add a new column for storing the random id or the full filename.
I solved it by saving the filename if it was still the original filename. In the uploader, put:
def filename
if original_filename && original_filename == #filename
#filename = "#{any_string}#{File.extname(original_filename).downcase}"
else
#filename
end
end
The issue of the sumbdomain versus the path is not actually an issue. It works with the subdomain. I.e. https://s3.amazonaws.com/bucket-name/ and https://bucket-name.s3.amazonaws.com/ both work fine.

Resources