I have this string
#<Fletcher::Model::Amazon alt="You Are Not a Gadget: A Manifesto (Vintage)" border="0" element="img" height="240" id="prodImage" onload="if (typeof uet == 'function') { if(typeof setCSMReq=='function'){setCSMReq('af');setCSMReq('cf');}else{uet('af');uet('cf');amznJQ.completedStage('amznJQ.AboveTheFold');} }" onmouseout="sitb_doHide('bookpopover'); return false;" onmouseover="sitb_showLayer('bookpopover'); return false;" src="http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg" width="240">
I simply want the link in the src attribute:
http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg"
How can I parse this string to get the link
Below is a listing of relevant functions
module Fletcher
module Model
class Amazon < Fletcher::Model::Base
# A regular expression for determining if a url comes from a specific service/website
def self.regexp
/amazon\.com/
end
# Parse data and look for object attributes to give to object
def parse(data)
super(data)
case doc
when Nokogiri::HTML::Document
# Get Name
self.name = doc.css("h1.parseasinTitle").first_string
# Get Description
self.description = doc.css("div#productDescriptionWrapper").first_string
# Get description from meta title if not found
self.description = doc.xpath("//meta[#name='description']/#content").first_string if description.nil?
# Get Price
parse_price(doc.css("b.priceLarge").first_string)
# Get Images
self.images = doc.xpath("//table[#class='productImageGrid']//img").attribute_array
self.image = images.first
end
end
end
end
end
In that case I believe it would be: fletchedProduct.image[:src]
require 'open-uri'
x = %Q{#<Fletcher::Model::Amazon alt="You Are Not a Gadget: A Manifesto (Vintage)" border="0" element="img" height="240" id="prodImage" onload="if (typeof uet == 'function') { if(typeof setCSMReq=='function'){setCSMReq('af');setCSMReq('cf');}else{uet('af');uet('cf');amznJQ.completedStage('amznJQ.AboveTheFold');} }" onmouseout="sitb_doHide('bookpopover'); return false;" onmouseover="sitb_showLayer('bookpopover'); return false;" src="http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg" width="240">}
url = URI.extract(x)
puts url[2]
output:
http://ecx.images-amazon.com/images/I/51bpl1wA%2BaL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg
Hope this helps. I just happened to need to be able to do this last week and looked it up.
Related
For educative purposes I've been creating a spider to fetch data from a HTML-based marketplace. I managed to fetch all the text based data I need, however, I need to fetch the itemID, the data-mins-elapsed, the item quality and item trait. These items are not text based, but they are HTML classes and such.
The ItemID is a unique ID for each item. In the HTML code of the website it can be found under: (I need the number "319842588", this number is unique to each item)
<tr class="cursor-pointer" data-on-click-link="/pc/Trade/Detail/319842588" data-on-click-link-action="NewWindow" data-toggle="tooltip" data-original-title="" title="">
The data-mins-elapsed keeps track of when the item has been posted. This number will change everytime you refresh the webpage as time goes by. It can be found under: (I need the number "3", this number will change constantly)
<td class="bold hidden-xs" data-mins-elapsed="3">Now</td>
The itemquality is the quality of a certain item. In the HTML code of the website it can be found under: (I need the "superior", the quality is unique to each item)
<img class="trade-item-icon item-quality-superior"
alt="Icon"
src="/Content/icons/bow.png"
data-trait="Infused"
/>
The itemtrait is the trait of a certain item. In the HTML code of the website it can be found under: (I need the "Infused", the trait is unique to each item)
<img class="trade-item-icon item-quality-superior"
alt="Icon"
src="/Content/icons/bow.png"
data-trait="Infused"
/>
How do I build a XPATH or something similar to fetch these numbers?
Website link: https://eu.tamrieltradecentre.com/pc/Trade/SearchResult?SearchType=Sell&ItemID=10052&ItemNamePattern=Briarheart+Bow&IsChampionPoint=true&LevelMin=160&LevelMax=&ItemCategory1ID=&ItemCategory2ID=&ItemCategory3ID=&ItemQualityID=&ItemTraitID=3&PriceMin=&PriceMax=25000
Part of the relevant HTML code
This includes the HTML for each product, each product is listed in a TR with tha class "cursor-pointer"
<table class="trade-list-table max-width">
<thead>
...
</thead>
<tr class="cursor-pointer" data-on-click-link="/pc/Trade/Detail/319836098"
data-on-click-link-action="NewWindow" data-toggle="tooltip">
<td>
<img class="trade-item-icon item-quality-superior"alt="Icon"
src="/Content/icons/bow.png"data-trait="Infused"/>
<div class="item-quality-superior">
Briarheart Bow
</div>
<div>
Level:
<img class="small-icon" src="/Content/icons/championPoint.png" />
160
</div>
</td>
<td class="hidden-xs">
...
</td>
<td class="hidden-xs">
...
</td>
<td class="gold-amount bold">
...
</td>
<td class="bold hidden-xs" data-mins-elapsed="15"></td>
</tr>
Spider file
# -*- coding: utf-8 -*-
import scrapy
import os
import csv
class TTCSpider(scrapy.Spider):
name = "ttc_spider"
allowed_domains = ["eu.tamrieltradecentre.com"]
start_urls = ['https://eu.tamrieltradecentre.com/pc/Trade/SearchResult?ItemID=10052&SearchType=Sell&ItemNamePattern=Briarheart+Bow&ItemCategory1ID=&ItemCategory2ID=&ItemCategory3ID=&ItemTraitID=3&ItemQualityID=&IsChampionPoint=true&IsChampionPoint=false&LevelMin=160&LevelMax=&MasterWritVoucherMin=&MasterWritVoucherMax=&AmountMin=&AmountMax=&PriceMin=&PriceMax=25000']
def start_requests(self):
"""Read keywords from keywords file amd construct the search URL"""
with open(os.path.join(os.path.dirname(__file__), "../resources/keywords.csv")) as search_keywords:
for keyword in csv.DictReader(search_keywords):
search_text=keyword["keyword"]
url="https://eu.tamrieltradecentre.com/pc/Trade/{0}".format(search_text)
# The meta is used to send our search text into the parser as metadata
yield scrapy.Request(url, callback = self.parse, meta = {"search_text": search_text})
def parse(self, response):
containers = response.css('.cursor-pointer')
for container in containers:
#Defining the XPAths
XPATH_ITEM_NAME = ".//td[1]//div[1]//text()"
XPATH_ITEM_LEVEL = ".//td[1]//div[2]//text()"
XPATH_ITEM_LOCATION = ".//td[3]//div[1]//text()"
XPATH_ITEM_TRADER = ".//td[3]//div[2]//text()"
XPATH_ITEM_PRICE = ".//td[4]//text()[2]"
XPATH_ITEM_QUANTITY = ".//td[4]//text()[4]"
XPATH_ITEM_LASTSEEN = "Help me plis :3"
XPATH_ITEM_ITEMID = "Help me plis :3"
XPATH_ITEM_QUALITY = "Help me plis :3"
XPATH_ITEM_TRAIT = "Help me plis :3"
#Extracting from list
raw_item_name = container.xpath(XPATH_ITEM_NAME).extract()
raw_item_level = container.xpath(XPATH_ITEM_LEVEL).extract()
raw_item_location = container.xpath(XPATH_ITEM_LOCATION).extract()
raw_item_trader = container.xpath(XPATH_ITEM_TRADER).extract()
raw_item_price = container.xpath(XPATH_ITEM_PRICE).extract()
raw_item_quantity = container.xpath(XPATH_ITEM_QUANTITY).extract()
raw_item_lastseen = container.xpath(XPATH_ITEM_LASTSEEN).extract()
raw_item_itemid = container.xpath(XPATH_ITEM_ITEMID).extract()
raw_item_quality = container.xpath(XPATH_ITEM_QUALITY).extract()
raw_item_trait = container.xpath(XPATH_ITEM_TRAIT).extract()
#Cleaning the data
item_name = ''.join(raw_item_name).strip() if raw_item_name else None
item_level = ''.join(raw_item_level).replace('Level:','').strip() if raw_item_level else None
item_location = ''.join(raw_item_location).strip() if raw_item_location else None
item_trader = ''.join(raw_item_trader).strip() if raw_item_trader else None
item_price = ''.join(raw_item_price).strip() if raw_item_price else None
item_quantity = ''.join(raw_item_quantity).strip() if raw_item_quantity else None
item_lastseen = ''.join(raw_item_lastseen).strip() if raw_item_lastseen else None
item_itemid = ''.join(raw_item_itemid).strip() if raw_item_itemid else None
item_quality = ''.join(raw_item_quality).strip() if raw_item_quality else None
item_trait = ''.join(raw_item_trait).strip() if raw_item_trait else None
yield {
'item_name':item_name,
'item_level':item_level,
'item_location':item_location,
'item_trader':item_trader,
'item_price':item_price,
'item_quantity':item_quantity,
'item_lastseen':item_lastseen,
'item_itemid':item_itemid,
'item_quality':item_quality,
'item_trait':item_trait,
}
You can use built-in .re_first() to match regular expression for ItemID:
ItemID = container.xpath('./#data-on-click-link').re_first(r'(\d+)$') # same code for ItemQuality
ItemTrait = container.xpath('.//img[#data-trait]/#data-trait').get()
First of all you shouldn't be asking such questions, a simple google search should suffice. Nonetheless all you need is way to access data available in the attributes of a HTML Node. The way is using # as a prefix to attribute name. e.g: for accessing class attribute you would use div/#class.
For your problem I could suggest a XPath for one of your item, you should be able to take on from that.
XPATH_ITEM_LASTSEEN = ".//td[4]/#data-mins-elapsed"
Also, for getting 319842588 out of data-on-click-link="/pc/Trade/Detail/319842588", you can use XPATH similar to above in addition to python's inbuilt functions like replace() or split() to get the desired data. for example:
suppose you have -
x = "/pc/Trade/Detail/319842588"
# you could do something like
x = x.replace('/pc/Trade/Detail/','') OR x = x.split('/')[-1]
Hope that helps.
Cheers!!
I have run a model that predicts whether an image is a lake or an ocean. I have been able to serve this model successfully on my local host where I upload an image and it predicts the class (ocean or lake) as well as the probability/confidence. I can return that result or I can return the image, but for some reason I cannot return both the image and the prediction result.
I have searched stackoverflow and github and tried many different things per the comment code. I can display an image from the web, but I can't display the image that was uploaded. I have read and leveraged code from Github but that only returns the image without the prediction results
from flask import Flask, flash, request, redirect, url_for
import os
from werkzeug import secure_filename
from flask import send_from_directory
UPLOAD_FOLDER = ''
ALLOWED_EXTENSIONS = set(['jpg'])
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
#app.route('/', methods=['GET', 'POST'])
def upload_file():
if request.method == 'POST':
# check if the post request has the file part
if 'file' not in request.files:
flash('No file part')
return redirect(request.url)
file = request.files['file']
# if user does not select file, browser also
# submit an empty part without filename
if file.filename == '':
flash('No selected file')
return redirect(request.url)
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
image = open_image(filename)
image_url = url_for('uploaded_file', filename=filename)
#print(learn.predict(image))
ok = learn.predict(image)
first = round(ok[2].data.tolist()[0], 4)*100
second = round(ok[2].data.tolist()[1], 4)*100
if first > second:
okp = first
else:
okp = second
#return redirect(url_for('uploaded_file', filename=filename)) I can get this to work
#return '''url_for('uploaded_file', filename=filename)'''
#return '''<img src = "{{image}}"/>'''
#return '''<h1>The prediction is: {}</h1><h1>With a confidence of: {}%'''.format(ok[0], okp)
return '''<h1>The prediction is: {}</h1><h1>With a confidence of: {}%</h1>
<img src= "{{image_url}}" height = "85" width="200"/>'''.format(ok[0], okp)
#return '''<img src = "{{send_from_directory(app.config['UPLOAD_FOLDER'], filename)}}"/>'''
return '''
<!doctype html>
<title>Upload new File</title>
<h1>Upload a jpg of an Ocean or a Lake</h1>
<form method=post enctype=multipart/form-data>
<input type=file name=file>
<input type=submit value=Upload>
</form>
'''
#app.route('/uploads/<filename>')
def uploaded_file(filename):
return send_from_directory(app.config['UPLOAD_FOLDER'], filename)
if __name__ == '__main__':
app.run(port=5000, debug=False)
This is what I get:
The prediction is: oceans
With a confidence of: 94.66%
Then the icon of a pic when a pic is not there
I would like to show the image that was uploaded along with the results.
Just put the image_url value in your <img href="..."> attribute:
return '''<h1>The prediction is: {}</h1><h1>With a confidence of: {}%</h1>
<img src="{}" height = "85" width="200"/>'''.format(ok[0], okp, image_url)
You can't use {{image_url}} syntax, that'd require that you used the Jinja2 template feature of Flask.
image_url is the string that you generated for the uploaded_file() view, so the browser knows where to load the image from to fill the <img /> tag in the HTML page.
I am using the Plivo API. When I do a phone number search. I get this returned in the terminal:
{"stock"=>1,
"voice_enabled"=>true,
"region"=>"New York, UNITED STATES",
"voice_rate"=>"0.00900",
"prefix"=>"212",
"sms_rate"=>"0.00800",
"number_type"=>"local",
"setup_rate"=>"0.00000",
"rental_rate"=>"0.80000",
"group_id"=>"29753262281573",
"sms_enabled"=>true,
"resource_uri"=>
"/v1/Account/MAZDQ1ZJIYMDZKMMZKYM/AvailableNumberGroup/29753262281573/"}
How can I loop through it and render as HTML to get a result like this:
<div>
<ul>
<li>
Region: json_obj['region']
</li>
<li>
Prefix: json_obj['prefix']
</li>
and so on ...
</ul>
The plivo gem facilitates getting the returned object with:
obj = response.last
I have tried:
obj = response.last
#region = obj['region']
followed by:
<%= #region %>
which produces an error message to say that object or method does not exist, or just nothing.
My ruby code is:
get '/search' do
erb :search
end
and
get '/search/data' do
country_iso = params[:country_iso]
region = params[:region]
prefix = params[:prefix]
p = RestAPI.new(AUTH_ID, AUTH_TOKEN)
params = {'country_iso' => country_iso, 'region' => region, 'prefix' => prefix}
#warn params.inspect
response = p.get_number_group(params)
pp response
obj = response.last
#region = obj['region']
#prefix = obj['prefix']
erb :search
end
The terminal shows me that the params are passed correctly from the ajax call and the desired number search takes place, I just cannot print the json response out in nice HTML.
Should erb :search appear twice as above or just once? On the search.erb page there is a dropdown and input boxes to collect the parameters, that all works fine.
Sorry it's so long. To summarise: how can I render a json response in nice HTML and am I getting confused about my erb pages with the two ruby segments above?
I'm new to grails (1.3.7) and I've been put in a strange situation with displaying an image from the filesystem. The one.png picture is put into web-app/images/fotos directory. The zz.gsp:
<img src="${resource(dir:'images/fotos', file:'one.png')}" alt="Nothing" />
related to the void action def zz = {} works fine. But if I intend to display the same picture in rawRenderImage.gsp:
<body>
<p>
${fdir} <br/> ${fname} <!-- OK -->
</p>
<g:if test="${fname}">
<img src="${resource(dir:'fdir',file: 'fname')}" alt ="Nothing"/>
</g:if>
</body>
the picture doesn't appear inspite of the parameters fdir and fname pass to the page. The action in the controller is:
def rawRenderImage = {
// def basePath = servletContext.getRealPath("/")
// def basePath = grailsAttributes.getApplicationContext().getResource("/").getFile().toString() + "/images/fotos/"
// def basePath = grailsAttributes.getApplicationContext().getResource("/").getFile().toString()
// def fname = params.photoId + ".png"
// def fname = "/images/fotos/" + params.photoId + ".png"
basePath = "images/fotos" // or basePath=”images” for /images/one.png
fname=”one.png”
[fdir:basePath, fname:fname]
}
Even direct assigns basePath=”images/fotos” and fname=”one.png” don't work, as well as any combinations with basePath to obtain the absolute path. Even the case when I put the picture in images directory doesn't work. I use netbeans, but it also doesn't work in console mode.
Help please.
When passing in your filename and directory as variables in the model, don't quote them in your tag's src attribute. Then the Groovy ${} evaluation will evaluate to the variables and not as Strings.
<g:if test="${fname}">
<img src="${resource(dir:fdir,file:fname)}" alt ="Something"/>
</g:if>
I wrote some codes.
I could save image in BobProperty.
But I cannot load image into HTML page...
source code:
class Product(db.Model):
image = db.BlobProperty()
...
class add:
productImage = self.request.get('image')
product.image = db.Blob(productImage)
product.put()
but i wrote {{product.image}} into html code. But there were like ��袀 ���� ���� ���� (����������� ��(:(������� (������� (��>̢��� (�������>������Y������K��
What should i do if i want load image from datastore?
I use an auxiliary view:
def serve_image(request, image):
if image == "None":
image = ""
response = HttpResponse(image)
response['Content-Type'] = "image/png"
response['Cache-Control'] = "max-age=7200"
return response
and in the model:
def get_image_path(self):
# This returns the url of serve_image, with the argument of image's pk.
# Something like /main/serve_image/1231234dfg22; this url will return a
# response image with the blob
return reverse("main.views.serve_image", args=[str(self.pk)])
and just use {{ model.get_image_path }} instead.
(this is django-nonrel, but I guess you could figure out what it does)
Also, there is a post here about this; you should check it out.