Extracting price from script id tag using xpath - xpath

I am using content egg, and need to extract the price from a script id on a website, using XPath.
<script id="test-script" class="">var ecomm_event_view_item = {"event":"view_item","event_data":{"items":[{"id":"PARENT","name":" telescopic pole pruner (12\" bar & chain)","price":688.94,"brand":"Stihl","category":"Groundcare & Landscaping\/Gardening Machinery\/Chainsaws\/Pole Saws"}]}};
</script>
So far using:
//*[#id="test-script"]/text()
I need to extract the price only.

Related

how to scrape popups on the website with scrapy

I want to scrape the name, age and gender of the reviews on boots.com. For age and gender you can only see this data once you hover the mouse on the name in each review. First of all my I made the code for scraping the name but its not working. Second of all I don't know how to scrape age and gender from the pop up. Could you help me please. Thanks in advance.
Link:https://www.boots.com/clearasil-ultra-rapid-action-treatment-cream-25ml-10084703
Screenshot of popup
import scrapy
from ..items import BootsItem
from scrapy.loader import ItemLoader
class bootsSpider(scrapy.Spider):
name = 'boots'
start_urls = ['https://www.boots.com/clearasil-ultra-rapid-action-treatment-cream-25ml-10084703']
allowed_domains = ["boots.com"]
def parse(self, response):
reviews = response.xpath("//div[#class='bv-content-item-avatar-offset bv-content-item-avatar-offset-off']")
for review in reviews:
loader = ItemLoader(item=BootsItem(), selector=review, response=response)
loader.add_xpath("name", ".//div[#class='bv-content-reference-data bv-content-author-name']/span/text()")
yield loader.load_item()
Javascript is used to display the data (78 reviews in your case). You should use Selenium to scrape this. To display all the comments, you'll have to click multiple times on the following button :
//button[contains(#class,"load-more")]
Then, to scrape the name of all consumers you can use the following XPath (then use .text method to extract the data) :
//li//div[#class="bv-content-header-meta"][./span[#class="bv-content-rating bv-rating-ratio"]]//span[#class="bv-author"]/*/span
Output : 78 nodes
If you want to scrape the text reviews you can use :
//li//div[#class="bv-content-header-meta"][./span[#class="bv-content-rating bv-rating-ratio"]]/following::p[1]
Output : 78 nodes
To get the age and the gender of each consumer, you'll have to mouse over their names (see the preceding XPath) then fetch the value with the following XPath :
//span[#class="bv-author-userinfo-value"][preceding-sibling::span[#class="bv-author-userinfo-data"][.="Age"]]
//span[#class="bv-author-userinfo-value"][preceding-sibling::span[#class="bv-author-userinfo-data"][.="Gender"]]
Alternatively, if you don't want to/can't use Selenium, you can download the JSON (see the XHR requests in your browser) which contains everything you need.
https://api.bazaarvoice.com/data/batch.json?passkey=324y3dv5t1xqv8kal1wzrvxig&apiversion=5.5&displaycode=2111-en_gb&resource.q0=reviews&filter.q0=isratingsonly:eq:false&filter.q0=productid:eq:868029&filter.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&sort.q0=submissiontime:desc&stats.q0=reviews&filteredstats.q0=reviews&include.q0=authors,products,comments&filter_reviews.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&filter_reviewcomments.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&filter_comments.q0=contentlocale:eq:en_EU,en_GB,en_IE,en_US,en_CA&limit.q0=100&offset.q0=0&limit_comments.q0=3&callback=bv_1111_50671
For this case, I set the &limit.q0= to 100 and offset.q0 to 0 to be sure to fetch all the data. Once you get the JSON, you'll find all the information in : Batched Results>q0>Results>0,1,2,3,...,78
Output :
To download the JSON userequest and extract the data with json module.

XPath to track if a product is: in stock/its price

I am a complete beginner who knows a little bit of HTML and Java so my question might sound very dumb.
I'm basically trying to use Google Spreadsheets in order to track the availability of an item/its price on this website. I'm using the "IMPORTXML" function and have no trouble getting the title of the product or its description. However I cannot get the price as it needs me to select a size first, which I don't know how to do through the "IMPORTXML" function.
Right now, this returns "Imported content is empty.":
=IMPORTXML("https://www.artisan-jp.com/fx-hien-eng.html","//p[#id='price']")
Would creating a function through Google Script work? If so, how do I do it?
Thank you!
You won't be able to do fetch any data with IMPORTXML since Javascript is used to display the price. With IMPORTFROMWEB addon, you can activate JS rendering but you'll only get the price of the default product.
It's probably better to use Selenium + Python (or any other language) to achieve your goal. That way you'll be able to click and select a specific product.(size, color, hardness)
If you really want to do this with a Google solution, you'll have to write your own custom function in Google Apps Script (send a POST request over a specific url : https://www.artisan-jp.com/get_syouhin.php). Something like :
function myFunction() {
var formData = {
'kuni': 'on',
'sir': '140',
'size': '1',
'color': '1',
};
var options = {
'method' : 'post',
'payload' : formData
};
Logger.log(UrlFetchApp.fetch('https://www.artisan-jp.com/get_syouhin.php', options).getContentText());
}
In the first part (formData), your declare the parameters of the POST. These parameters correspond to the properties of the product.
Sir :
XSoft = 140
Soft = 141
Mid = 142
Size :
S = 1
M = 2
L = 3
XL = 4
Color :
Red = 1
Black = 5
Output :
You'll get the reference number, the description of the product and its price.
When the product is not in stock, there's a preceding NON in the output.
It's up to you now to extract the data of interest from the output and to populate the cells of your workbook.
Assuming your function is named "mouse". Just use SPLIT to display the data properly.
=SPLIT(mouse();"/")
To extract the price only, you can use SPLIT then QUERY. SUBSTITUTE is used to coerce the result to a number.
=SUBSTITUTE(QUERY(SPLIT(mouse();"/");"select Col4");".";",")*1

Magento : how to set canonical URL for product?

I'm using megento 1.7.0.2, some products have 3 or 4 different urls :
example.com/category1/product1.html
example.com/category1/category2/product1.html
example.com/prodcut1.html
...
In the HTML code of Prodcut1, magento is adding this tag :
<link rel="canonical" href="http://example.com/product1.html">
How can choose another URL ? e.g :
<link rel="canonical" href="http://example.com/category1/category2/product1.html">
Unfortunately it can't be done with the default Magento functionality.
First, you need to decide which criteria to take into account when defining the canonical URL.
Our Magento SEO extension will let you choose:
The longest URLs (the ones that contain the biggest # of characters)
URLs with the maximum category depth level
Non-root URLs with the minimum category depth level (and at least Non-root URLs with the
Non-rootURLs with the minimum # of characters (and at least one category)
These URLs can also be added to an HTML and XML sitemaps.
On top of that, you can individually select any URL for any chosen product or enter a custom canonical.
You can also try it on your own, using the collection of re-writes. For the community edition it looks like this:
$collection = Mage::getResourceModel('core/url_rewrite_collection');
$collection->getSelect()->where('product_id = ? AND category_id is not null AND is_system = 1', $productId, Zend_Db::INT_TYPE); <- с наличием категории
$collection->addStoreFilter(Mage::app()->getStore()->getId());
$collection->getSelect()->order(new Zend_Db_Expr('LENGTH(request_path) ' . 'DESC')); <- sorting that depends on the principle of canonical settings.
$rewriteModel = $collection->getFirstItem();
var_dump($rewriteModel); <-damp of the acquired object
Then you need to connect base store URL + "request_path" property form object + URL suffix (if needed).

Google spreadsheet xpath only selects arrays instead of single values

So I am trying to use this bit of code to only select a single show from a show database, I only want the latest selected show, but google spreadsheet keeps returning an array of the lastest show from every season.
=importXML("http://services.tvrage.com/feeds/episode_list.php?sid="&B2, "//episode[number(translate(airdate, '-', '')) < "&I2&"][last()]")
B2 is the show id = 11215 and I2 is the today's date in iso style format = 20130626 this date is acquired from google spreadsheet with the command =TEXT( TODAY() ; "yyyyMMdd" )
So can anyone help me get just the latest show for the current season?
Put parentheses around the first path of your XPath expression:
(//episode[number(translate(airdate, '-', '')) < "&I2&"])[last()]

Magento display a 3rd price on the product page

I would like to add a third price to my products like "MSRP" on the latest version but not available on mine (Magento 1.4.1.1).
I would like to display it on the product page and it has to change according to the selected option.
So, I started by creating a new attribute called "msrp".
I managed to get it for every child product on my view.phtml file with the following code:
<?php
if($_product->isConfigurable())
{
$_associatedProducts = $_product->getTypeInstance()->getUsedProducts();
foreach($_associatedProducts as $assProducts)
{
$msrp = $assProducts->getData("msrp");
echo "MSRP: ".$msrp."<br />" ;
}
}
?>
Now the question is how to show it only one at a time and which corresponds to the selected option?
(Just like the normal price changes when we select an option.)
Maybe with a piece of javascript in this file?
/app/design/frontend/[...]/[...]/template/catalog/product/view/type/options/configurable.phtml
Thanks for your help !

Resources