How to webscrape deeply nested XML?

How to webscrape deeply nested XML? - xpath

I'm trying to scrape XML data into Google Sheets using the importxml function. I previously tried using the importhtml function, but the "list" and "table" query you need to use doesn't work.
I can't seem to work out the correct way to scrape the "Price (GBX)" value in a simple Market Data webpage, such as https://markets.ft.com/data/equities/tearsheet/summary?s=LSE:LSE
I have tried to use search by div, span and class name but am not getting anywhere.
I have also tried to copy the XPath /html/body/div[3]/div[2]/section[1]/div/div/div[1]/div[3]/ul/li[1]/span[2]
but this doesn't seem to work
Is it possible to retrieve data that is very nested through XML scraping in Google Sheets?

Try using
=IMPORTXML(A1,"//ul/li/span[#class='mod-ui-data-list__label'][contains(text(),'GBX')]/following-sibling::span")
Output:
7576.00

Related

Get synonyms of a word from Thesaurus using Google Sheets importxml function

I am trying to make a dictionary that includes one synonym for each word on Google Sheets. I wish that I can get only first synonym on the Thesaurus search result page, by copying XPath and pasting it to the importxml function. Could anybody help me getting the entire function that I can apply to all of my words? (hundreds of words so I need automation).

You can use something like this (first we concat to generate the url, then we pass the XPath expression :
=IMPORTXML("https://www.thesaurus.com/browse/"&B2;"(//h2)[1]/following::li[1]//text()")
Output :
EDIT : To get 3 synonyms in one cell, use :
=TEXTJOIN(";";VRAI;IMPORTXML("https://www.thesaurus.com/browse/"&B2;"(//h2)[1]/following::li[position()<4]//text()"))
Output :

Need to Get data for an index for a timeframe using curl or using url

I need to Get data for an index for a timeframe using curl or using url.
I was able to get the data using below created url
http://localhost:9200/index_name/_search?size=10&q="* AND severity:major|critical"
but I am not sure where to provide the timeframe for example i only want data from last 15minutes.
Can anyone help me with a way it can be done

You can do it like this by adding #timestamp:[now-15m TO now] to your query:
http://localhost:9200/index_name/_search?size=10&q="* AND severity:major|critical AND #timestamp:[now-15m TO now]"

Using custom fields in freemarker

I am trying to print custom field in PDF template, got from a saved search with the naming syntax Bank Name(Custom). How to use this name in the freemarker pdf template ? I am not able to fetch the value using any of the below : subs.bankname where subs is the result of the saved search.

A NetSuite custom field is prefixed with "cust" (ie. custrecord, custentity, custbody, etc), so you need to find out the correct field's id in order to display it using the Freemarker syntax. Also, as "subs" is the result of a Saved Search, you might need to interact with all rows.
In your case it would be something like (to display the first row):
${subs[0].cust_bankname}
Or the following to interact with all rows:
<#list subs as sub>
${sub.cust_bankname}
</#list>
I hope it helps.

Official reference for Google Spreadsheet Api Structured Query syntax

I'm looking for the official reference for the query syntax used for creating structured queries for requesting rows in the Google Spreadsheet API, as discussed here
The only example given is:
GET https://spreadsheets.google.com/feeds/list/key/worksheetId/private/full?sq=age>25%20and%20height<175
There must be some references of the query syntax used somewhere?
In particular I want to know how to query for all rows (containing column a and b) for which val(a) != val(b)

Can't get the result with yql and xpath

I would like to fetch part of web page with yql. I have tried several queries. Most of the queries can return the correct result except one query.
Here is the query:
select * from html where url="http://www.cngold.org/img_date/livesilvercn_rmb.html" and xpath='//div[6]/div[2]/div/div[2]/table/tbody/tr[4]/td[6]'
I hope to get the price but actually get the empty result.
If I retrieve the whole page with yql and check the xpath of that element, this time the xpath is
//div[3]/div/div[2]/a/div/div[2]/table/tbody/tr[4]/td[6]
Why there are so many differences?
How should I handle the situation?
Thanks in advance.

YQL cannot get values that are computed dynamically. In that case, you are better off using phantom.js.
This answer https://stackoverflow.com/a/7978072/1337392 provides several tools with which you can do HTML scrapping.
Hope it helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to webscrape deeply nested XML? - xpath

Try using =IMPORTXML(A1,"//ul/li/span[#class='mod-ui-data-list__label'][contains(text(),'GBX')]/following-sibling::span") Output: 7576.00

Related

Get synonyms of a word from Thesaurus using Google Sheets importxml function

Need to Get data for an index for a timeframe using curl or using url

Using custom fields in freemarker

Official reference for Google Spreadsheet Api Structured Query syntax

Can't get the result with yql and xpath

Categories

Resources