Import XML google spreadsheet is not working Properly - xpath

find attached link for my Google spreadsheet in which i tried to pull out the data from booking.com.
i retrieved Hotel name in column. it successfully retrieved the hotel name. but not work with every column. Output is like this.
even i referenced the formula correctly. but it automatically changed to like this..=CONTINUE(E1, 2, 1)
red highlighted cell is having problem. look there.
Please find below link for your reference. Thanks in Advance.
https://docs.google.com/spreadsheet/ccc?key=0AtjX3Ttg50IydHJzR1FMSkJyY0U1bHBSeWtrMDRSZ1E

The formula in E1:
=ImportXML(D1,"//div[#id = 'wrap-hotelpage-top']/h1[#class='item']/span")
is returning a 2 x 2 array. To return just the top left-cell and avoid population of the CONTINUE functions (so that you can fill down successfully), use
=INDEX(ImportXML(D1,"//div[#id = 'wrap-hotelpage-top']/h1[#class='item']/span");1;1)

Related

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

Extract substring using importxml and substring-after

Using Google sheet 'ImportXML', I was able to extract the following data from a url(in cell A2) using:
=IMPORTXML(A2,"//a/#href[substring-after(., 'AGX:')]").
Data:
/vector/AGX:5WH
/vector/AGX:Z74
/vector/AGX:C52
/vector/AGX:A27
/vector/AGX:C6L
But, I want to extract the code after "/vector/AGX:". The code is not fixed to 3 letters and number of rows is not fixed as well.
I used =INDEX(SPLIT(AP2,"/,'vector',':'"),1,2). But it applied to only one line of data. Had to copy the index+split function to the whole column and had to insert an additional column to store the codes.
5WH
Z74
C52
A27
C6L
But, I want to be able to extract the code(s) after AGX: using ImportXML in one go. Is there a way?
Solution
Your issue is in how you are implementing the index formula. The first parameter returns the rows (in your case each element) and the second the column (in your case either AGX or the code after that).
If instead of getting a single cell we apply this formula on a range and we do not set any value for the row, the formula will return all the values achieving what you were aiming for. Here is its implementation (where F1:F5 will be the range of values you want this formula to be applied) :
=INDEX(SPLIT(F1:F5,"/,'vector',':'"),,2)
If you are interested in a solution simply using IMPORTXML and XPATH, according to the documentation you could use a substring as follows:
=IMPORTXML(A1,"//a/#href[substring-after(.,'SGX:')]")
The drawback of this is that it will return the full string and not exclusively what is after the SGX: which means that you would need to use a Google sheet formula to splitting this. This is the furthest I have achieved exclusively using XPath. In XML it would be easier to apply a forEach and really select what is after the : but I believe in sheets is more complicated if not impossible just using XPath.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Scraping all data from Reddit searches

I am using PRAW to scrape data off of reddit. I am using the .search method to search very specific people. I can easily print the title of the submission if the keyword is in the title, but if the keyword is in the text of the submission nothing pops up. Here is the code I have so far.
import praw
reddit = praw.Reddit(----------)
alls = reddit.subreddit("all")
for submission in alls.search("Yoa ming",sort = comment, limit = 5):
print(submission.title)
When I run this code i get
Yoa Ming next to Elephant!
Obama's Yoa Ming impression
i used to yoa ming... until i took an arrow to the knee
Could someone make a rage face out of our dearest Yoa Ming? I think it would compliment his first one so well!!!
If you search Yoa Ming on reddit, there are posts that dont contain "Yoa Ming" in the title but "Yoa Ming" in the text and those are the posts I want.
Thanks.
You might need to update the version of PRAW you are using. Using v6.3.1 yields the expected outcome and includes submissions that have the keyword in the body and not the title.
Also, the sort=comment parameter should be sort='comments'. Using an invalid value for sort will not throw an error but it will fall back to the default value, which may be why you are seeing different search results between your script and the website.

Google sheets api adding multiple rows

I'm trying to figure this out and I must be overlooking something basic. (It took me WAAAY longer than it should have just to realize I hadn't added the trigger.)
When forms are submitted, if that page runs out of rows, it automatically expands. I have a reconciliation page where it is pulling the submitted data over line by line and analyzing it for discrepancies (the form collects billable time and tasks.)
So while the Form Responses 1 page will expand, I want to use a trigger on form submit to add a line to the reconciliation page and copy the formulas down. I can't seem to get the line to add though. Looking at the google page for expanding the number of rows, I'm not sure what I am doing wrong there either but I THINK I need to add more java features to my computer.
If I simply copy and paste the example into a new sheet, most of the code is black instead of the standard editor colors. Saving pops up "Missing ; before statement. (line 1, file "Code")"
Line one is simply "import com.google.gdata.client.spreadsheet.*;"
So zerothly: Whats the most basic code I can use to add that blank row?
Then first: Do I need to import a bunch of stuff to get this (adding rows) to work?
Second: If so, and I transfer ownership of the sheet to someone, do they need to do the imports also?
Third: If so, and I want to do edits on another device, will I need to do imports there too?
Fourth: The example uses Update() but I can't seem to find an Update() function in javascript or googlesheets api documentation.
This is the code I am trying and variations commented out which doesn't seem to work:
function onFormSubmit(e) {
Logger.log('form submit triggered')
var sheet = SpreadsheetApp.getActive()
var sss = sheet.getSheetByName('Reconciliation')
var col2 = sss.getRange("B:B");
var col2val = col2.getValues();
var counter = 0;
var sssrange = sss.getDataRange();
// sss.Rows = sss.getLastRow() + 1 //Object does not allow properties to be added or changed if I uncomment - this seems to match the google example line though
Logger.log(sss.getLastRow());//=8
var newsssrange = sssrange.offset(1,0); // didn't actually think this would work (since it also had the .update() part that previously didn't work for me) but came across it and was getting desparate.
// sss.setRowCount(sss.getLastRow() + 1); // TypeError: Cannot find function setRowCount in object Sheet.
Logger.log(sss.getLastRow()); //=8
// sss.Update();//TypeError: Cannot find function Update in object Sheet.
}
Sigh... I am still wondering how to add more rows but I did answer my original need of adding a single line since form submits only add a single line. So I'm going to answer it since I had done so many searches and for some reason this never came up, maybe someone will find this useful if they are having the same issue.
function onFormSubmit(e) {
var sheet = SpreadsheetApp.getActive()
var sss = sheet.getSheetByName('Reconciliation')
sss.appendRow(['']);
}
Note that this adds a single blank line. If you run it a second time it won't add a second blank line as appendRow() adds after the last line with data. If you put a string in there or something and run it over and over you will get multiple lines.
I really would like to know about adding multiple lines though also since that will come up and I still seem to be missing something, probably obvious.
Did you know that arrayformula woluld make new lines automatically.? If you paste this formula in Sheet2:
=OFFSET(Sheet1!A1,,,counta(Sheet1!A:A))
and then paste new values in Sheet1 range A:A, then new rows would be added on Sheet2.

ImportXML Xpath Query Return txt

I need to use Google Spreadsheet ImportXML return a value from this website...
http://www.e-go.com.au/calculatorAPI2?pickuppostcode=2000&pickupsuburb=SYDNEY+CITY&deliverypostcode=4000&deliverysuburb=BRISBANE&type=Carton&width=40&height=35&depth=65&weight=2&items=3
the website simply displays the below in text and code...
error=OK
eta=Overnight
price=64.69
I need to return the values after last line 'price=', being a newbee I'm struggling with xpath query (?) required to make this happens...
=importxml("url",?)
Your help is greatly appreciated.
Thank you in advance.
Regards
first of all, IMPORTXML() won't work because your webpage is not formatted correctly for XML, and google sheets doesn't like it.
All hope is not lost tho, as your output is so simple. you can simply load the whole output using IMPORTDATA() and then process within google sheets
have a look at the output of the following formulae (where the url is stored in A1)
=IMPORTDATA(A1)
=transpose(IMPORTDATA(A1))
=index(IMPORTDATA(A1),3,1) - IF there are always 3 results, and price will always be in the third one this will work
=filter(IMPORTDATA(A1),left(IMPORTDATA(A1),5)="price") - if the price can appear in any of the result lines, but always starting with "price"

Resources