I am doing a research about how mobile phones evolved over years so I need to create a database with specifications of as many phones is possible. I am trying to scrap data from GSM Arena website.
Example page: http://www.gsmarena.com/samsung_galaxy_note7-8082.php
I am using XPATH that contains the label that precedes each value, example //tr[contains (.,"Sensors")]/td[2]
But there are some values, last one in category, with no preceding label.
How do I pick this info:
Non-removable Li-Po 3500 mAh battery
or this ino:
Fast battery charging
Qi wireless charging (market dependent)
ANT+ support
S-Voice natural language commands and dictation
MP4/DivX/XviD/WMV/H.265 player
MP3/WAV/WMA/eAAC+/FLAC player
Photo/video editor
Document editor
Do note that different phones have different number of rows on page, so using [number] in XPATH would pick different info from
http://www.gsmarena.com/samsung_galaxy_note7-8082.php - need to pick 5th row of features
http://www.gsmarena.com/samsung_sgh_600-49.php - need to pick 8th row of features
To select rows without label in Battery section you have to use this xpath:
//tbody[.//th[contains(.,'Battery')]]//td[#class="ttl" and not(*)]/following-sibling::td
To select info from features use this
//tbody[.//th[contains(.,'Features')]]//td[#class="ttl" and not(*)]/following-sibling::td
To select Camera features
//tbody[.//th[contains(.,'Camera')]]//td[#class="ttl" and contains(.,'Features')]/following-sibling::td
To select Loudspeaker in Sound Category
//tbody[.//th[contains(.,'Sound')]]//td[#class="ttl" and contains(.,'Loudspeaker')]/following-sibling::td
Related
I'm trying to extract information from Wikipedia tables.
More specifically, I'm trying to make a list of all teams and all players in the premier league.
Until now I'm able to traverse over the whole teams in the premier league 2019-2020 table of teams, for every team there I get in it Wikipedia page and traverse over its player's getting their information.
I thought there is a fixed template that all premier league teams in Wikipedia have their table of players at position 3 but after traversing 6 teams it faced a team that it's table is in 2nd place.
So I was using the following XPath query on every team wiki page
"//table[3]/tbody//tr[position() > 1]//td[4]//span/a/#href"
but for example, the following team players table is at position 2, how can I make this query more generic and not fix it a certain position? I have noticed that all of my relevant tables have an element before it with the text "First-team squad"
The HTML of the table is too long, so I post here the wiki link of a certain team
https://en.wikipedia.org/wiki/Crystal_Palace_F.C.
Hope to get help! thanks.
You have to use another "anchor" which works for each page. The table you need is always the first after the span element "Players".
So with this :
//span[#id='Players']/following::table[1]//span[#class="fn"]//text()
You'll get the names of all players of the current squad team.
With this :
//span[#id='Players']/following::table[1]//span[#class="fn"]//#href
You'll get the associated URLs. /!\ Some players don't have a wikipedia webpage.
So you can have 26 player names but 25 urls. Like here :
https://en.wikipedia.org/wiki/Chelsea_F.C.
I am trying to import the Sector and Industry tags into a google doc.
I am using...
=IMPORTXML("https://finance.yahoo.com/quote/ABT/profile?p=ABT","//span[#data-reactid='21']")
Shows in three different cells:
Profile
Healthcare
About Our Ads
But all I want is Healthcare in one cell
=IMPORTXML("https://finance.yahoo.com/quote/ABT/profile?p=ABT","//span[#data-reactid='25']")
Shows in three different cells:
Financials
Medical Devices
Sitemap
But all I want is Medical Devices in one cell
What is wrong with my syntax?
I think you are getting two other spans as well - thus three rows instead of one.
You might want to use a more precise xpath, like those:
=IMPORTXML(
"https://finance.yahoo.com/quote/ABT/profile?p=ABT",
"//p/span[text()='Sector']/following-sibling::span[1]/text()")
=IMPORTXML(
"https://finance.yahoo.com/quote/ABT/profile?p=ABT",
"//p/span[text()='Industry']/following-sibling::span[1]/text()")
I want to analyse the sales of a certain company in Power Bi. I have a customer dataset with nine columns (gender, city, age range, hair colour etc.) and one million records. Now I want to put those columns in a matrix. For instance:
Rows: Gender
Columns: Age Range (<16, 17-20, 21-25 etc.)
Values: Number of Sales
I present this dashboard towards some people and I want to 'play' with the data. What happens if I change the rows to 'hair colour' for instance. Is there a way to do this without using bookmarks? In one sentence: swapping rows and columns of a matrix while you present the dashboard and cannot use the option 'Fields'? Or at least point me in the right direction? It would really help me. Thanks in advance!
Unfortunately, the quickest way to do this is using the Fields pane. The only other option that is available would be Bookmarks, but I guess you have already tried that. I guess you are an option similar to the one available in "Pivot Charts" where you can "Switch Rows/Columns" with the click of a button. That option is not available in Power BI at this point, as far as I know.
I'm new to Tableau. I'm using Tableau Desktop Professional 10.0.15. I need to write a very simple report that does not use any visualization.
Here's a example of the layout (the numbers are made up):
Web Site 1 North America Europe
Total Hits 3,523,483 3,523,483
Sessions 1,248,234 1,248,234
Unique Visitors 1,809,392 1,809,392
New Visitors 383,932 383,932
% new 10.9% 10.9%
Avg Page Views per user 1.9 1.9
Web Site 2 North America Europe
Total Hits 3,523,483 3,523,483
Sessions 1,248,234 1,248,234
Unique Visitors 1,809,392 1,809,392
New Visitors 383,932 383,932
% new 10.9% 10.9%
Avg Page Views per user 1.9 1.9
The users want the measures to be in one column, but they're not the same measures. Some measures need to be formatted as percentages. The average should have 1 decimal place. I have a feeling it's not possible to format the same measure differently in Tableau. Ideally, there would be something like a banded report where I could stack the measures on top of each other. But, I don't see a way to do that in Tableau. I could create a table in my database and put the measures in the same field and add the formatting in the database (which feels wrong), but it would have to be text (to have '%'). But, Tableau won't treat a text field as a measure. Also, it seems like if you don't add a measure, Tableau will insert a fake measure and put 'Abc' as the value (at least, I think that's why I'm getting these 'Abc' columns in my reports that I didn't add and that aren't in my data).
It seems like Tableau wants you to do something like this:
Unfortunately, this is not what my users want. Any suggestions?
In Tableau it's possible to put many measures in a single column using the measure names in rows.
To do this, add to filters shelf the dimension called Measures Names (the last one). Select all the Measures you'd like to show (Hits, Sessions, Visitors, Views, etc.). Then, drag this dimension to the rows shelft. Next, drag the measure called Measures Values (the last one too) to the marks shelf, specifically in the text box. You can also add other dimension to rows shelft (like website) to the left of Measures Names. This will show a table similar to your requirement.
By default, the Measure Values are SUM of each measure. Just right click on them and select the aggregation you need (AVG, COUNT or other).
Finally, you can format each measure as you want, right click a measure value, and select Format...
In Google Spreadsheets I have a column of various dates (these are employee's start dates). I want the cells to be highlighted when today's day is within a week of these employee start dates.
I have already been playing with =(B4-TODAY())>7 but this seems to highlight all the past dates.
If this is not possible, just being able to highlight this month's dates is fine (which is easy to do in Excel but can't seem to figure out in Google Spreadsheets).
Then, once this has been done, I have another column with a drop box selection with DONE, and, PENDING.
I would like to conditionally format it so that when DONE is clicked, the highlighted start dates in this month (or 7 days before the day) are highlighted in a different colour.
So it can easily be seen that in 1 week employees are coming, and when done is clicked, we can see their administrative stuff has been dealt with.
Please try =B1="DONE" for the alternative colour and for the +/-7 days:
=and(A1<today()+7,A1>today()-7)
in that order.
=and(…) is used in one of the formulae because the relevant condition is for a bounded range. When I enter =today() in Google Spreadsheets and change that cell’s format to Number I see 41,845.00. Since one week either side makes up the ‘band’ to which attention is to be drawn the relevant values for CF are everything from and including 41,838 to 41,852.
But for display purposes I switch to one day either side, rather than one week, and leave off 41840 throughout, so today becomes represented by 5, and the reduced range of interest therefore 4 to 6 (both inclusive). Of all the possibilities, any value up to and including 3, and 7 or greater, is to be ignored for CF:
The range of interest is everything less than 7 (green) that is also more than 3 (blue):
For “that is also” Google prefers and. In case of any remaining uncertainty creating your own example with a week either side of 41845 etc may help.