Splitting XPATH produces more results than the actual possible - xpath

I have been trying to gather some historical data of managers of football clubs and noticed a weird behaviour. I am trying to scrape the history table of the clubs managed by a manager from this website : https://www.transfermarkt.co.in/carlo-ancelotti/profil/trainer/523
With the entire xpath as a single input to fetch the response, the code works alright as expected
clubs = response.xpath("//div[#id='yw1']//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)
Output : ['Everton', 'SSC Napoli', 'Bayern Munich ', 'Real Madrid', 'Paris SG',\
'Chelsea', 'Milan', 'Juventus', 'AC Parma', 'Reggiana', 'Italy']
That's the list of clubs from the foretold history table. However, while the xpath is split as shown in the following code, it fetches names of clubs from the other table too in spite of it having a totally different div id. I mean it's not 'yw1' for the other table
career_table = response.xpath("//div[#id='yw1']")
clubs = career_table.xpath("//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)
Output : ['Everton', 'SSC Napoli', 'Bayern Munich ', 'Real Madrid', 'Paris SG',\
'Chelsea', 'Milan', 'Juventus', 'AC Parma', 'Reggiana', 'Italy', 'Milan', 'Retired',\
'AS Roma', 'Milan', 'AC Parma', 'AS Roma', 'Parma U19', 'AC Parma', 'Reggiolo', 'Parma U19']
Can someone enlighten me, what is that I'm missing here?

You need to use relative XPath (starting .):
clubs = career_table.xpath(".//td[#class='hauptlink no-border-links']//a/text()").extract()
print(clubs)

Related

XPath to return value only elements containing the text

would like to return value of 'Earnings per share' (i.e. -7.3009, -7.1454, -19.6295, -1.6316)
from "http://www.aastocks.com/en/stocks/analysis/company-fundamental/earnings-summary?symbol=01801"
using below as a example for '-7.3009'
=importxml("http://www.aastocks.com/en/stocks/analysis/company-fundamental/earnings-summary?symbol=01801", "//tr/td[contains(text(),'Earnings')]/td[2]")
However, it returns #N/A.
Can someone help?
this xpath will return your specific data
id("cnhk-list")//tr[td[contains(., "Earnings Per Share")]]/td[starts-with(#class, "cfvalue")]//text()
xpath explanation in english is " you actually needs to select the td where row contains Earnings Per Share which is in table that has some specific ID

How do I improve this Stored Procedure?

I have a question:
Assuming an assembly line where a bike goes through some tests, and then the devices send the information regarding the test to
our database (in oracle). I created this stored procedure; it works correctly for what I want, which is:
It gets a list of the first test (per type of test) that a bike has gone through. For instance, if a bike had 2 tests of the same type, it only
shows the first one, AND it shows it only when that first test is between the dates specified by the user. Also I look from 2 months back
because a bike cannot spend more than 2 months (I'm probably overestimating) at the assembly line, but if the user searches 2 days for instance, and I only look in between those days, I could let outside of my results a test made over a bike 3 days ago or maybe 4, and it get's worst if they search between hours.
As I said before, the sp works just fine, but I'm wondering if there's a way to optimize it.
Also consider that the table has around 7 millions of records by the end of the year, so I cannot query the whole year because it could get ugly.
Here's the main part of the stored procedure:
SELECT pid AS "bike_id",
TYPE AS "type",
stationnr AS "stationnr",
testtime AS "testtime",
rel2.releasenr AS "releasenr",
placedesc AS description,
tv.recordtime AS "recordtime",
To_char(tv.testtime, 'YYYY.MM.DD') AS "dategroup",
testcounts AS "testcounts",
tv.result AS "result",
progressive AS "PROGRESIVO"
FROM (SELECT l_bike_id AS pid,
l_testcounts AS testcounts,
To_char(l_testtime, 'yyyy-MM-dd hh24:mi:ss') AS testtimes,
testtime,
pl.code AS place,
t2.recordtime,
t2.releaseid,
t2.testresid,
t2.stationnr,
t2.result,
v.TYPE,
v.progressive,
v.prs,
pl.description AS placeDesc
FROM (SELECT v.bike_id AS l_bike_id,
v.TYPE AS l_type,
Min(t.testtime) AS l_testtime,
Count(t.testtime) AS l_testcounts
FROM result_test t
inner join bikes v
ON v.bike_id = t.pid
inner join result_release rel
ON t.releaseid = rel.releaseid
inner join resultconfig.places p
ON p.place = t.place
WHERE t.testtime >= Add_months(Trunc(p_startdate), -2)
GROUP BY v.bike_id,
v.TYPE,
p.code)p_bikelist
inner join result_test t2
ON p_bikelist.l_bike_id = t2.pid
AND p_bikelist.l_testtime = t2.testtime
inner join resultconfig.places pl
ON pl.place = t2.place
inner join bikes v
ON v.bike_id = t2.pid
inner join result_release rel2
ON t2.releaseid = rel2.releaseid
ORDER BY t2.pid)tv
inner join result_release rel2
ON tv.releaseid = rel2.releaseid
WHERE tv.testtime BETWEEN p_startdate AND p_enddate
ORDER BY testtime;
Thank you for answering!!
I'm struggling a bit to understand the business requirement from the English description you give. The wording suggests that this procedure is intended to work per bike but I don't see any obvious bike_id parameters being supplied, instead, you appear to be returning the earliest result for all bikes tested between given dates. Is that the aim? If it is designed to be run per bike, then ensure bike id gets passed in and used early :)
There is some confusion about your data types. You convert testtime in result_test (presumably a DATE or TIMESTAMP column ) into a string in the p_bikelist subquery but then compare back to the original value in the tv subquery. You further use (presumably typed parameters) p_startdate and p_enddate to filter results. I strongly suspect the conversion in p_bikelist to be unnecessary, and possibly a cause for index avoidance.
Finally, I don't get the add_months logic. By all means, extend the window back in time to get tests that finished within the window but started up to 2 months before the start date, but as written you will exclude the earlier starts anyway because of the condition on tv.testtime. Most likely you'd be better off fudging the startdate earlier in the stored procedure with code like
l_assumedstart := add_months(p_startdate, -2);
and then using l_assumedstart in the query itself.

Oracle: getting non unique duplicates with group by ... having count

I'm trying to build a query that shows only non-unique duplicates. I've already built a query that shows all the records coming into consideration:
SELECT tbl_tm.title, lp_index.starttime, musicsound.archnr
FROM tbl_tm
INNER JOIN musicsound on tbl_tm.fk_tbl_tm_musicsound = musicsound.pk_musicsound
INNER JOIN lp_index ON musicsound.pk_musicsound = lp_index.fk_index_musicsound
INNER JOIN plan ON lp_index.fk_index_plan = plan.pk_plan
WHERE tbl_tm.FK_tbl_tm_title_type_music = '22' AND plan.airdate
BETWEEN to_date ('15-01-13') AND to_date('17-01-13')
GROUP BY tbl_tm.title, lp_index.starttime, musicsound.archnr
HAVING COUNT (tbl_tm.title) > 0;
The corresponding result set looks like this:
title starttime archnrr
============================================
Pumped up kicks 05:05:37 0616866
People Help The People 05:09:13 0620176
I can't dance 05:12:43 0600109
Locked Out Of Heaven 05:36:08 0620101
China in your hand 05:41:33 0600053
Locked Out Of Heaven 08:52:50 0620101
It gives me music titles played between a certain timespan along with their starting time and archive ID.
What I want to achieve is something like this:
title starttime archnr
============================================
Locked Out Of Heaven 05:36:08 0620101
Locked Out Of Heaven 08:52:50 0620101
There would only be two columns left: both share the same title and archive number but differ in the time part. Increasing the 'HAVING COUNT' value will give me a zero-row
result set, since there aren't any entries that are exactly the same.
What I've found out so far is that the solution for this problem will most likely have a nested subquery, but I can't seem to get it done. Any help on this would be greatly appreciated.
Note: I'm on a Oracle 11g-server. My user has read-only privileges. I use SQL Developer on my workstation.
You can try something like this:
SELECT title, starttime, archnr
FROM (
SELECT title, starttime, archnr, count(*) over (partition by title) cnt
FROM (your_query))
WHERE cnt > 1
Here is a sqlfiddle demo

Facebook API: Getting photos that have a comment containing a a string

I'm implementing a pseudo hash-tagging system for the company I work at with their customer's facebook photos. A customer can upload a photo to the page, and a page admin can tag it with the hashtag for a product.
What I am trying to do is get all photos from the page that have a certain tag in the comments (for instance, get all photos from the company page with a comment containing only '#bluepants').
I am trying to make sure the Facebook API handles the heavy lifting (we'll cache the results), so I'd like to use FQL or the Graphs API, but I can't seem to get it working (my SQL is quite rusty after relying on an ORM for so long). I would prefer if it outputs as many results as possible, but I'm not sure if FB lets you do more than 25 at once.
This is going to be implemented in a sinatra site (I am currently playing around with the Koala gem, so bonus points if I can query using it)
Could anyone give me some guidance?
Thanks!
I've got something like this working in FQL/PHP. Here is my multiquery.
{'activity':
"SELECT post_id, created_time FROM stream WHERE source_id = PAGE_ID AND
attachment.fb_object_type = 'photo' AND created_time > 1338834720
AND comments.count > 0 LIMIT 0, 500",
'commented':
"SELECT post_id, text, fromid FROM comment WHERE post_id IN
(SELECT post_id FROM #activity) AND AND (strpos(upper(text), '#HASHTAG') >= 0",
'accepted':
"SELECT post_id, actor_id, message, attachment, place, created_time, likes
FROM stream WHERE post_id IN (SELECT post_id FROM #commented)
ORDER BY likes.count DESC",
'images':
"SELECT pid, src, src_big, src_small, src_width, src_height FROM photo
WHERE pid IN (SELECT attachment.media.photo.pid FROM #accepted)",
'users':
"SELECT name, uid, current_location, locale FROM user WHERE uid IN
(SELECT actor_id FROM #accepted)",
'pages':
"SELECT name, page_id FROM page WHERE page_id IN (SELECT actor_id FROM #accepted)",
'places':
"SELECT name, page_id, description, display_subtext, latitude, longitude
FROM place WHERE page_id IN (SELECT place FROM #accepted)"
}
To break this down:
#activity gets all stream objects created after the start date of my campaign that are photos and have a non-zero comment count. Using a LIMIT of 500 seems to return the maximum number of posts. Higher or lower values return fewer.
#commented finds the posts that have #HASHTAG in the text of one of their comments. Note, I'm not looking for a #, which is a reserved character in FQL. Using it may cause you problems.
#accepted gets the full details of the posts found in #commented.
#images gets all the details of the images in those posts. I have it on my todos to refactor this to use object_id instead of pid and try using the new real_width specification to make my layout easier.
#users and #pages get the details of the actor who originally posted the item. I now know I could have used the profile table to get this in one query.
#places gets the location details for geo-tagged posts.
You can see this in action here: http://getwellgabby.org/show-us-a-sign

VS 2010 reporting services grouping

I want to load the list of the groups as well as data into two separate datatables (or one, but I don't see that possible). Then I want to apply the grouping like this:
Groups
A
B
Bar
C
Car
Data
Ale
Beer
Bartender
Barry
Coal
Calm
Carbon
The final result after grouping should be like this.
*A
Ale
*B
*Bar
Bartender
Barry
Beer
*C
Calm
*Car
Carbon
Coal
I only have a grouping list, not the levels or anything else. And the items falling under the certain group are the ones that do start with the same letters as a group's name. The indentation is not a must. Hopefully my example clarifies what I need, but am not able to name thus I am unable to find anything similar on google.
The key things here are:
1. Grouping by a provided list of groups
2. There can be unlimited layers of grouping
Since every record has it's children, the query should also take a father for each record. Then there is a nice trick in advanced grouping tab. Choosing a father's column yields as many higher level groups as needed recursively. I learnt about that in http://blogs.microsoft.co.il/blogs/barbaro/archive/2008/12/01/creating-sum-for-a-group-with-recursion-in-ssrs.aspx
I suggest reporting from a query like this:
select gtop.category top_category,
gsub.category sub_category,
dtab.category data_category
from groupTable gtop
join groupTable gsub on gsub.category like gtop.category + '%'
left join dataTable dtab on dtab.category like gsub.category + '%'
where len(gtop.category) = 1 and
not exists
(select null
from groupTable gchk
where gsub.category = gtop.category and
gchk.category like gsub.category + '%' and
gchk.category <> gsub.category and
dtab.category like gchk.category + '%')
- with report groups on top_category and sub_category, and headings for both groups. You will probably want to hide the sub_category heading row when sub_category = top_category.

Resources