How to get all the videos of a YouTube channel with the Yt gem? - ruby

I want to use the Yt gem to get all the videos of channel. I configure the gem with my YouTube Data API key.
Unfortunately when I use it it returns a maximum of ~1000 videos, even for channels having more than 1000 videos. Yt::Channel#video_count returns the correct number of videos.
channel = Yt::Channel.new id: "UCGwuxdEeCf0TIA2RbPOj-8g"
channel.video_count # => 1845
channel.videos.map(&:id).size # => 949
The Youtube API can't be set to return more than 50 items per request, so I guess Yt automatically performs several requests going through each next page of results to be able to return more than 50 results.
For some reason though it does not go through all the result pages. I don't see a way in Yt for me to control how it goes through the pages of results. In particular I could not find a way to force it to get a single page of results, access the returned value nextPageToken, in order to perform a new request with this value.
Any idea?

Looking into gem's /spec folder, you can see a test for your code.
describe 'when the channel has more than 500 videos' do
let(:id) { 'UC0v-tlzsn0QZwJnkiaUSJVQ' }
specify 'the estimated and actual number of videos can be retrieved' do
# #note: in principle, the following three counters should match, but
# in reality +video_count+ and +size+ are only approximations.
expect(channel.video_count).to be > 500
expect(channel.videos.size).to be > 500
end
end
I did some tests and what I have noticed it that: video_count is the number that is displayed on youtube next to channel's name. This value is not accurate. Not rly sure what it represents.
If you do channel.videos.size, the number is not accurate either, because the videos collection can contain some empty(?) records.
If you do channel.videos.map(&:id).size the returned value should be correct. By correct I mean it should equal to number of videos listed at:
https://www.youtube.com/channel/:channel_id/videos

Related

Youtube API - Subscriptions list returns different number of total results in set

I'm trying to get the complete list of my subscriptions. I've tried 3 methods, all of them returns different amount of subscriptions and I don't know what to do :)
1: Using Subscriptions: list with channel ID:
https://www.googleapis.com/youtube/v3/subscriptions?part=snippet&channelId=MY_CHANNEL_ID&maxResults=50&key=MY_API_KEY
"totalResults" is 942
2: Using Subscriptions: list with "mine" flag. the "totalResult" field is 991.
Where do 49 subscriptions appear from?
3: Open browser in incognite mode, go to
https://www.youtube.com/channel/MY_CHANNEL_ID
Click on "Channels" tab, scroll down to the end of the subscriptions list, open console and type something like that
document.querySelectorAll("#contents #items > *").length
I see 1039. Where do another 48 subscriptions come from?
And the 1039 seems to be the most accurace number - I have 6 subscriptions in a row and the last row has only 1 item. 173*6+1 = 1039
So the questions is - how do I get all the 1039 subscriptions by API? And why does it return wrong amount of subscriptions?
You are using Subscriptions: list and shouldn't have such kind of bugs with totalResults however maybe there is a YouTube Data API v3 endpoint bug as documented in Search: list totalResults is:
integer
The total number of results in the result set. Please note that the value is an approximation and may not represent an exact value. In addition, the maximum value is 1,000,000.
You should not use this value to create pagination links. Instead, use the nextPageToken and prevPageToken property values to determine whether to show pagination links.
So I would recommend you to enumerate all subscriptions you have with the different methods you explained and so count on your own by using nextPageToken.

PlaylistItems seems to be limited to 20 000 entires for uploads playlist. Is there any workaround?

It seems that somewhat recently uploads playlists were limited to 20 000 entries. Is there a way to get list of all videos uploaded by a channel?
For example channel UCFL1sCAksD6_7JIZwwHcwjQ has 57 849 videos when searching for it:
https://www.youtube.com/results?search_query=jtbc+entertainment.
But its uploads playlist has only 20 000 videos:
https://www.youtube.com/playlist?list=UUFL1sCAksD6_7JIZwwHcwjQ.
When querying YouTube Data API through Python, after reaching page that has 20 000th entry, nextPageToken doesn't exist.
How can I find rest of the videos?
You may try using repeatedly the Search.list API endpoint queried with the following parameters:
channelId=UCFL1sCAksD6_7JIZwwHcwjQ,
type=video,
order=date,
publishedBefore=...,
maxResults=50,
where publishedBefore is computed appropriately.
The initial publishedBefore is set to 1 second before the value of the publishedAt property of the (chronologically) last video you have obtained from PlaylistItems.list endpoint invoked with playlistId=UUFL1sCAksD6_7JIZwwHcwjQ.
Successive values of publishedBefore will be set, similarly, to 1 second before the value of the publishedAt property of the (chronologically) last video of the previous call to Search.list endpoint.
One more remark: do note that -- even if the API will allow you to go beyond the 20000 limit using the algorithm above (I don't know if it will; you have to test that yourself) -- the cost of this procedure is quite high: each Search.list endpoint call has a quota cost of 100 units (expensive indeed).

Unable to get results more than 100 results on google custom search api

I need to use Google Custom Search API https://developers.google.com/custom-search/v1/overview. From that page, it said:
For CSE users, the API provides 100 search queries per day for free.
If you need more, you may sign up for billing in the Developers
Console. Additional requests cost $5 per 1000 queries, up to 10k
queries per day.
I already sign up for billing inside the developer console. However, I still could not retrieve results more than 100. What things should I do more? https://www.googleapis.com/customsearch/v1?cx=CSE_INSTANCE&key=API_KEY&q=QUERY&start=100
{ error: { errors: [ { domain: "global", reason: "invalid", message:
"Invalid Value" } ], code: 400, message: "Invalid Value" } }
Query: Definition
https://support.google.com/customsearch/answer/1361951
Any actual user query from a Google Site Search engine, including but
not limited to search engines installed on your website using XML,
iFrame, or the Custom Search Element.
That means you would probably need to send eleven queries to get more than 100 results.
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=1
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=11
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=21
GET ...
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=81
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=91
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=101
Check every response and if error code is 400, you can stop - there is probably no need to send next (&start=previous+10) request.
Now you can merge responses and start building results page.
Google Custom Search and Google Site Search return up to 10 results
per query. If you want to display more than 10 results to the user,
you can issue multiple requests (using the start=0, start=11 ...
parameters) and display the results on a single page. In this case,
Google will consider each request as a separate query, and if you are
using Google Site Search, each query will count towards your limit.
There might be a better way to do this then I described above. (But, I'm not sure about batching API calls.)
And (finally) possible answer to your question: I made more than few tests, but I haven't had any luck with start greater than 100 (I was getting the same as you - <Response [400]>). I'm using "Browser key" from my billing-enabled project. That could mean we can't get 101st, 102nd, 103rd, etc. results with CSE API.
The API documentation says it never returns more than 100 items.
https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
start
integer (uint32 format)
The index of the first result to return. The default number of results
per page is 10, so &start=11 would start at the top of the second page
of results. Note: The JSON API will never return more than 100
results, even if more than 100 documents match the query, so setting
the sum of start + num to a number greater than 100 will produce an
error. Also note that the maximum value for num is 10.

Scraping Real Time Visitors from Google Analytics

I have a lot of sites and want to build a dashboard showing the number of real time visitors on each of them on a single page. (would anyone else want this?) Right now the only way to view this information is to open a new tab for each site.
Google doesn't have a real-time API, so I'm wondering if it is possible to scrape this data. Eduardo Cereto found out that Google transfers the real-time data over the realtime/bind network request. Anyone more savvy have an idea of how I should start? Here's what I'm thinking:
Figure out how to authenticate programmatically
Inspect all of the realtime/bind requests to see how they change. Does each request have a unique key? Where does that come from? Below is my breakdown of the request:
https://www.google.com/analytics/realtime/bind?VER=8
&key= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&ds= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&pageId=rt-standard%2Frt-overview
&q=t%3A0%7C%3A1%3A0%3A%2Ct%3A11%7C%3A1%3A5%3A%2Cot%3A0%3A0%3A4%2Cot%3A0%3A0%3A3%2Ct%3A7%7C%3A1%3A10%3A6%3D%3DREFERRAL%3B%2Ct%3A10%7C%3A1%3A10%3A%2Ct%3A18%7C%3A1%3A10%3A%2Ct%3A4%7C5%7C2%7C%3A1%3A10%3A2!%3Dzz%3B%2C&f
The q variable URI decodes to this (what the?):
t:0|:1:0:,t:11|:1:5:,ot:0:0:4,ot:0:0:3,t:7|:1:10:6==REFERRAL;,t:10|:1:10:,t:18|:1:10:,t:4|5|2|:1:10:2!=zz;,&f
&RID=rpc
&SID= [What is this? Where does it come from? 16 character uppercase alphanumeric, stays the same each request]
&CI=0
&AID= [What is this? Where does it come from? integer, starts at 1, increments weirdly to 150 and then 298]
&TYPE=xmlhttp
&zx= [What is this? Where does it come from? 12 character lowercase alphanumeric, changes each request]
&t=1
Inspect all of the realtime/bind responses to see how they change. How does the data come in? It looks like some altered JSON. How many times do I need to connect to get the data? Where is the active visitors on site number in there? Here is a dump of sample data:
19
[[151,["noop"]
]
]
388
[[152,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[49,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2,0],"name":"Total"}]}}]]]
]
388
[[153,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[52,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[2,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2],"name":"Total"}]}}]]]
]
388
[[154,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[53,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,3,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3],"name":"Total"}]}}]]]
]
Let me know if you can help with any of the items above!
To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/
This is quite similar to Google Analytics API. To start development on this,
https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide
With Google Chrome I can see the data on the Network Panel.
The request endpoint is https://www.google.com/analytics/realtime/bind
Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.
After about 2.5 minutes the connection is closed and a new one is open.
On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.
I hope that can give you a place to start.
Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.
<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX goes here
$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>
Ammendum:
A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.
Client side Javascript with AJAX for fine tuning or overkill
The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/
When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.
As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.
I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.
Instead of binding to /bind I get data from /getData (no pun intended).
At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1
Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:
Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters
Values:
pageID: Required but seems to only be used for internal analytics.
propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.
dataType:
t: Current data
ot: Overtime/Past
c: Unknown, returns only a "count" value
dimensions (| separated or alone), most values are only applicable for t:
1: Country
2: City
3: Location code?
4: Latitude
5: Longitude
6: Traffic source type (Social, Referral, etc.)
7: Source
8: ?? Returns (not set)
9: Another location code? longer.
10: Page URL
11: Visitor Type (new/returning)
12: ?? Returns (not set)
13: ?? Returns (not set)
14: Medium
15: ?? Returns "1"
page:
At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.
For some reason 0 returns an impossibly high metrictotal
limit: Result limit per page, maximum of 50
filters:
Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States
You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).
Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States
Documentation
Documentation 2
I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

Google Places API does not return gym info

I could not get Google Places API to return gym info. Below is an example request api.
https://maps.googleapis.com/maps/api/place/search/json?location=33.347075,-111.96318&radius=100&types=gym&sensor=false&key=AIzaSyBg8HI6sH1Rxyhn1Mno_hhgDawuF1KAfq0
(Open URL)
I know the lat/lon in the link is valid.
If I remove the "types=gym" (see below link" it returns some places info but none of type gym.
https://maps.googleapis.com/maps/api/place/search/json?location=33.347075,-111.96318&radius=100&sensor=false&key=AIzaSyBg8HI6sH1Rxyhn1Mno_hhgDawuF1KAfq0
(Open URL)
Is there a limitation on the api?
Also, could I have the api to return an uri which takes me directly to the location?
You just need to increase your search radius a bit - you're looking for results within a 100m circle. Try this:
https://maps.googleapis.com/maps/api/place/search/json?location=33.347075,-111.96318&radius=200&types=gym&sensor=false&key=AIzaSyBg8HI6sH1Rxyhn1Mno_hhgDawuF1KAfq0
Increasing the radius to just 200m returns a result; at 1000m you get four results.
You can then pass the reference value to a Places Details search to get the url value, as follows:
https://maps.googleapis.com/maps/api/place/details/json?reference=CnRnAAAA99xxsFT0V-FNigzMi7GEnmkqWRYCOZG-lrQH0fpw9iI_JUp5WHrYOCcTGpeyzVdHrtk3rE2zrHleBxRw4i67K0sT_fhsSQufaAHN80Oi4OvxR-amG_W4plz5Mr8a-512584oHpfUpV87jMqyF2R8cRIQpqTgOCgZtZF0hYR4R_ZVRRoUEIS-oN1fcyVQcN5nj7DxaNK-e8o&sensor=false&key=AIzaSyBg8HI6sH1Rxyhn1Mno_hhgDawuF1KAfq0
The url links to the Google Maps Place page: http://maps.google.com/maps/place?cid=2681829493569576902
See the docs here: http://code.google.com/apis/maps/documentation/places/#PlaceDetailsRequests
also there is a limit of the no. requests on this particular api.
2500 IIRC -- but you can read it in the docs
Are there 20 coming back with the "types=gym" call? If so then you are hitting the limitation of the api and it just so happens the 20 returned are not a gym:
The Places API returns up to 20 establishment results. Additionally,
political results may be returned which serve to identify the area of
the request

Resources