Youtube API - Subscriptions list returns different number of total results in set - google-api

I'm trying to get the complete list of my subscriptions. I've tried 3 methods, all of them returns different amount of subscriptions and I don't know what to do :)
1: Using Subscriptions: list with channel ID:
https://www.googleapis.com/youtube/v3/subscriptions?part=snippet&channelId=MY_CHANNEL_ID&maxResults=50&key=MY_API_KEY
"totalResults" is 942
2: Using Subscriptions: list with "mine" flag. the "totalResult" field is 991.
Where do 49 subscriptions appear from?
3: Open browser in incognite mode, go to
https://www.youtube.com/channel/MY_CHANNEL_ID
Click on "Channels" tab, scroll down to the end of the subscriptions list, open console and type something like that
document.querySelectorAll("#contents #items > *").length
I see 1039. Where do another 48 subscriptions come from?
And the 1039 seems to be the most accurace number - I have 6 subscriptions in a row and the last row has only 1 item. 173*6+1 = 1039
So the questions is - how do I get all the 1039 subscriptions by API? And why does it return wrong amount of subscriptions?

You are using Subscriptions: list and shouldn't have such kind of bugs with totalResults however maybe there is a YouTube Data API v3 endpoint bug as documented in Search: list totalResults is:
integer
The total number of results in the result set. Please note that the value is an approximation and may not represent an exact value. In addition, the maximum value is 1,000,000.
You should not use this value to create pagination links. Instead, use the nextPageToken and prevPageToken property values to determine whether to show pagination links.
So I would recommend you to enumerate all subscriptions you have with the different methods you explained and so count on your own by using nextPageToken.

Related

How to get all the videos of a YouTube channel with the Yt gem?

I want to use the Yt gem to get all the videos of channel. I configure the gem with my YouTube Data API key.
Unfortunately when I use it it returns a maximum of ~1000 videos, even for channels having more than 1000 videos. Yt::Channel#video_count returns the correct number of videos.
channel = Yt::Channel.new id: "UCGwuxdEeCf0TIA2RbPOj-8g"
channel.video_count # => 1845
channel.videos.map(&:id).size # => 949
The Youtube API can't be set to return more than 50 items per request, so I guess Yt automatically performs several requests going through each next page of results to be able to return more than 50 results.
For some reason though it does not go through all the result pages. I don't see a way in Yt for me to control how it goes through the pages of results. In particular I could not find a way to force it to get a single page of results, access the returned value nextPageToken, in order to perform a new request with this value.
Any idea?
Looking into gem's /spec folder, you can see a test for your code.
describe 'when the channel has more than 500 videos' do
let(:id) { 'UC0v-tlzsn0QZwJnkiaUSJVQ' }
specify 'the estimated and actual number of videos can be retrieved' do
# #note: in principle, the following three counters should match, but
# in reality +video_count+ and +size+ are only approximations.
expect(channel.video_count).to be > 500
expect(channel.videos.size).to be > 500
end
end
I did some tests and what I have noticed it that: video_count is the number that is displayed on youtube next to channel's name. This value is not accurate. Not rly sure what it represents.
If you do channel.videos.size, the number is not accurate either, because the videos collection can contain some empty(?) records.
If you do channel.videos.map(&:id).size the returned value should be correct. By correct I mean it should equal to number of videos listed at:
https://www.youtube.com/channel/:channel_id/videos

Unable to get results more than 100 results on google custom search api

I need to use Google Custom Search API https://developers.google.com/custom-search/v1/overview. From that page, it said:
For CSE users, the API provides 100 search queries per day for free.
If you need more, you may sign up for billing in the Developers
Console. Additional requests cost $5 per 1000 queries, up to 10k
queries per day.
I already sign up for billing inside the developer console. However, I still could not retrieve results more than 100. What things should I do more? https://www.googleapis.com/customsearch/v1?cx=CSE_INSTANCE&key=API_KEY&q=QUERY&start=100
{ error: { errors: [ { domain: "global", reason: "invalid", message:
"Invalid Value" } ], code: 400, message: "Invalid Value" } }
Query: Definition
https://support.google.com/customsearch/answer/1361951
Any actual user query from a Google Site Search engine, including but
not limited to search engines installed on your website using XML,
iFrame, or the Custom Search Element.
That means you would probably need to send eleven queries to get more than 100 results.
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=1
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=11
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=21
GET ...
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=81
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=91
GET https://www.googleapis.com/customsearch/v1?&q=QUERY&...&start=101
Check every response and if error code is 400, you can stop - there is probably no need to send next (&start=previous+10) request.
Now you can merge responses and start building results page.
Google Custom Search and Google Site Search return up to 10 results
per query. If you want to display more than 10 results to the user,
you can issue multiple requests (using the start=0, start=11 ...
parameters) and display the results on a single page. In this case,
Google will consider each request as a separate query, and if you are
using Google Site Search, each query will count towards your limit.
There might be a better way to do this then I described above. (But, I'm not sure about batching API calls.)
And (finally) possible answer to your question: I made more than few tests, but I haven't had any luck with start greater than 100 (I was getting the same as you - <Response [400]>). I'm using "Browser key" from my billing-enabled project. That could mean we can't get 101st, 102nd, 103rd, etc. results with CSE API.
The API documentation says it never returns more than 100 items.
https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
start
integer (uint32 format)
The index of the first result to return. The default number of results
per page is 10, so &start=11 would start at the top of the second page
of results. Note: The JSON API will never return more than 100
results, even if more than 100 documents match the query, so setting
the sum of start + num to a number greater than 100 will produce an
error. Also note that the maximum value for num is 10.

how to get total number of mixpanel events via API

Can I get total number of events (=data points) for a time period?
The 'events' method (http://mixpanel.com/api/2.0/events/)
seems almost what I need, it's just that it requires a list of event names, and I need the total count of my events, I do not have the names.
I could not find this one in the API.
You can first hit the events by name API to return the list of your events at http://mixpanel.com/api/2.0/events/names/. Then, you can pump the return as a list into your request to http://mixpanel.com/api/2.0/events/ to get the count for each of your events.
Depending on your usage case, it may make more sense to use a URL hack on the main segmentation report instead of hitting the API. If you add union:1 as a new parameter to the URL (they are comma separated at the end) the report will display the union of all your events over a time period -- if you are viewing totals, this will be the total event count.
You can use the JQL console within Mixpanel, under the "Applications" menu in the left-nav. Just run the following, and it'll count the total number of events. See the JQL API reference here: https://mixpanel.com/help/reference/jql/api-reference#api/concepts
function main() {
return Events({
from_date: '2010-02-02',
to_date: '2017-02-03'
}).reduce(mixpanel.reducer.count());
}
// 989322

Yammer JSON Feed returning only 20 messages

I am trying to get all the messages from a particular group. I am getting the json feed back. The only problem is, its returning only 20 messages. Is this set as default or something. Is there any way by by which while doing the request, I can specify whether I want all the messages, by default just 20 or even messages posted between the start and the end date?
My RestApi call is:
https://www.yammer.com/api/v1/messages/in_group/[id].json
From Yammer Developer Documentation
<
Autocomplete: 10 requests in 10 seconds.
Messages: 10 requests in 30 seconds.
Notifications: 10 requests in 30 seconds.
All Other Resources: 10 requests in 10 seconds.
These limits are independent e.g. in the same 30 seconds period, you could make 10 message calls and 10 notification calls. The specific rate limits are subject to change but following the guidelines below will ensure that your app is not blocked.>>
I have tried using limit as the parameter to change the number of message more than 20. But it doesnt seem to be working?
Is this problem because of Rate Limit. If not, what's the problem?
Official documentation from Yammers Developer documentation
Messages - Viewing Messages
Endpoints:
1) All public messages in the user’s (whose access token is being used to make the API call henceforth referred to as current user) Yammer network. Corresponds to “All” conversations in the Yammer web interface.
GET https://www.yammer.com/api/v1/messages.json
2) The user’s feed, based on the selection they have made between “Following” and “Top” conversations.
GET https://www.yammer.com/api/v1/messages/my_feed.json
3) The algorithmic feed for the user that corresponds to “Top” conversations, which is what the vast majority of users will see in the Yammer web interface.
GET https://www.yammer.com/api/v1/messages/algo.json
4) The “Following” feed which is conversations involving people, groups and topics that the user is following.
GET https://www.yammer.com/api/v1/messages/following.json
5) All messages sent by the user. Alias for /api/v1/messages/from_user/logged-in_user_id.format.
GET https://www.yammer.com/api/v1/messages/sent.json
6) Private messages received by the user.
GET https://www.yammer.com/api/v1/messages/private.json
7) All messages received by the user.
GET https://www.yammer.com/api/v1/messages/received.json
Parameters:
The messages API endpoints return a similar structure and support the following query parameters:
older_than - Returns messages older than the message ID specified as a numeric string. This is useful for paginating messages. For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing.
newer_than - Returns messages newer than the message ID specified as a numeric string. This should be used when polling for new messages. If you’re looking at messages, and the most recent message returned is 3516, you can make a request with the parameter “?newer_than=3516″ to ensure that you do not get duplicate copies of messages already on your page.
threaded=[true | extended] - threaded=true will only return the first message in each thread. This parameter is intended for apps which display message threads collapsed. threaded=extended will return the thread starter messages in order of most recently active as well as the two most recent messages, as they are viewed in the default view on the Yammer web interface.
limit - Return only the specified number of messages. Works for threaded=true and threaded=extended.
Noted the limit parameter that you can set on your GET request - so based on this documentation if it is correct (I'm not a Yammer Developer but I do use it) you should be able to do
https://www.yammer.com/api/v1/messages.json?limit=50
That is in theory but reading through the documentation there is a section on Search that has
page - Only 20 results of each type will be returned for each page, but a total count is returned with each query. page=1 (the default) will return items 1-20, page=2 will return items 21-30, etc.
Which says to me they are limited to 20 results returned.
UPDATE
After testing this with https://www.yammer.com/api/v1/messages.json?limit=50 and it not returning 50 messages but doing https://www.yammer.com/api/v1/messages.json?limit=5 will return only 5 messages I would say that Yammer restrict the number of messages to 20 Also after reading through the documents a bit more I read
For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing"
This says to me that they will only return a max of 20. So I think you are stuck with 20 messages at a time.
Hope this helps.
You need to use Parameters:
The messages API endpoints return a similar structure and support the following query parameters:
older_than - Returns messages older than the message ID specified as a numeric string. This is useful for paginating messages. For example, if you’re currently viewing 20 messages and the oldest is number 2912, you could append “?older_than=2912″ to your request to get the 20 messages prior to those you’re seeing.
newer_than - Returns messages newer than the message ID specified as a numeric string. This should be used when polling for new messages. If you’re looking at messages, and the most recent message returned is 3516, you can make a request with the parameter “?newer_than=3516″ to ensure that you do not get duplicate copies of messages already on your page.
threaded=[true | extended] - threaded=true will only return the first message in each thread. This parameter is intended for apps which display message threads collapsed. threaded=extended will return the thread starter messages in order of most recently active as well as the two most recent messages, as they are viewed in the default view on the Yammer web interface.
limit - Return only the specified number of messages. Works for threaded=true and threaded=extended.
Example : GET https://www.yammer.com/api/v1/messages.json?older_than=2912
while older can be ID of message number 20 and so on you can get 20 by 20
I solved by requesting subsequent pages in a recursive manner.
You can simply increase the page parameter until the response is empty, or update the older_than parameter until the property meta.older_available is false.

Scraping Real Time Visitors from Google Analytics

I have a lot of sites and want to build a dashboard showing the number of real time visitors on each of them on a single page. (would anyone else want this?) Right now the only way to view this information is to open a new tab for each site.
Google doesn't have a real-time API, so I'm wondering if it is possible to scrape this data. Eduardo Cereto found out that Google transfers the real-time data over the realtime/bind network request. Anyone more savvy have an idea of how I should start? Here's what I'm thinking:
Figure out how to authenticate programmatically
Inspect all of the realtime/bind requests to see how they change. Does each request have a unique key? Where does that come from? Below is my breakdown of the request:
https://www.google.com/analytics/realtime/bind?VER=8
&key= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&ds= [What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]
&pageId=rt-standard%2Frt-overview
&q=t%3A0%7C%3A1%3A0%3A%2Ct%3A11%7C%3A1%3A5%3A%2Cot%3A0%3A0%3A4%2Cot%3A0%3A0%3A3%2Ct%3A7%7C%3A1%3A10%3A6%3D%3DREFERRAL%3B%2Ct%3A10%7C%3A1%3A10%3A%2Ct%3A18%7C%3A1%3A10%3A%2Ct%3A4%7C5%7C2%7C%3A1%3A10%3A2!%3Dzz%3B%2C&f
The q variable URI decodes to this (what the?):
t:0|:1:0:,t:11|:1:5:,ot:0:0:4,ot:0:0:3,t:7|:1:10:6==REFERRAL;,t:10|:1:10:,t:18|:1:10:,t:4|5|2|:1:10:2!=zz;,&f
&RID=rpc
&SID= [What is this? Where does it come from? 16 character uppercase alphanumeric, stays the same each request]
&CI=0
&AID= [What is this? Where does it come from? integer, starts at 1, increments weirdly to 150 and then 298]
&TYPE=xmlhttp
&zx= [What is this? Where does it come from? 12 character lowercase alphanumeric, changes each request]
&t=1
Inspect all of the realtime/bind responses to see how they change. How does the data come in? It looks like some altered JSON. How many times do I need to connect to get the data? Where is the active visitors on site number in there? Here is a dump of sample data:
19
[[151,["noop"]
]
]
388
[[152,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[49,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2,0],"name":"Total"}]}}]]]
]
388
[[153,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[52,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[2,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2],"name":"Total"}]}}]]]
]
388
[[154,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[53,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,3,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3],"name":"Total"}]}}]]]
]
Let me know if you can help with any of the items above!
To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/
This is quite similar to Google Analytics API. To start development on this,
https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide
With Google Chrome I can see the data on the Network Panel.
The request endpoint is https://www.google.com/analytics/realtime/bind
Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.
After about 2.5 minutes the connection is closed and a new one is open.
On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.
I hope that can give you a place to start.
Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.
<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX goes here
$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>
Ammendum:
A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.
Client side Javascript with AJAX for fine tuning or overkill
The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/
When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.
As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.
I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.
Instead of binding to /bind I get data from /getData (no pun intended).
At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1
Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:
Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters
Values:
pageID: Required but seems to only be used for internal analytics.
propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.
dataType:
t: Current data
ot: Overtime/Past
c: Unknown, returns only a "count" value
dimensions (| separated or alone), most values are only applicable for t:
1: Country
2: City
3: Location code?
4: Latitude
5: Longitude
6: Traffic source type (Social, Referral, etc.)
7: Source
8: ?? Returns (not set)
9: Another location code? longer.
10: Page URL
11: Visitor Type (new/returning)
12: ?? Returns (not set)
13: ?? Returns (not set)
14: Medium
15: ?? Returns "1"
page:
At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.
For some reason 0 returns an impossibly high metrictotal
limit: Result limit per page, maximum of 50
filters:
Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States
You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).
Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States
Documentation
Documentation 2
I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

Resources