YouTube Data API V3 - Maximum search result for Channel ID - youtube-data-api

I came across the API reference document regarding the Search API:
Note: Search results are constrained to a maximum of 500 videos if your request specifies a value for the channelId parameter and sets the type parameter value to video, [...].
Do I have to apply for paid account to break the 500 videos limit? If yes, how do I apply?

If you need to obtain the list of all videos of a given channel -- identified by its ID, say CHANNEL_ID --, then you have to proceed as follows:
Step 1: Query the Channels.list API endpoint with parameter id=CHANNEL_ID for to obtain from the API the ID of that channel's uploads playlist:
response = youtube.channels().list(
id = CHANNEL_ID,
part = 'contentDetails',
fields = 'items(contentDetails(relatedPlaylists(uploads)))',
maxResults = 1
).execute()
uploads_id = response \
['contentDetails'] \
['relatedPlaylists'] \
['uploads']
The code above should run only once for obtaining the uploads playlist ID as uploads_id, then that ID should be used as many times as needed.
Usually, a channel ID and its corresponding uploads playlist ID are related by s/^UC([0-9a-zA-Z_-]{22})$/UU\1/.
Step 2: Using the previously obtained uploads playlist ID -- let's name it UPLOADS_ID --, query the PlaylistItems.list API endpoint for to obtain the list of all video ID's of that playlist:
is_video = lambda item: \
item['snippet']['resourceId']['kind'] == 'youtube#video'
video_id = lambda item: \
item['snippet']['resourceId']['videoId']
request = youtube.playlistItems().list(
playlistId = UPLOADS_ID,
part = 'snippet',
fields = 'nextPageToken,items(snippet(resourceId))',
maxResults = 50
)
videos = []
while request:
response = request.execute()
items = response.get('items', [])
videos.extend(map(video_id, filter(is_video, items)))
request = youtube.playlistItems().list_next(
request, response)
Upon running the code above, the list videos will contain the IDs of all videos that were uploaded on the channel identified by CHANNEL_ID.
Step 3: Query the Videos.list API endpoint for to obtain the statistics info (i.e. object) of each of the videos you're interested in:
class Stat:
def __init__(video_id, view_count, like_count):
self.video_id = video_id
self.view_count = view_count
self.like_count = like_count
stats = []
while len(videos):
ids = videos[0:50]
del videos[0:50]
response = youtube.videos().list(
id = ','.join(ids),
part = 'id,statistics',
fields = 'items(id,statistics)',
maxResults = len(ids)
).execute()
items = response['items']
assert len(items) == len(ids)
for item in items:
stat = item['statistics']
stats.append(
Stat(
video_id = item['id'],
view_count = stat['viewCount'],
like_count = stat['likeCount']
)
)
Note that code above, in case the list videos has length N, reduces the number of calls to Videos.list from N to math.floor(N / 50) + (1 if N % 50 else 0). That's because the parameter id of Videos.list endpoint can be specified as a comma-separated list of video IDs (the number of IDs in one such list can be maximum 50).
Note also that each piece of code above uses the fields request parameter for to obtain from the invoked API endpoints only the info that is of actual use.
I must also mention that according to YouTube's staff, there's an upper 20000 limit set by design for the number of items returned via PlaylistItems.list endpoint. This is unfortunate, but a fact.

Related

#SNMP - GetBulk V2 request is limited to 100 results?

I'm trying to perform the below request and the results should be around 900 variables, not 100.
it doesn't matter how many oids I send 1 or 10, I always get no more than 100 variables.
what I'm doing wrong?
var readCommunity = new OctetString("XXXXX");
var oidsList = new List<string>
{
"1.3.6.1.2.1.2.2.1.3",
"1.3.6.1.2.1.2.2.1.5",
"1.3.6.1.2.1.2.2.1.6",
"1.3.6.1.2.1.2.2.1.7",
"1.3.6.1.2.1.2.2.1.8",
"1.3.6.1.2.1.2.2.1.2",
"1.3.6.1.2.1.2.2.1.10",
"1.3.6.1.2.1.2.2.1.16",
"1.3.6.1.2.1.2.2.1.14",
"1.3.6.1.2.1.31.1.1.1.6"
};
var oids = oidsList.Select(oid => new Variable(new ObjectIdentifier(oid))).ToArray();
ISnmpMessage request= new GetBulkRequestMessage(
0,
VersionCode.V2,
readCommunity,
0,
1000,
oids);
var response = request.GetResponse(60000, new IPEndPoint(IPAddress.Parse("1.1.1.1"), 161));
You can send a request asking for as many items as you wished, but it is the agent who decides how many to return to you. That's how the standard defines,
The receiving SNMP entity produces a Response-PDU with up to the
total number of requested variable bindings communicated by the
request.
While the maximum number of variable bindings in the Response-PDU is
bounded by N + (M * R), the response may be generated with a lesser
number of variable bindings (possibly zero) for either of three
reasons.
Reference

Power Query delayed recursion

I'm very new to Power Query and trying to piece a little demo together in Excel.
I have two web endpoints: I have to post some content to the first endpoint, this gives me the url of the second endpoint and then I have to query this second endpoint for the actual results. The second endpoint gives back a json response and in it, there is a field that represents if the results are ready or not. If the results are ready, they can be processed, if not, the endpoint should be queried again at a later date.
Here's the code I have so far:
let
apikey = "MYAPIKEY",
proxyendpoint = "URL OF THE FIRST ENDPOINT",
bytesbody = File.Contents("FILE TO POST"),
headers = [#"Ocp-Apim-Subscription-Key" = apikey],
bytesresp = Web.Contents(proxyendpoint, [Headers=headers, Content=bytesbody]),
jsonresp = Json.Document(bytesresp),
opLoc = jsonresp[OperationLocation],
getResult = (url) =>
let
linesBody = Web.Contents(url, [Headers=headers]),
linesJson = Json.Document(linesBody),
resultStatus = linesJson[status],
linesData = if (resultStatus = "Succeeded") then
linesJson[recognitionResult][lines]
else
Function.InvokeAfter(()=>#getResult(url),#duration(0,0,0,5))
in
linesData,
linesText = List.Transform(getResult(opLoc), each _[text]),
table = Table.FromList(linesText)
in
table
My problem is that when I check with Fiddler, I see the second endpoint queried once, I can check there in the response that the results are not ready, the data loading "hangs", but I cannot see any additional calls to the second endpoint, so basically my recursive calls are not being evaluated.
What am I doing wrong?
With the ()=> in the first argument of Function.InvokeAfter, the result of Function.InvokeAfter will be the function getResult, rather than the result from getResult. So it should be left out:
Function.InvokeAfter(#getResult(url),#duration(0,0,0,5))
Turns out my code was basically right. The issue was that Web.Contents() does some internal caching, that's why I couldn't see any more calls in Fiddler and that's why my data loading "hang" (since the first time the recursion exit criterion was false and the result got cached, every subsequent recursion just used the same data).
I created some POCs for the delayed recursion scenario and strangely, everything worked. I changed things around until I reached a version of the POC where the only difference was the Web.Contents() call. So I did a search for this specific issue and found a post here.
So as suggested in this post, I added a new header value to every Web.Contents() call to avoid the response being cached (also cleaned up the code a bit):
let
apikey = "MYAPIKEY",
proxyendpoint = "URL OF THE FIRST ENDPOINT",
bytesbody = File.Contents("FILE PATH TO BE POSTED"),
headers = [#"Ocp-Apim-Subscription-Key" = apikey],
bytesresp = Web.Contents(proxyendpoint, [Headers=headers, Content=bytesbody]),
jsonresp = Json.Document(bytesresp),
opLoc = jsonresp[OperationLocation],
getResult = (url, apiKeyParam) =>
let
// note the extra header here, which is different in every call
currentHeaders = [#"Ocp-Apim-Subscription-Key" = apiKeyParam, #"CacheHack" = Number.ToText(Number.Random())],
linesBody = Web.Contents(url, [Headers=currentHeaders]),
linesJson = Json.Document(linesBody),
resultStatus = linesJson[status],
result = if (resultStatus = "Succeeded") then linesJson[recognitionResult][lines]
else Function.InvokeAfter(()=>#getResult(url, apiKeyParam), #duration(0,0,0,5))
in result,
linesText = List.Transform(getResult(opLoc, apikey), each _[text]),
table = Table.FromList(linesText)
in table

Mailchimp url using power query

I'm not a programer, therefore i'm trying to use Power query to pull the data from Mailchimp, the power query allows me to write the url link and to get the data in tables (XMK/Json).
This is my URL **http://us5.api.mailchimp.com/3.0/reports?apikey=(secret)
and I get only ten reports instead of 100.
Am I doing something wrong?
Thanks in advance
Yes. It is nowhere documented in mailchimp docs, but by default you get only first ten records when you query a mailchimp 3.0 API. In order to fetch a larger number of records, you have to make use of the &offset and &count querystring parameters. In one of my recent python projects, I had implemented it as follows to fetch in blocks/pages of 1000 records per request. Perhaps, you might be able to convert it to power-query:
campaigns = []
baseurl = "https://" + dc + ".api.mailchimp.com/3.0/"
psize, i = 1000, 0 #page size
while(True):
turl = baseurl + "reports"
turl += "?since_send_time=" + camp_since_send_time
turl += '&offset=' + str(psize * i) + '&count=' + str(psize)
request = urllib2.Request(turl)
base64string = base64.encodestring('%s:%s' % (username, key)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)
try:
output = urllib2.urlopen(request).read()
data = json.loads(output)
except:
print "Error occurred. Make sure you entered the correct api key"
exit()
MailChimpExpress.createfile("allcampaigns.json", output)
cnt = len(data['reports'])
print str(cnt) + " campaigns retrieved"
for report in data['reports']:
lst = [report["id"], report["campaign_title"], report["type"], report["emails_sent"]]
campaigns.append(lst)
if cnt<psize: break #cnt could also be zero if no records are returned
i += 1
EDIT
According to this technet link, looks like you can indeed call urls in succession using power query. You can use a pattern like this one as mentioned in one of the answers:
let
Source = Table.FromColumns({{"firstURL", "secondURL", "etc."}}, {"URLS"}),
InsertedCustom = Table.AddColumn(Source, "Custom", each Web.Page(Web.Contents([URLS])))
in
InsertedCustom

How to retrieve total view count of large number of pages combined from the GA API

We are interested in the statistics of the different pages combined from the Google Analytics core reporting API. The only way I found to query statistics multiple pages at the same is by creating a filter like so:
ga:pagePath==page?id=a,ga:pagePath==page?id=b,ga:pagePath==page?id=c
And this get escaped inside the filter parameter of the GET query.
However when the GET query gets over 2000 characters I get the following response:
414. That’s an error.
The requested URL /analytics/v3/data/ga... is too large to process. That’s all we know.
Note that just like in the example call the only part that is different per page is a GET parameter in the pagePath, but we have to OR a new filter specifying both the metric (pagePath) as well as the part of the path that is always identical.
Is there any way to specify a large number of different pages to query without hitting this limit in the GET query (I can't find any documentation for doing POST requests)? Or are there alternatives to creating batches of a max of X different pages per query and adding them up on my end?
Instead of using ga:pagePath as part of a filter you should use it as a dimension. You can get up to 10,000 rows per query this way and paginate to get all results. Then parse the results client side to get what you need. Additionally use a filter to scope the results down if possible based on your site structure or page names.
I am sharing a sample code where you can fetch more then 10,000 record data via help of Items PerPage
private void GetDataofPpcInfo(DateTime dtStartDate, DateTime dtEndDate, AnalyticsService gas, List<PpcReportData> lstPpcReportData, string strProfileID)
{
int intStartIndex = 1;
int intIndexCnt = 0;
int intMaxRecords = 10000;
var metrics = "ga:impressions,ga:adClicks,ga:adCost,ga:goalCompletionsAll,ga:CPC,ga:visits";
var r = gas.Data.Ga.Get("ga:" + strProfileID, dtStartDate.ToString("yyyy-MM-dd"), dtEndDate.ToString("yyyy-MM-dd"),
metrics);
r.Dimensions = "ga:campaign,ga:keyword,ga:adGroup,ga:source,ga:isMobile,ga:date";
r.MaxResults = 10000;
r.Filters = "ga:medium==cpc;ga:campaign!=(not set)";
while (true)
{
r.StartIndex = intStartIndex;
var dimensionOneData = r.Fetch();
dimensionOneData.ItemsPerPage = intMaxRecords;
if (dimensionOneData != null && dimensionOneData.Rows != null)
{
var enUS = new CultureInfo("en-US");
intIndexCnt++;
foreach (var lstFirst in dimensionOneData.Rows)
{
var objPPCReportData = new PpcReportData();
objPPCReportData.Campaign = lstFirst[dimensionOneData.ColumnHeaders.IndexOf(dimensionOneData.ColumnHeaders.FirstOrDefault(h => h.Name == "ga:campaign"))];
objPPCReportData.Keywords = lstFirst[dimensionOneData.ColumnHeaders.IndexOf(dimensionOneData.ColumnHeaders.FirstOrDefault(h => h.Name == "ga:keyword"))];
lstPpcReportData.Add(objPPCReportData);
}
intStartIndex = intIndexCnt * intMaxRecords + 1;
}
else break;
}
}
Only one thing is problamatic that your query length shouldn't exceed around 2000 odd characters

Google calendar query returns at most 25 entries

I'm trying to delete all calendar entries from today forward. I run a query then call getEntries() on the query result. getEntries() always returns 25 entries (or less if there are fewer than 25 entries on the calendar). Why aren't all the entries returned? I'm expecting about 80 entries.
As a test, I tried running the query, deleting the 25 entries returned, running the query again, deleting again, etc. This works, but there must be a better way.
Below is the Java code that only runs the query once.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Execute the query and get the response
CalendarEventFeed resultFeed = service.query(myQuery, CalendarEventFeed.class);
// !!! This returns 25 (or less if there are fewer than 25 entries on the calendar) !!!
int test = resultFeed.getEntries().size();
// Delete all the entries returned by the query
for (int j = 0; j < resultFeed.getEntries().size(); j++) {
CalendarEventEntry entry = resultFeed.getEntries().get(j);
entry.delete();
}
PS: I've looked at the Data API Developer's Guide and the Google Data API Javadoc. These sites are okay, but not great. Does anyone know of additional Google API documentation?
You can increase the number of results with myQuery.setMaxResults(). There will be a maximum maximum though, so you can make multiple queries ('paged' results) by varying myQuery.setStartIndex().
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setMaxResults(int)
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setStartIndex(int)
Based on the answers from Jim Blackler and Chris Kaminski, I enhanced my code to read the query results in pages. I also do the delete as a batch, which should be faster than doing individual deletions.
I'm providing the Java code here in case it is useful to anyone.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Set the maximum number of results to return for the query.
// Note: A GData server may choose to provide fewer results, but will never provide
// more than the requested maximum.
myQuery.setMaxResults(5000);
int startIndex = 1;
int entriesReturned;
List<CalendarEventEntry> allCalEntries = new ArrayList<CalendarEventEntry>();
CalendarEventFeed resultFeed;
// Run our query as many times as necessary to get all the
// Google calendar entries we want
while (true) {
myQuery.setStartIndex(startIndex);
// Execute the query and get the response
resultFeed = service.query(myQuery, CalendarEventFeed.class);
entriesReturned = resultFeed.getEntries().size();
if (entriesReturned == 0)
// We've hit the end of the list
break;
// Add the returned entries to our local list
allCalEntries.addAll(resultFeed.getEntries());
startIndex = startIndex + entriesReturned;
}
// Delete all the entries as a batch delete
CalendarEventFeed batchRequest = new CalendarEventFeed();
for (int i = 0; i < allCalEntries.size(); i++) {
CalendarEventEntry entry = allCalEntries.get(i);
BatchUtils.setBatchId(entry, Integer.toString(i));
BatchUtils.setBatchOperationType(entry, BatchOperationType.DELETE);
batchRequest.getEntries().add(entry);
}
// Get the batch link URL and send the batch request
Link batchLink = resultFeed.getLink(Link.Rel.FEED_BATCH, Link.Type.ATOM);
CalendarEventFeed batchResponse = service.batch(new URL(batchLink.getHref()), batchRequest);
// Ensure that all the operations were successful
boolean isSuccess = true;
StringBuffer batchFailureMsg = new StringBuffer("These entries in the batch delete failed:");
for (CalendarEventEntry entry : batchResponse.getEntries()) {
String batchId = BatchUtils.getBatchId(entry);
if (!BatchUtils.isSuccess(entry)) {
isSuccess = false;
BatchStatus status = BatchUtils.getBatchStatus(entry);
batchFailureMsg.append("\nID: " + batchId + " Reason: " + status.getReason());
}
}
if (!isSuccess) {
throw new Exception(batchFailureMsg.toString());
}
There is a small quote on the API page
http://code.google.com/apis/calendar/data/1.0/reference.html#Parameters
Note: The max-results query parameter for Calendar is set to 25 by default,
so that you won't receive an entire
calendar feed by accident. If you want
to receive the entire feed, you can
specify a very large number for
max-results.
So to get all events from a google calendar feed, we do this:
google.calendarurl.com/.../basic?max-results=999999
in the API you can also query with setMaxResults=999999
I got here while searching for a Python solution;
Should anyone be stuck in the same way, the important line is the fourth:
query = gdata.calendar.service.CalendarEventQuery(cal, visibility, projection)
query.start_min = start_date
query.start_max = end_date
query.max_results = 1000
Unfortunately, Google is going to limit the maximum number of queries you can retrieve. This is so as to keep the query governor in their guidelines (HTTP requests not allowed to take more than 30 seconds, for example). They've built their whole architecture around this, so you might as well build the logic as you have.

Resources