I am currently experiencing the problem that the descriptions of the videos coming from the Youtube API are being truncated to 150-160 characters.
It was working correctly and I could get full descriptions for each video that I have found, but for the past 2 weeks the descriptions are truncated. Has anyone experienced the same problem?
This is my query:
youtube_instance.search().list(q = query,
part = "id, snippet",
publishedBefore=str(date_string) + "T23:59:59.000Z",
publishedAfter=str(date_string) + "T00:00:00.000Z",
type = 'channel, video',
maxResults = max_results).execute()
Related
I am trying to figure out how many restaurants, in each country, there are of a specific cuisine (seafood). I have looked at Google Places Api and TripAdvisor Api, but cannot find these numbers. I donĀ“t need the list of restaurants, only number of restaurants. I found OpenStreetMap which looked very promising. I downloaded data for Norway, but the numbers are not correct (osmium tags-filter norway-latest.osm.pbf cuisine=seafood) = 62, which is way to low.
Any suggestion for how and where I can find what I am looking for?
Extrapolate.
You won't get an accurate answer, how do you even define what a seafood restaurant is?
Find out roughly how many restaurants there are in the area you are interested in and then decide what % of them might be seafood restaurants.
You can use this approach to extract the data from OpenStreetMap:
https://gis.stackexchange.com/questions/363474/aggregate-number-of-features-by-country-in-overpass
You can run the query on http://overpass-turbo.eu/ (go to settings and chose the kumi-systems server).
The query could look like this:
// Define fields for csv output
[out:csv(name, total)][timeout:2500];
//All countries
area["admin_level"=2];
// Count in each area
foreach->.regio(
// Collect all Nodes with highway=milestone in the current area
( node(area.regio)[cuisine=seafood];
way(area.regio)[cuisine=seafood];
rel(area.regio)[cuisine=seafood];);
// assemble the output
make count name = regio.set(t["name:en"]),
total = count(nodes) + count(ways) + count(relations);
out;
);
This query can take a long time (at the time of writing, mine did not yet finish)
You can also run the query via curl in on some server and let the results mailed to you via curl ....... | mail -s "Overpass Result" yourmail#example.com. You get the curl command in the browser network tab by "copy curl"
I also considered Taginfo (https://taginfo.openstreetmap.org/tags/cuisine=seafood) but it cannot filter by tag.
I'm using the "IMPORTXML" function on Google Spreadsheets to get the number of likes and comments on any given YouTube video. However, I can't find the right XPath, and all I've tried return an empty value.
I used ChroPath to extract the XPath of the comments and likes count, but to no success.
This is the XPath that I've been using for amount of comments:
//yt-formatted-string[#class='count-text style-scope ytd-comments-header-renderer']
And this is for amount of likes:
//div[#id='info']//ytd-toggle-button-renderer[1]//a[1]//yt-icon-button[1]
When I try those it just says the content is empty. What is the correct XPath that I should be using to get the number of likes and comments?
You want to retrieve the number of likes of the video on YouTube.
You want to put the value to the Spreadsheet.
How about this formula? Please think of this as just one of several answers.
Sample formula:
=VALUE(IMPORTXML(A1,"//button[#title='I like this']/span"))
In this case, the cell "A1" is the URL like https://www.youtube.com/watch?v=###.
Xpath is //button[#title='I like this']/span".
Sample script of Google Apps Script:
As another method for retrieving the number of likes, if you use Advanced Google services of Google Apps Script, the sample script is as follows.
var count = YouTube.Videos.list("statistics", {id: "###"}).items[0].statistics.likeCount;
### is the video's ID.
References:
IMPORTXML
VALUE
Advanced Google services
Videos: list
If I misunderstood your question and this was not the result you want, I apologize.
for YT likes you could use:
=IF(ISNA(IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(#class,'like-button-renderer-like-button')])[1]"))=TRUE,0,
IMPORTXML("https://www.youtube.com/watch?v=MkgR0SxmMKo","(//*[contains(#class,'like-button-renderer-like-button')])[1]"))
I am using PRAW to scrape data off of reddit. I am using the .search method to search very specific people. I can easily print the title of the submission if the keyword is in the title, but if the keyword is in the text of the submission nothing pops up. Here is the code I have so far.
import praw
reddit = praw.Reddit(----------)
alls = reddit.subreddit("all")
for submission in alls.search("Yoa ming",sort = comment, limit = 5):
print(submission.title)
When I run this code i get
Yoa Ming next to Elephant!
Obama's Yoa Ming impression
i used to yoa ming... until i took an arrow to the knee
Could someone make a rage face out of our dearest Yoa Ming? I think it would compliment his first one so well!!!
If you search Yoa Ming on reddit, there are posts that dont contain "Yoa Ming" in the title but "Yoa Ming" in the text and those are the posts I want.
Thanks.
You might need to update the version of PRAW you are using. Using v6.3.1 yields the expected outcome and includes submissions that have the keyword in the body and not the title.
Also, the sort=comment parameter should be sort='comments'. Using an invalid value for sort will not throw an error but it will fall back to the default value, which may be why you are seeing different search results between your script and the website.
I have a country list of 245 countries.
Is there any way I can use a VLOOKUP in Google Sheets to import their respective flags?
I was thinking of potentially using a resource such as Wiki or http://www.theodora.com/flags/ but not sure if I can?
Sample file *
Related article
Step 1. Get links
A1 = http://www.sciencekids.co.nz/pictures/flags.html
B1 = //#src[contains(.,'flags96')]
A3 = =IMPORTXML(A1,B1)
Step2. Use image function
B3 = =IMAGE(substitute(A3,"..","http://www.sciencekids.co.nz"))
Bonus. Country name:
C1 = ([^/.]+)\.jpg$
C3 = =REGEXEXTRACT(A3,C1)
Update:
After writing this and doing a bit more curious Googling, I found the following APIs:
https://www.countryflags.io/ (for building a country flag url from a country code)
https://restcountries.eu/ (for getting a country code from a name or partial name)
Which allowed me to create this one-liner formula instead:
=IMAGE(CONCATENATE("https://www.countryflags.io/", REGEXEXTRACT(INDEX(IMPORTDATA(CONCAT("https://restcountries.eu/rest/v2/name/", F3)), 1, 3),"""(\w{2})"""), "/flat/64.png"))
(if anyone knows of a better way to import & parse json in Google Sheets - let me know)
Since these are official APIs rather than "sciencekids.co.nz" it would theoretically provide the following benefits:
It's a bit more "proper" to use a purpose-built API than some random website
Maybe slightly more "future proof"
Availability: more likely to be available in the future
Updated/maintenance: more likely to be updated to include new countries/updated flags
But, big downside: it seems to be limited to 64px-wide images (even the originally posted "sciencekids" solution provided 96px-wide images). So if you want higher-quality images, you can adapt the original formula to:
=IMAGE(SUBSTITUTE(SUBSTITUTE(QUERY(IMPORTXML("http://www.sciencekids.co.nz/pictures/flags.html","//#src[contains(.,'flags96')]"),CONCATENATE("SELECT Col1 WHERE Col1 CONTAINS '/", SUBSTITUTE(SUBSTITUTE(A1, " ", "_"), "&", "and") ,".jpg'")),"..","http://www.sciencekids.co.nz"), "flags96", "flags680"))
which provides 680px-wide images on the "sciencekids.co.nz" site. (If anyone finds an API that provides higher-quality images, please let me know. There's got to be one out there)
Original Post:
To add on to Max's awesome answer, here's the whole thing in a single function:
=IMAGE(SUBSTITUTE(QUERY(IMPORTXML("http://www.sciencekids.co.nz/pictures/flags.html","//#src[contains(.,'flags96')]"),CONCATENATE("SELECT Col1 WHERE Col1 CONTAINS '/", SUBSTITUTE(SUBSTITUTE(A1, " ", "_"), "&", "and") ,".jpg'")),"..","http://www.sciencekids.co.nz"))
(If anyone wants to simplify that a bit, be my guest)
Put this in A2, and put a country name in A1 (eg "Turkey" or "Bosnia & Herzegovina") and it will show a flag for your "search"
Recently converted some Bing Search API v2 code to v5 and it works but I am curious about the behavior of "totalEstimatedMatches". Here's an example to illustrate my question:
A user on our site searches for a particular word. The API query returns 10 results (our page size setting) and totalEstimatedMatches set to 21. We therefore indicate 3 pages of results and let the user page through.
When they get to page 3, totalEstimatedMatches returns 22 rather than 21. Seems odd that with such a small result set it shouldn't already know it's 22, but okay I can live with that. All results are displayed correctly.
Now if the user pages back again from page 3 to page 2, the value of totalEstimatedMatches is 21 again. This strikes me as a little surprising because once the result set has been paged through, the API probably ought to know that there are 22 and not 21 results.
I've been a professional software developer since the 80s, so I get that this is one of those devil-in-the-details issues related to the API design. Apparently it is not caching the exact number of results, or whatever. I just don't remember that kind of behavior in the V2 search API (which I realize was 3rd party code). It was pretty reliable on number of results.
Does this strike anyone besides me as a little bit unexpected?
Turns out this is the reason why the response JSON field totalEstimatedMatches includes the word ...Estimated... and isn't just called totalMatches:
"...search engine index does not support an accurate estimation of total match."
Taken from: News Search API V5 paging results with offset and count
As one might expect, the fewer results you get back, the larger % error you're likely to see in the totalEstimatedMatches value. Similarly, the more complex your query is (for example running a compound query such as ../search?q=(foo OR bar OR foobar)&...which is actually 3 searches packed into 1) the more variation this value seems to exhibit.
That said, I've managed to (at least preliminarily) compensate for this by setting the offset == totalEstimatedMatches and creating a simple equivalency-checking function.
Here's a trivial example in python:
while True:
if original_totalEstimatedMatches < new_totalEstimatedMatches:
original_totalEstimatedMatches = new_totalEstimatedMatches.copy()
#set_new_offset_and_call_api() is a func that does what it says.
new_totalEstimatedMatches = set_new_offset_and_call_api()
else:
break
Revisiting the API & and I've come up with a way to paginate efficiently without having to use the "totalEstimatedMatches" return value:
class ApiWorker(object):
def __init__(self, q):
self.q = q
self.offset = 0
self.result_hashes = set()
self.finished = False
def calc_next_offset(self, resp_urls):
before_adding = len(self.result_hashes)
self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
after_adding = len(self.result_hashes)
if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
self.complete = True
else:
self.offset += len(new_results)
def page_through_results(self, *args, **kwargs):
while not self.finished:
new_resp_urls = ...<call_logic>...
self.calc_next_offset(new_resp_urls)
...<save logic>...
print(f'All unique results for q={self.q} have been obtained.')
This^ will stop paginating as soon as a full response of duplicates have been obtained.