First post on Stackoverflow.
I am using the Google API to sort images taken while traveling into organized folders, append tags and rename files with relevant information. I have my code working well but am not always happy with the results. I want to be able to focus my query results on major tourist attractions such as National Parks, Ski Resorts, Beaches, etc. The problem I am finding is that the prominence "rankby" variable and the "radius" are not giving satisfactory results. Here is a typical query for Zion National Park.
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=37.269486111111,-112.948141666667&rankby=prominence&radius=50000&type=natural_feature,tourist_attraction,point_of_interest&keyword=&key=MYAPIKEY
The most prominent result is Springdale which is the town where you enter the part. Zion National Park is listed much further down in the results. What my code does is use the LAT and LON extracted using EXIF and does a Google API nearby search request to find the Place ID for where the photo was taken. It then does another API request for Place Details using the place_id provided by the previous step to cut down on the information I need to parse.
https://maps.googleapis.com/maps/api/place/details/json?place_id=ChIJ8R5RCzaNyoARegi3rqVkstk&fields=name,address_component&key=MYAPIKEY
I can force the nearby search to return a National Park by searching against "National Park" in the keywords variable but that limits my project to only being able to provide National Park results since the keywords field can only accept one string.
I would like a park of my query to be able to return the most prominent tourist attraction at the general level, i.e. Zion National Park, Yosemite National Park, etc. so I can sort images into the general name folders and another part of the query provides the exact location. i.e. I am on this trail or at this lookout. The problem is the Google API sees these specific locations "Trail, Lookout" as tourist attractions, parks, establishments, etc. as well so it chooses those first.
What I need help with is trying to figure out if there is a better way to structure my query to return the high-level / name of the major park. From my understanding, the types field only searches on the first type even if there is more in the list and the keywords field can only accept one string as well making it impossible for one phase to capture all major destinations at a high level.
Perhaps it needs to be done with more queries but I am trying to limit the number of queries to stay inside the free quota. Maybe it will just take a long time to fully sort my files.
Read through and implemented Google API structure. I hoping someone can provide a more detailed query structure or method to parse out truly prominent locations rather than googles interpretation of prominence as it can be affected by user ratings, etc. It is not always accurate.
Related
I am building a system that uses Elasticsearch to store and retrieve library catalogue data. One thing I've been asked for is a browse interface.
Here's a definition of what this is:
The user does a search, for example "Author starts with" and they
supply "Smith"
The system puts them into the middle of a list of authors, at or near
the position of the first one that starts with "Smith", so they might
see:
Smart, Murray
Smart, Murray J.
Smeaton, Duncan
Smieliauskas, Wally
Smillie, John
Smith Milway, Katie <-- this being the first actual search result
Smith, A. M. C.
Smith, Andrew
Smith, Andrew M. C.
etc.
The one with the marker is the one actually searched for, but you can see the ones around it according to the sort order, including ones that don't actually match the query.
These will be paged, so having ~20 or so results per page. If the user pages back, they head towards the start of the alphabet, if they page forwards they will go onward.
Each result shown will have a count beside it showing how many results (i.e. catalogue items) are associated with that author.
Clicking on a result takes you to everything by that author (this and everything beyond it is fairly easy and mostly implemented already.)
I'm wondering if anyone has any good ideas on how to approach this. At this stage, I don't care too much about handling searches that aren't "field starts with" searches, as exactly how that will be done is currently up in the air and I'll deal with it when the time comes.
Here's what I'm thinking, but there are serious issues with it:
All the fields that are going to be browsed are faceted
I get a list of all the facets for that field, search through it to find the starting point, and handle the paging manually in code.
This has the big problem that I might be fetching hundreds of thousands of terms and processing them, which won't be quick.
In retrospect, it's no different to loading all the values into its own index and fetching all them in sorted order.
I'm open to any options here, whether I can somehow jump into the middle of a large set of facets like the query "from" field, or if I should instead put everything into another index specifically for this purpose (though I don't know how I'd structure and query it), or something else.
From what I can see, my ideal solution would be that I can specify the facet field, tell ES that I want to start at the one that starts with "Smith", and it displays from around there, then I have the ability to say "go 20 back", but I'm not sure that this is possible.
You can see an example of the sort of thing I'm talking about in action here: http://hollisclassic.harvard.edu/ - put in Smith as "Author (last name first)", and it gives you a (terribly ugly looking) browse list.
Any thoughts?
On:
The one with the marker is the one actually searched for, but you can
see the ones around it according to the sort order, including ones
that don't actually match the query.
I had a similar requirement: "Show the user how many records we would have found if the search-conditions were more relaxed".
I solved this by doing two searches (one exact, one more relaxed), as the performance of ES is so good that doing one or two searches does not matter. The time gets eaten up in the displaying (in my case) and not in the search.
Still you would need to merge these two results in you application to generate one list to display.
I'm curious to know how Market Samurai, Long Tail Pro and other software handle retrieving the top 10 Google search results and not running into limits. It appears that these software packages use the users own Google account. Google Custom Search limits users to 100 queries per day (the free limit) but people tend to do keyword research on hundreds or even thousands of keywords per day and don't pay any additional amounts to Google.
Are they paying extra for this service, are they using a different API (perhaps the Adwords API?) or are they scraping the Google search results page (violation of TOS)? Really would like to know! Thanks.
i have done this in one of my project (in java).
this is very simple, in java there is one library call JSoup by using this library you can send get request to google, for example:
https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=<your url encoded search term>
this will return you an HTML code of google search result with your own term.
using Jsoup u can find specific HTML tag with specific class or id. this concept helps you to extract url link, title and description from google search result.
for working example check here, in that example you can extract google serach result links with custom search term.
i hope this will help you.
I am looking for geocodes with the google geocode-API:
http://maps.googleapis.com/maps/api/geocode/json?address=london%c2+UK&sensor=false
The problem is, that the input isn't very accurate (specially the street) and sometimes google mixes things up and ignores UK, because the street has a perfect match (as street and city) somewhere else. e.g. US.
Now i cannot solve this issue (input data), but I am wondering if there is a parameter, which forces google to search in UK and return no result instead of a completly wrong result.
You can add component filters in the url to constraint results. In this case you can use:
http://maps.googleapis.com/maps/api/geocode/json?address=london%c2+UK**&components=country:UK**&sensor=false
For more information about how to use component filtering see:
https://developers.google.com/maps/documentation/geocoding/intro#ComponentFiltering
I'm doing part-of-speech & morphological analysis project for Japanese sentences. Each sentence will have its own webpage. To make this page more visual, I want to show one picture which is somehow related to the sentence. For example, For the sentence "私は学生です" ("I'm a student"), the relevant pictures would be pictures of school, Japanese textbook, students, etc. What I have: part-of-speech tagging for every word. My approach now: use 2-3 nouns from every sentence and retrieve the first image from search results using Bing Images API. Note: all the sentence processing up to this point was done in Java.
Have a couple of questions though:
1) what is better (richer corpus & powerful search), Google Images API, Bing Images API, Flickr API, etc. for searching nouns in Japanese?
2) how do you select the most important noun from the sentence to do the query in Image Search Engine without doing complicated topic modeling, etc.?
Thanks!
Japanese WordNet has links to OpenClipart pictures. That could be another relevant source. They describe it in their paper called "Enhancing the Japanese WordNet".
I thought you would start by choosing any noun before は、が and を and giving these priority - probably in that order.
But that assumes that your part-of-speech tagging is good enough to get は=subject identified properly (as I guess you know that は is not always the subject marker).
I looked at a bunch of sample sentences here with this technique in mind and found it as good as could be expected. Except where none of those are used, which is rarish.
And sentences like this one, where you'd have to consider maybe looking for で and a noun before it in the case where there is no を or は. Because if you notice here, the word 人 (people) really doesn't tell you anything about what's being said. Without parsing context properly, you don't even know if the noun is person or people.
毎年 交通事故で 多くの人が 死にます
(many people die in traffic accidents every year)
But basically, couldn't you implement a priority/fallback type system like this?
BTW I hope your sentences all use kanji, or when you see はし (in one of the sentences linked to) you won't know whether to show a bridge or chopsticks - and showing the wrong one will probably not be good.
I am in the middle of designing a web form for German and French users. Within this form, the users would have to type street names several times.
I want to minimize the annoyance to the user, and offer autocomplete feature based on common French and German street names.
Any idea where I can a royalty-free list?
Would your users have to type the same street name multiple times? Because you could easily prevent this by coding something that prefilled the fields.
Another option could be to use your user database as a resource. Query it for all the available street names entered by your existing users and use that to generate suggestions.
Of course this would only work if you have a considerable number of users.
[EDIT] You could have a look at OpenStreetMap with their Planet.osm dumbs (or have a look here for a dump containing data for just Europe). That is basically the OSM database with all the map information they have, including street names. It's all in an XML format and streets seem to be stored as Ways. There are tools (i.e. Osmosis) to extract the data and put it into a database, or you could write something to plough through the data and filter out the street names for your database.
Start with http://en.wikipedia.org/wiki/Category:Streets_in_Germany and http://en.wikipedia.org/wiki/Category:Streets_in_France. You may want to verify the Wikipedia copyright isn't more protective than would be suitable for your needs.
Edit (merged from my own comment): Of course, to answer the "programmatically" part of your question: figure out how to spider and scrape those Wikipedia category pages. The polite thing to do would be to cache it, rather than hitting it every time you need to get the street list; refreshing once every month or so should be sufficient, since the information is unlikely to change significantly.
You could start by pulling names via Google API (just find e.g. lat/long outer bounds - of Paris and go to the center) - but since Google limits API use, it would probably take very long to do it.
I had once contacted City of Bratislava about the street names list and they sent it to me as XLS. Maybe you could try doing that for your preferred cities.
I like Tom van Enckevort's suggestion, but I would be a little more specific that just looking inside the Planet.osm links, because most of them require the usage of some tool to deal with the supported formats (pbf, osm xml etc)
In fact, take a look at the following link
http://download.gisgraphy.com/openstreetmap/
The files there are all in .txt format and if it's only the street names that you want to use, just extract the second field (name) and you are done.
As an fyi, I didn't have any use for the French files in my project, but mining the German files resulted (after normalization) in a little more than 380K unique entries (~6 MB in size)
#dusoft might be onto something - maybe someone at a government level can help? I don't think that a simple list of street names cannot be copyrighted, nor any royalties be charged. If that is the case, maybe you could even scrape some mapping data from something like a TomTom?
The "Deutsche Post" offers a list with all street names in Germany:
http://www.deutschepost.de/dpag?xmlFile=link1015590_3877
They don't mention the price, but I reckon it's not for free.