Possible algorithms to solve this problem - algorithm

I have a list of extracted names of one hotel , and these are the names taken by n websites about the same hotel . The list contains m names about 1 hotel . I have to select one name from the list based on correctness , similarity , less mistakes . How can I achieve this ?
Any direction is helpful .
Example: List of names for hotelId 1 {"ABC Hotel","CDE hotel" , "Hotel ABC" ,"AB Hotel" , "Hotel BCA" ...}
With the initital research it looks like a graph related problem

This is not gonna work. You will not get similarities based on the names. Especially if almost every hotel has the keyword hotel in its name.
You need more information to match similarities.
Address, Geo location, attributes about the hotel could also help (wifi, parking, close to beach, pool), if this is a chain and so on. The more information you have the better the matching result you can get.

You can try to leverage some of Bing or Google APIs --> i.e. do a search for the hotel name with some details from address in Search APIs or in some Map APIS (e.g. search for ["ABC Hotel 5AV Philliadelphia","CDE hotel 5AV Philliadelphia" , "Hotel ABC 5AV Philliadelphia",..] then compare your data with the API response.

Related

Number of restaurants with specific cuisine in each country

I am trying to figure out how many restaurants, in each country, there are of a specific cuisine (seafood). I have looked at Google Places Api and TripAdvisor Api, but cannot find these numbers. I donĀ“t need the list of restaurants, only number of restaurants. I found OpenStreetMap which looked very promising. I downloaded data for Norway, but the numbers are not correct (osmium tags-filter norway-latest.osm.pbf cuisine=seafood) = 62, which is way to low.
Any suggestion for how and where I can find what I am looking for?
Extrapolate.
You won't get an accurate answer, how do you even define what a seafood restaurant is?
Find out roughly how many restaurants there are in the area you are interested in and then decide what % of them might be seafood restaurants.
You can use this approach to extract the data from OpenStreetMap:
https://gis.stackexchange.com/questions/363474/aggregate-number-of-features-by-country-in-overpass
You can run the query on http://overpass-turbo.eu/ (go to settings and chose the kumi-systems server).
The query could look like this:
// Define fields for csv output
[out:csv(name, total)][timeout:2500];
//All countries
area["admin_level"=2];
// Count in each area
foreach->.regio(
// Collect all Nodes with highway=milestone in the current area
( node(area.regio)[cuisine=seafood];
way(area.regio)[cuisine=seafood];
rel(area.regio)[cuisine=seafood];);
// assemble the output
make count name = regio.set(t["name:en"]),
total = count(nodes) + count(ways) + count(relations);
out;
);
This query can take a long time (at the time of writing, mine did not yet finish)
You can also run the query via curl in on some server and let the results mailed to you via curl ....... | mail -s "Overpass Result" yourmail#example.com. You get the curl command in the browser network tab by "copy curl"
I also considered Taginfo (https://taginfo.openstreetmap.org/tags/cuisine=seafood) but it cannot filter by tag.

geotext library is not picking up the correct name of cities in python

Hi I am new bee in python and we are trying to find the country ,cities name from geotext library of python but it is not picking every name correctly. could anyone please suggest what should be wrong.
While reading the data from email it is picking up "Mobile" as city which is in SIgnature of email
from geotext import GeoText
places = GeoText("Hi , We need to book a flight from Mumbai to London on 13 Aug throuigh shivaji terminal.
Regards,
xyz
Mobile : 5368536
")
Output : ['Mumbai' ,'Moble']
please help
There are three cities named 'Mobile' in various states the US. You cannot avoid picking it up (unless you decide to block that specific word as being a city - but there could easily be other cities with names that match common words).

Google Place API street type list

I am using google place to retrieve address, and somehow we want the street(route in google terminology) to be separated into street name and street type. We also want the street type to match an existing column in database.
But things get difficult when google place sometimes use XXXX Street and some times XXXX st
For instance, this is a typical google address
{
administrative_area_level_1: ['short_name', 'VIC'],
locality: ['long_name', 'Carlton'],
postal_code: ['long_name', '3053'],
route: ['long_name', 'Canada Ln'],
street_number: ['short_name', '12'],
subpremise: ['short_name', '13']
}
But it always shows Canada Lane in the suggestion box.
And sometimes even worse when the abbreviation does not match my local data model. For instance we use la instead of ln for short of lane.
It will be appreciated if anyone could tell me where to find a list of street type (and abbreviation) used by google API. Or Is there a way to disable the abbreviation option?
Sounds like you're after "street suffixes". These are complicated.
Not only they change across countries and languages, even within the same country and language they can be used in different ways; abbreviations can have multiple meanings: "St" can be "Street" of "Saint"; abbreviations are used or not depending on subtle rules that also change from place to place.
Same goes for cardinal points (North, South, East, West) that are parts of road / street names: "North St" or "N 11st Street"? It's complicated.
If you already have a good amount of addresses, and you only care about addresses in English, you could take the last word from each street name as the suffix. When matching to your own data, allow for abbreviations when matching, rather than trying to expand them.
For instance, don't try to expand "Canada La" into "Canada Lane" so that it matches "Lane". Instead, expand "Lane" into ["Lane", "La", "Ln"] and match suffixes to all values.
Then you'd need a strategy for "collisions", abbreviations that can mean 2+ suffixes. These seem to be rare, I can't remember any ("St" isn't, because "Saint" isn't a suffix) and USPS' http://pe.usps.gov/text/pub28/28apc_002.htm doesn't seem to have any.

I want to get nodes of parseTree

This is part of my code:
String sentence = "The system Does Not Require users to identify themselves to search for books according to certain criteria and to check the availability of a particular book. However to check out books, to check their respective book loan status, and to place holds on books that are already on loan, users must first identify themselves to the system.";
Parse topParses[] =ParserTool.parseLine(sentence, parser, /*numParses=*/ 3);
for (Parse parseTree: topParses){
parseTree.show();
How can I get verbs in the sentence? Please!
I mean, how can I get nodes of tree?
If only you need to get the verbs from the sentence , then POSTagger in opennlp is sufficient.All you have to do is to use a Opennlp tokenizer to get tokens in a array and feed it to the POSTaggerME.It will give you the corresponding POS tags..Then you can filter by tags for the Verb like VB, VBZ etc.
If you are looking for verb phrases then use the Chunker, if you need just verbs then use the POS tagger.
Check out this answer
How to extract the noun phrases using Open nlp's chunking parser

yahoo weather country and city list

when i need get weather for some city i put this link:
http://weather.yahooapis.com/forecastrss?w=713169
But how get all country city code(id)?
Maybe is some xml, rss?
Thanks
see this: forecastrss?w=713169
This w parameter is your WOEID. WOEID is invented by yahoo I think and means WehereOnEarthID. So do a google search for WOEID list and you`ll find the answer.

Resources