Unified IDs of geographical locations - location

I want to use locations' titles in my app, like 'Chicago, Illinois, USA', or 'Surrey, British Columbia, Canada', or one of Springfields.
I am going to add them to the DB one by one during the app lifecycle, no need to add all at once, and think that it would be nice to identify them all with unique IDs. I could just go from 1 to n, as a key.
But for future potential flexibility I could use some criteria to make sure I will get that very Springfield when I decode and enter its ID somewhere, like Google.
May be I can use lat/lon data from public sources, e.g. Wikipedia and turn the pair into a key? Or may be there are already some IDs assigned by authorities or some agency that are kind of a standard?

One possibility is to use a GeoHash of the location. This would give you a unique code for each well positioned location you are using. An added bonus is that it would allow you to determine how close they were to each other too.

Related

HL7 FHIR mark resources as anonymized

I am trying to map an existing domain into HL7 FHIR.
So far it was pretty easy to find FHIR resources that more or less represent the same data and can be used for that purpose. But now I am running into a problem of which I am not sure how to solve it.
The existing domain allows that data can be anonymized depending on the users access level. e.g. a patient's name or address might be removed and marked as anonymized. Other data will be pseudonymised, for example a the birthdate in 1980 will be replaced with 01.01.1980. An Age of 37 will be replaced with a category of 30-40.
So I am unsure how to integrate that into the FHIR domain. I was thinking I could create an extension holding a boolean, indicating if a value was anonymized or not and always replace or remove the original value. This might work, but I will run into big problems when the anonymized value is of a different type than the original value (e.g. Age is replaced by a range of values)
Is that even a valid approach? I thought this might be common problem, but I could not find any examples where people described methods of how to mark data as altered. Unfortunately the documentation at http://build.fhir.org/extensibility-registry.html does not contain anything that would help my case.
You can use security labels for this purpose (Resource.meta.security). Take a look at REDACTED and SUBSETTED in the security label value set: https://www.hl7.org/fhir/valueset-security-labels.html
If you need to convey a data type other than the one allowed by the resource (e.g. wanting to convey a range rather than a birthdate), you'd need to use an extension. (Note that dates are valid even if you only include the year.)

find a single specific place with google places api

I was wondering if it is possible to find a specific place using the google places api.
I know the name of the place, the address or website url, and the coordinates.
I need this to get the ratings this place has.
Is this possible? If not, is it going to be?
I think your best bet would be to do a nearbysearch with the location (lat,long), a small radius, and the name and types parameters to narrow it down. If you are targeting a specific place, then you can just manually find it in the results and use its reference for a Details request in your solution.
If the target place can be dynamic, for example based on user input, then you might want to show the user the list of results and let them choose the correct one. I don't think there's a way to guarantee that you will always get exactly the result you're looking for as, say, the first result in the list. Experiment with different types of requests and parameters and try to get a sense for the behaviour of the responses to find what will work best for your solution.

Build a website: should I use a number or random unique string as ID in URLs?

Hi I am building an Internet website with Java and Spring framework. I believe my question is not technology or framework related.
I need to have links in user interface so that visitors can click and to see records. These links have the format of
http://mysite.com?id=number-id-or-random-unique-string
Not all records are allowed to view. For the ID parameter in the URL, I could use the database-generated number as the ID value and so I do not need to have additional programming. Or I could use unique random string (for example: jcTDjhdDUls) as the ID value (I have to program this part). Numbers allow curious people (with good or bad intentions) to EASILY guess and try other IDs. Unique random strings seems better in this regard.
However, no matter numbers or strings as the value for the ID, I have security check in the backend code to see whether a visitor is allowed to see a record. From this perspective, I am not sure what is the real benefit of having random string as the ID.
I hope to have input from experienced people. What design decision do you choose? Or other better ideas?
Thanks and regards.
You certainly can if you want to, but I would not go through the trouble to randomize the ID. This is at its root, "security through obscurity (STO)." Sometimes STO is useful, but in this case I don't think it is worth complicating and bloating the code and memory footprint. It's surprisingly easy to enumerate the valid IDs whether they're randomized or not, using a tool like Burp Suite. All the security controls that really matter should be implemented in the backend.

dynamically classify categories

I am new at the idea of programming algorithms. I can work with simplistic ideas, but my current project requires that I create something a bit more complicated.
I'm trying to create a categorization system based on keywords and subsets of 'general' categories that filter down into more detailed categories that requires as little work as possible from the user.
I.E.
Sports >> Baseball >> Pitching >> Nolan Ryan
So, if a user decides they want to talk about "Baseball" and they filter the search, I would like to also include 'Sports"
User enters: "baseball"
User is then taken to Sports >> Baseball
Now I understand that this would be impossible without a living - breathing dynamic program that connects those two categories in some way. It would also require 'some' user input initially, and many more inputs throughout the lifetime of the software in order to maintain it and keep it up to date.
But Alas, asking for such an algorithm would be frivolous without detailing very concrete specifics about what I'm trying to do. And i'm not trying to ask for a hand out.
Instead, I am curious if people are aware of similar systems that have already been implemented and if there is documentation out there describing how it has been done. Or even some real life examples of your own projects.
In short, I have a 'plan' but it requires more user input than I really want. I feel getting more info on the subject would be the best course of action before jumping head first into developing this program.
Thanks
IMHO It isn't as hard as you think. What you want is called Tagging and you can do it Automatically just by setting the correlation between tags (i.e. a Tag can have its meaningful information plus its reation with other ones. Then, if user select a Tag well, you related that with others via looking your ADT collection (can be as simple as an array).
Tag:
Sport
Related Tags
Football
Soccer
...
I'm hoping this helps!
It sounds like what you want to do is create a tree/menu structure, and then be able to rapidly retrieve the "breadcrumb" for any given key in the tree.
Here's what I would think:
Create the tree with all the branches. It's okay if you want branches to share keys - as long as you can give the user a "choice" of "Multiple found, please choose which one... ?"
For every key in the tree, generate the breadcrumb. This is time-consuming, and if the tree is very large and updating regularly then it may be something better done offline, in the cloud, or via hadoop, etc.
Store the key and the breadcrumb in a key/value store such as redis, or in memory/cached as desired. You'll want every value to have an array if you want to share keys across categories/branches.
When the user selects a key - the key is looked up in the store, and if the resulting value contains only one match, then you simply construct the breadcrumb to take the user where you want them to go. If it has multiple, you give them a choice.
I would even say, if you need something more organic, say a user can create "new topic" dynamically from anywhere else, then you might want to not use a tree at all after the initial import - instead just update your key/value store in real-time.

Programmatically find common European street names

I am in the middle of designing a web form for German and French users. Within this form, the users would have to type street names several times.
I want to minimize the annoyance to the user, and offer autocomplete feature based on common French and German street names.
Any idea where I can a royalty-free list?
Would your users have to type the same street name multiple times? Because you could easily prevent this by coding something that prefilled the fields.
Another option could be to use your user database as a resource. Query it for all the available street names entered by your existing users and use that to generate suggestions.
Of course this would only work if you have a considerable number of users.
[EDIT] You could have a look at OpenStreetMap with their Planet.osm dumbs (or have a look here for a dump containing data for just Europe). That is basically the OSM database with all the map information they have, including street names. It's all in an XML format and streets seem to be stored as Ways. There are tools (i.e. Osmosis) to extract the data and put it into a database, or you could write something to plough through the data and filter out the street names for your database.
Start with http://en.wikipedia.org/wiki/Category:Streets_in_Germany and http://en.wikipedia.org/wiki/Category:Streets_in_France. You may want to verify the Wikipedia copyright isn't more protective than would be suitable for your needs.
Edit (merged from my own comment): Of course, to answer the "programmatically" part of your question: figure out how to spider and scrape those Wikipedia category pages. The polite thing to do would be to cache it, rather than hitting it every time you need to get the street list; refreshing once every month or so should be sufficient, since the information is unlikely to change significantly.
You could start by pulling names via Google API (just find e.g. lat/long outer bounds - of Paris and go to the center) - but since Google limits API use, it would probably take very long to do it.
I had once contacted City of Bratislava about the street names list and they sent it to me as XLS. Maybe you could try doing that for your preferred cities.
I like Tom van Enckevort's suggestion, but I would be a little more specific that just looking inside the Planet.osm links, because most of them require the usage of some tool to deal with the supported formats (pbf, osm xml etc)
In fact, take a look at the following link
http://download.gisgraphy.com/openstreetmap/
The files there are all in .txt format and if it's only the street names that you want to use, just extract the second field (name) and you are done.
As an fyi, I didn't have any use for the French files in my project, but mining the German files resulted (after normalization) in a little more than 380K unique entries (~6 MB in size)
#dusoft might be onto something - maybe someone at a government level can help? I don't think that a simple list of street names cannot be copyrighted, nor any royalties be charged. If that is the case, maybe you could even scrape some mapping data from something like a TomTom?
The "Deutsche Post" offers a list with all street names in Germany:
http://www.deutschepost.de/dpag?xmlFile=link1015590_3877
They don't mention the price, but I reckon it's not for free.

Resources