What are alternatives for GUIDs for key generation when central server is not possible? - algorithm

I am looking for alternative to GUIDs for key generation in a distributed app. For example supposed I have Bob, James, and Jack all running a bug tracking application on their desktop where they can do thing like create bug tickets ala JIRA, or Bugzilla ... etc. When a ticket is created it is assigned a number such as T-1, T-2, T-3, T-4 ... etc. Tickets need to have a stable ID and should be creatable without having to consult a central server.
I understand that this is what GUID's are really good for but it in my case displaying a GUID in a UI is ugly people can't just copy and paste it and discuss it on a phone call, I really want integers or some sort of short string that is easy to talk about read in one glance .. etc.
Is there a way to use the bitcoin block chain as some sort of counter?

You may evaluate the approach taken by git. They use sha1 hash of commit information. And then abbreviate IDs are allowed which are much shorter and easier to read\transfer manually.
Having the number of bugs in your tracker is not going to reach millions that should be sufficient. Once it is you'll just need a longer abbreviation.
There seem to be plenty info around on how git calculates hash IDs and abbreviates them.

If I recall correctly how UUIDv1 works - it's "just" putting together the mac address and a very exact timestamp + maybe some additional integer. As your mac address should be unique (unless you've fiddled with it) and there are only so many UUIDs one computer can generate within a nano second, the resulting ID will be unique.
This is a very general and uninformed way to create IDs. If you'd implement a version of it yourself for your specific use case you could get much smaller IDs.
Assuming you can identify each node with a bug tracking system with a simple and unique string - for instance "Bob", "James", "Jack" - and you can create unique continuous integers within each node, you could combine those two and have IDs like "Bob-1", "James-12", ...
As you can see, actually there has to be again one central point, which will assign the unique strings, however depending on the number of nodes and how long they stay within the system, this could be as well done just by a human being.
The additional disadvantage (or advantage, depends how you look at it) of this approach (as well as of UUIDv1) would be, that you'd know where the ticket has been created as well as order of the tickets within one system.

Related

Representing PCP/GP History in FHIR

Background:
I have been digging into the FHIR DSTU2 specification to try and determine what is the most appropriate resource(s) to represent a particular patient's historical list of GPs/PCPs. I am struggling to find an ideal resource to house this information.
The primary criteria I have been using is to identify the proper resource is that it must provide values to associate a patient to a practitioner for a period of time.
Question:
What is the proper resource to represent historical pcp/gp information that can be tied back to a patient resource?
What I have explored:
Here is a list of my possible picks thus far. I paired the resource types with my thought process on why I'm not confident about using it:
Episode of Care - This seems to have the most potential. It has the associations between a patient and a set of doctors for a given time period. However, when I read its description and use-case scenarios, it seems like I would be bastardizing its usage to fit my needs, since it embodies a period of time where a group of related health care activities were performed.
Group - Very generic structure that could fit based on its definition. However, I want to rule out other options before taking this approach.
Care Plan - Similar to Episode of Care rational. It seems like a bastardization to just use this to house PCP/GP history information. The scope of this is much bigger and patient/condition-centric.
I understand that there may not be a clear answer and thus, the question might run the risk of becoming subjective and I apologize in advance if this is the case. Just wondering if anyone can provide concrete evidence of where this information should be stored.
Thanks!
That's not a use-case we've really encountered before. The best possibility is to use the new CareTeam resource (we're splitting out CareTeam from EpisodeOfCare and CarePlan) - take a look at the continuous integration build for a draft.
If you need to use DSTU 2, you could just look at Patient.careProvider and rely on "history" to see changes over time. Or use Basic to look like the new CareTeam resource.

How do you RESTfully get a complicated subset of records?

I have a question about getting 'random' chunks of available content from a RESTful service, without duplicating what the client has already cached. How can I do this in a RESTful way?
I'm serving up a very large number of items (little articles with text and urls). Let's pretend it's:
/api/article/
My (software) clients want to get random chunks of what's available. There's too many to load them all onto the client. They do not have a natural order, so it's not a situation where they can just ask for the latest. Instead, there are around 6-10 attributes that the client may give to 'hint' what type of articles they'd like to see (e.g. popular, recent, trending...).
Over time the clients get more and more content, but at the server I have no idea what they have already, and because they're sent randomly, I can't just pass in the 'most recent' one they have.
I could conceivably send up the GUIDS of what's stored locally. The clients only store 50-100 locally. That's small enough to stuff into a POST variable, but not into the GET query string.
What's a clean way to design this?
Key points:
Data has no logical order
Clients must cache the content locally
Each item has a GUID
Want to avoid pulling down duplicates
You'll never be able to make this work satisfactorily if the data is truly kept in a random order (bear in mind the Dilbert RNG Effect); you need to fix the order for a particular client so that they can page through it properly. That's easy to do though; just make that particular ordering be a resource itself; at that point, you've got a natural (if possibly synthetic) ordering and can use normal paging techniques.
The main thing to watch out for is that you'll be creating a resource in response to a GET when you do the initial query: you probably should use a resource name that is a hash of the query parameters (including the client's identity if that matters) so that if someone does the same query twice in a row, they'll get the same resource (so preserving proper idempotency). You can always delete the resource after some timeout rather than requiring manual disposal…

Is there a way to generate a short random id, avoiding collisions, without hitting persistent storage?

If you've used GoToMeeting, that's the type of ID I want. I'd like it to be random so that it obfuscates the number of items being tracked and short, so that it's easy to reference manually; UUIDs are way too long. I'd like to avoid hitting persistent storage merely for performance reasons, but I can't think of any other way to avoid collisions. Is 9 digits enough to do something time-based?
In response to questions:
I'm building a ticket-tracking application. This ID would be used as the primary key for a table, but it would be needed before the record is persisted which would result in an extra database call that I'd like to avoid if possible.
I'd like to keep it at a 9 digit int. I consider a UUID to be too long because people are going to have to reference the ID manually (via email, phone, etc.).
I'm thinking of using the time of generation somehow. Since time is always ticking on forward, it would continually limit the set of potential IDs, excluding those that had already been generated.
One way is to take a unique number or string (like a random UUID) then calculate a fixed-length digest (such as MD5 or SHA-1) and/or encode it in a higher base (like base64) to shorten it further.
Git does something similar where it generates a sha numbers for commits (and other events) and then the user can references the numbers manually in order to lookup those commits. The trick they used is that the user doesn't have to enter the whole string in order to find the correct event, they simply have to enter a long enough string that it doesn't collide with any other commit currently in the repository. In general this only require 5 or so hex digits for relatively large repositories.

Algorithm for unique CD-KEY generation with validation

I am trying to create a unique CD-KEY to put in our product's box, just like a normal CD-KEY found in standard software boxes that users use to register the product.
However we are not selling software, we are selling DNA collection kit for criminal and medical purposes. Users will receive a saliva collection kit by mail with the CD-KEY on it and they will use that CD-KEY to create an account on our website and get their results. The results from the test will be linked to the CD-KEY. This is the only way that we will have to link the results to the patients. It is therefore important that it does not fail :)
One of the requirements would be that the list of CD-KEYs must be sufficiently "spread" apart so that there is no possibility of someone entering an incorrect CD-KEY and still having it approved for someone else kit, thereby mixing up two kits. That could cost us thousands of dollars in liability.
For example, it cannot be a incremental sequence of numbers such as
00001
00002
00003
...
The reason is that if someone receives the kit 00002, but registers it as 000003 by accident, then his results will be matched to someone else. So it must be like credit card numbers... Unless a valid sequence is entered, your chances of randomly hitting a valid number is 1 in a million...
Also, we are selling over 50,000 kits annually to various providers (who will generate their own CD-KEYS using our algorithm) so we cannot maintain a list of all previously issued CD-KEYS to check for duplicate. The algorithm must generate unique CD-KEYs.
We also require the ability to verify that the CD-KEY is valid using a quick check algorithm, so that we can inform the user if the code he enters is invalid. This leaves out many hashing or MD5 algorithms I believe. And it cannot be a 128 bit because, who would take that time to type it out on the computer screen?
So far this is what I was thinking the final CD-KEY structure would look like
(4 char product code) - (4 char reseller code) - (12 char unique, verifiable CD-KEY)
Ex. 384A - GTLD - {4565 - FR54 - EDF3}
To insure the uniqueness of the KEYS, I could include the current date (20090521) as part of the source. We wont generate unique keys more than once a week, so this value changes often enough for the purpose of unique initial value.
What possible algorithm can I use to generate the unique keys?
Create the strings <providername>000001, <providername>000002, etc. or whatever and encrypt them with a public key, and that's your "CD-KEY" that the user enters. Decrypt the CD-KEY with the private key and validate that when decrypted you get a valid string with a valid provider name.
Credit Card numbers use the Luhn algorithm you might want to look at something similar to that.
I use SeriousBit Ellipter link for software protection but I don't see any reason you could generate a group of unique keys each week and us the library to verify the key validity when entered into your web site. You can also encode optional services into the key allow you to control how the sample is processed from the key (that's if you have different service levels).
As it uses an encrypted method of key generation in the first place and it's relatively cheap, it's certainly worth a look I would say.
I finally settled for a cd-key of this form
<TIMESTAMP>-<incremented number>-<8 char MD5 hash>-<checksumdigit>
I used the mod 11 ISBN checksum digit algorithm.
Generate GUID and catenate a random number to it. GUID is guaranteed to be unique and random number will make it improbable to hit a code accidentally. Just don't modify the GUID in any way or you might compromise the uniqueness.
http://msdn.microsoft.com/en-us/library/aa475087.aspx

Linking two Office documents

Problem:
I have two spreadsheets that each serve different purposes but contain one particular piece of data that needs to be the same in both spreadsheets. This piece of data (one of the columns) gets updated in spreadsheet A but needs to also be updated in spreadsheet B.
Goal:
A solution that would somehow link these two spreadsheets together (keep in mind that they exist on two separate LAN shares on the network) so that when A is updated, B is automatically updated for the corresponding record.
*Note that I understand fully that a database would probably be a better plan for tasks such as these but unfortunately I have no say in that matter.
**Note also that this needs to work for Office 2003 and Office 2007
So you mean that AD743 on spreadsheet B must be equal to AD743 on spreadsheet A? Try this:
Open both spreadsheets on the same
machine.
Go to AD743 on spreadsheet B.
Type =.
Go to spreadsheed A and click on
AD743.
Press enter.
You'll notice that the formula is something like '[path-to-file+file-name].worksheet-name!AD743'.
The value on spreadsheet B will be updated when you open it. In fact, it will ask you if you want to update. Of course, your connection must be up and running for it to update. Also, you can't change the name or the path of spreadsheet A.
I can't say if this is overkill without knowing the details of your usage case, but consider creating a spreadsheet C to hold all data held in common between the two. Links can become dizzyingly complex as spreadsheets age, and having a shared data source might help clear up the confusion.
Perhaps even more "enterprise-y" is the concept of just pasting in all data that otherwise would be shared. That is the official best practice in my company, because external links have caused so much trouble with maintainability. It may seem cumbersome at first, but I've found it may just be the best way to promote maintainability in addition to ease of use, assuming you don't mind the manual intervention.

Resources