I have a dictionary of words split into two lists of different lengths, adjectives and nouns. I want to be able to reversibly encode any phone number into a format where I have one or more adjectives followed by a noun.
Examples might be
"+447911123456" => "agile sassy stingray"
"07911123456" => "funky old golf club"
It should have properties like the avalanche effect, and make relatively even use of all the words in the dictionary.
I've not been able to come up with an algorithm that satisfies all the requirements. Does anyone know how to do this, or where to learn more about doing this sort of encoding?
If it helps, I've made the dictionary available on github. Any help is appreciated!
reversibly encode any phone number
How about something like this?
Given phone numbers are typically 10 - 14 digits including the international code, we can treat it as a 64 bit integer (up to 19 digits) if we ignore the international dialing code "+".
Split the segments into 3 roughly even zones = 21 bits each.
XOR each of the zones with a fixed repeating pattern - i.e. 01 for seg 1, 10 for seg 2, 11 for seg 3.
Perform a simple encryption that is 21 bits wide... a simple custom one can be developed easily.
After these transformations, you end up with 3 numbers. Use the numbers as keys to your dictionary. The 3rd block will reference a nouns dictionary.
The purpose of steps 3 and 4 are to obfuscate what you are doing.
For instance, if we had 111 111 111 as our number, without 3 and 4, we might have "happy happy dog". With 3 and 4, even though segments 1 and 2 are identical, it will result in different words such as "happy sloppy dog". Instead, we might get a totally different number result in repeated words... i.e. 111 843 111 => "happy happy cat".
Because it is only for obfuscating purposes, these do not need to be terribly "secure"...
Related
i think i got a simple problem, i just can not wrap my head around it.
Lets say i got an Array, with integers, representing IDs. (1, 2, 3, 4) in this order. I am looking for a way, to go through all possible sequences and perform a task afterwards. All IDs have an timelength behind them. If i am correct, there should be 24 possibilities. So i would like to start with checking 1234, and then see the next possibility. How can i get the loop/loops to go through all 24 possibilities? So i need to find and work immediately with the sequence, as i want to compere those. (Later will be with 6 numbers, so 720 possibilities, but procedure should be the same, so solving for 24 would be enough. Heck even for 3 with 6 possibilities would help a lot.)
1234
1243
1423
1432
and so on, no duplicates.
For an array with 6 numbers there should be 720? I tried loops within loops but i am stuck. My problem is, i know exactly what to do, on paper. But i have no clue, how to "tell the computer" to follow my logic.
So I know there are websites out there that do this for you, but not to the extent that I need. I want to be able to Create a 13 Digit Alpha Numeric code several times over, if possible have it spit out 1,000 codes at a time. However my problem is I only want there to be 4 numbers max(so 0-4 out of the 13 will be numbers and the rest CAPITAL letters), here is an example: CHC-RCJV-6KK-ZUA . The Hyphens are not a neccesity. I am new to coding for the most part, I'm not sure if it's possible to do this on windows If so I would prefer it, however I can use linux if needed. Thanks for any help!
You want up to 4 random digits and the rest capital letters. That gives you a five stage process:
Pick how many digits, from the range [0..4].
Pick that many random single digits and store them in a list.
Pick up to 13 random capital letters and store them in the same list.
Shuffle the contents of your list.
Insert the hyphens and print/display/return/whatever.
Try coding that for yourself. If you have problems making it work then show us your code and we will help you.
I am trying to generate a Settlers of Catan game board and am stuck trying to create an efficient implementation of hex numbers.
The goal is to randomly generate a set of numbers from 2-12 (with only one instance of 2 and 12, and two instances of all numbers in between), ensuring that the values 6 and 8 they are not hexagonally (?) adjacent to one another. 6 & 8 are special because they are the numbers you are most likely to roll so the game does not want these next to one another as players get disproportionately higher resources of that kind. A 7 means you have to discard resources.
The expected result: http://imgur.com/Ng7Siy8
Right now I have a working brute force implementation that is very slow and I am hoping to optimize it, but I am not sure how. The implementation is in VBA, which has constrained the data structures I can use.
In pseudo code I am doing something like this:
For Each of the 19 hexes
Loop Until we have a valid number
Generate a random number between 1 and 12
Check
Have we already placed too many of that number?
Is the number equal to 6 or 8?
Is the number being placed on a hex next to another hex with 6 or 8 placed on it?
If valid
Place
If invalid
Regenerate random number
It's very manual and subject to the random generator function, which means it can be anywhere from being really short to being really really long (compounded over 19 hexes).
Note: How my numbers are being placed seems important. I start at the outside of the gameboard (see here http://imgur.com/Ng7Siy8) on the gray hex with number 6, and then move counter clockwise around the board inward. This means that my next hex is 2 light green, 4 light orange...continuing around to 9 dark green and then coming inwards to 4 light orange.
This pattern limits the number of comparisons I need to make.
There are several optimizations you can do - first of all you know exactly how many numbers are present prom each tile - you have 2,3,3,4,4,5,5,6,6,8,8,9,9,10,10,11,11,12. So start off with this set of numbers - you will eliminate the check if the number has been generated too many times. now you can do a random shuffle of this set of numbers and check if it is "valid". This will still result in too many negative checks I think but it should perform better than your current approach.
Place the 8 first, calculate which of the remaining tiles you'd be happy to place the 6 on (i.e. non-adjacent), then choose on at random for the 6. Then place the rest.
I have to write oracle procedure registering 16 digit security numbers in registered_security_numbers.
I know that first 6 digits of security number are either 1234 11 or 1234 12 , rest 10 digits are generated randomly.
I have 2 possible solutions :
Write second procedure,which generates all possible security numbers and inserts them in possible_security_numbers table, setting property free=1 .
Then when i get a request to register a new security number, I query possible_security_numbers table for a random security number,which is free and insert it in registered_security_numbers.
Every time I get a request to register security number, I generate random number from 1234 1100 0000 0000 - 1234 1299 9999 9999 range until I get security number, which does not exist in registered_security_numbers table and insert it in registered_security_numbers.
(1) approach i don't like because possible_security_numbers table will contain several billion entries and I am not sure how good it is or how fast select/update can be run.
(2) approach I don't like because if I have many records in registered_security_numbers table, generating random number from a range might be repeated many times.
I'd like to know if anyone has other solution or can comment on my solutions, which seem bad to me …
How many numbers are you actually going to generate?
Imagine that, at most, you're going to generate 1 million (10^6) numbers. If so, the odds that you're going to need to generate a second random number is roughly 5 in 10^-5 (0.00005 or 0.005%). If that's the case, it makes little sense to worry about the expense of occasionally generating a second number or the near impossibility of generating a third. The second approach will be much more efficient.
On the other hand, imagine that you intend to generate 1 billion numbers over time. If that's the case, then by the end, the odds that you're going to need to generate a second number is 5% and you'll need to generate 3 or 4 numbers reasonably often. Here, the trade-offs are much harder to figure out. Depending on the business, the performance impact of catching the unique constraint violation exception and generating multiple numbers on some calls may cause a service to violate the SLA often enough to matter while enumerating the valid numbers may be more efficient on average.
On the third hand, imagine that you intend to generate all 20 billion numbers over time. If that's the case, by the end, you'd expect to have to generate 10 billion random numbers before you found the one remaining valid number. If that's the case, the clear advantage will be with the first option of enumerating all possible numbers and tracking which ones have been used.
How reliable is it to use a 10-char hash to identify email addresses?
MailChimp has 10-character alphanumeric IDs for email addresses.
10 chars 4 bit each gives 40 bits, a bit over one trillion. Maybe for an enterprise sized like MailChimp this gives a reasonable headroom for a unique index space, and they have a single table with all possible emails, indexed with a 40-bit number.
I'd love to use same style of hashes or coded IDs to include in links. To decide whether to go for indexes or hashes, need to estimate a probability of two valid email addresses leading to the same 10-char hash.
Any hints to evaluating that for a custom hash function, other than raw testing?
You don't explicitly say what you mean by "reliable", but I presume you're trying to avoid collisions. As wildplasser says, for random identifiers it's all about the birthday paradox, and the chance of a collision in an identifier space with 2^n IDs reaches 50% when 2^(n/2) IDs are in use.
The Wikipedia page on Birthday Attacks has a great table illustrating probabilities for collisions under various parameters; for instance with 64 bits and a desired maximum collision probability of 1 in 1 million, you can have about 6 million identifiers.
Bear in mind that there are a lot more efficient ways to represent data in characters than hex; base64, for instance, gives you 3 bytes per 4 characters, meaning 10 characters gives you 60 bits, instead of 40 with hex.