Standard on internationalized converting numbers to words - internationalization

I am looking for a standard that describes how a program should convert a number (like 123 456 789) and turn it into words (like one hundred twenty-three million four hundred fifty-six thousand seven hundred and eighty-nine) depending on the locale (such as en-US or es-ES)

Related

how to detect if the barcode is for Weight Scale Item

I wonder how we can detect if a barcode that is read by barcode reader is related to an items that is sold by weight or a regular item ( in Ean-13 or other formats) ? is there any part of code that shows that it is a weighted Item?
Barcodes are just strings of characters (mostly numbers and letters) and most barcode readers/scanners do not indicate the type of barcode. They just send the value. But some values, such as an EAN13, have embedded check digits that can be used to auto-discriminate. For example, if you see a 13-digit number and calculate the mod10 check digit over the first 12 digits and it matches the 13th digit, you can be fairly certain you have an EAN13.
Alternatively, if you have control over the creation of the barcodes, you can use GS1 application identifiers to prefix each value. (GS1 barcodes can actually contain multiple values in a single symbol.) See https://www.gs1.org/standards/barcodes/application-identifiers?lang=en for more information on the standard ids. Application ids are routinely used in logistics but are fairly rare in retail channels.

Algorithm help - unique encoding of phone numbers

I have a dictionary of words split into two lists of different lengths, adjectives and nouns. I want to be able to reversibly encode any phone number into a format where I have one or more adjectives followed by a noun.
Examples might be
"+447911123456" => "agile sassy stingray"
"07911123456" => "funky old golf club"
It should have properties like the avalanche effect, and make relatively even use of all the words in the dictionary.
I've not been able to come up with an algorithm that satisfies all the requirements. Does anyone know how to do this, or where to learn more about doing this sort of encoding?
If it helps, I've made the dictionary available on github. Any help is appreciated!
reversibly encode any phone number
How about something like this?
Given phone numbers are typically 10 - 14 digits including the international code, we can treat it as a 64 bit integer (up to 19 digits) if we ignore the international dialing code "+".
Split the segments into 3 roughly even zones = 21 bits each.
XOR each of the zones with a fixed repeating pattern - i.e. 01 for seg 1, 10 for seg 2, 11 for seg 3.
Perform a simple encryption that is 21 bits wide... a simple custom one can be developed easily.
After these transformations, you end up with 3 numbers. Use the numbers as keys to your dictionary. The 3rd block will reference a nouns dictionary.
The purpose of steps 3 and 4 are to obfuscate what you are doing.
For instance, if we had 111 111 111 as our number, without 3 and 4, we might have "happy happy dog". With 3 and 4, even though segments 1 and 2 are identical, it will result in different words such as "happy sloppy dog". Instead, we might get a totally different number result in repeated words... i.e. 111 843 111 => "happy happy cat".
Because it is only for obfuscating purposes, these do not need to be terribly "secure"...

Approximate text matching

I need to compare two pieces of text, say 200 words long. As these were obtained by OCR, discrepancies can arise at two levels:
words can be misspelled,
whole words can be missing or merged, or extra parasitic chunks inserted (in extreme cases, groups of words could be swapped).
The output of the recognition would be a similarity score. I don't think that matching the whole text as a long string can be efficient enough.
Are you aware of methods that specifically address this problem (two-level Levenshtein ??). Are there libraries available ?
(I am not looking for an OCR package.)

Substitution cipher decryption using letter frequency analysis for text without blanks and special characters

I need to find the plain text for given cipher text. I also have statistics (in an Excel document) for the letters in the given language e.g. I have the frequencies of the letters and also of the digraphs.
I tried this approach so far: I evaluated the frequency of each letter in the cipher text I received. Then I sorted the letters in descending order by their frequencies and mapped each letter with the corresponding letter from the Excel document. The problem with this approach is that it gives me some text that has no meaning at all. That is because my text is pretty small (only 1500 characters long).
I considered doing some limited permutations, but I have no idea what could I use to evaluate how good some permutation is. I think a good evaluation function would solve my problem.
Be aware that all special characters and white spaces are removed from the text. Also there are no numbers.
Thank you in advance.
for fully automated decryption
you need to add some dictionary of commonly used words
and compare against it
the solution that finds most words from it is probably the right one
with letter probabilities comes few problems
they are derived for common texts
so if your encrypted text is for example technical paper and not beletry ...
or it includes equations or tables
then it can screw your overall letter occurence
so do it like this:
compute the probabilities of letters
divide letters into groups by probabilities
so commonly used (high probability) letters are grouped together (group A)
so less common used (mid probability) letters are grouped together (group B)
and the rest (low probability) also group together (group C)
substitute group A
first see if group A probabilities match your language
if not then the text is in different language,style/form,or it is not a plain text at all
in such case you can not proceed safely
if they match then substitute letters from group A
they should be OK on the first run
try substitute group B
so you know all the letters from group B (encrypted/decrypted)
so generate all permutations of substitutions
for each one try to decipher text
and search for words after decryption (ignoring not yet substituted letters)
compute the word count percentage
and remember the best one (or few top ones)
try substitute group C
do it the same as bullet 4
corrections
it is probable that in the final result will be few letters mixed
so there are ways to handle also this
you can have table of letters that are mixable between each other
so you can try permutate them and test against your dictionary
or find words in your text with 1-2 wrong letters per word (for bigger words like 5 or more letters)
and permutate/correct substitution of the wrong letters if enough such words found
[notes]
you can obtain dictionaries from translators
also saw some plain text translator tables online
the groups should have distinct probability difference to each other
number of groups can change with language
I had best results for this task with semi automated approach
steps 5,6 can use user input

How can I generate an order number with similar results as Amazon when they do it?

Note: I have already read through older questions like What is the best format for a customer number, order number? , however my question is a little more specific.
Generating pseudo-random numbers encounter the "birthday problem" before long. For example, if I am using a 27-bit field for my order number, after 15000 entries, the chances of collision increase to 50%.
I am wondering whether large ecommerce businesses like Amazon generates its order number in any other way - for example :
pre-generate the entire set and pick from them randomly (a few hundred GB of database)
Use lexicographical "next_permutation" starting from a particular seed number
MD5 or SHA-1 hash of the date, user-id, etc parameters, truncated to 14 digits
etc
All I want is a non-repeating integer (doesnt need to be very random except to obfuscate total number of orders) of a certain width. Any ideas on how this can be achieved ?
Suggest starting with the date in reverse format then starting at 1, followed by a check (or random) digit. If you are likely to never exceed 100 orders per day you need add two digits plus a check/random digit.
The year need include only the final two digits, possibly only the final digit, depending on how long you keep records of orders: 7 years or so is usually enough, meaning the records from 2009 (beginning with 9) could be deleted during 2018 in preparation to use the order numbers again in 2019. You could use mmdd for the next 4 digits, or simply number the days through the year and use just 3 digits - it depends how human-friendly you want the number to be. It's also possible just to omit the day of the month and restart the sequential numbers at the start of each month, rather than every day.
Today is 2 Nov 2017, let's suppose this is order no 16 today, your order no would be 71102168 (where the 8 is a check digit or random digit). If you're likely to have up to, but not exceeding a thousand, you'll need an extra digit, thus: 711020168. To avoid limiting yourself the number of digits, you might prefer to use a hyphen: 71102-168 … you could include another hyphen before the check/random digit if you wish: 71102-16-8.
If you have several areas dealing with orders, you may wish to include a depot number, perhaps at the beginning or after the date, allowing you to use the sequence numbers at each depot - eg depot 5 might be: 5-71102-168, 71102-5-168 or 711025168. Again, if you don't use hyphens, you'll need to assess whether you need up to ten, a hundred or a thousand (etc) possible depot numbers. I hope this helps!
This problem has been solved, why
not use the UUID. See RFC 4122. These are close enough to globally unique you can easily combine many systems and never ever have a duplicate just because the number space is so massive.

Resources