I've just been reading this mind-blowing and hilarious post about some common falsehoods regarding time. Number forty is:
Every integer is a theoretical possible year
This implies that every integer is not a theoretical possible year. What is the negative case here? What integer is not a theoretically possible year?
Depending on the context, 0 is not a valid year number. In the Gregorian calendar we're currently using (and in its predecessor, the Julian calendar), the year 1 (CE/AD) was immediately preceded by the year -1 (1 BCE/BC). (For dates before the Gregorian calendar was introduced, we can use either the Julian calendar or the proleptic Gregorian calendar).
In a programming context, this may or may not be directly relevant. Different languages, libraries, and frameworks represent years in different ways. ISO 8601, for example, supports years from 0000 to 9999, where 0000 is 1 BCE; wider ranges can be supported by mutual agreement. Some implementations of the C standard library can only represent times from about 1901 to 2038; others, using 64-bit time_t can represent a much wider range, and typically treat -1, 0, and 1 as consecutive years.
Ultimately you'll need to check the documentation for whatever language/library/framework you're using.
Related
In psuedo-random number generators like WELL512a, WELL1024, and WELL44497b, I understand what WELL (well equidistributed long-period linear) stands for, but I can't find any information on the suffix.
I'm writing a paper over rng's and I'm not sure if this is relevant
This is, I believe, log2(RNG period). Thus, WELL512a will have period of 2512, WELL1024 will have period 21024 etc
Reference: http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wsc05rng.pdf, Table 1
This is an old question, and I'm sure that OP has moved on, but others may be interested in the answer. #SeverinPappadeux's answer is pretty much correct. The number n in the suffix is the roughly number of bits in the internal state. The period is 2n - 1. The letters after the numbers indicate different variants of the PRNG with the corresponding period. The different letters don't have any meaning other than indicating different versions.
The Wikipedia page is very brief:
https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
This is the official paper on the WELL generators:
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf
The table on page 9 lists parameters for the various WELL generators. You have to study the paper to understand the parameters, but the upper Δ1 in the right-hand column is worth noticing. Zero is the best value for Δ1--it's the number of dimensions in which the random numbers are not equidistributed. So it's worth noticing, for example, that Δ1 is not zero for WELL19937a or WELL19937b, but it is zero for WELL19937c. Thus if you want a WELL generator and like the idea of a generator with period 219937 - 1, and you don't mind 624 words of state (624 * 32 = 19968), it's probably slightly better to use WELL19937c rather than the other two. (This is probably one reason why WELL19937c is currently the default generator for Apache Commons Math lib, release 3.6.1, btw.)
I am trying to reverse engineer an algorithm used to generate a check digit.
Numbers are 8 digits long and the last digit is the check digit. I have thousands of valid numbers to test it on.
I have tried a standard Luhn, Verhoeff and modulo-10 algorithms (brute force checking of all possible weights), but could not find an answer!
Is it possible to calculate this? Any ideas?
Here is some examples of valid numbers:
1002784-5
1000514-7
1001602-8
1001255-2
1001707-1
1003355-5
1005579-1
1004535-0
1004273-1
1001695-9
1004565-9
1000541-9
1001291-1
1005866-1
1004352-7
EDIT:
Thanks guys - I don't have access to the code unfortunately. The number is a tax number, I need to be able to verify that the number was typed in correctly. From my research is looks like most countries use a pretty standard modulo-10 type system. I've got access to about 60 000 numbers.
I understand that the problem could be impossible to solve, it was more of academic concern.
First check your context:
If context is credit cards, driver's licenses, government licensing numbers (not SSN) think Luhn or Mod 10. If some other industry, does that industry have a defacto standard? If not, is the developer of the system using the numbers also a player in an industry that has a de facto standard?
Nobody likes to reinvent the wheel if they don't have to.
If that doesn't help keep in mind:
Don't assume that all the numbers in the keys you are testing against are used to arrive at the check digit. It's possible only 4 or the 8 digits are being used to calculate the check digit (or any other combination). It's also possible there is some external PREFIX number that is used with the other digits to arrive at the check digit. So... line up all your numbers with the same check digit, and see what the similarities are. Can you add a number to them and then always reach check digit? Can you test only the first few digits? Last few digits? every other digit?
Good luck.
Count how many times in your data (60 thousand) there are digits 0,1,2,3,4,5,6,7,8,9 as a check digit. If the digit 0 occurs twice as often as other digits, it means that the algorithm uses the modulo 11 operation. In this algorithm, if the sum mod 11 = 10, then the check digit is 0.
I'm learning Computer Science course and when I read to these definition, I understand. But I don't know what different purpose of two presentations and why.
Here some short explanation of purpose that my book said:
Zone decimal : hightly compatible with text data.
Packed decimal : faster computing speed.
Something I want to know is:
1) in zone decimal presentation there is a zone section that duplicate every digit. Why ? I see this is no purpose :(
2) why they say zone decimal is compatible with text data and why packed decimaal is faster.
Thanks :)
Firstly - where are you learning CS? Those terms are from the 1960s, the more common name is BCD (Binary Coded Decimal)
Zone decimal uses an entire byte for each digit. This means you can just print a number as if it was text (each 'character' stores a digit 0-9) but since there are only 10 digits and a byte can hold 256 different values this is a bit wasteful.
Packed decimal uses the fact that 4bits can store 16different values. So you can store two digits in a byte (top 4bits and bottom 4bits). This is still a bit wasteful since you only use half the capacity. But it's pretty easy to extract the two digits with just shift and mask operations.
Pretty much the only place you would see BCD these days is in some low level hardware where you want to read/x-mit a digit without using a microprocessor at all. It's easy to make a BCD counter just in transistors
but if you want to do any maths you either have to do long multiplication on each digit like you would on paper - or convert into regular ints and back again
Both of these representations have fallen out of favor, perhaps because they are not directly supported by C, and hence all of the systems descended from Unix.
Packed decimal has an advantage in two respects: since takes up less space it can get off the bus and into the processor faster, and many CISC instruction sets have dedicated instructions for arithmetic. To quote from http://en.wikipedia.org/wiki/Packed_decimal#Packed_BCD:
Packed BCD [binary coded decimal] is supported in the COBOL programming language as the
"COMPUTATIONAL-3" (an IBM extension adopted by many other compiler
vendors) or "PACKED-DECIMAL" (part of the 1985 COBOL standard) data
type. Besides the IBM System/360 and later compatible mainframes,
packed BCD was implemented in the native instruction set of the
original VAX processors from Digital Equipment Corporation and was the
native format for the Burroughs Corporation Medium Systems line of
mainframes (descended from the 1950s Electrodata 200 series).
Zoned decimal (http://en.wikipedia.org/wiki/Zoned_decimal#Zoned_decimal) has an easy mapping between characters on punch cards and their representation in memory, which perhaps explains your textbook's claim that it is "highly compatible with text data." As the Wikipedia article suggests, it's a term more used in IBM mainframe circles. On minis, we tended to just call it plain old decimal, PIC 9 data.
"Zoned Decimal" in its natural environment is meant to be compatable with the EBCDIC char set .
ASCII represents numbers as x'3x' -- x'39' which display as character "0" to "9".
The EBCDIC character sets (which has its origins in Hollerith pucnched cards) uses a similar but different scheme where x'F0' is displayed as characer "0' and x'F9' is displayed as character '9'.
Punched cards had a fixed length of 80 characters in many cases 10 or 12 of these characters were eaten up with record type identifiers and sequence numbers (desperately important if you dropped a bunch of cards on the floor!). So space was at a premium. Rather than enter a "+" or "-" character next to each number an "overpunch" extra holes near the top bit of the card was used to represent a positive or negative numbers, so saving a byte.
These overpunched characters were encdoded in EBCDIC as x"D0' to x'D9" for -0 to -9 and x'C0' to x'C9' for +0 to +9 usually in the last digit of the number.
Hence the "Zoned Decimal" format. The first four bits of each byte are the Zone, the second four bits the "number" to -42 was encoded as x'F4D2'.
This is more of a convention than anything else as the computer could not do anything with this format. So it needed to be encoded into "packed" format before any calculations took place. This is pretty easy s 'X'F4D2' -> x'042D' is mostly a case a grabbing the last zone then extracting the "numeric" four bits from each byte, which, could then be converted to binary.
When IBM mainframes were designed the largest group of users were banks, insurance companies and utility companies. The bulk of there processing followed this pattern.
read punch card.
read tape record.
add monthly payment to balance
store new balance on tape
print new balance
Most of the calculations involved currency amounts and most of the results were displayed immediately. It became clear that if the machine could do the arithmetic directly on the packed decimal values you could avoid several expensive "convert to binary" and "convert to decimal" instructions. As a bonus it made it easy to place the decimal point at the correct position and perform any decimal rounding. So a great deal of work went into implementing native packed decimal instructions (zero, add, subtract, multiply, divide, shift and round etc.).
This has been the preferred currency format for IBM mainframes ever since.
For many years developers on other platforms poured scorn on the mainframers for using such an archaic format, and, only recently began to realize how difficult it was to do fixed point decimal arithmetic to the standards accountants and tax collectors expect. Thanks to the efforts of Mike_Cowlishaw and others the rest of the world has caught up with the venerable IBM 360 and Java programmers can now calculate sales tax correctly using the BigDecimal library which is based on a variation on the old packed decimal format.
I have been doing some reading, and I'm having a tough time understanding how to interpret something that is a "digits x".
I.E.
type something is digits 6
I get that it's 6 digits of precision, but I guess what has me mixed up is what does that mean.
1) Y.XXXXXX (6X's),
2) XXX.XXX (Any number of digits, just will always be 6 of them counting both fore and aft the mantissa)
...
I'm just trying to understand what a range of something that is digits 6 (or digits n to be more generic), is there a formula I can simply plug into to determine what my ranges are on a type that is some number of digits?
A type declared with digits is a floating-point type, similar to Float or Long_Float.
The 6 is "the minimum number of significant decimal digits required for
the floating point type". For example, all the following will be represented reasonably accurately (but not exactly):
type My_Real is digits 6;
X: My_Real := 1.23456;
Y: My_Real := 12345.6;
Z: My_Real := 1.23456E7;
In practice, there are usually just 2 or 3 underlying floating-point types on a given system. The compiler will choose an appropriate one as the underlying type for your declaration. In practice, two types declared with digits 2 and digits 6 will probably have exactly the same representation and precision.
Understanding the phrase "not exactly" requires an understanding of floating-point that's well beyond the scope of a single question, but if you're familiar with floating-point in other languages, it's the same general idea.
If you want a general understanding of what floating-point is and how it works, the Wikipedia Article isn't bad. A much more advanced treatment is David Goldberg's classic paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic", available here as a web page and here as a PDF.
ISO 4217 defines 3-letter currency symbols:
EUR
USD
LKR
GBP
Do currencies' minor units (cent, pence) have a ISO or similar standard, too, that defines codes for those sub-units like
ct
p
?
The standard also defines the relationship between the major currency unit and any minor currency unit. Often, the minor currency unit has a value that is 1/100 of the major unit, but 1/1000 is also common. Some currencies do not have any minor currency unit at all. In others, the major currency unit has so little value that the minor unit is no longer generally used (e.g. the Japanese sen, 1/100th of a yen). This is indicated in the standard by the currency exponent. For example, USD has exponent 2, while JPY has exponent 0. Mauritania does not use a decimal division of units, setting 1 ouguiya (UM) = 5 khoums, and Madagascar has 1 ariary = 5 iraimbilanja.
Wikipedia.
As for a better word, how does minor currency unit suit? Although, Wikipedia also refers to it as sub unit. Take your pick.
There is a table on that Wikipedia article listing the standard precision for the minor currency unit.
As a sidenote, Wikipedia provides the fractional unit name for all circulating currencies.
You need to look at the standard itself.
From the ISO website:
ISO 4217:2008 specifies the structure
for a three-letter alphabetic code and
an equivalent three-digit numeric code
for the representation of currencies
and funds. For those currencies having
minor units, it also shows the decimal
relationship between such units and
the currency itself.
ISO 4217:2008 also establishes
procedures for a Maintenance Agency,
and specifies the method of
application for codes.
The key bit is:
it also shows the decimal
relationship between such units and
the currency itself.
So to answer your question, I couldn't find an ISO Standard that discusses minor units. Similar standards discuss Commercial Administration and Finance.
In the financial markets there's roughly two established industry standards.
The first one is really a case-by-case agreement, mostly enforced by exchanges that have their securities quote in the minor currency unit.
This lead to:
GBX for British pence
ZAC for South-African cents
ILA for Israeli agorot
Probably pioneered by Reuters and Bloomberg, the second standard is far more wide-spread and consistent. The agreement is to lowercase the third letter to denote the minor units.
GBp, ZAr, ILs, USd, EUr, etc.
Related discussions:
http://www.fixtradingcommunity.org/pg/discussions/topicpost/167427/