Why isn't this valid USPS tracking number validating according to their spec? - ruby

I'm writing a gem to detect tracking numbers (called tracking_number, natch). It searches text for valid tracking number formats, and then runs those formats through the checksum calculation as specified in each respective service's spec to determine valid numbers.
The other day I mailed a letter using USPS Certified Mail, got the accompanying tracking number from USPS, and fed it into my gem and it failed the validation. I am fairly certain I am performing the calculation correctly, but have run out of ideas.
The number is validated using USS Code 128 as described in section 2.8 (page 15) of the following document: http://www.usps.com/cpim/ftp/pubs/pub109.pdf
The tracking number I got from the post office was "7196 9010 7560 0307 7385", and the code I'm using to calculate the check digit is:
def valid_checksum?
# tracking number doesn't have spaces at this point
chars = self.tracking_number.chars.to_a
check_digit = chars.pop
total = 0
chars.reverse.each_with_index do |c, i|
x = c.to_i
x *= 3 if i.even?
total += x
end
check = total % 10
check = 10 - check unless (check.zero?)
return true if check == check_digit.to_i
end
According to my calculations based on the spec provided, the last digit should be a 3 in order to be valid. However, Google's tracking number auto detection picks up the number fine as is, so I can only assume I am doing something wrong.

From my manual calculations, it should match what your code does:
posn: 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 sum mult
even: 7 9 9 1 7 6 0 0 7 8 54 162
odd: 1 6 0 0 5 0 3 7 3 25 25
===
187
Hence the check digit should be three.
If that number is valid, then they're using a different algorithm to the one you think they are.
I think that might be the case since, when I plug the number you gave into the USPS tracker page, I can see its entire path.
In fact, if you look at publication 91, the Confirmation Services Technical Guide, you'll see it uses two extra digits, including the 91 at the front for the tracking application ID. Applying the algorithm found in that publication gives us:
posn: 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 sum mult
even: 9 7 9 9 1 7 6 0 0 7 8 63 189
odd: 1 1 6 0 0 5 0 3 7 3 26 26
===
215
and that would indeed give you a check digit of 5. I'm not saying that's the answer but it does match with the facts and is at least a viable explanation.
Probably your best bet would be to contact USPS for the information.

I don't know Ruby, but it looks as though you're multiplying by 3 at each even number; and the way I read the spec, you sum all the even digits and multiply the sum by 3. See the worked-through example pp. 20-21.
(later)
your code may be right. this Python snippet gives 7 for their example, and 3 for yours:
#!/usr/bin/python
'check tracking number checksum'
import sys
def check(number = sys.argv[1:]):
to_check = ''.join(number).replace('-', '')
print to_check
even = sum(map(int, to_check[-2::-2]))
odd = sum(map(int, to_check[-3::-2]))
print even * 3 + odd
if __name__ == '__main__':
check(sys.argv[1:])
[added later]
just completing my code, for reference:
jcomeau#intrepid:~$ /tmp/track.py 7196 9010 7560 0307 7385
False
jcomeau#intrepid:~$ /tmp/track.py 91 7196 9010 7560 0307 7385
True
jcomeau#intrepid:~$ /tmp/track.py 71123456789123456787
True
jcomeau#intrepid:~$ cat /tmp/track.py
#!/usr/bin/python
'check tracking number checksum'
import sys
def check(number):
to_check = ''.join(number).replace('-', '')
even = sum(map(int, to_check[-2::-2]))
odd = sum(map(int, to_check[-3::-2]))
checksum = even * 3 + odd
checkdigit = (10 - (checksum % 10)) % 10
return checkdigit == int(to_check[-1])
if __name__ == '__main__':
print check(''.join(sys.argv[1:]).replace('-', ''))

Related

Drawing from a 2-D prior that is only available as samples in pymc2

I'm trying to play around with Bayesian updating, and have a situation in which I am using a posterior from previous runs as a prior. This is a 2D prior on alpha and beta, for which I have traces, alphatrace and betatrace. So I stack them and use code adopted from https://gist.github.com/jcrudy/5911624 to make a KDE based stochastic.
#from https://gist.github.com/jcrudy/5911624
def KernelSmoothing(name, dataset, bw_method=None, observed=False, value=None):
'''Create a pymc node whose distribution comes from a kernel smoothing density estimate.'''
density = gaussian_kde(dataset, bw_method)
def logp(value):
#print "VAL", value
d = density(value)
if d == 0.0:
return float('-inf')
return np.log(d)
def random():
result = None
sample=density.resample(1)
#print sample, sample.shape
result = sample[0][0],sample[1][0]
return result
if value == None:
value = random()
dtype = type(value)
result = pymc.Stochastic(logp = logp,
doc = 'A kernel smoothing density node.',
name = name,
parents = {},
random = random,
trace = True,
value = None,
dtype = dtype,
observed = observed,
cache_depth = 2,
plot = True,
verbose = 0)
return result
Note that the critical thing here is to obtain 2-values from the joint prior: this is why i need a 2-D prior and not two 1-D priors.
The model itself is so:
ctrace=np.vstack((alphatrace, betatrace))
cnew=KernelSmoothing("cnew", ctrace)
#pymc.deterministic
def alphanew(cnew=cnew, name='alphanew'):
return cnew[0]
#pymc.deterministic
def betanew(cnew=cnew, name='betanew'):
return cnew[1]
newtheta=pymc.Beta("newtheta", alphanew, betanew)
newexp = pymc.Binomial('newexp', n=[14], p=[newtheta], value=[4], observed=True)
model3=pymc.Model([cnew, alphanew, betanew, newtheta, newexp])
mcmc3=pymc.MCMC(model3)
mcmc3.sample(20000,5000,5)
In case you are wondering, this is to do the 71st experiment in the hierarchical Rat Tumor example in Chapter 5 in Gelman's BDA. The "prior" I am using is the posterior on alpha and beta after 70 experiments.
But, when I sample, things blow up with the error:
ValueError: Maximum competence reported for stochastic cnew is <= 0... you may need to write a custom step method class.
Its not cnew I care about updating as a stochastic, but rather alphanew and betanew. How ought I be structuring the code to make this error go away?
EDIT: initial model which gave me the posteriors I wish to use as the prior:
tumordata="""0 20
0 20
0 20
0 20
0 20
0 20
0 20
0 19
0 19
0 19
0 19
0 18
0 18
0 17
1 20
1 20
1 20
1 20
1 19
1 19
1 18
1 18
3 27
2 25
2 24
2 23
2 20
2 20
2 20
2 20
2 20
2 20
1 10
5 49
2 19
5 46
2 17
7 49
7 47
3 20
3 20
2 13
9 48
10 50
4 20
4 20
4 20
4 20
4 20
4 20
4 20
10 48
4 19
4 19
4 19
5 22
11 46
12 49
5 20
5 20
6 23
5 19
6 22
6 20
6 20
6 20
16 52
15 46
15 47
9 24
"""
tumortuples=[e.strip().split() for e in tumordata.split("\n")]
tumory=np.array([np.int(e[0].strip()) for e in tumortuples if len(e) > 0])
tumorn=np.array([np.int(e[1].strip()) for e in tumortuples if len(e) > 0])
N = tumorn.shape[0]
mu = pymc.Uniform("mu",0.00001,1., value=0.13)
nu = pymc.Uniform("nu",0.00001,1., value=0.01)
#pymc.deterministic
def alpha(mu=mu, nu=nu, name='alpha'):
return mu/(nu*nu)
#pymc.deterministic
def beta(mu=mu, nu=nu, name='beta'):
return (1.-mu)/(nu*nu)
thetas=pymc.Container([pymc.Beta("theta_%i" % i, alpha, beta) for i in range(N)])
deaths = pymc.Binomial('deaths', n=tumorn, p=thetas, value=tumory, size=N, observed=True)
I use the joint-posterior from this model on alpha, beta as input to the "new model" at top. This also begs the question if I ought to be including theta1..theta70 in the model at top as they will update along with alpha and beta thanks to the new data which is a binomial with n=14, y=4. But I cant even get the little model with only a prior as a 2d sample array working :-(
I found your question since I ran into a similar proble. According to the documentation of pymc.StepMethod.competence, the problem is that none of the built-in samplers handle the dtype associated with the stochastic variable.
I am not sure what needs to be done to actually resolve that. Maybe one of the sampler methods can be extended to handle special types?
Hopefully someone with more pymc mojo can shine a light on what needs to be done..
def competence(s):
"""
This function is used by Sampler to determine which step method class
should be used to handle stochastic variables.
Return value should be a competence
score from 0 to 3, assigned as follows:
0: I can't handle that variable.
1: I can handle that variable, but I'm a generalist and
probably shouldn't be your top choice (Metropolis
and friends fall into this category).
2: I'm designed for this type of situation, but I could be
more specialized.
3: I was made for this situation, let me handle the variable.
In order to be eligible for inclusion in the registry, a sampling
method's init method must work with just a single argument, a
Stochastic object.
If you want to exclude a particular step method from
consideration for handling a variable, do this:
Competence functions MUST be called 'competence' and be decorated by the
'#staticmethod' decorator. Example:
#staticmethod
def competence(s):
if isinstance(s, MyStochasticSubclass):
return 2
else:
return 0
:SeeAlso: pick_best_methods, assign_method
"""

Dynamic Programming - Two spies at the river

I think this is a very complicated dynamic programming problem.
Two spies each have a secret number in [1..m]. To exchange numbers they agree to meet at the river and "innocently" take turns throwing stones: from a pile of n=26 identical stones, each spy in turn throws at least one stone in the river.
The only information is in the number of stones each thrown in each turn. What is the largest m can be so they are sure they can complete the exchange?
Develop a recursive formula to count. Here is the start of the table; complete it to n=26. (You should not expect a closed form.)
n 1 2 3 4 5 6 7 8 9 10 11 12
m 1 1 1 2 2 3 4 6 8 12 16 23
Here are some hints from our professor: I suggest changing the problem to making the following table: Let R(n,m) be the range of numbers [1..R(n,m)] that A can indicate to B if they start with n stones, and both know that A has to also receive a number in [1..m] from B.
For example, if A needs no more information, R(n,1) can be computed by considering how many stones A could throw (one to n), then B thows 1 (if any remain) and A gets to decide again. The base cases R(0,1) = R(1,1) = 1, and you can write a recursive rule if you are careful at the boundaries. (You should find the Fibonacci numbers for R(n,1).)
If A needs information, then B has to send it by his or her choices, so things are a little more complicated. Here is the start of the table:
n\ m 1 2 3 4 5
0 1 0 0 0 0
1 1 0 0 0 0
2 2 0 0 0 0
3 3 1 0 0 0
4 5 2 1 0 0
5 8 4 2 1 1
6 13 7 4 3 2
7 21 12 8 6 4
8 34 20 15 11 8
9 55 33 27 19 16
From the R(n,m) table, how would you recover the entries of the earlier table (the table showing m as a function of n)?

Algorithm suggestion

I'm looking for the best way to accomplish the following tasks:
Given 4 non-repeatable numbers between 1 and 9.
Given 2 numbers between 1 and 6.
Adding up the two numbers (1 to 6), check to see if there is a way make that same number using the four non-repeatable numbers (1 to 9), plus you may not even have to use all four numbers.
Example:
Your four non-repeatable (1 to 9) numbers are: 2, 4, 6, and 7
Your two numbers between 1 and 6 are: 3 and 3
The total for the two numbers is 3 + 3 = 6.
Looking at the four non-repeatable (1 to 9) numbers, you can make a 6 in two different ways:
2 + 4 = 6
6 = 6
So, this example returns "yes, there is a possible solution".
How do I accomplish this task in the most efficient, cleanest way possible, algorithmic-ally.
enter code hereSince the number of elements here is 4 so we should not worry about efficiency.
Just loop over 0 to 15 and use it as a bit mask to check what are the valid results that can be generated.
Here is a code in python to give you idea.
a = [2,4,6,7]
for i in range(16):
x = i
ans = 0
for j in range(4):
if(x%2):
ans += a[j]
x /= 2
print ans,
0 2 4 6 6 8 10 12 7 9 11 13 13 15 17 19

Identify gaps in repeated sequences

I have a vector that should contain n sequences from 00 to 11
A = [00;01;02;03;04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
and I would like to check that the sequence "00 - 11 " is always respected (no missing values).
for example if
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03;04;05;06;07;08;09;10;11]
(missing 03 in the 3rd position)
For each missing value I would like to have back this information in another vector
missing=
[value_1,position_1;
value_2, position_2;
etc, etc]
Can you help me?
For sure we know that the last element must be 11, so we can already check for this and make our life easier for testing all previous elements. We ensure that A is 11-terminated, so an "element-wise change" approach (below) will be valid. Note that the same is true for the beginning, but changing A there would mess with indices, so we better take care of that later.
missing = [];
if A(end) ~= 11
missing = [missing; 11, length(A) + 1];
A = [A, 11];
end
Then we can calculate the change dA = A(2:end) - A(1:end-1); from one element to another, and identify the gap positions idx_gap = find((dA~=1) & (dA~=-11));. Now we need to expand all missing indices and expected values, using ev for the expected value. ev can be obtained from the previous value, as in
for k = 1 : length(idx_gap)
ev = A(idx_gap(k));
Now, the number of elements to fill in is the change dA in that position minus one (because one means no gap). Note that this can wrap over if there is a gap at the boundary between segments, so we use the modulus.
for n = 1 : mod(dA(idx_gap(k)) - 1, 12)
ev = mod(ev + 1, 12);
missing = [missing; ev, idx_gap(k) + 1];
end
end
As a test, consider A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]. That's a case where the special initialization from the beginning will fire, memorizing the missing 11 already, and changing A to [5 6 ... 7 8 11]. missing then will yield
11 24 % recognizes improper termination of A.
11 7
0 7 % properly handles wrap-over here.
1 7
2 7
5 21 % recognizes single element as missing.
9 24
10 24
which should be what you are expecting. Now what's missing still is the beginning of A, so let's say missing = [0 : A(1) - 1, 1; missing]; to complete the list.
This will give you the missing values and their positions in the full sequence:
N = 11; % specify the repeating 0:N sub-sequence
n = 3; % reps of sub-sequence
A = [5 6 7 8 9 10 3 4 5 6 7 8 9 10 11 0 1 2 3 4 6 7 8]'; %' column from s.bandara
da = diff([A; N+1]); % EDITED to include missing end
skipLocs = find(~(da==1 | da==-N));
skipLength = da(skipLocs)-1;
skipLength(skipLength<0) = N + skipLength(skipLength<0) + 1;
firstSkipVal = A(skipLocs)+1;
patchFun = #(x,y)(0:y)'+x - (N+1)*(((0:y)'+x)>N);
patches = arrayfun(patchFun,firstSkipVal,skipLength-1,'uni',false);
locs = arrayfun(#(x,y)(x:x+y)',skipLocs+cumsum([A(1); skipLength(1:end-1)])+1,...
skipLength-1,'uni',false);
Then putting them together, including any missing values at the beginning:
>> gapMap = [vertcat(patches{:}) vertcat(locs{:})-1]; % not including lead
>> gapMap = [repmat((0 : A(1) - 1)',1,2); gapMap] %' including lead
gapMap =
0 0
1 1
2 2
3 3
4 4
11 11
0 12
1 13
2 14
5 29
9 33
10 34
11 35
The first column contains the missing values. The second column is the 0-based location in the hypothetical full sequence.
>> Afull = repmat(0:N,1,n)
>> isequal(gapMap(:,1), Afull(gapMap(:,2)+1)')
ans =
1
Although this doesn't solve your problem completely, you can identify the position of missing values, or of groups of contiguous missing values, like this:
ind = 1+find(~ismember(diff(A),[1 -11]));
ind gives the position with respect to the current sequence A, not to the completed sequence.
For example, with
A =[00;01;02; 04;05;06;07;08;09;10;11;00;01;02;03; ;06;07;08;09;10;11];
this gives
>> ind = 1+find(~ismember(diff(A),[1 -11]))
ind =
4
16

trying to find the lowest average height in this .dat file of numbers

Im trying to fit a swimming pool onto this piece of terrain. The terrain is the first index (10x10 in this case) and the last index is the size the pool will be(2x2).
ive figured out how to read in the terrain and get the mean and standard deviation of it but now i need to find the lowest average height. I know i need to use a while loop but I dont know how to go about this can anyone help me ?
10
1 1 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
1 2 3 4 5 6 7 12 12 12
21
Here are two answers showing different styles. The first is faster (only important for HUGE terrain sizes), but less "Ruby-esque"; the second is more functional, but creates extra intermediary data. For your own best education, I encourage you to ensure that you understand these thoroughly, and choose how to proceed in a way that is best for you.
Also, I've assumed that the 21 you have in your question is a mistake, and you meant to have a 2 there.
First, both solutions start with the same code that creates an array of arrays for the terrain:
# Load the text file as an array of strings
lines = IO.readlines('pool.txt')
# Turn it into an array of arrays of numbers
terrain = lines.map{ |s| s.scan(/\d+/).map(&:to_i) }
# Throw out the silly grid size; we'll infer it from real data instead!
terrain.shift
# Take the last line (pool size) out of the terrain
pool_size = terrain.pop.first
The first solution walks through the terrain and calculates the average for each sub-grid, keeping track of the lowest number:
# For fun, we'll allow terrain that doesn't have to be square
rows = terrain.length
cols = terrain.first.length
best_size = Float::INFINITY
0.upto(rows-pool_size-1) do |y|
0.upto(cols-pool_size-1) do |x|
# x,y is the upper left corner of a valid pool_size × pool_size grid
average = 0.0
0.upto(pool_size-1) do |m|
0.upto(pool_size-1) do |n|
# Add up each point in the sub-grid
average += terrain[y+n][x+m]
end
end
# The number of points we added is the square of the size
average /= (pool_size*pool_size)
# Mark this as the best seen so far
best_size = average if average < best_size
end
end
p best_size
#=> 1.25
The second solution finds all the sub-grids, and then uses the Enumerable#min_by method to find the best. We also create a method for calculating the average on an array of numbers, just for fun and more self-describing code:
# See http://ruby-doc.org/stdlib-1.9.3/libdoc/matrix/rdoc/Matrix.html
require 'matrix'
class Matrix
# Average all values in the array (as a float)
def average
parts = to_a.flatten
parts.inject(:+) / parts.length.to_f
end
end
# Hey look, a nice 2D grid of elevations!
terrain = Matrix[ *terrain ]
# Create an array of matrices, each one representing a possible pool
rows = 0...(terrain.row_size - size)
cols = 0...(terrain.column_size - size)
pools = rows.flat_map{|x| cols.map{ |y| terrain.minor(x,size,y,size) } }
# Find the lowest pool by calling the above 'average' method on each
lowest = pools.min_by(&:average)
p lowest, lowest.average
#=> Matrix[[1, 1], [1, 2]]
#=> 1.25
On my computer the simple array-of-arrays method takes ~0.6s to find the lowest 3x3 pool in a random 400×400 terrain, while the matrix technique takes ~1.3s. So the matrix style is more than twice as slow, but still plenty fast for your assignment. :)
It's Ruby. You probably want to use iterators, not while loops.
But do your own homework. You'll learn more.

Resources