When I run HPL with multiple options like different problem sizes etc., the benchmark performs multiple runs on the system. In my example:
multiple NBMIN
multiple BCAST
multiple DEPTH
etc.
When I then look at the single output file of the run, I don't get how I can differentiate those outputs. In my example, how do I know which variant WR01R2C4 or WR01R2C8 or WR03R2C4 is?
The Output gives a clue with an encoded variant, but I couldn't find any info on how to decode it.
Does anybody know?
Here is a snippet of my output file...
(on another note: is there an option to highlight (i.e. make bold) text inside my codeblock on stackoverflow?)
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 9000
NB : 640
PMAP : Row-major process mapping
P : 3
Q : 3
PFACT : Crout
NBMIN : 4 8
NDIV : 2
RFACT : Right
BCAST : 1ringM 2ringM
DEPTH : 0 1
SWAP : Mix (threshold = 60)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR01R2C4 9000 640 3 3 9.42 5.1609e+01
HPL_pdgesv() start time Mon Nov 29 13:12:56 2021
HPL_pdgesv() end time Mon Nov 29 13:13:05 2021
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.34317645e-03 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR01R2C8 9000 640 3 3 9.35 5.2011e+01
HPL_pdgesv() start time Mon Nov 29 13:13:06 2021
HPL_pdgesv() end time Mon Nov 29 13:13:15 2021
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.50831382e-03 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR03R2C4 9000 640 3 3 9.32 5.2164e+01
HPL_pdgesv() start time Mon Nov 29 13:13:16 2021
HPL_pdgesv() end time Mon Nov 29 13:13:25 2021
If it isn't documented, just look into the source code. In testing/ptest/HPL_pdtest.c you'll find the following line:
HPL_fprintf( TEST->outfp,
"W%c%1d%c%c%1d%c%1d%12d %5d %5d %5d %18.2f %18.3e\n",
( GRID->order == HPL_ROW_MAJOR ? 'R' : 'C' ),
ALGO->depth, ctop, crfact, ALGO->nbdiv, cpfact, ALGO->nbmin,
N, NB, nprow, npcol, wtime[0], Gflops );
Hence, the format of the encoded variant is:
WR01R2C4
^^^^^^^^
||||||||
|||||||+--- NBMIN
||||||+---- PFACT (C = Crout, L = Left, R = Right)
|||||+----- NBDIV
||||+------ RFACT (see PFACT)
|||+------- BCAST (0 = 1ring, 1 = 1ringM, 2 = 2ring, 3 = 2ringM, 4 = long)
||+-------- DEPTH
|+--------- PMAP (R = Row-major, C = Column-major)
+---------- always W
Related
Original question:
I am relatively new to OpenACC, but have so far managed to accelerate my suite of iterative Fortran solvers (based on CG) with relative success, and am getting speed-ups of around 7 on my Nvidia RTX GeForce card. All works fine if no preconditioning is used, or if diagonal preconditioning is used. But, problems start if I want to speed up slightly more sophisticated preconditioners - the inclomplete LDL^T being my favorite.
The piece of code which performs incomplete LDL^T factorization looks like this:
20 do i = 1, n ! browse through rows
21 sum1 = a % val(a % dia(i)) ! take diagonal entry
22 do j = a % row(i), a % dia(i)-1 ! browse only lower triangular part
23 k = a % col(j) ! fetch the column
24 sum1 = sum1 - f % val(f % dia(k)) * a % val(j) * a % val(j)
25 end do
26
27 ! Keep only the diagonal from LDL decomposition
28 f % val(f % dia(i)) = 1.0 / sum1
29 end do
This piece of code is inherently sequential. As I browse through rows, I use the factorization performed in previous rows, thus should not be easily parallelizable from the outset. The only way I managed to find to compile it through OpenACC directive, and having in mind its sequential nature, is:
20 !$acc parallel loop seq &
21 !$acc& present(a, a % row, a % col, a % dia, a % val) &
22 !$acc& present(f, f % row, f % col, f % dia, f % val)
23 do i = 1, n ! browse through rows
24 sum1 = a % val(a % dia(i)) ! take diagonal entry
25 !$acc loop vector reduction(+:sum1)
26 do j = a % row(i), a % dia(i)-1 ! browse only lower triangular part
27 k = a % col(j) ! fetch the column
28 sum1 = sum1 - f % val(f % dia(k)) * a % val(j) * a % val(j)
29 end do
30
31 ! Keep only the diagonal from LDL decomposition
32 f % val(f % dia(i)) = 1.0 / sum1
33 end do
Although these OpenACC directives keeps calculations on GPUs and give the same results as non-GPU variant, those of you more experienced than me will already see that there is not a hell of lot of parallelism and speed-up here. The outer loop (i, through rows) is sequential, and the inner loop (j and k) browses through several elements only.
Overall, the GPU variant of incomplete LDL^T preconditioned CG solver is several times slower than the non-GPU version.
Does anyone have an idea how to work around this? I understand that it might be far from trivial, but maybe some work has already been done on this issue that I am unaware of, or maybe there is a better way to use OpenACC.
Update a couple of days later
I did my homework, read some articles on Nvidia site and beyond, and yeah, the factorization algorithm is indeed sequential and it seems to be an open question how to go about it on GPUs. In this recent blog: https://docs.nvidia.com/cuda/incomplete-lu-cholesky/index.html the author is using cuBLAS and cuSPARSE for CG, but still does factorization on CPU. The average speed-up he reports is around two, which is slightly underwhelming I dare to say.
So, I decided for a workaround by coloring matrix rows and perform re-factorization color by color, like this:
21 n_colors = 8
22
23 do color = 1, n_colors
24
25 c_low = (color-1) * n / n_colors + 1 ! color's upper bound
26 c_upp = color * n / n_colors ! color's lower bound
27
28 do i = c_low, c_upp ! browse your color band
29 sum1 = a % val(a % dia(i)) ! take diagonal entry
30 do j = a % row(i), a % dia(i)-1 ! only lower traingular
31 k = a % col(j)
32 if(k >= c_low) then ! limit entries to your color
33 sum1 = sum1 - f % val(f % dia(k)) * a % val(j) * a % val(j)
34 end if
35 end do
36
37 ! This is only the diagonal from LDL decomposition
38 f % val(f % dia(i)) = 1.0 / sum1
39 end do
40 end do ! colors
With line 32 I am limiting matrix entries only to the color belonging to its loop. Clearly, since the incompleteness of the factorized matrix is more pronounced (more entries are neglected), convergence properties are poorer, but still better than with simple diagonal preconditoner.
I did the following with OpenACC:
21 n_colors = 8
22
23 !$acc parallel loop num_gangs(8) tile(8) ! do each tile in its own color
24 !$acc& present(a, a % row, a % col, a % dia, a % val) &
25 !$acc& present(f, f % row, f % col, f % dia, f % val)
26 do color = 1, n_colors
27
28 c_low = (color-1) * n / n_colors + 1 ! color's upper bound
29 c_upp = color * n / n_colors ! color's lower bound
30
31 !$acc loop seq ! inherently sequential
32 do i = c_low, c_upp ! browse your color band
33 sum1 = a % val(a % dia(i)) ! take diagonal entry
34 !$acc loop vector reduction(+:sum1)
35 do j = a % row(i), a % dia(i)-1 ! only lower traingular
36 k = a % col(j)
37 if(k >= c_low) then ! limit entries to your color
38 sum1 = sum1 - f % val(f % dia(k)) * a % val(j) * a % val(j)
39 end if
40 end do
41
42 ! This is only the diagonal from LDL decomposition
43 f % val(f % dia(i)) = 1.0 / sum1
44 end do
45 end do ! colors
The results I obtain from OpenACC variant are identical to those from CPU only, it is just that performance is still much slower on GPUs. Yet, it seems that tile directive works as I expected, each gang seems to be working on its own color since results I get from GPUs are identical.
Any advice on how to improve performance? Am I doing something utterly silly in the above code? (Profiler shows that the computation is GPU-bound since the entire system is on GPU, but performance is really poor.)
Best regards
Not sure this will help, but this paper describes a method for solving CG on GPUs. It uses cuSPARSE and CUDA for the implementations, but you might be able to get ideas that could be applied to your code.
https://www.dcs.warwick.ac.uk/pmbs/pmbs14/PMBS14/Workshop_Schedule_files/8-CUDAHPCG.pdf
I have a list of items, a, b, c,..., each of which has a weight and a value.
The 'ordinary' Knapsack algorithm will find the selection of items that maximises the value of the selected items, whilst ensuring that the weight is below a given constraint.
The problem I have is slightly different. I wish to minimise the value (easy enough by using the reciprocal of the value), whilst ensuring that the weight is at least the value of the given constraint, not less than or equal to the constraint.
I have tried re-routing the idea through the ordinary Knapsack algorithm, but this can't be done. I was hoping there is another combinatorial algorithm that I am not aware of that does this.
In the german wiki it's formalized as:
finite set of objects U
w: weight-function
v: value-function
w: U -> R
v: U -> R
B in R # constraint rhs
Find subset K in U subject to:
sum( w(u) <= B ) | all w in K
such that:
max sum( v(u) ) | all u in K
So there is no restriction like nonnegativity.
Just use negative weights, negative values and a negative B.
The basic concept is:
sum( w(u) ) <= B | all w in K
<->
-sum( w(u) ) >= -B | all w in K
So in your case:
classic constraint: x0 + x1 <= B | 3 + 7 <= 12 Y | 3 + 10 <= 12 N
becomes: -x0 - x1 <= -B |-3 - 7 <=-12 N |-3 - 10 <=-12 Y
So for a given implementation it depends on the software if this is allowed. In terms of the optimization-problem, there is no problem. The integer-programming formulation for your case is as natural as the classic one (and bounded).
Python Demo based on Integer-Programming
Code
import numpy as np
import scipy.sparse as sp
from cylp.cy import CyClpSimplex
np.random.seed(1)
""" INSTANCE """
weight = np.random.randint(50, size = 5)
value = np.random.randint(50, size = 5)
capacity = 50
""" SOLVE """
n = weight.shape[0]
model = CyClpSimplex()
x = model.addVariable('x', n, isInt=True)
model.objective = value # MODIFICATION: default = minimize!
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int) # assumes existence
print("INSTANCE")
print(" weights: ", weight)
print(" values: ", value)
print(" capacity: ", capacity)
print("Solution")
print(x_sol)
print("sum weight: ", x_sol.dot(weight))
print("value: ", x_sol.dot(value))
Small remarks
This code is just a demo using a somewhat low-level like library and there are other tools available which might be better suited (e.g. windows: pulp)
it's the classic integer-programming formulation from wiki modifies as mentioned above
it will scale very well as the underlying solver is pretty good
as written, it's solving the 0-1 knapsack (only variable bounds would need to be changed)
Small look at the core-code:
# create model
model = CyClpSimplex()
# create one variable for each how-often-do-i-pick-this-item decision
# variable needs to be integer (or binary for 0-1 knapsack)
x = model.addVariable('x', n, isInt=True)
# the objective value of our IP: a linear-function
# cylp only needs the coefficients of this function: c0*x0 + c1*x1 + c2*x2...
# we only need our value vector
model.objective = value # MODIFICATION: default = minimize!
# WARNING: typically one should always use variable-bounds
# (cylp problems...)
# workaround: express bounds lower_bound <= var <= upper_bound as two constraints
# a constraint is an affine-expression
# sp.eye creates a sparse-diagonal with 1's
# example: sp.eye(3) * x >= 5
# 1 0 0 -> 1 * x0 + 0 * x1 + 0 * x2 >= 5
# 0 1 0 -> 0 * x0 + 1 * x1 + 0 * x2 >= 5
# 0 0 1 -> 0 * x0 + 0 * x1 + 1 * x2 >= 5
model += sp.eye(n) * x >= np.zeros(n) # could be improved
model += sp.eye(n) * x <= np.ones(n) # """
# cylp somewhat outdated: need numpy's matrix class
# apart from that it's just the weight-constraint as defined at wiki
# same affine-expression as above (but only a row-vector-like matrix)
model += np.matrix(-weight) * x <= -capacity # MODIFICATION
# internal conversion of type neeeded to treat it as IP (or else it would be
LP)
cbcModel = model.getCbcModel()
cbcModel.logLevel = True
status = cbcModel.solve()
# type-casting
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int)
Output
Welcome to the CBC MILP Solver
Version: 2.9.9
Build Date: Jan 15 2018
command line - ICbcModel -solve -quit (default strategy 1)
Continuous objective value is 4.88372 - 0.00 seconds
Cgl0004I processed model has 1 rows, 4 columns (4 integer (4 of which binary)) and 4 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 5
Cbc0038I Before mini branch and bound, 4 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting with objective of 5 - took 0.00 seconds
Cbc0012I Integer solution of 5 found by feasibility pump after 0 iterations and 0 nodes (0.00 seconds)
Cbc0001I Search completed - best objective 5, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 5 to 5
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Total time (CPU seconds): 0.00 (Wallclock seconds): 0.00
INSTANCE
weights: [37 43 12 8 9]
values: [11 5 15 0 16]
capacity: 50
Solution
[0 1 0 1 0]
sum weight: 51
value: 5
The title may not be correct, I was unsure how to phrase my question.
I am attempting to program with Python3.6 an asymmetric cipher similar to, I believe, that used with RSA encrypted communication
My logic understanding of this is as follows:
Person1 (p1) picks two prime numbers say 17 and 19
let p = 17 and q = 19
the product of these two numbers will be called n (n = p * q)
n = 323
p1 will then make public n
P1 will then make public another prime called e, e = 7
Person2(p2) wants to send p1 the letter H (72 in Ascii)
To do this p2 does the following ((72 ^ e) % n) and calls this value M
M = 13
p2 sends M to p1
p1 receives M and now needs to decrypt it
p1 can do this by calculating D where (e^D) % ((p-1)*(q-1)) = 1
In this example i know D = 247
With D p1 can calculate p2 message using M^D % n
which successfully gives 72 ('H' in ASCII)
With this said the following rules must apply:
GCD(e,m) = 1
where m = ((p-1)*(q-1))
otherwise (e^D) % ((p-1)*(q-1)) = 1 does not exist.
Now comes by issue! :)
Calculating D where the numbers are not so easy to work with.
Now please tell me if there is an easier way to calculate D but this is where I got upto using online aid.
(the example I looked at online used different values so they are as follows:
p=47
q=71
n = p*q = 3337
(p-1)*(q-1) = 3220
e = 79
Now we must find D. We know (e^D) % ((p-1)*(q-1)) = 1
Therefore D = 79^-1 % 3220
The equation is rewritten as 79*d = 1 mod 3220
This is where I get confused
Using regular Euclidean Algorithm gcd(79,3220) must equal 1 or there may not actually be a solution (are my descriptions correct here?)
3220 = 40*79 + 60 (79 goes into 3220 40 times with remainder 60)
79 = 1*60 + 19 (The last remainder 60 goes into 79 once with r 19)
60 = 3*19 + 3 (The last remainder 19 goes into 60 three times with r 3)
19 = 6*3 + 1 (The last remainder 3 goes into 19 6 times with r 1)
3 = 3*1 + 0 (The last remainder 1 goes into 3 three times with r 0)
The last nonzero remainder is the gcd. Thus gcd(79,3220) = 1 (as it should be)
The last step here I do not know what on earth is happening
I am told write the gcd(one) as a linear combination of 19 and 3220 by working back up the tree...
1 = 19-6*3
= 19-6*(60-3*19)
= 19*19 - 6*60
= 19*(79-60) - 6*60
= 19*79 - 25*60
= 19*79 - 25*(3220-40*79)
= 1019*79 - 25*3220
After this I am left with 1019*79 - 25*3220 = 1 and if i mod 3220 on both sides i get 1019*79 = 1 mod 3220
(the term that contains 3220 goes away because 3220 = 0 mod 3220).
Thus d = 1019.
So, the problem is to unwind the following sequence:
3220 = 40*79 + 60
79 = 1*60 + 19
60 = 3*19 + 3
19 = 6*3 + 1
3 = 3*1 + 0
First, forget the last row and start from the one with the last non-null remainder.
Then proceed step by step:
1 = 19 - 6*3 ; now expand 3
= 19 - 6*(60 - 3*19) = 19 - 6*60 + 18*19
= 19*19 - 6*60 ; now expand 19
= 19*(79 - 1*60) - 6*60 = 19*79 - 19*60 - 6*60
= 19*79 - 25*60 ; now expand 60
= 19*79 - 25*(3220 - 40*79) = 19*79 - 25*3220 + 1000*79
= 1019*79 - 25*3220 ; done
Note that the idea is to expand, at each step, the previous remainder. For instance, when expanding remainder 19 with: 79 - 1*60, you transform 19*19 - 6*60 into 19*(79 - 1*60) - 6*60. This lets you regroup around 79 and 60 and keep going.
I want to find the best match of a sequence of integers within a NxN matrix. The problem is that I don't know how to extract the position of this best match. The following code that I have should calculate the edit distance but I would like to know where in my grid that edit distance is shortest!
function res = searchWordDistance(word,grid)
% wordsize = length(word); % extract the actual size
% [x ,y] = find(word(1) == grid);
D(1,1,1)=0;
for i=2:length(word)+1
D(i,1,1) = D(i-1,1,1)+1;
end
for j=2:length(grid)
D(1,1,j) = D(1,1,j-1)+1;
D(1,j,1) = D(1,j-1,1)+1;
end
% inspect the grid for best match
for i=2:length(word)
for j=2:length(grid)
for z=2:length(grid)
if(word(i-1)==grid(j-1,z-1))
d = 0;
else
d=1;
end
c1=D(i-1,j-1,z-1)+d;
c2=D(i-1,j,z)+1;
c3=D(i,j-1,z-1)+1;
D(i,j,z) = min([c1 c2 c3]);
end
end
end
I have used this code (in one less dimension) to compare two strings.
EDIT Using a 5x5 matrix as example
15 17 19 20 22
14 8 1 15 24
11 4 17 3 2
14 2 1 14 8
19 23 5 1 22
now If I have a sequence [4,1,1] and [15,14,12,14] they should be found using the algorithm. The first one is a perfect match(diagonal starts at (3,2)). The second one is on the first column and is the closest match for that sequence since only one number is wrong.
I'm making a function that converts a number into a string with predefined characters. Original, I know. I started it, because it seemed fun at the time. To do on my own. Well, it's frustrating and not fun.
I want it to be like binary as in that any left character is worth more than its right neigbour. Binary is inefficient because every bit has only 1 positive value. Xnary is efficient, because a 'bit' is never 0.
The character set (in this case): A - Z.
A = 1 ..
Z = 26
AA = 27 ..
AZ = 52
BA = 53 ..
BZ = 2 * 26 (B) + 26 * 1 (Z) = 78... Right?
ZZ = 26 * 26 (Z) + 26 * 1 (Z) = 702?? Right??
I found this here, but there AA is the same as A and AAA. The result of the function is never AA or AAA.
The string A is different from AA and AAA however, so the number should be too. (Unlike binary 1, 01, 001 etc.) And since a longer string is always more valuable than a shorter... A < AA < AAA.
Does this make sense? I've tried to explain it before and have failed. I've also tried to make it before. =)
The most important thing: since A < AA < AAA, the value of 'my' ABC is higher than the value of the other script. Another difference: my script doesn't exist, because I keep failing.
I've tried with this algorithm:
N = 1000, Size = 3, (because 26 log(1000) = 2.x), so use 676, 26 and 1 for positions:
N = 1000
P0 = 1000 / 676 = 1.x = 1 = A
N = 1000 - 1 * 676 = 324
P1 = 324 / 26 = 12.x = 12 = L
N = 324 - 12 * 26 = 12
P1 = 12 / 1 = 12 = L
1000 => ALL
Sounds fair? Apparently it's crap. Because:
N = 158760, Size = 4, so use 17576, 676, 26 and 1
P0 = 158760 / 17576 = 9.x = 9 = I
N = 158760 - 9 * 17576 = 576
P1 = 576 / 676 = 0.x = 0 <<< OOPS
If 1 is A (the very first of the xnary), what's 0? Impossible is what it is.
So this one is a bust. The other one (on jsFiddle) is also a bust, because A != AA != AAA and that's a fact.
So what have I been missing for a few long nights?
Oh BTW: if you don't like numbers, don't read this.
PS. I've tried searching for similar questions but none are similar enough. The one references is most similar, but 'faulty' IMO.
Also known as Excel column numbering. It's easier if we shift by one, A = 0, ..., Z = 25, AA = 26, ..., at least for the calculations. For your scheme, all that's needed then is a subtraction of 1 before converting to Xnary resp. an addition after converting from.
So, with that modification, let's start finding the conversion. First, how many symbols do we need to encode n? Well, there are 26 one-digit numbers, 26^2 two-digit numbers, 26^3 three-digit numbers etc. So the total of numbers using at most d digits is 26^1 + 26^2 + ... + 26^d. That is the start of a geometric series, we know a closed form for the sum, 26*(26^d - 1)/(26-1). So to encode n, we need d digits if
26*(26^(d-1)-1)/25 <= n < 26*(26^d-1)/25 // remember, A = 0 takes one 'digit'
or
26^(d-1) <= (25*n)/26 + 1 < 26^d
That is, we need d(n) = floor(log_26(25*n/26+1)) + 1 digits to encode n >= 0. Now we must subtract the total of numbers needing at most d(n) - 1 digits to find the position of n in the d(n)-digit numbers, let's call it p(n) = n - 26*(26^(d(n)-1)-1)/25. And the encoding of n is then simply a d(n)-digit base-26 encoding of p(n).
The conversion in the other direction is then a base-26 expansion followed by an addition of 26*(26^(d-1) - 1)/25.
So for N = 1000, we encode n = 999, log_26(25*999/26+1) = log_26(961.5769...) = 2.x, we need 3 digits.
p(999) = 999 - 702 = 297
297 = 0*26^2 + 11*26 + 11
999 = ALL
For N = 158760, n = 158759 and log_26(25*158759/26+1) = 3.66..., we need four digits
p(158759) = 158759 - 18278 = 140481
140481 = 7*26^3 + 25*26^2 + 21*26 + 3
158759 = H Z V D
This appears to be a very standard "implement conversion from base 10 to base N" where N happens to be 26, and you're using letters to represent all digits.
If you have A-Z as a 26ary value, you can represent 0 through (26 - 1) (like binary can represent 0 - (2 - 1).
BZ = 1 * 26 + 25 *1 = 51
The analogue would be:
19 = 1 * 10 + 9 * 1 (1/B being the first non-zero character, and 9/Z being the largest digit possible).
You basically have the right idea, but you need to shift it so A = 0, not A = 1. Then everything should work relatively sanely.
In the lengthy answer by #Daniel I see a call to log() which is a red flag for performance. Here is a simple way without much complex math:
function excelize(colNum) {
var order = 0, sub = 0, divTmp = colNum;
do {
divTmp -= 26**order;
sub += 26**order;
divTmp = (divTmp - (divTmp % 26)) / 26;
order++;
} while(divTmp > 0);
var symbols = "0123456789abcdefghijklmnopqrstuvwxyz";
var tr = c => symbols[symbols.indexOf(c)+10];
Number(colNum-sub).toString(26).split('').map(c=>tr(c)).join('');
}
Explanation:
Since this is not base26, we need to substract the base times order for each additional symbol ("digit"). So first we count the order of the resulting number, and at the same time count the substract. And then we convert it to base 26 and substract that, and then shift the symbols to A-Z instead of 0-P.