Criteria for choosing mod value

Criteria for choosing mod value - algorithm

I have an array Array = {}, size of array is n
My constraints are like this:
n <= 100000
and Arrayi <=100
I have to find the products of all the elements in the array, I will be given a mod value with which I have to mod the product. The mod will value changes all the time and this mod value is always less than or equal to n.
My problem is when I chose, a global mod value say R = 1000000000 (which is far bigger than mod constraint) and whenever my product exceeds this value, I mod the result.
But I dont know why the result Im obtaining is zero.
My question is how do I chose R in such situations?

I dont know your code but it is likey that 0 is correct result.
Pick R large prime and make sure that none of elements is divisible by this number in order to get result different from 0.

You haven't showed us your code, but presumably it looks something like the following pseudo-Python code:
limit = 1000000
def product_mod( array, m ):
product = 1
for k in array:
product = product * k
if product > limit: product = product % m
return product % m
This algorithm should work, provided that the limit is low enough that product * k can never overflow. If it doesn't, you probably have a bug in your code.
However, note that it's quite likely that the result of this function may often be legitimately zero: specifically, this will happen whenever the product of the numbers in your array evenly divides the modulus. Since the product will typically be a highly composite number (it will have at least as many factors as there are numbers in the array), this is pretty likely.
In particular, the output of the function will be zero whenever:
any one of the numbers in the array is zero,
any one of the numbers in the array is a multiple of the modulus, or
the product of any subset of the numbers in the array is a multiple of the modulus.
In all those cases, the product of all the numbers in the array will be either zero or a multiple of the modulus (and will thus reduce to zero).

It sounds that you are calculating the product of the values in the array modulo some given value, but are using another value to limit integer overflow in the intermediate calculations. Then you have a high risk of getting a wrong result.
E.g. 120 mod 9 = 3 while (120 mod 100) mod 9 = 20 mod 9 = 2
The correct procedure is then to do all the calculations modulo the same number as you are to use for the final result since (a * b) mod n = (a mod n) * (b mod n) mod n
E.g. (24 * 5) mod 9 = (24 mod 9) * (5 mod 9) mod 9 = (6 * 5) mod 9 = 30 mod 9 = 3

Related

Given an integer array (of size N) and a number M, find product of N-1 elements of the array modulo M

Let's say you are given an array A of N integers and another integer M. For any given index i where 0 <= i < N, hide the ith index of A and return the product of all other elements of A modulo M.
For example, say A = {1, 2, 3, 4, 5} and M=100 then for i=1, the result would be (1x3x4x5) mod 100. Hence the result is 60.
Assume that all integers are 32 bit unsigned integers.
Now an obvious approach to do this is to calculate the result for any given value of i. That would mean N-1 multiplications for every given value of i. Is there a more optimal way to do this?
P.S.
First idea would be to store the product of all numbers in A (let's call this total). Now for every given value of i, we can just divide total by A[i] and return the result after taking the modulo. However, the total would cause an overflow so this cannot be done.

Easy...:)
left[0]=a[0];
for(int i=1;i<=n-1;i++)
left[i]=(left[i-1]*a[i])%M;
right[n-1]=a[n-1];
for(int i=n-2;i>=0;i--)
right[i]=(right[i-1]*a[i])%M;
for query q
if(q==0)
return right[1]%M;
if(q==n-1)
return left[n-2]%M;
return (left[q-1]*right[q+1])%M;
Suppose there is an array of 5 elements.
Now
index: 1 2 3 4 5
1 5 2 10 4
Now for query q=3
answer is = ((1*5) * (10*4))%M
for query q=4
answer is = ((1*5*2)*(4))%M
We are basically pre computing all the left and right multiplication
index: 1 2 3 4 5
1 5 2 10 4
left: 1 5 10 100 400
right: 400 400 80 40 4
For q=3 answer is left[2]*right[4]= (5*40)%M= 200%M
For q=4 answer is left[3]*right[5]= (10*4)%M= 40%M

For this answer, I'm assuming that this is not a ONE-TIME calculation, but it is something that can take place many times with different values of i.
First, define a non-volatile array to hold calculated products.
Then, whenever the function is invoked with a given pair of parameters (M and i):
Check in the array (of above) if the product was calculated,
If yes, simply use the stored value, calculate the MOD and return the result,
If not, calculate the product, store it, calculate the MOD and return the value.
This method spares you from having a (potentially long) initialization which might calculate products that would not be needed.

How can I minimise number of additions?

Multiply two numbers without using * operator, and with minimum number of additions
For eg: If input is, 5*8, one of the following ways, can be add the bigger number smaller number of times, and that will be the answer. But how can I minimise the number of additions?

One strategy to minimize reduce the number of additions is to add things hierarchically. This is the same strategy that is used in the classic power algorithm, which follows the same technique for minimizing the number of multiplications.
Let's say you need
M = a * 8 = a + a + a + a + a + a + a + a
Once you calculate m2 = a + a, you can substitute it into the above addition and get
M = m2 + m2 + m2 + m2
Then you can calculate m4 = m2 + m2 and arrive at
M = m4 + m4
So, the result is calculated in 3 additions instead of the original 8. However, adding a value to itself can be replaced by a left-shift by 1 bit (if this is allowed), this greatly reducing the number of additions.
This technique can be elegantly implemented through analyzing the binary representation of one of the multiplicands (exactly as it is typically implemented in the power algorithm). E.g. if you need to calculate a * b you can do it in this fashion
int M = 0;
for (int m = a; b != 0; b >>= 1, m <<= 1)
if ((b & 1) != 0)
M += m;
The total number of additions such implementation will use is the total number of 1 bits in b. It will multiply 5 by 8 in 1 addition.
Note that in order to achieve the lowest the number of additions provided by this strategy, multiplying larger number by smaller number is not necessarily the best idea. E.g. multiplying by 8 uses less additions than multiplying by 5.

A better example will be 5 * 7. This is essentially the binary multiplication using old methods, but with clever choice of the multiplier.
If we can use left-shift and that doesn't count as an addition: choose the number with the smaller number of bits as the multiplier. This will be 5 in this case.
111
x 101
------
111
000x <== This is not an addition, only a left shift
111xx
-------
100011 <== 2 additions totally.
-------
If we cannot use left-shift: note that left shift is the same as doubling / additions. Then we will have to use a slightly different tactic. Since the multiplicand will be shifted the same number of times as the (position of MSB - 1), the number of additions will be the number with the lesser value of (position of MSB - 1) + (number of bits set). In the case of 5 * 8, the values are (3-1) + 2 = 4 and (4-1) = 3 respectively. The lesser is for 8 and hence use that as the multiplier.
101
x 1000
-------
000
000x <== left shift
000xx <== left shift
101xxx <== left shift
--------
101000 <== no addition needed, so 3 additions totally.
--------
The above has three shifts and zero additions.

I like Codor's suggestion of using shifts and having zero additions!
But if you can truly only use additions and no other operations like shifts, logs, subtractions, etc, I believe the minimal number of additions to compute a * b will be:
min{int[log2(a+1)] + numbits(a), int[log2(b+1)] + numbits(b)} - 2
where
numbits(n) is the number of ones in the binary representation of
integer n
For example, numbits(4)=1, numbits(5)=2, etc.
int[x] is the integer part of float x
For example, int[3.9]=3
Now, how did we get there? First look at your original example. You can at least group additions together. E.g.
8+8=16
16+16=32
32+8=40
To generalize this, if you need to multiply a b times by only using additions that used a or the results of additions already computed, you need:
int[log2(b+1)]-1 additions to compute all the 2^n.a intermediate numbers you need.
In your example, int[log2(5+1)]-1 = 2: you need 2 additions to compute 16 and 32
numbits(b)-1 additions to add all intermediate results together, where numbits(b) is the number of ones in the binary representation of b.
In your example, 5 = 2^2 + 2^0 so numbits(5)-1 = 1: you need 1 addition to do 32 + 8
Interestingly, this means that your statement
add the bigger number smaller number of times
is not always the recipe to minimize the number of additions.
For example, if you need to compute 2^9 * (2^9 - 1), you are better off computing additions based on (2^9-1) than on 2^9 even though 2^9 is larger. The fastest approach is:
x = (2^9-1) + (2^9-1)
And then
x = x+x
8 times for a total of 9 additions.
If instead you added 2^9 to itself, you would need 8 additions to get all the 2^k*2^9 first and then an additional 8 additions to add all these numbers together for a total of 16 additions.

suppose a is to be multiplied with b and we are storing the result in res, we add a to res only if b is odd, else keep dividing b by 2 and multiplying a by 2. this is done in a loop till b becomes 0. multiplication and division can be done using bitwise operator.
Let the two given numbers be 'a' and 'b'
1) Initialize result 'res' as 0.
2) Do following while 'b' is greater than 0
a) If 'b' is odd, add 'a' to 'res'
b) Double 'a' and halve 'b'
3) Return 'res'.

Linear Hashing calculation?

I am currently studying for my exams and have came up against this question:
(5d) Suppose we are using linear hashing, and start with an empty table with 2 buckets (M = 2), split = 0 and a load factor of 0.9. Explain the steps we go through when the following hashes are added (in order):
5,7,12,11,9
The answer provided for this is:
*— —5— (0,1)
* — —5,7 —
split —*—5,7— — (0,1,2)
—12*—5,7— — —
split —12—5—*—7— (0,1,2,3)
split =M, M = 2*M, split = 0
*—12—5— —7—
*—12—5— —7,11—
split —*—5— —7,11—12— (0,1,2,3,4)
—*—5,9— —7,11—12—
split — —9*— —7,11—12—5— (0,1,2,3,4,5)
This answer doesn't make any sense to me and the lecturer did not go through this.
How do I tackle this question?

I edited your question because the answer looks like a list of descriptions of the hash table state as each operation is performed. Did your professor cover linear hashing at all? The Wikipedia description mention a load factor precisely, but it's in the original LH paper by Witold Litwin. it's integral to when a controlled split occurs. I also found these descriptions:
Let l denote the Linear Hashing scheme’s load factor, i.e., l = S/b where S is the total number of records and b is the number of buckets used.
Linear Hashing by Zhang, et al (PDF)
The linear hashing algorithm performs splits in a deterministic order, rather than splitting at a bucket that overflowed. The splits are performed in linear order (bucket 0 first, then bucket 1, then 2, ...), and a split is performed when any bucket overflows. If the bucket that overflows is not the bucket that is split (which is the common case), overflow techniques such as chaining are used, but the common case is that few overflow buckets are needed.
snip
Instead of splitting on every collision, you can do a split when the "load" (which is bytes stored / (num buckets * bucket size), i.e. utilization of the data structure) crosses some watermark. This is called controlled splitting; the previously described is called uncontrolled splitting.
Linear Hashing: A new Tool for File and Table Addressing Witold Litwin, Summary by: Steve Gribble and Armando Fox, Online Berkley.edu retrieved June 16
So basically, a load factor is a means of predictably controlling when a split will occur. One implementation of linear hashing appears to be called 'uncontrolled split' which adds a new bucket and performs a split whenever a collision occurs. Using a load factor of 0.9 only has a split occur when 90% of the tables buckets are full - or rather, would be full based on the prediction that the buckets are uniformly assigned to.
Based on this and the Wikipedia article I just read, the setup is this:
Table is initially empty with two buckets (N = 2) - - (numbered 0 and 1)
N for number of buckets makes so much more sense to me than M, so I'm using that in my answer.
Apparently N is never changed even as new buckets are added to the table.
Our growth factor (L for bucket level) is 0. It is incremented every time every bucket in the table has been split once, which coincides with when our table has doubled in size.
Step pointer S (also called a split pointer) points to 0th bucket. It indicates which bucket will have a split applied to it next.
This follows the wikipedia article description I linked to above. Now we need to cover the hash and bucket assignment.
A decent hash function for integers you expect to have a normal distribution is to just use the integer itself. So for an input integer I, our hash H(I) is just I. I think this follows the answer key, which is good because the question is unanswerable without knowing H.
To determine which bucket an integer I is added to, one of two function values will be used, depending on whether or not the assignment points to before or after S.
First, calculate H(I) mod (N x 2L), which is really just I mod (N x 2L). I'm going to call this B(I) below for brevity (also for bucket). Call this the assignment address A.
If A is greater than or equal to S, we assign input I to address A and move on.
If A (B(I)) is less than S, we actually use a different hash function, I'll call B'(I), which is calculated as I mod (N x 2L + 1), giving us an actual assignment address of A'.
I think the reasoning for this is to keep the assignment to buckets more even as buckets are split along the way, but I don't have the mathematical proof of its importance.
I think the * in the answer's notation above denotes the location of the split pointer S. In my notation for the rest of the question below:
Let - denote an empty bucket, i denote a bucket with the Integer i in it, and i,j denote a bucket with both i and j in it.
So the first step of your answer key "— —5— (0,1)" is saying bucket 0 is empty and bucket 1 has 5 in it. I would rewrite this as - 5 for clarity.
I'm thinking your answer breakdown reads like this:
Add 5 to the table.
The linear hashing algorithm puts it into the second bucket (index 1) because:
B(5) = 5 mod (2 x 20) = 5 mod (2 x 1) = 5 mod 2 = 1
1 is greater than S, which is still 0, so we use 1 as the address.
Table now has - 5 (0th bucket empty, 1st bucket with 5 in it.
N, L, and S are unchanged
Add 7 to the table.
B(7) = 7 mod 2 = 1, so 7 is added to the same bucket as 5. S still hasn't changed, so again 1 is used as the address.
Table now has - 5,7
A split occurs! Not because a bucket has overflowed, but because the load factor has been exceeded. 2 items added, 2 total buckets, 2/2 = 1.0 > 0.9 = do a split.
First a new bucket is added at the end of the table.
S is incremented to 1. N is not incremented. L is unchanged
The split is done on a bucket. A split means all the items in the bucket get their assignment recalculated based on the new hash table size. However, one key to linear hashing is that the actual buckets are split in order, so the 0th bucket is split even though the 1st bucket is the one thats full.
Post split, the table is now - 5,7 -, with buckets 0 and 2 empty, and 1 still with 5 and 7 in it.
Add 12 to the table.
B(12) = 12 mod (2 x 20) = 12 mod 2 = 0
S is 1 and B(12) is 0, so we calculate B'(12) instead for our address.
Coincidentally, this is 12 mod (2 x 20 + 1) = 12 mod 4, which is still 0, so 12 is added to the 0th bucket.
Table now has 12 5,7 -, only the 3rd, new bucket is empty.
A split occurs again, because 3/3 = 1.0 > 0.9. This split promises to be more interesting than the last!
A new bucket is added to the end of the table, giving us 12 5,7 - -
S = 1, so the bucket with 5,7 is split. That means new buckets are picked for 5 and 7.
Increment S to 2. This is done after the split target bucket is picked, but before the new buckets are assigned. This ensures the new table is more evenly distributed (again, my supposition, don't have proof).
5 mod 2 = 1, 1 < S, calculate 5 mod 2 x 21 = 5 mod 4 = 1. 5 is re-assigned to its same bucket.
7 mod 2 = 1, 1 < S, calculate 7 mod 2 x 21 = 7 mod 4 = 3. 7 is re-assigned to 3.
Table now has 12 5 - 7
S = 2, N still equals 2, and L still = 0. S has now reached N x 2L = 2 x 20 = 2, so S is reset to 0 and L is incremented to 1.
Add 11 to the table.
B(11) = 11 mod (2 x 21) = 11 mod 4 = 3. 11 is assigned to the 3rd bucket.
Table now has 12 5 - 7,11, 4 items and 4 buckets, so a split occurs again.
S is 0 again, so the 0th bucket with 12 is reassigned after a new bucket is added. S is incremented to 1 before choosing a new bucket for 12.
B(12) = 12 mod (2 x 21) = 12 mod 4 = 0. 0 < 1, so recalculate
B'(12) = 12 mod (2 x 21+1) = 12 mod 8 = 4. 12 is assigned to the 4th bucket.
Table now contains - 5 - 7,11 12
Add 9 to the table.
I'll leave the steps to the last one for you. There are a few nuances to the LH algorithm that I'm not quite grasping. I might ask additional questions about them. But hopefully that's enough for you to get going on. In the future, I would recommend asking the course instructor directly.

Encode number to a result

In my app I need to run a 5 digits number through an algorithm and return a number between the given interval, ie:
The function encode, gets 3 parameters, 5 digits initial number, interval lower limit and interval superior limit, for example:
int res=encode(12879,10,100) returns 83.
The function starts from 12879 and does something with the numbers and returns a number between 10 and 100. This mustn't be random, every time I pass the number 12879 to the encode function must always return the same number.
Any ideas?
Thanks,
Direz

One possible approach:
compute the range of your interval R = (100 - 10) + 1
compute a hash modulo R of the input H = hash(12879) % R
add the lower bound to the modular hash V = 10 + H
Here the thing though - you haven't defined any constraints or requirements on the "algorithm" that produces the result. If all you want is to map a value into a given range (without any knowledge of the distribution of the input, or how input values may cluster, etc), you could just as easily just take the range modulo of the input without hashing (as Foo Bah demonstrates).
If there are certain constraints, requirements, or distributions of the input or output of your encode method, then the approach may need to be quite different. However, you are the only one who knows what additional requirements you have.

You can do something simple like
encode(x,y,z) --> y + (x mod (z-y))

You don't have an upper limit for this function?
Assume it is 99999 because it is 5 digits. For your case, the simplest way is:
int encode (double N,double H,double L)
{
return (int)(((H - L) / （99999 - 10000)) * (N - 10000) + 10);
}

Derive integer factors of float value?

I have a difficult mathematical question that is breaking my brain, my whiteboard, and all my pens. I am working with a file that expresses 2 values, a multiplicand and a percentage. Both of those values must be integers. These two values are multiplied together to produce a range. Range is a float value.
My users edit the range, and I have to calculate a new percentage and multiplicand value. Confused yet? Here's an example:
Multiplicand: 25000 Apples
Percentage: 400 (This works out to .4% or .004)
Range: 100.0 Apples (Calculated by Multiplicand * Percentage)
To complicate things, the allowable values for Percentage are 0-100000. (Meaning 0-100%) Multiplicand is a value between 1 and 32bit int max (presumably unsigned).
I need to allow for users to input a range, like so:
Range: .04 Apples
And calculate the appropriate Percentage and Multiplicand. Using the first example:
OriginalMultiplicand: 25000 Apples
OriginalPercentage: 400 (This works out to .4% or .004)
OriginalRange: 100.0 Apples (Calculated by Multiplicand * Percentage)
NewRange: .01 Apples
NewPercentage: 40
NewMultiplicand: 25 Apples
The example calculation is easy, all that was required was adjusting down the multiplicand and percentage down by the scale factor of the new and old range. The problem arises when the user changes the value to something like 1400.00555. Suddenly I don't have a clean way to adjust the two values.
I need an algorithmic approach to getting values for M & P that produce the closest possible value to the desired range. Any suggestions?

To maximize the numbers of decimal points stored, you should use a P of 1, or 0.1%. If that overflows M, then increment P.
So for your example of 1400.00555, P is 1 and M is 1400006
Your algorithm would search for the lowest P such that M does not overflow. And you can do a binary search here.
public int binarySearch(int P0, int P1) {
P = (P1 - P0)/2;
if(P == P0) {
if(R/(P0/100f) does not overflows 32-bit int) {
return P0;
} else {
return P1;
}
}
if(R/(P/100f) does not overflows 32-bit int) {
return binarySearch(P0, P);
} else {
return binarSearch(P, P1);
}
}
P = binarySearch(1, 100000);
M = round(R/(P/100f));

(I had a bad method here, but I erased it because it sucked.)
EDIT:
There's got to be a better way than that. Let's rephrase the problem:
What you have is an arbitrary floating-point number. You want to represent this floating-point number with two integers. The integers, when multiplied together and then divided by 100000.0, are equal to the floating-point number. The only other constraint is that one of the integers must be equal to or less than 100000.
It's clear that you can't actually represent floating-point numbers accurately. In fact, you can ONLY represent numbers that are expressible in 1/100000s accurately, even if you have an infinite number of digits of precision in "multiplicand". You can represent 333.33333 accurately, with 33333333 as one number and 1 as the other; you just can't get any more 3s.
Given this limitation, I think your best bet is the following:
Multiply your float by 100000 in an integer format, probably a long or some variant of BigNumber.
Factor it. Record all the factors. It doesn't matter if you store them as 2^3 or 2*2*2 or what.
Grab as many factors as you can without the multiplication of them all exceeding 100000. That becomes your percent. (Don't try to do this perfectly; finding the optimal solution is an NP-hard problem.)
Take the rest of the factors and multiply them together. That's your multiplicand.

As I understand from your example, you could represent the range in 100000 different multiplicand * percentage. any choice of multiplicand will give you a satisfying value of percentage, and vice versa. So you have this equation in two variables:
Multiplicand * Percentage = 100.0
You should figure out another equation(constraint), to get a specific value of Multiplicand OR Percentage to solve this equation. Otherwise, you could choose Percentage to be any number between 0-100000 and just substitute it in the first equation to get the value of Multiplicand. I hope I understood the question correctly :)
Edit: OK, then you should factorize the range easily. Get the range, then try to factorize it by dividing range by percentage(2-100000). Once the reminder of division is zero you got the factors. This is a quick pseudo-code:
get range;
percentage = 2;
while(range % percentage != 0)
{
percentage++;
}
multiplicand = range / percentage;
All what you have to do now is to calculate your limits:
max of percentage = 100000;
max of multiplicand = 4294967295;
Max of range = 4294967295 * 100000 = 429496729500000 (15-digit);
your Max range consists of 15 digit at a maximum. double data types in most programming languages can represent it. Do the calculation using doubles and just convert the Multiplicand & Percentage to int at the end.

It seems you want to choose M and P such that R = (M * P) / 100000.
So M * P = 100000 * R, where you have to round the right-hand side to an integer.
I'd multiply the range by 100000, and then choose M and P as factors of the result so that they don't overflow their allowed ranges.

say you have
1) M * P = A
then you have a second value for A, so also new values for M and P, lets call then M2, P2 and A2:
2) M2 * P2 = A2
This I dont know for sure, but that is what you seem to be saying imho: the ratio has to stay the same, then
3) M/P = M2/P2
Now we have 3 equations and 2 unknowns M2 and P2
One way to solve it:
3) becomes
M/P = M2/P2
=>M2 = (M/P)*P2
than substitute that in 2)
(M/P)*P2*P2 = A2
=> P2*P2 = A2 * (P/M)
=> P2 = sqrt(A2 * (P/M))
so first solve P2, then M2 if i didn't make any mistakes
There will have to be some rounding if M2 and P2 have to be integers.
EDIT: i forgot about the integer percentage, so say
P = percentage/100000 or P*100000 = percentage
P2 = percentage2/100000 or P2*100000 = percentage2
so just solve for P2 and M2, and multiply P2 with 100000

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio