Algorithm for counting substrings in a numerical range - algorithm

I'm looking for a fast algorithm that can be used to solve this problem: Giving A and B integers (in the range [0,10^18]), and giving a list of N (N<=1000) numerical substrings; the goal is to count all the numbers in the range [A,B] containing any of the N substrings. We've always A<=B and the numerical substrings are also integers in the range [0,10^18].
Example1: if A=10, B=22, and giving N=2 substrings={1,10} ; the count would be = 11; counting the numbers: 10->19 and 21.
Example2: If A=175, B=201, and giving N=3 substrings={55,0,200} ; the count would be = 4; counting the numbers: 180, 190, 200 and 201.
The straight-forward way is to analyse each integer in the range [A,B], one after another, but it's not a solution since the range can be so big (until 10^18 integers).
One first thing I did to reduce the problem complexity is to delete some useless substrings from the original list of N substrings, such that "no substring is contained in another". For example: {1,10} becomes {1} and {55,0,200} becomes {55,0}. This won't change the final count.
Next, even assuming we can get the count for one substring in the range [A,B], we still cannot sum this count with those of other substrings from the list, as one number can contain many substrings and should not be counted more than once.
Any ideas to solve the problem et get the wanted count?

I think it is more of a combinatorial problem.
Calculate the possible number of digits of numbers between A and B. For example between 2 and 2000, the number of digits can be 1, 2, 3 or 4. With 1 digit, you need to calculate for numbers > 2 and for 4 digits, you have to calculate the numbers less than 2000, i.e, beginning with 1.
If the number of digits is k, and you have to say find the numbers containing substring 234, then choose where you will place that substring (in k-2 ways) and then find the number of permutations for all the possible remaining digits (i think in 10 ^ (k-3) ways). Ofcourse you will have to discount for leading zeroes etc.
Repeat this for all substrings.
Now you will have to subtract the ones that contain more than one substrings. Repeat the above procedure for all combinations of substrings and subtract it from the calculated value.

Related

Finding the biggest number divisible by other

I need to prepare an algorithm, which will display n-th digit (counting from right) of the biggest number divisible by B-1 where B is the base of specified number system. The number can only consist of digits provided in the input.
So for example: The base of number system is 3, the provided digits are [0, 1, 2] and I'm looking for 2nd digit. So that I need to find 2nd digit of the biggest number consisted of 0, 1, 2 divisible by 2. In this case the result will be 2, because the biggest number was 203.
I've tried to find this algorithm in many ways, but I cannot find any connection between input and output values.
You're asking about algorithm so basically I 'd do:
Generate biggest number from digits (it means sort them in desc order)
check if it is divisible by B-1
If yes sucess just return n-th digit if no go to step 1. but generate next number (by swaping different digits)
Second approach. Not efficient in most cases.
Generate all possible numbers
Sort in desc order
Get first divisible by B - 1 and return n-th digit

Generating all combinations (not all permutations) of n characters with length up to n

How can we generate all the possible combinations of n characters with length 1 to n in an increasing order of length?
For Example : if n = 4 and characters are 1,2,3,4
we need to generate an array
1,2,3,4
12,13,14,23,24,34
123, 124, 134, 234
1234
Here n is the variable and user can feed the n characters.
There is a bijection between the combinations of up to n elements and the bit masks of length n. A way to solve your problem is to generate all bitsets of length n and sort them by the number of bits they have on. You can use bucket sort for this.
Create n+1 lists corresponding to the numbers from 0 to n. Then iterate over all bitmasks and for each bitmask compute the number of bits that are 1 in it and then add the bitmask in the corresponding list. Using those lists it should be pretty easy to solve the problem.

Why is the total number of possible substrings of a string n^2?

I read that the total number of substrings that can be formed from a given string is n^2 but I don't understand how to count this.
By substrings, I mean, given a string CAT, the substrings would be:
C
CA
CAT
A
AT
T
The total number of (nonempty) substrings is n + C(n,2). The leading n counts the number of substrings of length 1 and C(n,2) counts the number of substrings of length > 1 and is equal to the number of ways to choose 2 indices from the set of n. The standard formula for binomial coefficients yields C(n,2) = n*(n-1)/2. Combining these two terms and simplifying gives that the total number is (n^2 + n)/2. #rici in the comments notes that this is the same as C(n+1,2) which makes sense if you e.g. think in terms of Python string slicing where substrings of s can always be written in the form s[i:j] where 0 <= i < j <= n (with j being 1 more than the final index). For n = 3 this works out to (9 + 3)/2 = 6.
In the sense of complexity theory the number of substrings is O(n^2), which might be what you read somewhere.
You have a starting point and and end point - if each could point to anywhere along the word, each would have n possible values, and therefor an overall of n^2, so that's an upper limit.
However, we need a constraint saying that the substring cannot end before it started, so end - start >=0. This cuts the possible count in about half, but on asymptotic terms it's still O(n^2)
Substring calculation is logically
selecting 2 blank spaces atleast one letter apart.
a| b c | d = substring bc
| a b c |d = substring abc.
Now how many ways can you chose these 2 blankspace. For n letter word there are n+1.
Then first select one = n+1 ways
Select another (not the same)= n
So total n(n+1). But you have calculated everything twice. So n*(n+1)/2.
Programmatically, without applying any special algorithms(like Z algo etc) you can use a map to calculate no of distinct substrings.(O(n^3)).
You can use suffix tree to get O(n^2) substring calculaton.
To get a substring of a given string s, you just need to select two different points in the string. Let s contain n characters,
|s[0]|s[1]|...|s[n-1]|
You want to choose two vertical bars to get a substring. How many vertical bars do you have? Exactly n+1. So the number of sustrings is C(n+1,2) = n(n+1)/2, which is to choose 2 items from n+1. Of course, it could be denoted as O(n^2).

Is this possible? Last few digits of sum equal to another number

I have a n-digit number and a list of numbers, from which any number can be used any number of times.
Taking numbers from the list, how do I know that it is possible to generate a sum such that the last n-digits of the sum are the the n-digit number?
Note: The sum has some initial value, its not zero.
EDIT - If a solution exists, I need to find the minimum number of the numbers added to get a number such that it has the last 4 digits as the given number. That be easily solved with DP (minimum coin change problem).
For example, if n=4,
Given number = 1212
Initial value = 5234
List = [1023, 101, 1]
A solution exists: 21212 = 5234 + 1023*15 + 101*6 + 1*27
It's easy to find a counterexample (see comments).
Now, for the solution here's a dynamic programming approach:
All arithmetic is modulo 10^n. For each value in the range 0 - 10^n-1 you need a flag whether it was found and you need a queue for the elements to be processed.
Push the initial value to the to-be-processed-list.
Get an element from the to-be-processed list. If empty, finished. No solution.
Try to add each number separately to this number. If it was already found, nothing to do. If sum is found, you've finished, there's a solution. If not, mark it as found and push it to the queue.
Goto 2
An actual solution can be reconstructed if you store how you reached a number. You just have to walk back from sum till you hit the initial value.
If the greatest common factor of the numbers in the list is a unit modulo 10n (that is, not divisible by 2 or 5) you can solve the problem for any choice of the other given values: use the extended Euclid's algorithm to find a linear combination of the list that sums to the gcf, find the multiplicative inverse of the gcf modulo 10n and multiply by the difference between the given and the initial values.
If the gcf of the numbers in the list is divisible by 2 or 5 (that is, is not a unit) and the difference between the given and the initial value is also divisible by 2 or 5, divide the numbers in the list and the difference by the largest powers of 2 and 5 that divide them all. If the gcf you end up with is a unit there is a solution and you can find it with the procedure above. Otherwise there is no solution.
For example, given 16 and initial value for the sum 5, and list of numbers [3].
The gcf of the numbers in the list is 3 which is a unit. Its inverse modulo 100 is 67 (3×67 = 201).
Multiplying by the difference between the given number and the initial value 16-5 = 11 we get the factor 67*11 = 737 for 3. Since we're working modulo 100 that's the same as 37.
Checking the result: 5 + 37×3 = 16. Yep, that works.

Given an array of integers, find the LARGEST number using the digits of the array such that it is divisible by 3

E.g.: Array: 4,3,0,1,5 {Assume all digits are >=0. Also each element in array correspond to a digit. i.e. each element on the array is between 0 and 9. }
In the above array, the largest number is: 5430 {using digits 5, 4, 3 and 0 from the array}
My Approach:
For divisibility by 3, we need the sum of digits to be divisible by 3.
So,
Step-1: Remove all the zeroes from the array.
Step-2: These zeroes will come at the end. {Since they dont affect the sum and we have to find the largest number}
Step-3: Find the subset of the elements of array (excluding zeroes) such that the number of digits is MAXIMUM and also that the sum of digits is MAXIMUM and the sum is divisible by 3.
STEP-4: The required digit consists of the digits in the above found set in decreasing order.
So, the main step is STEP-3 i.e. How to find the subset such that it contains MAXIMUM possible number of elements such that their sum is MAX and is divisible by 3 .
I was thinking, maybe Step-3 could be done by GREEDY CHOICE of taking all the elements and keep on removing the smallest element in the set till the sum is divisible by 3.
But i am not convinced that this GREEDY choice will work.
Please tell if my approach is correct.
If it is, then please suggest as to how to do Step-3 ?
Also, please suggest any other possible/efficient algorithm.
Observation: If you can get a number that is divisible by 3, you need to remove at most 2 numbers, to maintain optimal solution.
A simple O(n^2) solution will be to check all possibilities to remove 1 number, and if none is valid, check all pairs (There are O(n^2) of those).
EDIT:
O(n) solution: Create 3 buckets - bucket1, bucket2, bucket0. Each will denote the modulus 3 value of the numbers. Ignore bucket0 in the next algorithm.
Let the sum of the array be sum.
If sum % 3 ==0: we are done.
else if sum % 3 == 1:
if there is a number in bucket1 - chose the minimal
else: take 2 minimals from bucket 2
else if sum % 3 == 2
if there is a number in bucket2 - chose the minimal
else: take 2 minimals from bucket1
Note: You don't actually need the bucket, to achieve O(1) space - you need only the 2 minimal values from bucket1 and bucket2, since it is the only number we actually used from these buckets.
Example:
arr = { 3, 4, 0, 1, 5 }
bucket0 = {3,0} ; bucket1 = {4,1} bucket2 = { 5 }
sum = 13 ; sum %3 = 1
bucket1 is not empty - chose minimal from it (1), and remove it from the array.
result array = { 3, 4, 0, 5 }
proceed to STEP 4 "as planned"
Greedy choice definitely doesn't work: consider the set {5, 2, 1}. You'd remove the 1 first, but you should remove the 2.
I think you should work out the sum of the array modulo 3, which is either 0 (you're finished), or 1, or 2. Then you're looking to remove the minimal subset whose sum modulo 3 is 1 or 2.
I think that's fairly straightforward, so no real need for dynamic programming. Do it by removing one number with that modulus if possible, otherwise do it by removing two numbers with the other modulus. Once you know how many to remove, choose the smallest possible. You'll never need to remove three numbers.
You don't need to treat 0 specially, although if you're going to do that then you can further reduce the set under consideration in step 3 if you temporarily remove all 0, 3, 6, 9 from it.
Putting it all together, I would probably:
Sort the digits, descending.
Calculate the modulus. If 0, we're finished.
Try to remove a digit with that modulus, starting from the end. If successful, we're finished.
Remove two digits with negative-that-modulus, starting from the end. This always succeeds, so we're finished.
We might be left with an empty array (e.g. if the input is 1, 1), in which case the problem was impossible. Otherwise, the array contains the digits of our result.
Time complexity is O(n) provided that you do a counting sort in step 1. Which you certainly can since the values are digits.
What do you think about this:
first sort an array elements by value
sum up all numbers
- if sum's remainder after division by 3 is equal to 0, just return the sorted
array
- otherwise
- if sum of remainders after division by 3 of all the numbers is smaller
than the remainder of their sum, there is no solution
- otherwise
- if it's equal to 1, try to return the smallest number with remainder
equal to 1, or if no such, try two smallest with remainder equal to 2,
if no such two (I suppose it can happen), there's no solution
- if it's equal to 2, try to return the smallest number with remainder
equal to 2, or if no such, try two smallest with remainder equal to 1,
if no such two, there's no solution
first sort an array elements by remainder of division by 3 ascending
then each subset of equal remainder sort by value descending
First, this problem reduces to maximizing the number of elements selected such that their sum is divisible by 3.
Trivial: Select all numbers divisible by 3 (0,3,6,9).
Le a be the elements that leave 1 as remainder, b be the elements that leave 2 as remainder. If (|a|-|b|)%3 is 0, then select all elements from both a and b. If (|a|-|b|)%3 is 1, select all elements from b, and |a|-1 highest numbers from a. If the remainder is 2, then select all numbers from a, and |b|-1 highest numbers from b.
Once you have all the numbers, sort them in reverse order and concatenate. that is your answer.
Ultimately if n is the number of elements this algorithm returns a number that is al least n-1 digits long (except corner cases. see below).
NOTE: Take care of corner cases(i.e. what is |a|=0 or |b|=0 etc). (-1)%3 = 2 and (-2)%3 = 1 .
If m is the size of alphabet, and n is the number of elements, this my algorithm is O(m+n)
Sorting the data is unnecessary, since there are only ten different values.
Just count the number of zeroes, ones, twos etc. in O (n) if n digits are given.
Calculate the sum of all digits, check whether the remainder modulo 3 is 0, 1 or 2.
If the remainder is 1: Remove the first of the following which is possible (one of these is guaranteed to be possible): 1, 4, 7, 2+2, 2+5, 5+5, 2+8, 5+8, 8+8.
If the remainder is 2: Remove the first of the following which is possible (one of these is guaranteed to be possible): 2, 5, 8, 1+1, 1+4, 4+4, 1+7, 4+7, 7+7.
If there are no digits left then the problem cannot be solved. Otherwise, the solution is created by concatenating 9's, 8's, 7's, and so on as many as are remaining.
(Sorting n digits would take O (n log n). Unless of course you sort by counting how often each digit occurs and generating the sorted result according to these numbers).
Amit's answer has a tiny thing missing.
If bucket1 is not empty but it has a humongous value, lets say 79 and 97 and b2 is not empty as well and its 2 minimals are, say 2 and 5. Then in this case, when the modulus of the sum of all digits is 1, we should choose to remove 2 and 5 from bucket 2 instead of the minimal in bucket 1 to get the largest concatenated number.
Test case : 8 2 3 5 78 79
If we follow Amits and Steve's suggested method, largest number would be 878532 whereas the largest number possible divisble by 3 in this array is 879783
Solution would be to compare the appropriate bucket's smallest minimal with the concatenation of both the minimals of the other bucket and eliminate the smaller one.

Resources