Random pairings with equal counts - algorithm

I am working on some code that will allow assessors to assess something (vague, right?). Before assessing can occur, a random sampling needs to be taken of the submitted items. That part is rather simple.
The part that is fouling me up is the requirement that each item needs to be assessed by two different assessors and that we want the final number of assessments that each assessor performs to be as evenly distributed as possible.
Example: If I have 10 items, that should come out to 20 assessments (2 assessments per item). 20 assessments divided by 4 assessors comes out to 5 assessments per assessor. Obviously the numbers won't always come out this clean (11 items would still come out to 5 per assessor, with the remaining two to get assigned on top after everyone has evened out).
Just looking for some algorithmic help here. The closest I can get ended up being more of a bell curve than I would have liked.

It's not difficult. Let's say you have A accessors and I items. Just run the following loop (everything is zero-based indexing):
a = 0
for 0 <= r < 2:
for 0 <= i < I:
while (assessor a is already assessing item i):
a = (a + 1) mod A
assessor a will assess item i on round r
a = (a + 1) mod A
This will simply allocate the assessors in round-robin fashion, but will skip over those cases where the same assessor would assess the same item twice.

For me it looks like you need to distribute 2N assessments of N items between M assessors so that every assessor will get equal his share or as close to is as possible.
There's identity:
2N = ceil(2N/M) + ceil((2N-1)/M) + ... + ceil((2N-M+1)/M)
which can be used for that purpose. ceil here is the closest non-lesser integer: ceil(2.3) = 3, ceil(4) = 4
For you example of 11 items you will have 22 = 5 + 5 + 4 + 4 + 4.
How it works? I will refer you to "Concrete mathematics" by Knuth, Patashnik & Graham, chapter 3, part 4 for explanation :)
I've coded Anttis' approach and the one described in "Concrete math":
public static void main(String[] args) {
wayOne(5, 7);
System.out.println("======");
wayTwo(5, 7);
}
private static void wayOne(int assessors, int items) {
Integer assessments[][] = new Integer[2][items];
int assessor = 0;
for (int pass = 0; pass < 2; pass++) {
for (int item = 0; item < items; item++) {
while (assessments[pass][item] != null)
assessor = (assessor + 1) % assessors;
assessments[pass][item] = assessor;
assessor = (assessor + 1) % assessors;
}
}
for (int pass = 0; pass < assessments.length; pass++) {
for (int item = 0; item < assessments[pass].length; item++)
System.out.println("Pass " + pass + " item " + item + " is assessed by " + assessments[pass][item]);
}
}
private static void wayTwo(int assessors, int items) {
Integer distribution[][] = new Integer[2][items];
int assessments = 2 * items;
int step = 0, prevBatch = 0;
while (assessments > 0) {
int batch = (int) Math.ceil(( 2.0 * items - step) / assessors);
assessments -= batch;
for (int i = prevBatch; i < batch + prevBatch; i++) {
distribution[i / items][i % items] = i % assessors;
}
prevBatch += batch;
step++;
}
for (int pass = 0; pass < distribution.length; pass++) {
for (int item = 0; item < distribution[pass].length; item++)
System.out.println("Pass " + pass + " item " + item + " is assessed by " + distribution[pass][item]);
}
}
If I'm correct, second way will give more desired output. For example, try it for 7 items and 5 assessors. Or 11 items and 4 assessors.
UPDATE After I fixed bug pointed out by Antti, two routines give same results.

Related

Collaborative sorting algorithm based on 1 vs 1 choice

I don't know if this is a more mathematical object, but I lurked mathexchange and doesn't look algorithm oriented so I prefer to ask here.
I would like to know if the following problem, was already resolved:
let's say we have 10 objects and that we want to sort them preferences based. If the sort pertains a single person, no problem, we ask him to answer to our questions (using bubblesort or similar) and answering, after a bunch of questions, he will receive the final ranking.
Now let's say that there are 10 persons. And we want to make a global rank. It becomes difficult, and anyone can have its way to solve the problem (for example, asking for the "first favourite three" to everyone and assigning points, then make a ranking);
I would like to be more scientific and therefore more algorithmic, so, in other words, use bubble sort (whose implementation, is like a series of question 1vs1 objects and asking what's your favourite, then make a ranking) for the ten people, minimizing the questions to ask.
So we should have a way to global rank the objects, and in the meanwhile assigning to the people who will sort, major importance, and if possible, don't wait for anyoone making his ranking but on percentages and statistics basis.
Hope to have explained well my question, please if you don't feel it's for this group, let me know and transfer on another service. Thanks!
You question is the subject of Arrow's Theorem. In short, what you are trying to do is impossible in general.
If you still want to try, I suggest using directed edges in a directed graph to represent preferences; something like majority prefers A to B, include edge A->B, and no edge in case of ties. If the result is a Directed Acyclic Graph, congratulations, you can order the items with a toposort. Otherwise use Tarjan's Algorithm to identify strongly connected components, which are the trouble spots.
In general, the best way out of this conundrum in my opinion is to obtain scores rather than ranking pairs of items. Then you just average the scores.
After the unpromising results of my previous answer, I decided to get started on a practical aspect of the question: how to optimally ask the questions to establish a person's preference.
skipping unnecessary questions
If there are 10 items to order, there are 45 pairs of items which have to be compared. These 45 decisions make up a triangular matrix:
0 1 2 3 4 5 6 7 8
1 >
2 > <
3 < > =
4 = > < =
5 > < < < >
6 < > > < > <
7 < > < = = < >
8 < < = > = < < <
9 = > > < < > > = >
In the worst case scenario, you'd have to ask a person 45 questions before you can fill out the whole matrix and know his ranking of the 10 items. However, if a person prefers item 1 to item 2, and item 2 to item 3, you can deduce that he prefers item 1 to item 3, and skip that question. In fact, in the best case scenario, just 9 questions will be enough to fill out the whole matrix.
Answering binary questions to deduce an item's place in an ordered list is very similar to filling a binary search tree; however, in a 10-item b-tree, the best-case scenario is 16 questions instead of our theoretical minimum of 9; so I decided to try and find another solution.
Below is an algorithm based on the triangular matrix. It asks the questions in random order, but after every answer it checks which other answers can be deduced, and avoids asking unnecessary questions.
In practice, the number of questions needed to fill out the 45-question matrix is on average 25.33, with 90.5% of instances in the 20-30 range, a minimum value of 12 and a maximum of 40 (tested on 100,000 samples, random question order, no "=" answers).
When the questions are asked systematically (filling the matrix from top to bottom, left to right), the distribution is quite different, with a lower average of 24.44, a strange cutoff below 19, a few samples going up to the maximum of 45, and an obvious difference between odd and even numbers.
I wasn't expecting this difference, but it has made me realise that there are opportunities for optimisation here. I'm thinking of a strategy linked to the b-tree idea, but without a fixed root. That will be my next step. (UPDATE: see below)
function PrefTable(n) {
this.table = [];
for (var i = 0; i < n; i++) {
this.table[i] = [];
for (var j = 0; j < i; j++) {
this.table[i][j] = null;
}
}
this.addAnswer = function(x, y, pref, deduced) {
if (x < y) {
var temp = x; x = y; y = temp; pref *= -1;
}
if (this.table[x][y] == null) {
this.table[x][y] = pref;
if (! deduced) this.deduceAnswers();
return true;
}
else if (this.table[x][y] != pref) {
console.log("INCONSISTENT INPUT: " + x + ["<", "=", ">"][pref + 1] + y);
}
return false;
}
this.deduceAnswers = function() {
do {
var changed = false;
for (var i = 0; i < this.table.length; i++) {
for (var j = 0; j < i; j++) {
var p = this.table[i][j];
if (p != null) {
for (var k = 0; k < j; k++) {
var q = this.table[j][k];
if (q != null && p * q != -1) {
changed |= this.addAnswer(i, k, p == 0 ? q : p, true);
}
}
for (var k = i + 1; k < this.table.length; k++) {
var q = this.table[k][j];
if (q != null && p * q != 1) {
changed |= this.addAnswer(i, k, p == 0 ? -q : p, true);
}
}
for (var k = j + 1; k < i; k++) {
var q = this.table[i][k];
if (q != null && p * q != 1) {
changed |= this.addAnswer(j, k, p == 0 ? q : -p, true);
}
}
}
}
}
}
while (changed);
}
this.getQuestion = function() {
var q = [];
for (var i = 0; i < this.table.length; i++) {
for (var j = 0; j < i; j++) {
if (this.table[i][j] == null) q.push({a:i, b:j});
}
}
if (q.length) return q[Math.floor(Math.random() * q.length)]
else return null;
}
this.getOrder = function() {
var index = [];
for (i = 0; i < this.table.length; i++) index[i] = i;
index.sort(this.compare.bind(this));
return(index);
}
this.compare = function(a, b) {
if (a > b) return this.table[a][b]
else return 1 - this.table[b][a];
}
}
// CREATE RANDOM ORDER THAT WILL SERVE AS THE PERSON'S PREFERENCE
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = fruit.slice();
for (i in pref) pref.push(pref.splice(Math.floor(Math.random() * (pref.length - i)),1)[0]);
pref.join(" ");
// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(a, b) {
if (pref.indexOf(a) - pref.indexOf(b) < 0) return -1
else if (pref.indexOf(a) - pref.indexOf(b) > 0) return 1
else return 0;
}
// CREATE TABLE AND ASK QUESTIONS UNTIL TABLE IS COMPLETE
var t = new PrefTable(10), c = 0, q;
while (q = t.getQuestion()) {
console.log(++c + ". " + fruit[q.a] + " or " + fruit[q.b] + "?");
var answer = preference(fruit[q.a], fruit[q.b]);
console.log("\t" + [fruit[q.a], "whatever", fruit[q.b]][answer + 1]);
t.addAnswer(q.a, q.b, answer);
}
// PERFORM SORT BASED ON TABLE
var index = t.getOrder();
// DISPLAY RESULT
console.log("LIST IN ORDER:");
for (var i in index) console.log(i + ". " + fruit[index[i]]);
update 1: asking the questions in the right order
If you ask the questions in order, filling up the triangular matrix from top to bottom, what you're actually doing is this: keeping a preliminary order of the items you've already asked about, introducing new items one at a time, comparing it with previous items until you know where to insert it in the preliminary order, and then moving on to the next item.
This algorithm has one obvious opportunity for optimisation: if you want to insert a new item into an ordered list, instead of comparing it to each item in turn, you compare it with the item in de middle: that tells you which half to new item goes into; then you compare it with the item in the middle of that half, and so on... This limits the maximum number of steps to log2(n)+1.
Below is a version of the code that uses this method. In practice, it offers very consistent results, and the number of questions needed is on average 22.21, less than half of the maximum 45. And all the results are in the 19 to 25 range (tested on 100,000 samples, no "=" answers).
The advantage of this optimisation becomes more pronounced as the number of items increases; for 20 items, out of a possible 190 questions, the random method gives an average of 77 (40.5%), while the optimised method gives an average of 62 (32.6%). At 50 items, that is 300/1225 (24.5%) versus 217/1225 (17.7%).
function PrefList(n) {
this.size = n;
this.items = [{item: 0, equals: []}];
this.current = {item: 1, try: 0, min: 0, max: 1};
this.addAnswer = function(x, y, pref) {
if (pref == 0) {
this.items[this.current.try].equals.push(this.current.item);
this.current = {item: ++this.current.item, try: 0, min: 0, max: this.items.length};
} else {
if (pref == -1) this.current.max = this.current.try
else this.current.min = this.current.try + 1;
if (this.current.min == this.current.max) {
this.items.splice(this.current.min, 0, {item: this.current.item, equals: []});
this.current = {item: ++this.current.item, try: 0, min: 0, max: this.items.length};
}
}
}
this.getQuestion = function() {
if (this.current.item >= this.size) return null;
this.current.try = Math.floor((this.current.min + this.current.max) / 2);
return({a: this.current.item, b: this.items[this.current.try].item});
}
this.getOrder = function() {
var index = [];
for (var i in this.items) {
index.push(this.items[i].item);
for (var j in this.items[i].equals) {
index.push(this.items[i].equals[j]);
}
}
return(index);
}
}
// PREPARE TEST DATA
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = fruit.slice();
for (i in pref) pref.push(pref.splice(Math.floor(Math.random() * (pref.length - i)),1)[0]);
pref.join(" ");
// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(a, b) {
if (pref.indexOf(a) - pref.indexOf(b) < 0) return -1
else if (pref.indexOf(a) - pref.indexOf(b) > 0) return 1
else return 0;
}
// CREATE TABLE AND ASK QUESTIONS UNTIL TABLE IS COMPLETE
var t = new PrefList(10), c = 0, q;
while (q = t.getQuestion()) {
console.log(++c + ". " + fruit[q.a] + " or " + fruit[q.b] + "?");
var answer = preference(fruit[q.a], fruit[q.b]);
console.log("\t" + [fruit[q.a], "whatever", fruit[q.b]][answer + 1]);
t.addAnswer(q.a, q.b, answer);
}
// PERFORM SORT BASED ON TABLE
var index = t.getOrder();
// DISPLAY RESULT
console.log("LIST IN ORDER:");
for (var i in index) console.log(i + ". " + fruit[index[i]]);
I think this is as far as you can optimise the binary question process for a single person. The next step is to figure out how to ask several people's preferences and combine them without introducing conflicting data into the matrix.
update 2: sorting based on the preferences of more than one person
While experimenting (in my previous answer) with algorithms where different people would answer each question, it was clear that the conflicting preferences would create a preference table with inconsistent data, which wasn't useful as a basis for comparison in a sorting algorithm.
The two algorithms earlier in this answer offer possibilities to deal with this problem. One option would be to fill out the preference table with votes in percentages instead of "before", "after" and "equal" as the only options. Afterwards, you could search for inconsistencies, and fix them by changing the decision with the closest vote, e.g. if apples vs. oranges was 80/20%, oranges vs. pears was 70/30%, and pears vs. apples was 60/40%, changing the preference from "pears before apples" to "apples before pears" would be the best way to resolve the inconsistency.
Another option would be to skip unnecessary questions, thereby removing the chance of inconsistencies in the preference table. This would be the easiest method, but the order in which the questions are asked would then have a greater impact on the end result.
The second algorithm inserts each item into a preliminary order by first checking whether it goes in the first or last half, then whether it goes in the first or last half of that half, and so on... steadily zooming in on the correct position in ever decreasing steps. This means the sequence of decisions used to determine the position of each item are of decreasing importance. This could be the basis of a system where more people are asked to vote for important decisions, and less people for less important decisions, thus reducing the number of questions that each person has to answer.
If the number of people is much greater than the number of items, you could use something like this: with every new item, the first question is put to half of the people, and every further question is then put to half of the remaining people. That way, everyone would have to answer at most one question per item, and for the whole list everyone would answer at most the number of questions equal to the number of items.
Again, with large groups of people, there are possibilities to use statistics. This could decide at which point a certain answer has developed a statistically significant lead, and the question can be considered as answered, without asking any more people. It could also be used to decide how close a vote has to be to be considered an "equal" answer.
update 3: ask subgroups based on importance of questions
This code version reduces the number of questions per person by asking important questions to a large subgroup of the population and less important questions to a smaller subgroup, as discussed in update 2.
e.g. When finding the position of the eighth item in a list already containing 7 items, a maximum number of 3 questions is needed to find the correct position; the population will therefor be split into 3 groups, whose relative sizes are 4:2:1.
The example orders 10 items based on the preferences of 20 people; the maximum number of questions any person is asked is 9.
function GroupPref(popSize, listSize) { // CONSTRUCTOR
if (popSize < steps(listSize)) return {};
this.population = popSize;
this.people = [];
this.groups = [this.population];
this.size = listSize;
this.items = [{item: 0, equals: []}];
this.current = {item: 1, question: 0, try: 0, min: 0, max: 1};
this.getQuestion = function() {
if (this.current.item >= this.size) return null;
if (this.current.question == 0) this.populate();
var group = this.people.splice(0, this.groups[this.current.question++]);
this.current.try = Math.floor((this.current.min + this.current.max) / 2);
return({people: group, a: this.current.item, b: this.items[this.current.try].item});
}
this.processAnswer = function(pref) {
if (pref == 0) {
this.items[this.current.try].equals.push(this.current.item);
} else {
if (pref < 0) this.current.max = this.current.try
else this.current.min = this.current.try + 1;
if (this.current.min == this.current.max) {
this.items.splice(this.current.min, 0, {item: this.current.item, equals: []});
} else return;
}
this.current = {item: ++this.current.item, question: 0, try: 0, min: 0, max: this.items.length};
this.distribute();
}
function steps(n) {
return Math.ceil(Math.log(n) / Math.log(2));
}
this.populate = function() {
for (var i = 0; i < this.population; i++) this.people.splice(Math.floor(Math.random() * (i + 1)), 0, i);
}
this.distribute = function() {
var total = this.population, groups = steps(this.current.item + 1);
this.groups.length = 0;
for (var i = 0; i < groups; i++) {
var size = Math.round(Math.pow(2, i) * total / (Math.pow(2, groups) - 1));
if (size == 0) ++size, --total;
this.groups.unshift(size);
}
}
this.getOrder = function() {
var index = [];
for (var i in this.items) {
var equal = [this.items[i].item];
for (var j in this.items[i].equals) {
equal.push(this.items[i].equals[j]);
}
index.push(equal);
}
return(index);
}
}
// PREPARE TEST DATA
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = [];
for (i = 0; i < 20; i++) {
var temp = fruit.slice();
for (j in temp) temp.push(temp.splice(Math.floor(Math.random() * (temp.length - j)), 1)[0]);
pref[i] = temp.join(" ");
}
// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(person, a, b) {
if (pref[person].indexOf(a) - pref[person].indexOf(b) < 0) return -1
else if (pref[person].indexOf(a) - pref[person].indexOf(b) > 0) return 1
else return 0;
}
// CREATE LIST AND ANSWER QUESTIONS UNTIL LIST IS COMPLETE
var t = new GroupPref(20, 10), c = 0, q;
while (q = t.getQuestion()) {
var answer = 0;
console.log(++c + ". ask " + q.people.length + " people (" + q.people + ")\n\tq: " + fruit[q.a] + " or " + fruit[q.b] + "?");
for (i in q.people) answer += preference(q.people[i], fruit[q.a], fruit[q.b]);
console.log("\ta: " + [fruit[q.a], "EQUAL", fruit[q.b]][answer != 0 ? answer / Math.abs(answer) + 1 : 1]);
t.processAnswer(answer);
}
// GET ORDERED LIST AND DISPLAY RESULT
var index = t.getOrder();
console.log("LIST IN ORDER:");
for (var i = 0, pos = 1; i < index.length; i++) {
var pre = pos + ". ";
for (var j = 0; j < index[i].length; j++) {
console.log(pre + fruit[index[i][j]]);
pre = " ";
}
pos += index[i].length;
}

Random number with no repetition

What I am trying to do is make it so that the game I am creating will randomly change characters every 5 seconds.
I got this working via a timer, the only problem is I don't want them repeating, I'm currently working on dummy code so it's just changing the screen colour, but how can I make it so that it doesn't repeat the number it just called?
if (timer <= 0)
{
num = rand.Next(2);
timer = 5.0f;
}
That is the current code and then in the draw I've literally just done "if num equals a certain number then change background colour".
I tried adding a prev_num checker but I can't get it to work properly (here it is)
if (timer <= 0)
{
prev_number = num;
num = rand.Next(2);
if (prev_number == num)
{
num = rand.Next(2);
}
else
{
timer = 5.0f;
}
}
Consider that if you're picking (for example) a random number from 1-5 then there are five possible outcomes, so you would use rand.Next(5) to select the zero-based "ordinal" or index of the outcome, then convert it into the range you actually want (in this case, by adding one).
If you want a random number from 0-4, excluding the number you just picked, then there are only four possible outcomes, not five - if the previous number was 3, then the possible outcomes are 0, 1, 2 or 4. You can then simplify your algorithm by choosing one of those four outcomes (rand.Next(4)) and mapping that ordinal to your desired range. A simple mapping would be to say if the new random number is below the previous number, return it as-is, otherwise (if equal or greater) add one.
int new_num = rand.Next(4);
if(new_num >= prev_num)
{
new_num++;
}
Your new number is now guaranteed to be in the same range as the previous number, but not equal to it.
Maybe just put it into a loop instead of a single check?
Also, I think because your timer was inside the else then it was not always
updated correctly.
if (timer <= 0)
{
tempNum = rand.Next(2);
do
{
tempNum = rand.Next(2);
}
while (tempNum == num)
num = tempNum;
timer = 5.0f;
}
Create an array of sequential numbers and then shuffle them (like a deck of cards) when your application begins.
int[] numbers = new int[100];
for(int i = 0; i < numbers.Length; i++)
numbers[i] = i;
Shuffle(numbers);
Using a function to shuffle the list:
public static void Shuffle<T>(IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
You can then access them sequentially out of the list. They will be random as the list was shuffled, but you won't have any repetitions since each number only exists once in the list.
if (timer <= 0)
{
num = numbers[index];
index++;
timer = 5.0f;
}

Find Second largest number in array at most n+log₂(n)−2 comparisons [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most n+log₂(n)−2 comparisons.
Start with comparing elements of the n element array in odd and even positions and determining largest element of each pair. This step requires n/2 comparisons. Now you've got only n/2 elements. Continue pairwise comparisons to get n/4, n/8, ... elements. Stop when the largest element is found. This step requires a total of n/2 + n/4 + n/8 + ... + 1 = n-1 comparisons.
During previous step, the largest element was immediately compared with log₂(n) other elements. You can determine the largest of these elements in log₂(n)-1 comparisons. That would be the second-largest number in the array.
Example: array of 8 numbers [10,9,5,4,11,100,120,110].
Comparisons on level 1: [10,9] ->10 [5,4]-> 5, [11,100]->100 , [120,110]-->120.
Comparisons on level 2: [10,5] ->10 [100,120]->120.
Comparisons on level 3: [10,120]->120.
Maximum is 120. It was immediately compared with: 10 (on level 3), 100 (on level 2), 110 (on level 1).
Step 2 should find the maximum of 10, 100, and 110. Which is 110. That's the second largest element.
sly s's answer is derived from this paper, but he didn't explain the algorithm, which means someone stumbling across this question has to read the whole paper, and his code isn't very sleek as well. I'll give the crux of the algorithm from the aforementioned paper, complete with complexity analysis, and also provide a Scala implementation, just because that's the language I chose while working on these problems.
Basically, we do two passes:
Find the max, and keep track of which elements the max was compared to.
Find the max among the elements the max was compared to; the result is the second largest element.
In the picture above, 12 is the largest number in the array, and was compared to 3, 1, 11, and 10 in the first pass. In the second pass, we find the largest among {3, 1, 11, 10}, which is 11, which is the second largest number in the original array.
Time Complexity:
All elements must be looked at, therefore, n - 1 comparisons for pass 1.
Since we divide the problem into two halves each time, there are at most log₂n recursive calls, for each of which, the comparisons sequence grows by at most one; the size of the comparisons sequence is thus at most log₂n, therefore, log₂n - 1 comparisons for pass 2.
Total number of comparisons <= (n - 1) + (log₂n - 1) = n + log₂n - 2
def second_largest(nums: Sequence[int]) -> int:
def _max(lo: int, hi: int, seq: Sequence[int]) -> Tuple[int, MutableSequence[int]]:
if lo >= hi:
return seq[lo], []
mid = lo + (hi - lo) // 2
x, a = _max(lo, mid, seq)
y, b = _max(mid + 1, hi, seq)
if x > y:
a.append(y)
return x, a
b.append(x)
return y, b
comparisons = _max(0, len(nums) - 1, nums)[1]
return _max(0, len(comparisons) - 1, comparisons)[0]
The first run for the given example is as follows:
lo=0, hi=1, mid=0, x=10, a=[], y=4, b=[]
lo=0, hi=2, mid=1, x=10, a=[4], y=5, b=[]
lo=3, hi=4, mid=3, x=8, a=[], y=7, b=[]
lo=3, hi=5, mid=4, x=8, a=[7], y=2, b=[]
lo=0, hi=5, mid=2, x=10, a=[4, 5], y=8, b=[7, 2]
lo=6, hi=7, mid=6, x=12, a=[], y=3, b=[]
lo=6, hi=8, mid=7, x=12, a=[3], y=1, b=[]
lo=9, hi=10, mid=9, x=6, a=[], y=9, b=[]
lo=9, hi=11, mid=10, x=9, a=[6], y=11, b=[]
lo=6, hi=11, mid=8, x=12, a=[3, 1], y=11, b=[9]
lo=0, hi=11, mid=5, x=10, a=[4, 5, 8], y=12, b=[3, 1, 11]
Things to note:
There are exactly n - 1=11 comparisons for n=12.
From the last line, y=12 wins over x=10, and the next pass starts with the sequence [3, 1, 11, 10], which has log₂(12)=3.58 ~ 4 elements, and will require 3 comparisons to find the maximum.
I have implemented this algorithm in Java answered by #Evgeny Kluev. The total comparisons are n+log2(n)−2. There is also a good reference:
Alexander Dekhtyar: CSC 349: Design and Analyis of Algorithms. This is similar to the top voted algorithm.
public class op1 {
private static int findSecondRecursive(int n, int[] A){
int[] firstCompared = findMaxTournament(0, n-1, A); //n-1 comparisons;
int[] secondCompared = findMaxTournament(2, firstCompared[0]-1, firstCompared); //log2(n)-1 comparisons.
//Total comparisons: n+log2(n)-2;
return secondCompared[1];
}
private static int[] findMaxTournament(int low, int high, int[] A){
if(low == high){
int[] compared = new int[2];
compared[0] = 2;
compared[1] = A[low];
return compared;
}
int[] compared1 = findMaxTournament(low, (low+high)/2, A);
int[] compared2 = findMaxTournament((low+high)/2+1, high, A);
if(compared1[1] > compared2[1]){
int k = compared1[0] + 1;
int[] newcompared1 = new int[k];
System.arraycopy(compared1, 0, newcompared1, 0, compared1[0]);
newcompared1[0] = k;
newcompared1[k-1] = compared2[1];
return newcompared1;
}
int k = compared2[0] + 1;
int[] newcompared2 = new int[k];
System.arraycopy(compared2, 0, newcompared2, 0, compared2[0]);
newcompared2[0] = k;
newcompared2[k-1] = compared1[1];
return newcompared2;
}
private static void printarray(int[] a){
for(int i:a){
System.out.print(i + " ");
}
System.out.println();
}
public static void main(String[] args) {
//Demo.
System.out.println("Origial array: ");
int[] A = {10,4,5,8,7,2,12,3,1,6,9,11};
printarray(A);
int secondMax = findSecondRecursive(A.length,A);
Arrays.sort(A);
System.out.println("Sorted array(for check use): ");
printarray(A);
System.out.println("Second largest number in A: " + secondMax);
}
}
the problem is:
let's say, in comparison level 1, the algorithm need to be remember all the array element because largest is not yet known, then, second, finally, third. by keep tracking these element via assignment will invoke additional value assignment and later when the largest is known, you need also consider the tracking back. As the result, it will not be significantly faster than simple 2N-2 Comparison algorithm. Moreover, because the code is more complicated, you need also think about potential debugging time.
eg: in PHP, RUNNING time for comparison vs value assignment roughly is :Comparison: (11-19) to value assignment: 16.
I shall give some examples for better understanding. :
example 1 :
>12 56 98 12 76 34 97 23
>>(12 56) (98 12) (76 34) (97 23)
>>> 56 98 76 97
>>>> (56 98) (76 97)
>>>>> 98 97
>>>>>> 98
The largest element is 98
Now compare with lost ones of the largest element 98. 97 will be the second largest.
nlogn implementation
public class Test {
public static void main(String...args){
int arr[] = new int[]{1,2,2,3,3,4,9,5, 100 , 101, 1, 2, 1000, 102, 2,2,2};
System.out.println(getMax(arr, 0, 16));
}
public static Holder getMax(int[] arr, int start, int end){
if (start == end)
return new Holder(arr[start], Integer.MIN_VALUE);
else {
int mid = ( start + end ) / 2;
Holder l = getMax(arr, start, mid);
Holder r = getMax(arr, mid + 1, end);
if (l.compareTo(r) > 0 )
return new Holder(l.high(), r.high() > l.low() ? r.high() : l.low());
else
return new Holder(r.high(), l.high() > r.low() ? l.high(): r.low());
}
}
static class Holder implements Comparable<Holder> {
private int low, high;
public Holder(int r, int l){low = l; high = r;}
public String toString(){
return String.format("Max: %d, SecMax: %d", high, low);
}
public int compareTo(Holder data){
if (high == data.high)
return 0;
if (high > data.high)
return 1;
else
return -1;
}
public int high(){
return high;
}
public int low(){
return low;
}
}
}
Why not to use this hashing algorithm for given array[n]? It runs c*n, where c is constant time for check and hash. And it does n comparisons.
int first = 0;
int second = 0;
for(int i = 0; i < n; i++) {
if(array[i] > first) {
second = first;
first = array[i];
}
}
Or am I just do not understand the question...
In Python2.7: The following code works at O(nlog log n) for the extra sort. Any optimizations?
def secondLargest(testList):
secondList = []
# Iterate through the list
while(len(testList) > 1):
left = testList[0::2]
right = testList[1::2]
if (len(testList) % 2 == 1):
right.append(0)
myzip = zip(left,right)
mymax = [ max(list(val)) for val in myzip ]
myzip.sort()
secondMax = [x for x in myzip[-1] if x != max(mymax)][0]
if (secondMax != 0 ):
secondList.append(secondMax)
testList = mymax
return max(secondList)
public static int FindSecondLargest(int[] input)
{
Dictionary<int, List<int>> dictWinnerLoser = new Dictionary<int, List<int>>();//Keeps track of loosers with winners
List<int> lstWinners = null;
List<int> lstLoosers = null;
int winner = 0;
int looser = 0;
while (input.Count() > 1)//Runs till we get max in the array
{
lstWinners = new List<int>();//Keeps track of winners of each run, as we have to run with winners of each run till we get one winner
for (int i = 0; i < input.Count() - 1; i += 2)
{
if (input[i] > input[i + 1])
{
winner = input[i];
looser = input[i + 1];
}
else
{
winner = input[i + 1];
looser = input[i];
}
lstWinners.Add(winner);
if (!dictWinnerLoser.ContainsKey(winner))
{
lstLoosers = new List<int>();
lstLoosers.Add(looser);
dictWinnerLoser.Add(winner, lstLoosers);
}
else
{
lstLoosers = dictWinnerLoser[winner];
lstLoosers.Add(looser);
dictWinnerLoser[winner] = lstLoosers;
}
}
input = lstWinners.ToArray();//run the loop again with winners
}
List<int> loosersOfWinner = dictWinnerLoser[input[0]];//Gives all the elemetns who lost to max element of array, input array now has only one element which is actually the max of the array
winner = 0;
for (int i = 0; i < loosersOfWinner.Count(); i++)//Now max in the lossers of winner will give second largest
{
if (winner < loosersOfWinner[i])
{
winner = loosersOfWinner[i];
}
}
return winner;
}

DP algorithm for bounded Knapsack?

The Wikipedia article about Knapsack problem contains lists three kinds of it:
1-0 (one item of a type)
Bounded (several items of a type)
Unbounded (unlimited number of items of a type)
The article contains DP approaches for 1. and 3. types of problem, but no solution for 2.
How can the dynamic programming algorithm for solving 2. be described?
Use the 0-1 variant, but allow repetition of an item in the solution up to the number of times specified in its bound. You would need to maintain a vector stating how many copies of each item you already included in the partial solution.
The other DP solutions mentioned are all suboptimal as they require you to directly simulate the problem, resulting in a O(number of items * maximum weight * total count of items) runtime complexity.
There are many ways to optimize this, and I'll mention a few of them here:
One solution is to apply a technique similar to Sqrt Decomposition and is described here: https://codeforces.com/blog/entry/59606. This algorithm runs in O(number of items * maximum weight * sqrt(maximum weight)).
However, Dorijan Lendvaj describes a much faster algorithm that runs in O(number of items * maximum weight * log(maximum weight)) here: https://codeforces.com/blog/entry/65202?#comment-492168
Another way to think of the above approach is the following:
For each type of item, let's define the following values:
w, the weight/cost of the current type of item
v, the value of the current type of item
n, the number of copies of the current type of item available to use
Phase 1
First, let us consider 2^k, the largest power of 2 less than or equal to n. We insert the following items (each inserted item is in the format (weight, value)): (w, v), (2 * w, 2 * v), (2^2 * w, 2^2 * v), ..., (2^(k-1) * w, 2^(k-1) * v). Note that the items inserted each represent 2^0, 2^1, ..., 2^(k-1) copies of the current type of item respectively.
Observe that this is the same as inserting 2^k - 1 copies of the current type of item. This is because we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Phase 2
Lastly, we just insert the items that correspond to the set bits of n - (2^k - 1). (For all whole numbers k', if the bit representing 2^k' is set, insert (2^k' * w, 2^k' * v)).
Now, we can simulate the taking of up to n items of the current type simply by taking a combination of the above inserted items.
I don't currently have an exact proof of this solution, but after playing around with it for a while it seems correct. If I can figure one out I may update this post later on.
Proof
First, a proposition: All we have to prove is that inserting the above items allows us to simulate the taking of any number of items of the current type up to n.
With that in mind, let's define some variables:
Let n be the number of items of the current type available
Let x be the number of items of the current type we want to take
Let k be the greatest integer such that 2^k <= n
If x < 2^k, we can easily take x items using the method described in phase 1 of the algorithm:
... we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Otherwise, we do the following:
Take n - (2^k - 1) items. This is done by taking all the items inserted in phase 2. Now only the items inserted in phase 1 are available for use.
Take x - (n - (2^k - 1)) items. Since this value is always less than 2^k, we can just use the method used for the first case.
Finally, how do we know that x - (n - (2^k - 1)) < 2^k?
If we simplify the left side, we get:
x - (n - (2^k - 1))
x - n + 2^k - 1
x - (n + 1) + 2^k
If the above value was >= 2^k, then x - (n + 1) >= 0 would be true, meaning that x > n. That would be impossible as that's not a valid value of x.
Finally, there is even an approach mentioned here that runs in O(number of items * maximum weight) time.
The algorithm is similar to the brute force method ic3b3rg proposed and just uses simple DP optimizations and sliding window deque to bring down the run time.
My code was tested on this problem (classical bounded knapsack problem): https://dmoj.ca/problem/knapsack
My code: https://pastebin.com/acezMrMY
I posted an article on Code Project which discusses a more efficient solution to the bounded knapsack algorithm.
From the article:
In the dynamic programming solution, each position of the m array is a
sub-problem of capacity j. In the 0/1 algorithm, for each sub-problem
we consider the value of adding one copy of each item to the knapsack.
In the following algorithm, for each sub-problem we consider the value
of adding the lesser of the quantity that will fit, or the quantity
available of each item.
I've also enhanced the code so that we can determine what's in the
optimized knapsack (as opposed to just the optimized value).
ItemCollection[] ic = new ItemCollection[capacity + 1];
for(int i=0;i<=capacity;i++) ic[i] = new ItemCollection();
for(int i=0;i<items.Count;i++)
for(int j=capacity;j>=0;j--)
if(j >= items[i].Weight) {
int quantity = Math.Min(items[i].Quantity, j / items[i].Weight);
for(int k=1;k<=quantity;k++) {
ItemCollection lighterCollection = ic[j - k * items[i].Weight];
int testValue = lighterCollection.TotalValue + k * items[i].Value;
if(testValue > ic[j].TotalValue) (ic[j] = lighterCollection.Copy()).AddItem(items[i],k);
}
}
private class Item {
public string Description;
public int Weight;
public int Value;
public int Quantity;
public Item(string description, int weight, int value, int quantity) {
Description = description;
Weight = weight;
Value = value;
Quantity = quantity;
}
}
private class ItemCollection {
public Dictionary<string,int> Contents = new Dictionary<string,int>();
public int TotalValue;
public int TotalWeight;
public void AddItem(Item item,int quantity) {
if(Contents.ContainsKey(item.Description)) Contents[item.Description] += quantity;
else Contents[item.Description] = quantity;
TotalValue += quantity * item.Value;
TotalWeight += quantity * item.Weight;
}
public ItemCollection Copy() {
var ic = new ItemCollection();
ic.Contents = new Dictionary<string,int>(this.Contents);
ic.TotalValue = this.TotalValue;
ic.TotalWeight = this.TotalWeight;
return ic;
}
}
The download in the Code Project article includes a test case.
First, store all your data in a single array (with repetition).
Then use the 1st method mentioned in the Wikipedia article(1-0).
For example, trying a bounded knapsack with { 2 (2 times), 4(3 times),...} is equivalent to solving a 1-0 knapsack with {2, 2, 4, 4, 4,...}.
I will suggest you to use Knapsack Fraction Greedy Method Algorithm. It's Complexity is O(n log n) and one of the best algorithm.
Below I have mentioned its code in c#..
private static void Knapsack()
{
Console.WriteLine("************Kanpsack***************");
Console.WriteLine("Enter no of items");
int _noOfItems = Convert.ToInt32(Console.ReadLine());
int[] itemArray = new int[_noOfItems];
int[] weightArray = new int[_noOfItems];
int[] priceArray = new int[_noOfItems];
int[] fractionArray=new int[_noOfItems];
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("[Item"+" "+(i+1)+"]");
Console.WriteLine("");
Console.WriteLine("Enter the Weight");
weightArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Enter the Price");
priceArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("");
itemArray[i] = i+1 ;
}//for loop
int temp;
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " "+"PRICE");
Console.WriteLine(" ");
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("Item"+" "+(i+1)+" "+weightArray[i]+" "+priceArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
//Caluclating Fraction for the Item............
for(int i=0;i<_noOfItems;i++)
{
fractionArray[i] = (priceArray[i] / weightArray[i]);
}
Console.WriteLine("Testing.............");
//sorting the Item on the basis of fraction value..........
//Bubble Sort To Sort the Process Priority
for (int i = 0; i < _noOfItems; i++)
{
for (int j = i + 1; j < _noOfItems; j++)
{
if (fractionArray[j] > fractionArray[i])
{
//item Array
temp = itemArray[j];
itemArray[j] = itemArray[i];
itemArray[i] = temp;
//Weight Array
temp = weightArray[j];
weightArray[j] = weightArray[i];
weightArray[i] = temp;
//Price Array
temp = priceArray[j];
priceArray[j] = priceArray[i];
priceArray[i] = temp;
//Fraction Array
temp = fractionArray[j];
fractionArray[j] = fractionArray[i];
fractionArray[i] = temp;
}//if
}//Inner for
}//outer For
// Printing its value..............After Sorting..............
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " " + "PRICE" + " "+"Fraction");
Console.WriteLine(" ");
for (int i = 0; i < _noOfItems; i++)
{
Console.WriteLine("Item" + " " + (itemArray[i]) + " " + weightArray[i] + " " + priceArray[i] + " "+fractionArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
Console.WriteLine("");
Console.WriteLine("Enter the Capacity of Knapsack");
int _capacityKnapsack = Convert.ToInt32(Console.ReadLine());
// Creating the valuse for Solution
int k=0;
int fractionvalue = 0;
int[] _takingItemArray=new int[100];
int sum = 0,_totalPrice=0;
int l = 0;
int _capacity = _capacityKnapsack;
do
{
if(k>=_noOfItems)
{
k = 0;
}
if (_capacityKnapsack >= weightArray[k])
{
_takingItemArray[l] = weightArray[k];
_capacityKnapsack = _capacityKnapsack - weightArray[k];
_totalPrice += priceArray[k];
k++;
l++;
}
else
{
fractionvalue = fractionArray[k];
_takingItemArray[l] = _capacityKnapsack;
_totalPrice += _capacityKnapsack * fractionArray[k];
k++;
l++;
}
sum += _takingItemArray[l-1];
} while (sum != _capacity);
Console.WriteLine("");
Console.WriteLine("Value in Kg Are............");
Console.WriteLine("");
for (int i = 0; i < _takingItemArray.Length; i++)
{
if(_takingItemArray[i]!=0)
{
Console.WriteLine(_takingItemArray[i]);
Console.WriteLine("");
}
else
{
break;
}
enter code here
}//for loop
Console.WriteLine("Toatl Value is "+_totalPrice);
}//Method
We can use 0/1 knapsack algorithm with tracking # of items left for each item;
We could do the same on unbounded knapsack algorithm to solve bounded knapsack problem also.

Finding the number of digits of an integer

What is the best method to find the number of digits of a positive integer?
I have found this 3 basic methods:
conversion to string
String s = new Integer(t).toString();
int len = s.length();
for loop
for(long long int temp = number; temp >= 1;)
{
temp/=10;
decimalPlaces++;
}
logaritmic calculation
digits = floor( log10( number ) ) + 1;
where you can calculate log10(x) = ln(x) / ln(10) in most languages.
First I thought the string method is the dirtiest one but the more I think about it the more I think it's the fastest way. Or is it?
There's always this method:
n = 1;
if ( i >= 100000000 ) { n += 8; i /= 100000000; }
if ( i >= 10000 ) { n += 4; i /= 10000; }
if ( i >= 100 ) { n += 2; i /= 100; }
if ( i >= 10 ) { n += 1; }
Well the correct answer would be to measure it - but you should be able to make a guess about the number of CPU steps involved in converting strings and going through them looking for an end marker
Then think how many FPU operations/s your processor can do and how easy it is to calculate a single log.
edit: wasting some more time on a monday morning :-)
String s = new Integer(t).toString();
int len = s.length();
One of the problems with high level languages is guessing how much work the system is doing behind the scenes of an apparently simple statement. Mandatory Joel link
This statement involves allocating memory for a string, and possibly a couple of temporary copies of a string. It must parse the integer and copy the digits of it into a string, possibly having to reallocate and move the existing memory if the number is large. It might have to check a bunch of locale settings to decide if your country uses "," or ".", it might have to do a bunch of unicode conversions.
Then finding the length has to scan the entire string, again considering unicode and any local specific settings such as - are you in a right->left language?.
Alternatively:
digits = floor( log10( number ) ) + 1;
Just because this would be harder for you to do on paper doesn't mean it's hard for a computer! In fact a good rule in high performance computing seems to have been - if something is hard for a human (fluid dynamics, 3d rendering) it's easy for a computer, and if it's easy for a human (face recognition, detecting a voice in a noisy room) it's hard for a computer!
You can generally assume that the builtin maths functions log/sin/cos etc - have been an important part of computer design for 50years. So even if they don't map directly into a hardware function in the FPU you can bet that the alternative implementation is pretty efficient.
I don't know, and the answer may well be different depending on how your individual language is implemented.
So, stress test it! Implement all three solutions. Run them on 1 through 1,000,000 (or some other huge set of numbers that's representative of the numbers the solution will be running against) and time how long each of them takes.
Pit your solutions against one another and let them fight it out. Like intellectual gladiators. Three algorithms enter! One algorithm leaves!
Test conditions
Decimal numeral system
Positive integers
Up to 10 digits
Language: ActionScript 3
Results
digits: [1,10],
no. of runs: 1,000,000
random sample: 8777509,40442298,477894,329950,513,91751410,313,3159,131309,2
result: 7,8,6,6,3,8,3,4,6,1
CONVERSION TO STRING: 724ms
LOGARITMIC CALCULATION: 349ms
DIV 10 ITERATION: 229ms
MANUAL CONDITIONING: 136ms
Note: Author refrains from making any conclusions for numbers with more than 10 digits.
Script
package {
import flash.display.MovieClip;
import flash.utils.getTimer;
/**
* #author Daniel
*/
public class Digits extends MovieClip {
private const NUMBERS : uint = 1000000;
private const DIGITS : uint = 10;
private var numbers : Array;
private var digits : Array;
public function Digits() {
// ************* NUMBERS *************
numbers = [];
for (var i : int = 0; i < NUMBERS; i++) {
var number : Number = Math.floor(Math.pow(10, Math.random()*DIGITS));
numbers.push(number);
}
trace('Max digits: ' + DIGITS + ', count of numbers: ' + NUMBERS);
trace('sample: ' + numbers.slice(0, 10));
// ************* CONVERSION TO STRING *************
digits = [];
var time : Number = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(String(numbers[i]).length);
}
trace('\nCONVERSION TO STRING - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* LOGARITMIC CALCULATION *************
digits = [];
time = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(Math.floor( Math.log( numbers[i] ) / Math.log(10) ) + 1);
}
trace('\nLOGARITMIC CALCULATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* DIV 10 ITERATION *************
digits = [];
time = getTimer();
var digit : uint = 0;
for (var i : int = 0; i < numbers.length; i++) {
digit = 0;
for(var temp : Number = numbers[i]; temp >= 1;)
{
temp/=10;
digit++;
}
digits.push(digit);
}
trace('\nDIV 10 ITERATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* MANUAL CONDITIONING *************
digits = [];
time = getTimer();
var digit : uint;
for (var i : int = 0; i < numbers.length; i++) {
var number : Number = numbers[i];
if (number < 10) digit = 1;
else if (number < 100) digit = 2;
else if (number < 1000) digit = 3;
else if (number < 10000) digit = 4;
else if (number < 100000) digit = 5;
else if (number < 1000000) digit = 6;
else if (number < 10000000) digit = 7;
else if (number < 100000000) digit = 8;
else if (number < 1000000000) digit = 9;
else if (number < 10000000000) digit = 10;
digits.push(digit);
}
trace('\nMANUAL CONDITIONING: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
}
}
}
This algorithm might be good also, assuming that:
Number is integer and binary encoded (<< operation is cheap)
We don't known number boundaries
var num = 123456789L;
var len = 0;
var tmp = 1L;
while(tmp < num)
{
len++;
tmp = (tmp << 3) + (tmp << 1);
}
This algorithm, should have speed comparable to for-loop (2) provided, but a bit faster due to (2 bit-shifts, add and subtract, instead of division).
As for Log10 algorithm, it will give you only approximate answer (that is close to real, but still), since analytic formula for computing Log function have infinite loop and can't be calculated precisely Wiki.
Use the simplest solution in whatever programming language you're using. I can't think of a case where counting digits in an integer would be the bottleneck in any (useful) program.
C, C++:
char buffer[32];
int length = sprintf(buffer, "%ld", (long)123456789);
Haskell:
len = (length . show) 123456789
JavaScript:
length = String(123456789).length;
PHP:
$length = strlen(123456789);
Visual Basic (untested):
length = Len(str(123456789)) - 1
conversion to string: This will have to iterate through each digit, find the character that maps to the current digit, add a character to a collection of characters. Then get the length of the resulting String object. Will run in O(n) for n=#digits.
for-loop: will perform 2 mathematical operation: dividing the number by 10 and incrementing a counter. Will run in O(n) for n=#digits.
logarithmic: Will call log10 and floor, and add 1. Looks like O(1) but I'm not really sure how fast the log10 or floor functions are. My knowledge of this sort of things has atrophied with lack of use so there could be hidden complexity in these functions.
So I guess it comes down to: is looking up digit mappings faster than multiple mathematical operations or whatever is happening in log10? The answer will probably vary. There could be platforms where the character mapping is faster, and others where doing the calculations is faster. Also to keep in mind is that the first method will creats a new String object that only exists for the purpose of getting the length. This will probably use more memory than the other two methods, but it may or may not matter.
You can obviously eliminate the method 1 from the competition, because the atoi/toString algorithm it uses would be similar to method 2.
Method 3's speed depends on whether the code is being compiled for a system whose instruction set includes log base 10.
For very large integers, the log method is much faster. For instance, with a 2491327 digit number (the 11920928th Fibonacci number, if you care), Python takes several minutes to execute the divide-by-10 algorithm, and milliseconds to execute 1+floor(log(n,10)).
import math
def numdigits(n):
return ( int(math.floor(math.log10(n))) + 1 )
Regarding the three methods you propose for "determining the number of digits necessary to represent a given number in a given base", I don't like any of them, actually; I prefer the method I give below instead.
Re your method #1 (strings): Anything involving converting back-and-forth between strings and numbers is usually very slow.
Re your method #2 (temp/=10): This is fatally flawed because it assumes that x/10 always means "x divided by 10". But in many programming languages (eg: C, C++), if "x" is an integer type, then "x/10" means "integer division", which isn't the same thing as floating-point division, and it introduces round-off errors at every iteration, and they accumulate in a recursive formula such as your solution #2 uses.
Re your method #3 (logs): it's buggy for large numbers (at least in C, and probably other languages as well), because floating-point data types tend not to be as precise as 64-bit integers.
Hence I dislike all 3 of those methods: #1 works but is slow, #2 is broken, and #3 is buggy for large numbers. Instead, I prefer this, which works for numbers from 0 up to about 18.44 quintillion:
unsigned NumberOfDigits (uint64_t Number, unsigned Base)
{
unsigned Digits = 1;
uint64_t Power = 1;
while ( Number / Power >= Base )
{
++Digits;
Power *= Base;
}
return Digits;
}
Keep it simple:
long long int a = 223452355415634664;
int x;
for (x = 1; a >= 10; x++)
{
a = a / 10;
}
printf("%d", x);
You can use a recursive solution instead of a loop, but somehow similar:
#tailrec
def digits (i: Long, carry: Int=1) : Int = if (i < 10) carry else digits (i/10, carry+1)
digits (8345012978643L)
With longs, the picture might change - measure small and long numbers independently against different algorithms, and pick the appropriate one, depending on your typical input. :)
Of course nothing beats a switch:
switch (x) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: case 8: case 9: return 1;
case 10: case 11: // ...
case 99: return 2;
case 100: // you get the point :)
default: return 10; // switch only over int
}
except a plain-o-array:
int [] size = {1,1,1,1,1,1,1,1,1,2,2,2,2,2,... };
int x = 234561798;
return size [x];
Some people will tell you to optimize the code-size, but yaknow, premature optimization ...
log(x,n)-mod(log(x,n),1)+1
Where x is a the base and n is the number.
Here is the measurement in Swift 4.
Algorithms code:
extension Int {
var numberOfDigits0: Int {
var currentNumber = self
var n = 1
if (currentNumber >= 100000000) {
n += 8
currentNumber /= 100000000
}
if (currentNumber >= 10000) {
n += 4
currentNumber /= 10000
}
if (currentNumber >= 100) {
n += 2
currentNumber /= 100
}
if (currentNumber >= 10) {
n += 1
}
return n
}
var numberOfDigits1: Int {
return String(self).count
}
var numberOfDigits2: Int {
var n = 1
var currentNumber = self
while currentNumber > 9 {
n += 1
currentNumber /= 10
}
return n
}
}
Measurement code:
var timeInterval0 = Date()
for i in 0...10000 {
i.numberOfDigits0
}
print("timeInterval0: \(Date().timeIntervalSince(timeInterval0))")
var timeInterval1 = Date()
for i in 0...10000 {
i.numberOfDigits1
}
print("timeInterval1: \(Date().timeIntervalSince(timeInterval1))")
var timeInterval2 = Date()
for i in 0...10000 {
i.numberOfDigits2
}
print("timeInterval2: \(Date().timeIntervalSince(timeInterval2))")
Output
timeInterval0: 1.92149806022644
timeInterval1: 0.557608008384705
timeInterval2: 2.83262193202972
On this measurement basis String conversion is the best option for the Swift language.
I was curious after seeing #daniel.sedlacek results so I did some testing using Swift for numbers having more than 10 digits. I ran the following script in the playground.
let base = [Double(100090000000), Double(100050000), Double(100050000), Double(100000200)]
var rar = [Double]()
for i in 1...10 {
for d in base {
let v = d*Double(arc4random_uniform(UInt32(1000000000)))
rar.append(v*Double(arc4random_uniform(UInt32(1000000000))))
rar.append(Double(1)*pow(1,Double(i)))
}
}
print(rar)
var timeInterval = NSDate().timeIntervalSince1970
for d in rar {
floor(log10(d))
}
var newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
timeInterval = NSDate().timeIntervalSince1970
for d in rar {
var c = d
while c > 10 {
c = c/10
}
}
newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
Results of 80 elements
0.105069875717163 for floor(log10(x))
0.867973804473877 for div 10 iterations
Adding one more approach to many of the already mentioned approaches.
The idea is to use binarySearch on an array containing the range of integers based on the digits of the int data type.
The signature of Java Arrays class binarySearch is :
binarySearch(dataType[] array, dataType key) which returns the index of the search key, if it is contained in the array; otherwise, (-(insertion point) – 1).
The insertion point is defined as the point at which the key would be inserted into the array.
Below is the implementation:
static int [] digits = {9,99,999,9999,99999,999999,9999999,99999999,999999999,Integer.MAX_VALUE};
static int digitsCounter(int N)
{
int digitCount = Arrays.binarySearch(digits , N<0 ? -N:N);
return 1 + (digitCount < 0 ? ~digitCount : digitCount);
}
Please note that the above approach only works for : Integer.MIN_VALUE <= N <= Integer.MAX_VALUE, but can be easily extended for Long data type by adding more values to the digits array.
For example,
I) for N = 555, digitCount = Arrays.binarySearch(digits , 555) returns -3 (-(2)-1) as it's not present in the array but is supposed to be inserted at point 2 between 9 & 99 like [9, 55, 99].
As the index we got is negative we need to take the bitwise compliment of the result.
At last, we need to add 1 to the result to get the actual number of digits in the number N.
In Swift 5.x, you get the number of digit in integer as below :
Convert to string and then count number of character in string
let nums = [1, 7892, 78, 92, 90]
for i in nums {
let ch = String(describing: i)
print(ch.count)
}
Calculating the number of digits in integer using loop
var digitCount = 0
for i in nums {
var tmp = i
while tmp >= 1 {
tmp /= 10
digitCount += 1
}
print(digitCount)
}
let numDigits num =
let num = abs(num)
let rec numDigitsInner num =
match num with
| num when num < 10 -> 1
| _ -> 1 + numDigitsInner (num / 10)
numDigitsInner num
F# Version, without casting to a string.

Resources