how to count distinct elements of an array in one pass? - data-structures

Can someone give me an algorithm to count distinct elements of an array of integers in one pass.
for example i can try to traverse through the array using a for loop
I will store the first element in another array.And the subsequent elements will be compared with those in the second array and if it is distinct then i will store it in that array and increment counter.
can someone give me a better algorithm than this.
Using c and c++

Supposing that your elements are integers and their values are between 0 and MAXVAL-1.
#include <stdio.h>
#include <string.h>
#define MAXVAL 50
unsigned int CountDistinctsElements(unsigned int* iArray, unsigned int iNbElem) {
unsigned int ret = 0;
//this array will contains the count of each value
//for example, c[3] will contain the count of the value 3 in your original array
unsigned int c[MAXVAL];
memset(c, 0, MAXVAL*sizeof(unsigned int));
for (unsigned int i=0; i<iNbElem; i++) {
unsigned int elem = iArray[i];
if (elem < MAXVAL && c[elem] == 0) {
ret++;
}
c[elem]++;
}
return ret;
}
int main() {
unsigned int myElements[10] = {0, 25, 42, 42, 1, 2, 42, 0, 24, 24};
printf("Distincts elements : %d\n", CountDistinctsElements(myElements, 10));
return 0;
}
Output : (Ideone link)
Distincts elements : 6

Maintain a array of structures.
structure should have a value and a counter of that value.
As soon as you pass an new element in an array being tested create a structure with value and increment the counter by 1.if you pass an existing element in the array then simply access the related structure and increment its counter by 1.
Finally after you do a one complete pass of the array, you will have the required result in the array of structures.

Edit: I wasn't aware you wanted just to count the elements. Updated code below.
int countUnique()
{
uniqueArray[numElements];
myArray[numElements];
int counter = 0;
int uniqueElements = 0;
for(int i = 0; i < numElements; i++)
{
element tempElem = myArray[i];
if(!doesUniqueContain(tempElem, counter, uniqueArray)//If it doesn't contain it
{
uniqueArray[counter] = tempElem;
uniqueElements++;
}
}
return uniqueElements;
}
bool doesUniqueContain(element oneElement, int counter, array *uniqueArray)
{
if(counter == 0)
{
return false; //No elements, so it doesn't contain this element.
}
for(int i = 0; i < counter; i++)
{
if(uniqueArray[i] == oneElement)
return true;
}
return false;
}
This is only so you can see the logic

How about using a hash table (in the Java HashMap or C# Dictionary sense) to count the elements? Basically you create an empty hash table with the array element type as the key type and the count as values. Then you iterate over your list. If the element is not yet in the hash table, you add it with count 1, otherwise you increment the count for that element.

Related

array index 2 is past the end of the array

The string array d[2] should have 3 elements. But it seems that it can contain only 2 elements. The 3rd element is not being stored in the array. What is the reason? Does it have to do anyting with the memory allocation which I have done with the new operator?
#include<iostream>
class A
{
public:
A()
{
std::string d[2];
d[0] = "Dilshdur";
d[1] = "Dilshad";
d[2] = "Dolon";
for(int i=0; i<3; i++)
{
std::cout<<d[i]<<std::endl;
}
}
};
int main()
{
A *p;
p = new A;
return 0;
}
There's seems to be something you missed when reading or learning about arrays, because the size you provide when defining the array is the number of elements, not the top index.
So
std::string d[2];
will define d as an array of two elements, with the indexes 0 and 1.
If you don't know the number of elements beforehand, then use std::vector as it will allow you to add elements dynamically at run-time.

iterate through a set goes to infinite loop

i used exactly the same code in both of my files.
and one is work properly while the other one (this one) goes to endless loop.
int arr[5] = {3, 1, 3, 5, 6};
int main() {
int T = 1;
set<int> s;
for (int tc = 0; tc < T; tc++) {
s.emplace(0);
for (auto x : arr) {
auto end = s.end();
for (auto it = s.begin(); it != end; it++) {
// here's where goes to infinite loop
// and i couldn't figure out why..
s.emplace(*it+x);
}
}
}
return 0;
}
below one is well working one
using namespace std;
int main() {
int arr[5] = {3,1,3,5,6}, sum=20;
set<int> s;
s.emplace(sum);
for (auto x : arr) {
auto end = s.end();
for (auto it = s.begin(); it != end; it++) {
s.emplace(*it-x);
}
}
return 0;
}
expected results are s = {1, 4, 7, 8, ...}
all the sum of all the subset of arr.
but not working properly.. i don't know why..
The issue is that you're inserting elements into the set while iterating over it (with the ranged-for loop). The ranged-for loop semantics do not involve remembering the state of the range before the loop started; it's just like writing:
for(auto it = std::begin(container); it < std::end(container); it++)
Now, std::set is an ordered container. So when you insert/emplace elements smaller than the one your iterator points at, you won't see them later on in the iteration; but if you insert larger elements, you will see them. So you end up iterating only over elements you've inserted, infinitely.
What you should probably be doing is not emplace new elements into s during the iteration, but rather place them in some other container, then finally dump all of that new containers' contents into the set (e.g. with an std::inserter to the set and an std::copy).
(Also, in general, all of your code seems kind of suspect, i.e. I doubt you really want to do any of this stuff in the first place.)

Make unique array with minimal sum

It is a interview question. Given an array, e.g., [3,2,1,2,7], we want to make all elements in this array unique by incrementing duplicate elements and we require the sum of the refined array is minimal. For example the answer for [3,2,1,2,7] is [3,2,1,4,7] and its sum is 17. Any ideas?
It's not quite as simple as my earlier comment suggested, but it's not terrifically complicated.
First, sort the input array. If it matters to be able to recover the original order of the elements then record the permutation used for the sort.
Second, scan the sorted array from left to right (ie from low to high). If an element is less than or equal to the element to its left, set it to be one greater than that element.
Pseudocode
sar = sort(input_array)
for index = 2:size(sar) ! I count from 1
if sar(index)<=sar(index-1) sar(index) = sar(index-1)+1
forend
Is the sum of the result minimal ? I've convinced myself that it is through some head-scratching and trials but I haven't got a formal proof.
If you only need to find ONE of the best solution, here's the algorythm with some explainations.
The idea of this problem is to find an optimal solution, which can be found only by testing all existing solutions (well, they're infinite, let's stick with the reasonable ones).
I wrote a program in C, because I'm familiar with it, but you can port it to any language you want.
The program does this: it tries to increment one value to the max possible (I'll explain how to find it in the comments under the code sections), than if the solution is not found, decreases this value and goes on with the next one and so on.
It's an exponential algorythm, so it will be very slow on large values of duplicated data (yet, it assures you the best solution is found).
I tested this code with your example, and it worked; not sure if there's any bug left, but the code (in C) is this.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
typedef int BOOL; //just to ease meanings of values
#define TRUE 1
#define FALSE 0
Just to ease comprehension, I did some typedefs. Don't worry.
typedef struct duplicate { //used to fasten the algorythm; it uses some more memory just to assure it's ok
int value;
BOOL duplicate;
} duplicate_t;
int maxInArrayExcept(int *array, int arraySize, int index); //find the max value in array except the value at the index given
//the result is the max value in the array, not counting th index
int *findDuplicateSum(int *array, int arraySize);
BOOL findDuplicateSum_R(duplicate_t *array, int arraySize, int *tempSolution, int *solution, int *totalSum, int currentSum); //resursive function used to find solution
BOOL check(int *array, int arraySize); //checks if there's any repeated value in the solution
These are all the functions we'll need. All split up for comprehension purpose.
First, we have a struct. This struct is used to avoid checking, for every iteration, if the value on a given index was originally duplicated. We don't want to modify any value not duplicated originally.
Then, we have a couple functions: first, we need to see the worst case scenario: every value after the duplicated ones is already occupied: then we need to increment the duplicated value up to the maximum value reached + 1.
Then, there are the main Function we'll discute later about.
The check Function only checks if there's any duplicated value in a temporary solution.
int main() { //testing purpose
int i;
int testArray[] = { 3,2,1,2,7 }; //test array
int nTestArraySize = 5; //test array size
int *solutionArray; //needed if you want to use the solution later
solutionArray = findDuplicateSum(testArray, nTestArraySize);
for (i = 0; i < nTestArraySize; ++i) {
printf("%d ", solutionArray[i]);
}
return 0;
}
This is the main Function: I used it to test everything.
int * findDuplicateSum(int * array, int arraySize)
{
int *solution = malloc(sizeof(int) * arraySize);
int *tempSolution = malloc(sizeof(int) * arraySize);
duplicate_t *duplicate = calloc(arraySize, sizeof(duplicate_t));
int i, j, currentSum = 0, totalSum = INT_MAX;
for (i = 0; i < arraySize; ++i) {
tempSolution[i] = solution[i] = duplicate[i].value = array[i];
currentSum += array[i];
for (j = 0; j < i; ++j) { //to find ALL the best solutions, we should also put the first found value as true; it's just a line more
//yet, it saves the algorythm half of the duplicated numbers (best/this case scenario)
if (array[j] == duplicate[i].value) {
duplicate[i].duplicate = TRUE;
}
}
}
if (findDuplicateSum_R(duplicate, arraySize, tempSolution, solution, &totalSum, currentSum));
else {
printf("No solution found\n");
}
free(tempSolution);
free(duplicate);
return solution;
}
This Function does a lot of things: first, it sets up the solution array, then it initializes both the solution values and the duplicate array, that is the one used to check for duplicated values at startup. Then, we find the current sum and we set the maximum available sum to the maximum integer possible.
Then, the recursive Function is called; this one gives us the info about having found the solution (that should be Always), then we return the solution as an array.
int findDuplicateSum_R(duplicate_t * array, int arraySize, int * tempSolution, int * solution, int * totalSum, int currentSum)
{
int i;
if (check(tempSolution, arraySize)) {
if (currentSum < *totalSum) { //optimal solution checking
for (i = 0; i < arraySize; ++i) {
solution[i] = tempSolution[i];
}
*totalSum = currentSum;
}
return TRUE; //just to ensure a solution is found
}
for (i = 0; i < arraySize; ++i) {
if (array[i].duplicate == TRUE) {
if (array[i].duplicate <= maxInArrayExcept(solution, arraySize, i)) { //worst case scenario, you need it to stop the recursion on that value
tempSolution[i]++;
return findDuplicateSum_R(array, arraySize, tempSolution, solution, totalSum, currentSum + 1);
tempSolution[i]--; //backtracking
}
}
}
return FALSE; //just in case the solution is not found, but we won't need it
}
This is the recursive Function. It first checks if the solution is ok and if it is the best one found until now. Then, if everything is correct, it updates the actual solution with the temporary values, and updates the optimal condition.
Then, we iterate on every repeated value (the if excludes other indexes) and we progress in the recursion until (if unlucky) we reach the worst case scenario: the check condition not satisfied above the maximum value.
Then we have to backtrack and continue with the iteration, that will go on with other values.
PS: an optimization is possible here, if we move the optimal condition from the check into the for: if the solution is already not optimal, we can't expect to find a better one just adding things.
The hard code has ended, and there are the supporting functions:
int maxInArrayExcept(int *array, int arraySize, int index) {
int i, max = 0;
for (i = 0; i < arraySize; ++i) {
if (i != index) {
if (array[i] > max) {
max = array[i];
}
}
}
return max;
}
BOOL check(int *array, int arraySize) {
int i, j;
for (i = 0; i < arraySize; ++i) {
for (j = 0; j < i; ++j) {
if (array[i] == array[j]) return FALSE;
}
}
return TRUE;
}
I hope this was useful.
Write if anything is unclear.
Well, I got the same question in one of my interviews.
Not sure if you still need it. But here's how I did it. And it worked well.
num_list1 = [2,8,3,6,3,5,3,5,9,4]
def UniqueMinSumArray(num_list):
max=min(num_list)
for i,V in enumerate(num_list):
while (num_list.count(num_list[i])>1):
if (max > num_list[i]+1) :
num_list[i] = max + 1
else:
num_list[i]+=1
max = num_list[i]
i+=1
return num_list
print (sum(UniqueMinSumArray(num_list1)))
You can try with your list of numbers and I am sure it will give you the correct unique minimum sum.
I got the same interview question too. But my answer is in JS in case anyone is interested.
For sure it can be improved to get rid of for loop.
function getMinimumUniqueSum(arr) {
// [1,1,2] => [1,2,3] = 6
// [1,2,2,3,3] = [1,2,3,4,5] = 15
if (arr.length > 1) {
var sortedArr = [...arr].sort((a, b) => a - b);
var current = sortedArr[0];
var res = [current];
for (var i = 1; i + 1 <= arr.length; i++) {
// check current equals to the rest array starting from index 1.
if (sortedArr[i] > current) {
res.push(sortedArr[i]);
current = sortedArr[i];
} else if (sortedArr[i] == current) {
current = sortedArr[i] + 1;
// sortedArr[i]++;
res.push(current);
} else {
current++;
res.push(current);
}
}
return res.reduce((a,b) => a + b, 0);
} else {
return 0;
}
}

Simple Sort function not working. Suggestions?

void sort_records_by_id (int []indices, int []students_id )
{
for (int k = 1; k<students_id.length; k++)
{
for (int j = k; j>0 && students_id[j]<students_id[j-1]; j--)
{
int place_holder = indices[j];
indices[j] = indices [j-1];
indices[j-1] = place_holder;
}
}
}
Hi,
I have to create a function that is able to sort an array of integers, not by changing and rearranging its contents, but by changing the order of integers in another array of integers called indexes. So, I would have an array with a series of ids such as: Lets call this id" [#] represents index [0]10001 12001 212334 [3]14332 [4]999999 [5]10111
There is a corresponding array, with integer values [#] is the index Lets call this arr [0]0 11 [2}2 [3]3 [4]4 [5]5 So that they correspond to the indexes we have in the other array.
Now, we must change the order of "arr", such that the elements are in such an order that it corresponds to the order of indexes in array id in sorted order. Note, array id is not changed in any way.
So, we can print the ids to the console in ascending order, by using a for loop, the values of arr, and array id.
Please, I would really appreciate if you would be able to provide advice without creating a very complex function. I would just like to alter my existing function I created so that it works.
This is the output of my function so far:
Any input or suggestions would be greatly appreciated.
When indexing students_id array, don't use j and j-1, but instead of this -> indices[j] and indices[j-1]. Thanks to that, you will change order in indices array using students_id array to get values to comparison.
for (int j = k; j>0 && students_id[indices[j]]<students_id[indices[j-1]]; j--)
Also I would change loop into
void sort_records_by_id (int []indices, int []students_id )
{
for (int k = 1; k<students_id.length; ++k)
{
for (int j = 0; j<k; ++j)
{
if(students_id[indices[j]]>students_id[indices[j+1]]) {
int place_holder = indices[j];
indices[j] = indices [j+1];
indices[j+1] = place_holder;
}
}
}
}
The simplest bubble sort - that's what comes to my mind.

Parallel radix sort with virtual memory and write-combining

I'm attempting to implement the variant of parallel radix sort described in http://arxiv.org/pdf/1008.2849v2.pdf (Algorithm 2), but my C++ implementation (for 4 digits in base 10) contains a bug that I'm unable to locate.
For debugging purposes I'm using no parallelism, but the code should still sort correctly.
For instance the line arr.at(i) = item accesses indices outside its bounds in the following
std::vector<int> v = {4612, 4598};
radix_sort2(v);
My implementation is as follows
#include <set>
#include <array>
#include <vector>
void radix_sort2(std::vector<int>& arr) {
std::array<std::set<int>, 10> buckets3;
for (const int item : arr) {
int d = item / 1000;
buckets3.at(d).insert(item);
}
//Prefix sum
std::array<int, 10> outputIndices;
outputIndices.at(0) = 0;
for (int i = 1; i < 10; ++i) {
outputIndices.at(i) = outputIndices.at(i - 1) +
buckets3.at(i - 1).size();
}
for (const auto& bucket3 : buckets3) {
std::array<std::set<int>, 10> buckets0, buckets1;
std::array<int, 10> histogram2 = {};
for (const int item : bucket3) {
int d = item % 10;
buckets0.at(d).insert(item);
}
for (const auto& bucket0 : buckets0) {
for (const int item : bucket0) {
int d = (item / 10) % 10;
buckets1.at(d).insert(item);
int d2 = (item / 100) % 10;
++histogram2.at(d2);
}
}
for (const auto& bucket1 : buckets1) {
for (const int item : bucket1) {
int d = (item / 100) % 10;
int i = outputIndices.at(d) + histogram2.at(d);
++histogram2.at(d);
arr.at(i) = item;
}
}
}
}
Can anyone spot my mistake?
I took at look at the paper you linked. You haven't made any mistakes, none that I can see. In fact, in my estimation, you corrected a mistake in the algorithm.
I wrote out the algorithm and ended up with the exact same problem as you did. After reviewing Algorithm 2, either I woefully mis-understand how it is supposed to work, or it is flawed. There are at least a couple of problems with the algorithm, specifically revolving around outputIndices, and histogram2.
Looking at the algorithm, the final index of an item is determined by the counting sort stored in outputIndices. (lets ignore the histogram for now).
If you had an inital array of numbers {0100, 0103, 0102, 0101} The prefix sum of that would be 4.
The algorithm makes no indication I can determine to lag the result by 1. That being said, in order for the algorithm to work the way they intend, it does have to be lagged, so, moving on.
Now, the prefix sums are 0, 4, 4.... The algorithm doesn't use the MSD as the index into the outputIndices array, it uses "MSD - 1"; So taking 1 as the index into the array, the starting index for the first item without the histogram is 4! Outside the array on the first try.
The outputIndices is built with the MSD, it makes sense for it to be accessed by MSD.
Further, even if you tweak the algorithm to correctly to use the MSD into the outputIndices, it still won't sort correctly. With your initial inputs (swapped) {4598, 4612}, they will stay in that order. They are sorted (locally) as if they are 2 digit numbers. If you increase it to have other numbers not starting with 4, they will be globally, sorted, but the local sort is never finished.
According to the paper the goal is to use the histogram to do that, but I don't see that happening.
Ultimately, I'm assuming, what you want is an algorithm that works the way described. I've modified the algorithm, keeping with the overall stated goal of the paper of using the MSD to do a global sort, and the rest of the digits by reverse LSD.
I don't think these changes should have any impact on your desire to parallel-ize the function.
void radix_sort2(std::vector<int>& arr)
{
std::array<std::vector<int>, 10> buckets3;
for (const int item : arr)
{
int d = item / 1000;
buckets3.at(d).push_back(item);
}
//Prefix sum
std::array<int, 10> outputIndices;
outputIndices.at(0) = 0;
for (int i = 1; i < 10; ++i)
{
outputIndices.at(i) = outputIndices.at(i - 1) + buckets3.at(i - 1).size();
}
for (const auto& bucket3 : buckets3)
{
if (bucket3.size() <= 0)
continue;
std::array<std::vector<int>, 10> buckets0, buckets1, buckets2;
for (const int item : bucket3)
buckets0.at(item % 10).push_back(item);
for (const auto& bucket0 : buckets0)
for (const int item : bucket0)
buckets1.at((item / 10) % 10).push_back(item);
for (const auto& bucket1 : buckets1)
for (const int item : bucket1)
buckets2.at((item / 100) % 10).push_back(item);
int count = 0;
for (const auto& bucket2 : buckets2)
{
for (const int item : bucket2)
{
int d = (item / 1000) % 10;
int i = outputIndices.at(d) + count;
++count;
arr.at(i) = item;
}
}
}
}
For extensiblility, it would probably make sense to create a helper function that does the local sorting. You should be able to extend it to handle any number of digit numbers that way.

Resources