Cannot understand hoow to recursively merge sort - c++11

Currently self-learning C++ with Daniel Liang's Introduction to C++.
On the topic of the merge sort, I cannot seem to understand how his code is recursively calling itself.
I understand the general concept of the merge sort, but I am having trouble understanding this code specifically.
In this example, we first pass the list 1, 7, 3, 4, 9, 3, 3, 1, 2, and its size (9) to the mergeSort function.
From there, we divide the list into two until the array size reaches 1. In this case, we would get: 1,7,3,4 -> 1,7 -> 1. We then move onto the merge sorting the second half. The second half array would be 7 in this case. We merge the two arrays [1] and [7] and proceed to delete the two arrays that were dynamically allocated to prevent any memory leak.
The part I don't understand is how does this code run from here? After delete[] firstHalf and delete[] secondHalf. From my understanding, shouldn't there be another mergeSort function call in order to merge sort the new firstHalf and secondHalf?
#include <iostream>
using namespace std;
// Function prototype
void arraycopy(int source[], int sourceStartIndex,
int target[], int targetStartIndex, int length);
void merge(int list1[], int list1Size,
int list2[], int list2Size, int temp[]);
// The function for sorting the numbers
void mergeSort(int list[], int arraySize)
if (arraySize > 1)
// Merge sort the first half
int* firstHalf = new int[arraySize / 2];
arraycopy(list, 0, firstHalf, 0, arraySize / 2);
mergeSort(firstHalf, arraySize / 2);
// Merge sort the second half
int secondHalfLength = arraySize - arraySize / 2;
int* secondHalf = new int[secondHalfLength];
arraycopy(list, arraySize / 2, secondHalf, 0, secondHalfLength);
mergeSort(secondHalf, secondHalfLength);
// Merge firstHalf with secondHalf
merge(firstHalf, arraySize / 2, secondHalf, secondHalfLength,
delete [] firstHalf;
delete [] secondHalf;
void merge(int list1[], int list1Size,
int list2[], int list2Size, int temp[])
int current1 = 0; // Current index in list1
int current2 = 0; // Current index in list2
int current3 = 0; // Current index in temp
while (current1 < list1Size && current2 < list2Size)
if (list1[current1] < list2[current2])
temp[current3++] = list1[current1++];
temp[current3++] = list2[current2++];
while (current1 < list1Size)
temp[current3++] = list1[current1++];
while (current2 < list2Size)
temp[current3++] = list2[current2++];
void arraycopy(int source[], int sourceStartIndex,
int target[], int targetStartIndex, int length)
for (int i = 0; i < length; i++)
target[i + targetStartIndex] = source[i + sourceStartIndex];
int main()
const int SIZE = 9;
int list[] = {1, 7, 3, 4, 9, 3, 3, 1, 2};
mergeSort(list, SIZE);
for (int i = 0; i < SIZE; i++)
cout << list[i] << " ";
return 0;

From my understanding, shouldn't there be another mergeSort function
call in order to merge sort the new firstHalf and secondHalf?
It is happening implicitly during the recursive call. When you reach these two lines:
delete [] firstHalf;
delete [] secondHalf;
It means that one call to mergeSort is completed. If this call belongs to merging a first half, then code starts from the line after, i.e. these lines:
// Merge sort the second half
int secondHalfLength = arraySize - arraySize / 2;
But, if this call belongs to merging of the second half, then the control goes back to the line just after that call, i.e. these lines:
// Merge firstHalf with secondHalf
merge(firstHalf, arraySize / 2, secondHalf, secondHalfLength,
And everything if doing well as planned.


iterate through a set goes to infinite loop

i used exactly the same code in both of my files.
and one is work properly while the other one (this one) goes to endless loop.
int arr[5] = {3, 1, 3, 5, 6};
int main() {
int T = 1;
set<int> s;
for (int tc = 0; tc < T; tc++) {
for (auto x : arr) {
auto end = s.end();
for (auto it = s.begin(); it != end; it++) {
// here's where goes to infinite loop
// and i couldn't figure out why..
return 0;
below one is well working one
using namespace std;
int main() {
int arr[5] = {3,1,3,5,6}, sum=20;
set<int> s;
for (auto x : arr) {
auto end = s.end();
for (auto it = s.begin(); it != end; it++) {
return 0;
expected results are s = {1, 4, 7, 8, ...}
all the sum of all the subset of arr.
but not working properly.. i don't know why..
The issue is that you're inserting elements into the set while iterating over it (with the ranged-for loop). The ranged-for loop semantics do not involve remembering the state of the range before the loop started; it's just like writing:
for(auto it = std::begin(container); it < std::end(container); it++)
Now, std::set is an ordered container. So when you insert/emplace elements smaller than the one your iterator points at, you won't see them later on in the iteration; but if you insert larger elements, you will see them. So you end up iterating only over elements you've inserted, infinitely.
What you should probably be doing is not emplace new elements into s during the iteration, but rather place them in some other container, then finally dump all of that new containers' contents into the set (e.g. with an std::inserter to the set and an std::copy).
(Also, in general, all of your code seems kind of suspect, i.e. I doubt you really want to do any of this stuff in the first place.)

Recursive algorithm to find all possible solutions in a nonogram row

I am trying to write a simple nonogram solver, in a kind of bruteforce way, but I am stuck on a relatively easy task. Let's say I have a row with clues [2,3] that has a length of 10
so the solutions are:
I want to find all the possible solutions for a row
I know that I have to consider each block separately, and each block will have an availible space of n-(sum of remaining blocks length + number of remaining blocks) but I do not know how to progress from here
Well, this question already have a good answer, so think of this one more as an advertisement of python's prowess.
def place(blocks,total):
if not blocks: return ["-"*total]
if blocks[0]>total: return []
starts = total-blocks[0] #starts = 2 means possible starting indexes are [0,1,2]
if len(blocks)==1: #this is special case
return [("-"*i+"$"*blocks[0]+"-"*(starts-i)) for i in range(starts+1)]
ans = []
for i in range(total-blocks[0]): #append current solutions
for sol in place(blocks[1:],starts-i-1): #with all possible other solutiona
return ans
To test it:
for i in place([2,3,2],12):
Which produces output like:
This is what i got:
#include <iostream>
#include <vector>
#include <string>
using namespace std;
typedef std::vector<bool> tRow;
void printRow(tRow row){
for (bool i : row){
std::cout << ((i) ? '$' : '-');
std::cout << std::endl;
int requiredCells(const std::vector<int> nums){
int sum = 0;
for (int i : nums){
sum += (i + 1); // The number + the at-least-one-cell gap at is right
return (sum == 0) ? 0 : sum - 1; // The right-most number don't need any gap
bool appendRow(tRow init, const std::vector<int> pendingNums, unsigned int rowSize, std::vector<tRow> &comb){
if (pendingNums.size() <= 0){
return false;
int cellsRequired = requiredCells(pendingNums);
if (cellsRequired > rowSize){
return false; // There are no combinations
tRow prefix;
int gapSize = 0;
std::vector<int> pNumsAux = pendingNums;
unsigned int space = rowSize;
while ((gapSize + cellsRequired) <= rowSize){
space = rowSize;
space -= gapSize;
prefix = init;
for (int i = 0; i < gapSize; ++i){
for (int i = 0; i < pendingNums[0]; ++i){
if (space > 0){
appendRow(prefix, pNumsAux, space, comb);
return true;
std::vector<tRow> getCombinations(const std::vector<int> row, unsigned int rowSize) {
std::vector<tRow> comb;
tRow init;
appendRow(init, row, rowSize, comb);
return comb;
int main(){
std::vector<int> row = { 2, 3 };
auto ret = getCombinations(row, 10);
for (tRow r : ret){
while (r.size() < 10)
return 0;
And my output is:
For sure, this must be absolutely improvable.
Note: i did't test it more than already written case
Hope it works for you

Parallel radix sort with virtual memory and write-combining

I'm attempting to implement the variant of parallel radix sort described in (Algorithm 2), but my C++ implementation (for 4 digits in base 10) contains a bug that I'm unable to locate.
For debugging purposes I'm using no parallelism, but the code should still sort correctly.
For instance the line = item accesses indices outside its bounds in the following
std::vector<int> v = {4612, 4598};
My implementation is as follows
#include <set>
#include <array>
#include <vector>
void radix_sort2(std::vector<int>& arr) {
std::array<std::set<int>, 10> buckets3;
for (const int item : arr) {
int d = item / 1000;;
//Prefix sum
std::array<int, 10> outputIndices; = 0;
for (int i = 1; i < 10; ++i) { = - 1) + - 1).size();
for (const auto& bucket3 : buckets3) {
std::array<std::set<int>, 10> buckets0, buckets1;
std::array<int, 10> histogram2 = {};
for (const int item : bucket3) {
int d = item % 10;;
for (const auto& bucket0 : buckets0) {
for (const int item : bucket0) {
int d = (item / 10) % 10;;
int d2 = (item / 100) % 10;;
for (const auto& bucket1 : buckets1) {
for (const int item : bucket1) {
int d = (item / 100) % 10;
int i = +;; = item;
Can anyone spot my mistake?
I took at look at the paper you linked. You haven't made any mistakes, none that I can see. In fact, in my estimation, you corrected a mistake in the algorithm.
I wrote out the algorithm and ended up with the exact same problem as you did. After reviewing Algorithm 2, either I woefully mis-understand how it is supposed to work, or it is flawed. There are at least a couple of problems with the algorithm, specifically revolving around outputIndices, and histogram2.
Looking at the algorithm, the final index of an item is determined by the counting sort stored in outputIndices. (lets ignore the histogram for now).
If you had an inital array of numbers {0100, 0103, 0102, 0101} The prefix sum of that would be 4.
The algorithm makes no indication I can determine to lag the result by 1. That being said, in order for the algorithm to work the way they intend, it does have to be lagged, so, moving on.
Now, the prefix sums are 0, 4, 4.... The algorithm doesn't use the MSD as the index into the outputIndices array, it uses "MSD - 1"; So taking 1 as the index into the array, the starting index for the first item without the histogram is 4! Outside the array on the first try.
The outputIndices is built with the MSD, it makes sense for it to be accessed by MSD.
Further, even if you tweak the algorithm to correctly to use the MSD into the outputIndices, it still won't sort correctly. With your initial inputs (swapped) {4598, 4612}, they will stay in that order. They are sorted (locally) as if they are 2 digit numbers. If you increase it to have other numbers not starting with 4, they will be globally, sorted, but the local sort is never finished.
According to the paper the goal is to use the histogram to do that, but I don't see that happening.
Ultimately, I'm assuming, what you want is an algorithm that works the way described. I've modified the algorithm, keeping with the overall stated goal of the paper of using the MSD to do a global sort, and the rest of the digits by reverse LSD.
I don't think these changes should have any impact on your desire to parallel-ize the function.
void radix_sort2(std::vector<int>& arr)
std::array<std::vector<int>, 10> buckets3;
for (const int item : arr)
int d = item / 1000;;
//Prefix sum
std::array<int, 10> outputIndices; = 0;
for (int i = 1; i < 10; ++i)
{ = - 1) + - 1).size();
for (const auto& bucket3 : buckets3)
if (bucket3.size() <= 0)
std::array<std::vector<int>, 10> buckets0, buckets1, buckets2;
for (const int item : bucket3) % 10).push_back(item);
for (const auto& bucket0 : buckets0)
for (const int item : bucket0) / 10) % 10).push_back(item);
for (const auto& bucket1 : buckets1)
for (const int item : bucket1) / 100) % 10).push_back(item);
int count = 0;
for (const auto& bucket2 : buckets2)
for (const int item : bucket2)
int d = (item / 1000) % 10;
int i = + count;
++count; = item;
For extensiblility, it would probably make sense to create a helper function that does the local sorting. You should be able to extend it to handle any number of digit numbers that way.

how to count distinct elements of an array in one pass?

Can someone give me an algorithm to count distinct elements of an array of integers in one pass.
for example i can try to traverse through the array using a for loop
I will store the first element in another array.And the subsequent elements will be compared with those in the second array and if it is distinct then i will store it in that array and increment counter.
can someone give me a better algorithm than this.
Using c and c++
Supposing that your elements are integers and their values are between 0 and MAXVAL-1.
#include <stdio.h>
#include <string.h>
#define MAXVAL 50
unsigned int CountDistinctsElements(unsigned int* iArray, unsigned int iNbElem) {
unsigned int ret = 0;
//this array will contains the count of each value
//for example, c[3] will contain the count of the value 3 in your original array
unsigned int c[MAXVAL];
memset(c, 0, MAXVAL*sizeof(unsigned int));
for (unsigned int i=0; i<iNbElem; i++) {
unsigned int elem = iArray[i];
if (elem < MAXVAL && c[elem] == 0) {
return ret;
int main() {
unsigned int myElements[10] = {0, 25, 42, 42, 1, 2, 42, 0, 24, 24};
printf("Distincts elements : %d\n", CountDistinctsElements(myElements, 10));
return 0;
Output : (Ideone link)
Distincts elements : 6
Maintain a array of structures.
structure should have a value and a counter of that value.
As soon as you pass an new element in an array being tested create a structure with value and increment the counter by 1.if you pass an existing element in the array then simply access the related structure and increment its counter by 1.
Finally after you do a one complete pass of the array, you will have the required result in the array of structures.
Edit: I wasn't aware you wanted just to count the elements. Updated code below.
int countUnique()
int counter = 0;
int uniqueElements = 0;
for(int i = 0; i < numElements; i++)
element tempElem = myArray[i];
if(!doesUniqueContain(tempElem, counter, uniqueArray)//If it doesn't contain it
uniqueArray[counter] = tempElem;
return uniqueElements;
bool doesUniqueContain(element oneElement, int counter, array *uniqueArray)
if(counter == 0)
return false; //No elements, so it doesn't contain this element.
for(int i = 0; i < counter; i++)
if(uniqueArray[i] == oneElement)
return true;
return false;
This is only so you can see the logic
How about using a hash table (in the Java HashMap or C# Dictionary sense) to count the elements? Basically you create an empty hash table with the array element type as the key type and the count as values. Then you iterate over your list. If the element is not yet in the hash table, you add it with count 1, otherwise you increment the count for that element.

Segmented Sort with CUDPP/Thrust

Is it possible to do segmented sort in with CUDPP in CUDA? By segmented sort, I mean to sort elements of array which are protected by flags like below.
Flag array[1,0,1,0,0,1,0,0,0,0]
Sort elements of A which are between consecutive 1.
Expected output
you can do this in a single sorting pass: the idea is to adjust the elements in your array such that sort will relocate elements only within the "segments"
for your example:
(I removed the first 1 since it's not needed)
first scan the flag array:
then you have many options depending on the number types, e.g. for unsigned integers you can set the highest bits to distinguish the "segments".
The easiest way is just to add the largest element multiplied by scanned_flags:
A + scanned_flag*10 = [10,9,18,17,16,25,24,23,22,21]
the rest is easy: sort the array and reverse the transformation.
Here are the two versions: using Arrayfire and thrust. Check whichever you like more.
void af_test() {
int A[] = {10,9,8,7,6,5,4,3,2,1};
int S[] = {0, 0,1,0,0,1,0,0,0,0};
int n = sizeof(A) / sizeof(int);
af::array devA(n, A, af::afHost);
af::array devS(n, S, af::afHost);
// obtain the max element
int maxi = af::max< int >(devS);
// scan the keys
// keys = 0,0,1,1,1,2,2,2,2,2
af::array keys = af::accum(devS);
// compute: A = A + keys * maxi
// A = 10,9,18,17,16,25,24,23,22,21
devA = devA + keys * maxi;
// sort the array
// A = 9,10,16,17,18,21,22,23,24,25
devA = af::sort(devA);
// compute: A = A - keys * maxi
// A = 9,10,6,7,8,1,2,3,4,5
devA = devA - keys * maxi;
// print the results
template<typename T>
struct add_mul : public binary_function<T,T,T>
add_mul(const T& _factor) : factor(_factor) {
__host__ __device__ T operator()(const T& a, const T& b) const
return (a + b * factor);
const T factor;
void thrust_test()
int A[] = {10,9,8,7,6,5,4,3,2,1};
int S[] = {0, 0,1,0,0,1,0,0,0,0};
int n = sizeof(A) / sizeof(int);
thrust::host_vector< int > hA(A, A + n), hS(S, S + n);
thrust::device_vector< int > devA = hA, devS = hS, keys(n);
// scan the keys
thrust::inclusive_scan(devS.begin(), devS.end(), keys.begin());
// obtain the maximal element
int maxi = *(thrust::max_element(devA.begin(), devA.end()));
// compute: A = A + keys * maxi
thrust::transform(devA.begin(), devA.end(), keys.begin(), devA.begin(), add_mul< int >(maxi));
// sort the array
thrust::sort(devA.begin(), devA.end());
// compute: A = A - keys * maxi
thrust::transform(devA.begin(), devA.end(), keys.begin(), devA.begin(), add_mul< int >(-maxi));
// copy back to the host
hA = devA;
std::cout << "\nSorted array\n";
thrust::copy(hA.begin(), hA.end(), std::ostream_iterator<int>(std::cout, "\n"));
