nearly sorted array using heap sort - algorithm

I have an assignment and here is the question I struggle.
We are given an unsorted array A[1…n]. Now, imagine its sorted
version. The unsorted array has the property that each element has a
distance of at most k positions, where 0<k<=n, from its index in the
sorted version. For example, when k is 2, an element at index 5 in the
sorted array, can be at one of the indices {3,4,5,6,7} in the unsorted
array. The unsorted array can be sorted efficiently by utilizing a
Min-Heap data structure.
Question : . Generate the array sizes of 100,1000,10000,100000 and
note running times, show results.
I have implemented my solution in java.
here is my running time tables.
in the first table, k values are determined according to the array.
in the second table, k values are hold same.
running time of the kHeapSort algorithm
I was struggling about analyzing these tables.I couldn't figure out how array size and k effects the algorithm.Any hint would be helpful.Thank you already.

Related

Is there a way to recover the two sorted halfs in a merge sort algorithm if I have the sorted array?

Suppose I have an unsorted array P and it's sorted equivalent P_Sorted. Suppose L and R refer to the left and right halves of P. Is there a way to recover L_Sorted and R_Sorted from P and P_Sorted in linear time without using extra memory?
For further clarification, during a recursive merge sort implementation L_Sorted and R_Sorted would be merged together to form P_Sorted, so I'm kinda looking to reverse the merge step.
In a merge sort, you divide the array into two halves recursively and merge them. So at the last merge, you would have already sorted the left and right halves - they are sorted independently - that is why divide and conquer name.
Therefore when doing a merge you can just look at the sizes of the arrays to be merged and if they are equal ( even input size ) or differ by 1 ( odd input size ), you are at the last merge. Then you could store those sorted arrays in some variable before merging them.
BUT if you are not allowed to mess with the function, and you need to work only with the sorted array and the original array, I think the solution is not straightforward. I found an url that poses this problem and a possible solution.
It seems feasible in linear time for very specific datasets:
If there is a way to tell the original position of each data element in the sorted list, for example if these are records with a creation date and a name field and the original array is in chronological order, selecting from the sorted array the elements that fall in the first or second half can be done in a single scan in linear time with no space overhead.
In the general case, sorting the left and right half seems the most efficient way to get L_sorted and R_sorted, with or without P_sorted. The time complexity is O(n.log(n)).

picking the 10 largest values in array

I want to pick the 10 largest values in an array (size~1e9 elements) in fortran 90. what is the most time efficient way to do this? I was looking into efficient sorting algorithm, is it the way to go? Do I need to sort the entire array?
Sorting 109 elements to pick 101 from the top sounds like an overkill: log2N factor will be about 30, and the process of sorting will move a lot of data.
Make an array of ten items for the result, fill it with the first ten elements from the big array, and sort your 10-element array. Now walk the big array starting at element 11. If the current element is greater than the smallest item in the 10-element array, find the insertion point, shift ten-element array to make space for the new element, and place it in the array. Once you are done with the big array, the small array contains ten largest values.
For "larger values of ten" you can get a significant performance improvement by switching to a max-heap data structure. Construct a heap from the first ten items of the big array; store the smallest number for future reference. Then for each number in the big array above the smallest number in the heap so far do the following:
Replace the smallest number with the new number,
Follow the heap structure up to the root to place the number in the correct spot,
Store the location of the new smallest number in the heap.
Once you are done, the heap will contain ten largest items from the big array.
Sorting is not needed. You just need a priority queue of size 10, cost O(n) while the best sort is O(nlogn).
No, you don't need to perform a full sorting. You can drop parts of an input array as soon as you know they contain only items from those largest 10, or none of them.
You could for example adapt a quicksort algorithm in such a way that you recursively process only partitions covering the border between the 10-th and the 11-th highest items. Eventually you'll get 10 largest items at 10 last positions (not necessarily ordered by value, though) and all other items below (not in order, either).
Anyway in pessimistic case (wrong pivot selection or too many equal items) it may take too long.
The best solution is passing the big array through a 10-item priority queue, as #J63 mentions in the answer.

How to shuffle rows with keeping two fields in original order?

I have array of rows with four fields:
GROUP,NAME,KEY,VALUE
I need to "shuffle" this array, but resulting array should comply to following rule: every KEY-VALUE pair having same GROUP should have the same order as in original array
Here's one possible algorithm, which requires an auxiliary array of the same size as the original array. It's O(N), but it makes several passes over the original array.
Using a stable counting sort algorithm, make a copy of the original array sorted by GROUP. Keep the histogram for use in step 3.
Use the Fisher-Yates shuffle algorithm to shuffle the original array in place.
Make a final pass over the shuffled array created in step 2. For each row, replace the KEY and VALUE entries with the next unused KEY and VALUE entries from the sorted array created in step 1.
The counting sort algorithm assumes that the GROUP values are integers in a small range, ideally smaller than the total number of rows in the original array. If this is not the case -- either the groups are not integers or they don't have a restricted size -- then the original histogram for the counting sort can be created by placing the GROUP values in a hash-table. The hash-table cannot have more than N entries, so it requires O(N) space and expected O(N) time to create.
If you are planning to repeatedly shuffle the same array, then you should keep the sorted array and a copy of the histogram, since the construction of these auxiliary structures is more than half the time of producing a shuffle.

Maintaining sort while changing random elements

I have come across this problem where I need to efficiently remove the smallest element in a list/array. That would be fairly trivial to solve - a heap would be sufficient.
However, the issue now is that when I remove the smallest element, it would cause changes in other elements in the data structure, which may result in the ordering being changed. An example is this:
I have an array of elements:
[1,3,5,7,9,11,12,15,20,33]
When I remove "1" from the array "5" and "12" get changed to "4" and "17" respectively.
[3,4,7,9,11,17,15,20,33]
And hence the ordering is not maintained.
However, the element that is removed will have pointers to all elements that will be changed, but there is not knowing how many elements will be changed and by how much.
So my question is:
What is the best way to store these elements to maximize performance when removing the smallest element from the data structure while maintaining sort? Or should I just leave it unsorted?
My current implementation is just storing them unsorted in a vector, so the time complexity is O(N^2), O(N) for finding the smallest element, and N removals.
A.
If you have the list M of all changed elements of the ordered list L,
go through M, and for every element
If it is still ordered with its neigbours in M, live it be.
If it is not in order with neighbours, exclude it from the M.
Such excluded elements will create a list N
Order N
Use some algorithm for merging ordered lists. http://en.wikipedia.org/wiki/Merge_algorithm
B.
If you are sure that new elements are few and not strongly changed, simply use the bubble sort.
I would still go with a heap ,backed by an array
In case only a few elements change after each pop,After you perform the pop operation , perform a heapify up/down for any item that reduces in value. It will still be in the order of O(nlog k) values, where k is the size of your array and n the number of elements that have reduced in size.
If a lot of items change in size , then you can consider this as a case where you have an unsorted array and you just create a heap from the array.

Create a data structure in O(n+k) time that can allow me to find the number of integers in some range in the array in constant time

I have an array of integers and I know the range of values of these integers
I want to create a data structure in O(n+k) time that can allow me to find the number of integers in some range in the array in constant time.
Is this possible, I don't want to sort the array though, is this possible with some sort of balanced tree? AVL trees?
You could create another array where the values are the number of values in the target array larger than the index. Then to find the number within the range, you simply need to find a[max] - a[min].
To create the array, you could iterate through the target array, incrementing the value at the index of the searching array (this is O(n)). Then you iterate through the searching array in reverse, making each value the sum of itself and all the values after it in the array (O(k), if k is the range of the data). Of course this would require quite a bit of memory, depending on the size of your array, but it would have the performance characteristics you require.
tree operations are typically log(n)
I would have an array with the size of range which I would pre-fill with 'number of integers less than this value' and return difference between the min and max of range asked. that's const time.

Resources