I have the following question and I am unsure how to approach. I would like some help/hints with designing an efficient algorithm for the following requirements
Input
The first line of the input contains an integer N, which is length of the series.
This is followed by N lines, each of which contains a string containing only lower case characters.
1<=N<=100000.
Length of each string is between 1 and 10 (inclusive).
Output
Output the minimum length of the consecutive sub-series which contains all distinct strings.
Sample Input
6
letitbe
mihon
mihon
omi
omi
letitbe
Sample Output
18
Explanation: the last 4 consecutive strings contains all unique strings with the minimum length (smallest number of characters)
If I understood this correctly, you want the subseries which:
Contains at least 1 instance of "letitbe, "mihon" and "omi"
Has the lowest possible sum of string lengths
Here is how to do this efficiently, code in C#, algorithm explained in comments:
static void Main(string[] args)
{
// Input
var elements = new List<string> { "letitbe", "mihon", "mihon", "omi", "omi", "letitbe" };
// Find distinct elements
var distinctElements = elements.Distinct().ToList();
// Create a dictionary that tells us how many copies of each element we have in the current subseries, initialize all values to 0
var copiesOfElementInCurrentSubseries = distinctElements.ToDictionary(key => key, value => 0);
// The sum of lengths of strings in the current subseries
// Our goal is to minimize this
var lengthOfCurrentSubseries = 0;
// How many distinct elements are covered by the current subseries
// The condition under which we minimize lengthOfCurrentSubseries is that numberOfElementsCoveredByCurrentSubseries equals distinctElements
var numberOfElementsCoveredByCurrentSubseries = 0;
// We remember the solution in these
var bestStartIndex = 0;
var bestLength = elements.Sum(e => e.Length);
var bestNum = elements.Count;
// Start with startIndex and endIndex at 0, increase endIndex until we cover all distinct elements
// The subseries from startIndex to endIndex (inclusive) is our current subseries
for (int startIndex = 0, endIndex = 0; endIndex < elements.Count; endIndex++)
{
// We add the element at endIndex to our current subseries:
// If we found an element that previously wasn't covered, increase the count of covered elements
// Note that we never decrease this, because once we find a solution that covers all elements, we never make a change which "loses" some element
if (copiesOfElementInCurrentSubseries[elements[endIndex]] == 0)
{
numberOfElementsCoveredByCurrentSubseries++;
}
// Increase the number of copies of the element we added
copiesOfElementInCurrentSubseries[elements[endIndex]]++;
// Increase the total length of subseries by this element's length
lengthOfCurrentSubseries += elements[endIndex].Length;
// Initially, we will just loop increasing endIndex until all elements are covered
// Once we are covering all elements, try to improve the solution
if (numberOfElementsCoveredByCurrentSubseries == distinctElements.Count)
{
// Move startIndex to the right as far as possible while still covering all elements
while (copiesOfElementInCurrentSubseries[elements[startIndex]] > 1)
{
lengthOfCurrentSubseries -= elements[startIndex].Length;
copiesOfElementInCurrentSubseries[elements[startIndex]]--;
startIndex++;
}
// If the new solution is better, remember it
if (lengthOfCurrentSubseries < bestLength)
{
bestLength = lengthOfCurrentSubseries;
bestStartIndex = startIndex;
bestNum = endIndex - startIndex + 1;
}
}
// Now we add another element by moving endIndex one place to the right, then try improving the solution by moving startIndex to the right, and we repeat this process...
}
Console.WriteLine(string.Join(" ", elements.Skip(bestStartIndex).Take(bestNum)));
}
Note that even though this has nested loops, the inner while loop can have at most length of input steps total in all passes of the inner loop combined, as startIndex keeps its value and always moves to the right.
In case you are unfamiliar with C# - Dictionary is basically a hashtable - it can efficiently look up values based on keys (as long as the keys have a good hash function, which strings do).
Related
Let's say we have some array of boolean values:
A = [0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 ... 0]
The array is constructed by performing classification on a stream of data. Each element in the array corresponds to the output of a classification algorithm given a small "chunk" of the data. An answer may include restructuring the array to make parsing more efficient.
The array is pseudo random in the sense that groups of 1's and 0's tend to exist in bunches (but not necessarily always).
Given some index, i, what is the most efficient way to find the group of at least n zeros closest to A[i]? For the easy case, take n = 1.
EDIT: Groups should have AT LEAST n zeros. Again, for the easy case, that means at least 1 zero.
EDIT2: This search will be performed o(n) times, where n is size of the array. (Specifically, its n/c, where c is some fixed duration.
In this solution I organize the data so that you can use a binary search O(log n) to find the nearest group of at least a certain size.
I first create groups of zeros from the array, then I put each group of zeros into lists containing all groups of size s or larger , so that when you want to find the nearest group of s s or more then you just run a binary search in the list that has all groups with a size of s or greater.
The downside is in the pre-processing of putting the groups into the lists, with O(n * m) (I think, please check me) time and space efficiency where n is the number of groups of zeros, and m is the max size of the groups, though in reality the efficiency is probably better.
Here is the code:
public static class Group {
final public int x1;
final public int x2;
final public int size;
public Group(int x1, int x2) {
assert x1 <= x2;
this.x1 = x1;
this.x2 = x2;
this.size = x2 - x1 + 1;
}
public static final List<Group> getGroupsOfZeros(byte[] arr) {
List<Group> listOfGroups = new ArrayList<>();
for (int i = 0; i < arr.length; i++) {
if (arr[i] == 0) {
int x1 = i;
for (++i; i < arr.length; i++)
if (arr[i] != 0)
break;
int x2 = i - 1;
listOfGroups.add(new Group(x1, x2));
}
}
return Collections.unmodifiableList(listOfGroups);
}
public static final Group binarySearchNearest(int i, List<Group> list) {
{ // edge cases
Group firstGroup = list.get(0);
if (i <= firstGroup.x2)
return firstGroup;
Group lastGroup = list.get(list.size() - 1);
if (i >= lastGroup.x1)
return lastGroup;
}
int lo = 0;
int hi = list.size() - 1;
while (lo <= hi) {
int mid = (hi + lo) / 2;
Group currGroup = list.get(mid);
if (i < currGroup.x1) {
hi = mid - 1;
} else if (i > currGroup.x2) {
lo = mid + 1;
} else {
// x1 <= i <= x2
return currGroup;
}
}
// intentionally swapped because: lo == hi + 1
Group lowGroup = list.get(hi);
Group highGroup = list.get(lo);
return (i - lowGroup.x2) < (highGroup.x1 - i) ? lowGroup : highGroup;
}
}
NOTE: GroupsBySize can be improved, as described by #maraca to only contain a list of Groups per each distinct group size. I'll update tomorrow.
public static class GroupsBySize {
private List<List<Group>> listOfGroupsBySize = new ArrayList<>();
public GroupsBySize(List<Group> groups) {
for (Group group : groups) {
// ensure internal array can groups up to this size
while (listOfGroupsBySize.size() < group.size) {
listOfGroupsBySize.add(new ArrayList<Group>());
}
// add group to all lists up to its size
for (int i = 0; i < group.size; i++) {
listOfGroupsBySize.get(i).add(group);
}
}
}
public final Group getNearestGroupOfAtLeastSize(int index, int atLeastSize) {
if (atLeastSize < 1)
throw new IllegalArgumentException("group size must be greater than 0");
List<Group> groupsOfAtLeastSize = listOfGroupsBySize.get(atLeastSize - 1);
return Group.binarySearchNearest(index, groupsOfAtLeastSize);
}
}
public static void main(String[] args) {
byte[] byteArray = null;
List<Group> groups = Group.getGroupsOfZeros(byteArray);
GroupsBySize groupsBySize = new GroupsBySize(groups);
int index = 12;
int atLeastSize = 5;
Group g = groupsBySize.getNearestGroupOfAtLeastSize(index, atLeastSize);
System.out.println("nearest group is (" + g.x1 + ":" + g.x2 + ") of size " + g.size);
}
If you have n queries on an array of size n, then the naive approach would take O(n^2) time.
You can optimize this by incorporating the observation that the number of distinct group sizes is in the order of sqrt(n), because the most distinct group sizes we get if we have one group of size 1, one of size 2, one of size 3 and so on, we know that 1 + 2 + 3 + ... + n is n * (n + 1) / 2, so in the order of n^2, but the array has size n, so the number of distinct group sizes is in the order of sqrt(n).
create an integer array of size n to denote which group sizes are present how many times
create a list for the 0-groups, each element should contain the group size and starting index
scan the array, add the 0-groups to the list and update the present group sizes
create an array for the different group sizes, each entry should contain the group size and an array with the start indices of the groups
create an integer array or a map which tells you which group size is at which index by scanning the array of the present group sizes
go through the list of 0-groups and fill the start index arrays created at 4.
We end up with an array which takes O(n) space, takes O(n) time to create and contains all present group sizes in order, additionally each entry has an array with the start indices of the groups of that size.
To answer a query we can do a binary search on the start indices of all groups greater or equal than the given minimum group size. This takes O(log(n)*sqrt(n)) and we do it n times, so over all it would take O(n*log(n)*sqrt(n)) = O(n^1.5*log(n)) which is better than O(n^2).
I think you can get it down to O(n^1.5) by creating a structure which has all distinct group sizes but contains not only the groups of that size, but also the groups that are bigger than that size. This would be the time complexity to create the structure and answering all the n queries would be faster O(n*log(sqrt(n))*log(n)) I think, so it doesn't matter.
example:
[0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0, 1, 0, 0] -- 0 indexed array
hashmap = {1:[0], 2:[15, 18], 7:[5]}
search(i = 7, n = 2) {
binary search in {2:[15, 18], 7:[5]}
return min(15, 5)
}
what is the most efficient way to find the group of at least n zeros closest to A[i]
If we are not limited in preprocessing time and resources, the most efficient way would seem to be O(1) time and O(n * sqrt n) space, storing the answers to all possible queries. (To accomplish that, run the algorithm below with a list of all possible queries, that is each distinct zero-group size in the array paired with each index.)
If we are provided with all the n / c queries at once, we can produce the complete result set in O(n log n) time.
Traverse once from the left and once from the right. For each traversal, start with a balanced binary tree with our queries, sorted by zero-group-size (the n in the query), where each node has a sorted list of the query indexes (all is with this particular n).
At each iteration, when a zero-group is registered, update all queries with n equal and lower than this zero-group size, removing all equal and lower indexes from the node and keeping the records for them (since the index list is sorted, we just remove the head of the list while it's equal or lower than the current index), and storing the current index of the zero-group in the node as well (the "last seen" zero-group-index). If no is are left in the node, remove it.
After the traversal, assign each node's "last seen" zero-group-index to any remaining is in that node. Now we have all the answers for this traversal. (Any queries left in the tree have no answer.) In the opposite traversal, if any query comes up with a better (closer) answer, update it in the final record.
We are given n points in a 3d space ,we need to find count of all points that are strictly less than atleast one of the points in the 3d space
i.e.
x1<x2 and y1<y2 and z1<z2
so (x1,y1,z1) would be one such point.
For example,Given points
1 4 2
4 3 2
2 5 3
(1,4,2)<(2,5,3)
So the answer for the above case should be the count of such points i.e. 1.
I know this can be solved through a O(n^2) algorithm but i need something faster,i tried sorting through one dimension and then searching only over the greater part of the key , but its still o(n^2) worst case.
What is the efficient way to do this?
There is a way to optimize your search that may be faster than O(n^2) - I would welcome counter-sample input.
Keep three lists of the indexes of the points, sorted by x, y and z respectively. Make a fourth list associating each point with it's place in each of the lists (indexes in the code below; e. g., indexes[0] = [5,124,789] would mean the first point is 5th in the x-sorted list, 124th in the y-sorted list, and 789th in the z-sorted list).
Now iterate over the points - pick the list where the point is highest and test the point against the higher indexed points in the list, exiting early if the point is strictly less than one of them. If a point is low on all three lists, the likelihood of finding a strictly higher point is greater. Otherwise, a higher place in one of the lists means less iterations.
JavaScript code:
function strictlyLessThan(p1,p2){
return p1[0] < p2[0] && p1[1] < p2[1] && p1[2] < p2[2];
}
// iterations
var it = 0;
function f(ps){
var res = 0,
indexes = new Array(ps.length);
// sort by x
var sortedX =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][0] - ps[b][0]; });
// record index of point in x-sorted list
for (var i=0; i<sortedX.length; i++){
indexes[sortedX[i]] = [i,null,null];
}
// sort by y
var sortedY =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][1] - ps[b][1]; });
// record index of point in y-sorted list
for (var i=0; i<sortedY.length; i++){
indexes[sortedY[i]][1] = i;
}
// sort by z
var sortedZ =
ps.map(function(x,i){ return i; })
.sort(function(a,b){ return ps[a][2] - ps[b][2]; });
// record index of point in z-sorted list
for (var i=0; i<sortedZ.length; i++){
indexes[sortedZ[i]][2] = i;
}
// check for possible greater points only in the list
// where the point is highest
for (var i=0; i<ps.length; i++){
var listToCheck,
startIndex;
if (indexes[i][0] > indexes[i][1]){
if (indexes[i][0] > indexes[i][2]){
listToCheck = sortedX;
startIndex = indexes[i][0];
} else {
listToCheck = sortedZ;
startIndex = indexes[i][2];
}
} else {
if (indexes[i][1] > indexes[i][2]){
listToCheck = sortedY;
startIndex = indexes[i][1];
} else {
listToCheck = sortedZ;
startIndex = indexes[i][2];
}
}
var j = startIndex + 1;
while (listToCheck[j] !== undefined){
it++;
var point = ps[listToCheck[j]];
if (strictlyLessThan(ps[i],point)){
res++;
break;
}
j++;
}
}
return res;
}
// var input = [[5,0,0],[4,1,0],[3,2,0],[2,3,0],[1,4,0],[0,5,0],[4,0,1],[3,1,1], [2,2,1],[1,3,1],[0,4,1],[3,0,2],[2,1,2],[1,2,2],[0,3,2],[2,0,3], [1,1,3],[0,2,3],[1,0,4],[0,1,4],[0,0,5]];
var input = new Array(10000);
for (var i=0; i<input.length; i++){
input[i] = [Math.random(),Math.random(),Math.random()];
}
console.log(input.length + ' points');
console.log('result: ' + f(input));
console.log(it + ' iterations not including sorts');
I doubt that the worst-case complexity can be reduced below N×N, because it is possible to create input where no point is strictly less than any other point:
For any value n, consider the plane that intersects with the Z, Y and Z axis at (n,0,0), (0,n,0) and (0,0,n), described by the equation x+y+z=n. If the input consists of points on such a plane, none of the points is strictly less than any other point.
Example of worst-case input:
(5,0,0) (4,1,0) (3,2,0) (2,3,0) (1,4,0) (0,5,0)
(4,0,1) (3,1,1) (2,2,1) (1,3,1) (0,4,1)
(3,0,2) (2,1,2) (1,2,2) (0,3,2)
(2,0,3) (1,1,3) (0,2,3)
(1,0,4) (0,1,4)
(0,0,5)
However, the average complexity can be reduced to much less than N×N, e.g. with this approach:
Take the first point from the input and put it in a list.
Take the second point from the input, and compare it to the first
point in the list. If it is strictly less, discard the new point. If
it is strictly greater, replace the point in the list with the new
point. If it is neither, add the point to the list.
For each new point from the input, compare it to each point in the
list. If it is stricly less than any point in the list, discard the
new point. If it is strictly greater, replace the point in the list
with the new point, and also discard any further points in the list
which are strictly less than the new point. If the new point is not
strictly less or greater than any point in the list, add the new
point to the list.
After checking every point in the input, the result is the number of
points in the input minus the number of points in the list.
Since the probability that for any two random points a and b either a<b or b<a is 25%, the list won't grow to be very large (unless the input is specifically crafted to contain few or no points that are strictly less than any other point).
Limited testing with the code below (100 cases) with 1,000,000 randomly distributed points in a cubic space shows that the average list size is around 116 (with a maximum of 160), and the number of checks whether a point is strictly less than another point is around 1,333,000 (with a maximum of 2,150,000).
(And a few tests with 10,000,000 points show that the average number of checks is around 11,000,000 with a list size around 150.)
So in practice, the average complexity is close to N rather than N×N.
function xyzLessCount(input) {
var list = [input[0]]; // put first point in list
for (var i = 1; i < input.length; i++) { // check every point in input
var append = true;
for (var j = 0; j < list.length; j++) { // against every point in list
if (xyzLess(input[i], list[j])) { // new point < list point
append = false;
break; // continue with next point
}
if (xyzLess(list[j], input[i])) { // new point > list point
list[j] = input[i]; // replace list point
for (var k = list.length - 1; k > j; k--) {
if (xyzLess(list[k], list[j])) { // check rest of list
list.splice(k, 1); // remove list point
}
}
append = false;
break; // continue with next point
}
}
if (append) list.push(input[i]); // append new point to list
}
return input.length - list.length;
function xyzLess(a, b) {
return a.x < b.x && a.y < b.y && a.z < b.z;
}
}
var points = []; // random test data
for (var i = 0; i < 1000000; i++) {
points.push({x: Math.random(), y: Math.random(), z: Math.random()});
}
document.write("1000000 → " + xyzLessCount(points));
Suppose we have this array of integers called data
3
2
4
5
2
Also we have the following array of the same size called info
1
4
0
2
3
Each value of info represents an index on the first array. So for example the first value is 1 which means that in position 0 the final sorted array will have the value data[info[0]].
By following this logic the final sorted array will be the following:
data[info[0]] => 2
data[info[1]] => 2
data[info[2]] => 3
data[info[3]] => 4
data[info[4]] => 5
I would like to make an in place sorting of the data array, without using any extra memory of size N where N is the size of the data array. In addition I would like the amount of total operations to be as small as possible.
I've been trying to think of a solution to my problem however I couldn't think of anything that wouldn't use extra memory. Keep in mind that these are my own restrictions for a system that I'm implementing, if these restrictions can't be kept then I will probably have to think of something else.
Any ideas would be appreciated.
Thank you in advance
why not simply
for i in 0..n-1 :
info[i] := data[info[i]]
and info now holds the sorted array. If it must be in data, just copy it back, next:
for i in 0..n-1 :
data[i] := info[i]
2*n copies, overall.
If the info array need not remain intact, you can use that as additional storage and sort in O(n):
for(int i = 0; i < n; ++i) {
int where = info[i];
if (where == i) continue;
info[i] = data[i];
data[i] = i < where ? data[where] : info[where];
}
If an element of data is already in its correct place, we skip that index. Otherwise, remember the element in the info array, and write the correct element into data, fetching it from data if it comes from a larger index, and from info if it comes from a smaller index.
Of course that simple method requires the types of the info and data arrays to be the same, and in general does 3*n copies.
If the data elements cannot be stored in the info array, we can follow the cycles in info:
for(int i = 0; i < n; ++i) {
// Check if this is already in the right place, if so mark as done
if (info[i] == i) info[i] = -1;
// Nothing to do if we already treated this index
if (info[i] < 0) continue;
// New cycle we haven't treated yet
Stuff temp = data[i]; // remember value
int j = info[i], k = i;
while(j != i) {
// copy the right value into data[k]
data[k] = data[j];
// mark index k as done
info[k] = -1;
// get next indices
k = j;
j = info[j];
}
// Now close the cycle
data[k] = temp;
info[k] = -1;
}
That does n - F + C copies of data elements, where F is the number of elements that already were in the right place (fixed points of the sorting permutation) and C is the number of cycles of length > 1 in the sorting permutation. That means the number of copies is at most 3*n/2.
There is an array (greater than 1000 elements space) with 1000 large numbers (can be 64 bit numbers as well). The numbers in the array may not be necessarily sorted.
We have to generate a unique number at 1001th position that is different from the previous 1000.
Justify the approach used is the best.
My answer (don't know to what extent this was correct):
Sort the numbers, and start from the 0 position. The number that is at 1000th position + 1 is the required number.
Better suggestions for this?
Create an auxiliary array of 1001 elements. Set all these to 1 (or true or Y or whatever you choose). Run through the main array, if you find a number in the range 1..1000 then 0 out (or falsify some other how) the corresponding element in the auxiliary array. At the end the first element in the auxiliary array which is not 0 (or false) corresponds to a number which is not in the main array.
This is simple, and, I think, O(n) in time complexity, where n is the number of elements in the main array.
unsigned ii,slot;
unsigned array [ NNN ];
/* allocate a histogram */
#define XXX (NNN+1);
unsigned histogram [XXX];
memset(histogram, 0, sizeof histogram);
for (ii=0; ii < NNN; ii++) {
slot = array [ii ] % XXX;
histogram[slot] += 1;
}
for (slot=0; slot < NNN; slot++) {
if ( !histogram[slot]) break;
}
/* Now, slot + k * XXX will be a
** number that does not occur in the original array */
Note: this is similar to High performance Mark, but at least I typed in the code ...
If you sort your array, you have three possibilities for a unique number:
array[999]+1, if array[999] is not equal to INT_MAX
array[0]-1, if array[0] is not equal to INT_MIN
a number between array[i] and array[i+1], if array[i+1]-array[i]>1 (0<=i<=998). Notice that if the two previous tries have failed, then it is guaranteed that there is a number between two elements in your array.
Notice that this solution will also work for the 1002th, 1003th, and so on.
An attempt at a clumsy c# implementation
public class Test
{
public List<int> Sequence { get; set; }
public void GenerateFirstSequence()
{
Sequence = new List<int>();
for (var i = 0; i < 1000; i++)
{
var x = new Random().Next(0, int.MaxValue);
while (Sequence.Contains(x))
{
x = new Random().Next(0, int.MaxValue);
}
Sequence.Add(x);
}
}
public int GetNumberNotInSequence()
{
var number = Sequence.OrderBy(x => x).Max();
var mustRedefine = number == int.MaxValue && Sequence.Contains(number);
if (mustRedefine)
{
while (Sequence.Contains(number))
{
number = number - 1;
if (!Sequence.Contains(number))
return number;
}
}
return number + 1;
}
}
I have some thoughts on this problem:
You could create a hash table H, which contain 1000 elements. Suppose your array named A, and for each element, we have the reminder by 1000: m[i] = A[i] % 1000.
If there is a conflict between A[i] and A[j], that A[i] % 1000 = A[j] % 1000. That is to say, there must exist an index k, that no element's reminder by 1000 equals to k, then k is the number you are going to get.
If there is no conflict at all, just pick H[1] + 1000 as your result.
The complexity of this algorithm is O(l), in which l indicates the original list size, in the example, l = 1000
The title explains most of the question.
I have a tile grid which is represented by a 2D array. Some tiles are marked as empty (but they exist in the array, for certain continued uses) while others are in normal state.
What I need to do is, to reorder the remaining (non-empty) tiles in the grid so that all (or most) are in a different non-empty position. If I just iterate all the non-empty positions and swap the tile with another random one, I might be already reordering many of them automatically (the swapped ones).
So I was wondering if there's some technique I can follow so as to reorder the grid satisfactorily with minimal looping. Any hints?
public void RandomizeGrid<T>(T[,] grid, Func<T,bool> isEmpty)
{
// Create a list of the indices of all non-empty cells.
var indices = new List<Point>();
int width = grid.GetLength(0);
int height = grid.GetLength(1);
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
if (!isEmpty(grid[x,y])) // function to check emptiness
{
indices.Add(new Point(x,y));
}
}
}
// Randomize the cells using the index-array as displacement.
int n = indices.Count;
var rnd = new Random();
for (int i = 0; i < n; i++)
{
int j = rnd.Next(i,n); // Random index i <= j < n
if (i != j)
{
// Swap the two cells
var p1 = indices[i];
var p2 = indices[j];
var tmp = grid[p1.X,p1.Y];
grid[p1.X,p1.Y] = grid[p2.X,p2.Y];
grid[p2.X,p2.Y] = tmp;
}
}
}
Would it meet your needs ("satisfactorily" is a bit vague) to ensure that every non empty tile was swapped with one other non-empty tile one time?
Say you have a list :
(1,4,7,3,8,10)
we can write down the indicies of the list
(0,1,2,3,4,5)
and perform N random swaps on the indices to shuffle it - maybe some numbers move, some don't.
(5,1,3,2,4,0)
Then take these pairwise as a sequence of swaps to perform on our original list.
(8,10,3,7,1,4)
if you have an odd number of elements, the leftover is swapped with any other element in the list.