Minimum sets to cover all sub arrays

Minimum sets to cover all sub arrays - algorithm

I am explaining this question with little modification so that it becomea easy for me to explain.
There are n employees and I need to arrange an outing for them on such a day of a month on which all (or max) employees would be available for outing.
Each employee is asked to fill up an online survey stating his availability e.g. 1-31 or 15-17 etc. etc. And some might not be available for even a single day too.
There are no restrictions on the number of trips I have to arrange to cover all employees (not considering who arent available the whole month), but i want to find out minimum set of dates so as to cover all the employees. So in worst case scenario I will have to arrange trip 31 times.
Question: what is the best data structure I can use to run the best fitting algorithm on this data structure? What is the best possible way to solve this problem?
By best of course I mean time and space efficient way but I am also looking for other options to solve it.
The way I think is to maintain an array for 31 ints and initialize it to 0. Run over each employee and based on their available dates increment the array index. At the end sort the array of 31. The maximum value represents the date on qhich max employees are available. And apply the same logic on the left out employees. But the problem is to remove the left out employees. For which I will have to run over whole list of employees once to know which employees can be removed and form a new list of left out employees on which I can apply the previous logic. Running over the list twice this way to remove the employees isnt the best according to me. Any ideas?

As a first step, you should exclude employees with no available dates.
Then you problem becomes a variant of Set Cover Problem.
Your universe U is all employees, and collections of sets S are days. For each day i, you have employee j is in set S[i] iff that employee is available on day i.
That problem is NP-hard. So, unless you want an approximate solution, you must check every 31^2 combination of days, probably with some pruning.

Select an array from 1 to 31(each index is representing dates of a month).for each date you have to create a linked list(doubly) contains the emp_id who are available on that days(you can simultaneously create this list which will be sorted based on emp_id,and you can keep the information about the size of the list and the index of array which maximum employees).
The largest list must be in the solution(take it as first date).
Now compare each list with the largest list and remove those employees from the list which are already present in the selected largest list.
now do the same procedure and find the second date and so on...
this whole procedure will run in O(n^2)(because 31 is a constant value).
and space will be O(n).

Related

Persistent partitioning of the data

I am looking for the approach or algorithm that can help with the following requirements:
Partition the elements into a defined number of X partitions. Number of partitions might be redefined manually over time if needed.
Each partition should not have more than Y elements
Elements have a "category Id" and "element Id". Ideally all elements with the same category Id should be within the same partition. They should overflow to as few partitions as possible only if a given category has more than Y elements. Number of categories is orders of magnitude larger than number of partitions.
If the element from the set has been previously assigned to a given partition it should continue being assigned to the same partition
Account for change in the data. Existing elements might be removed and new elements within each of the categories can be added.
So far my naive approach is to:
sort the categories descending by their number of elements
keep a variable with a count-of-elements for a given partition
assign the rows from the first category to the first partition and increase the count-of-elements
if count-of-elements > Y: assign overflowing elements to the next partition, but only if the number of elements in a category is bigger than Y. Otherwise assign all elements from a given category to the next partition
continue till all elements are assigned to partitions
In order to persist the assignments store in the database all pairs: (element Id, partition Id)
On the consecutive re-assignments:
remove from the database any elements that were deleted
assign existing elements to the partitions based on (element Id, partition Id)
for any new elements follow the above algorithm
My main worry is that after few such runs we will end up with categories spread all across the partitions as the initial partitions will get all filled in. Perhaps adding a buffer (of 20% or so) to Y might help. Also if one of the categories will see a sudden increase in a number of elements the partitions will need rebalancing.
Are there any existing algorithms that might help here?

This is NP hard (knapsack) on NP hard (finding optimal way to split too large categories) on currently unknown because of future data changes. Obviously the best that you can do is a heuristic.
Sort the categories by descending size. Using a heap/priority queue for the partitions, put each category into the least full available partition. If the category won't fit, then split it as evenly as you can into the smallest number of possible partitions. My guess (experiment!) is that trying to leave partitions at the same fill is best.
On reassignment, delete the deleted elements first. Then group new elements by category. Sort the categories by how many preferred locations they have ascending, and then by descending size. Now move the categories with 0 preferred locations to the end.
For each category, if possible split its new elements across the preferred partitions, leaving them equally full. If this is not possible, put them into the emptiest possible partition. If that is not possible, then split them to put them across the fewest possible partitions.
It is, of course, possible to come up with data sets that eventually turn this into a mess. But it makes a pretty good good faith effort to try to come out well.

Student Election Results Tallying

This is a coding interview question:
Your school is having an election and you are tasked with coding a program that tallies the results.
You are given a Set of Votes, each vote containing a candidate and a time stamp. Given a time stamp, return the top N candidates with the most votes at that timestamp. (each vote tallied must come before or at the given timestamp)

Create the Min Heap and HashMap Data structure to solve this problem.
1. Cast each vote in HashMap(Candidate, Votes).
2. At any time we want to find the N top trending Candidate, Add all the HashMap keys(Candidate votes) to the min heap with restriction of N size.
3. return all the item from the min heap, which will return the top N candidate with the votes. (as min heap filter the candidate with the restriction on size N).

This is probably far from the most efficient way, but I would:
Create a list candidateList containing each candidate and their respective number of votes (initially 0)
Go through the set of votes, and if a vote meets the time stamp requirement, add 1 to the candidate's votes in candidateList.
After you've gone through the set of votes, find the nth most popular candidate in candidateList (using a selection algorithm), and then iterate over the list to find the candidates more popular than them.

I would do it per array
For every new date you read, you create a new subarray: lets say you get a vote from the 9th of August 2016, for which you don't have votes registered yet l, for John Doe soons as you register a vote for lets say John Doe.
Your array should then be constructed like this:
array ->0->date: 09/08/2016
->John Doe: 1
Since I assume at an election all the names are known, we can simply save all the candidates names in another array, which we can use when we loop through this one.
Incase a new vote for John Doe on another date gets registered, your array would look like this
array ->0->date: 09/08/2016
->John Doe: 1
->1->date: 11/08/2016
->John Doe: 1
If someone votes for another persom on an already known date, it should look like this
array ->0->date: 09/08/2016
->John Doe: 1
->Jane Doe: 1
Hope this helps. If you want help looping through this array-structure-thingy, don't be afraid to ask :)

Analysis of different Sets and optimizations. Best approach?

For the last few days, I've tried to accomplish the following task regarding the analysis of a Set of Objects, and the solutions I've come up with rely heavily on Memory (obtaining OutOfMemory exceptions in some cases) or take an incredible long time to process. I now think is a good idea to post it here, as I'm out of ideas. I will explain the problem in detail, and provide the logic I've followed so far.
Scenario:
First, we have an object, which we'll name Individual, that contains the following properties:
A date
A Longitude - Latitude pair
Second, we have another object, which we'll name Group, which definition is:
A set of Individuals that, together, match the following conditions:
All individuals in the set have a date which, within each other, is not superior to 10 days. This means that all of the Individuals, if compared within each other, don´t differ in 10 days between each other.
The distance between each object is less than Y meters.
A group can have N>1 individuals, as long as each of the Individuals match the conditions within each other.
All individuals are stored in a database.
All groups would also be stored in a database.
The task:
Now, consider a new individual.
The system has to check if the new individual:
Belongs to an existing Group or Groups
The Individual now forms one or multiple new Groups with other Individuals.
Notes:
The new individual could be in multiple existing groups, or could create multiple new groups.
SubGroups of Individuals are not allowed, for example if we have a Group that contains Individuals {A,B,C}, there cannot exist a group that contains {A,B}, {A,C} or {B,C}.
Solution (limited in processing time and Memory)
First, we filter the database with all the Individuals that match the initial conditions. This will output a FilteredIndividuals enumerable, containing all the Individuals that we know will form a Group (of 2) with the new one.
Briefly, a Powerset is a set that contains all the possible subsets of
a particular set. For example, a powerset of {A,B,C} would be:
{[empty], A, B, C, AB, AC, BC, ABC}
Note: A powerset will output a new set with 2^N combinations, where N is the length of the originating set.
The idea with using powersets is the following:
First, we create a powerset of the FilteredIndividuals list. This will give all possible combinations of Groups within the FilteredIndividuals list. For analysis purposes and by definition, we can omit all the combinations that have less than 2 Individuals in them.
We Check if each of the Individuals in a combination of the powerset, match the conditions within each other.
If they match, that means that all of the Individuals in that combination form a Group with the new Individual. Then, to avoid SubGroups, we can eliminate all of the subsets that contain combinations of the Checked combination. I do this by creating a powerset of the Checked combination, and then eliminating the new powerset from the original one.
At this point, we have a list of sets that match the conditions to form a Group.
Before formally creating a Group, I compare the DB with other existing Groups that contain the same elements as the new sets:
If I find a match, I eliminate the newly created set, and add the new Individual to the old Group.
If I don't find a match, it means they are new Groups. So I add the new Individual to the sets and finally create the new Groups.
This solution works well when the FilteredIndividuals enumerable has less than 52 Individuals. After that, Memory exceptions are thrown (I know this is because of the maximum size allowed for data types, but incrementing such size is not of help with very big sets. For your consideration, the top amount of Individuals that match the conditions I've found is 345).
Note: I have access to the definition of both entities. If there's a new property that would reduce the processing time, we can add it.
I'm using the .NET framework with C#, but if the language is something that requires changing, we can accept this as long as we can later convert the results to object understandable by our main system.

All individuals in the set have a date which, within each other, is not superior to 10 days. This means that all of the Individuals, if compared within each other, don´t differ in 10 days between each other.
The distance between each object is less than Y meters.
So your problem becomes how to cluster these points in 3-space, a partitioning where X and Y are your latitude and longitude, Z is the time coordinate, and your metric is an appropriately scaled variant of the Manhattan distance. Specifically you scale Z so that 10*Z days equals your maximum distance of Y meters.
One possible shortcut would be to use the divide et impera and classify your points (Individuals) in buckets, Y meters wide and 10 days high. You do so by dividing their coordinates by Y and by 10 days (you can use Julian dates for that). If an individual is in bucket H { X=5, Y=3, Z=71 }, then it cannot be than any individual in buckets with X < (5-1) or X > (5+1), Y < (3-1) or Y > (3+1), or Z < (71-1) or Z > (71+1), is in his same group, because their distance would certainly be above the threshold. This means that you can quickly select a subset of 27 "buckets" and worry with only those individuals in there.
At this point you can enumerate the possible groups your new individual can be in (if you use a database back end, they would be SELECT groups.* FROM groups JOIN iig USING (gid) JOIN individuals USING (uid) WHERE individuals.bucketId IN ( #bucketsId )), and compare those with the group your individual may form from other individuals (SELECT individuals.id WHERE bucketId IN ( #bucketsId ) AND ((x-#newX)*(x-#newX)+(y-#newY)*(y-#newY)) < #YSquared AND ABS(z - #newZ) < 10)).
This approach is not very performant (it depends on the database, and you'll want an index on bucketId at a minimum), but it has the advantage of using as little memory as possible.
On some database backends with geographical extensions, you might want to use the native latitude and longitude functions instead of implicitly converting to meters.

What Sort Of Algorithm Should I Use To Sort Students?

In a program that generates random groups of students, I give the user the option to force specific students to be grouped together and also block students from being paired. I have tried for two days to make my own algorithm for accomplishing this, but I get lost in all of the recursion. I'm creating the program in Lua, but I'll be able to comprehend any sort of pseudo code. Here's an example of how the students are sorted:
students = {
Student1 = {name=Student1, force={"Student2"}, deny={}};
Student2 = {name=Student2, force={"Student1","Student3"}, deny={}};
Student3 = {name=Student3, force={"Student2"}, deny={}};
}-- A second name property is given in the case that the student table needs to be accessed by students[num] to retrieve a name
I then create temporary tables:
forced = {}--Every student who has at least 1 entry in their force table is placed here, even if they have 1 or more in the deny table
denied = {}--Every student with 1 entry for the deny table and none in the force table is placed here
leftovers = {}--Every student that doesn't have any entries in the force nor deny tables is placed here
unsortable = {}--None are placed here yet -- this is to store students that are unable to be placed according to set rules(i.e. a student being forced to be paired with someone in a group that also contains a student that they can't be paired with
SortStudentsIntoGroups()--predefined; sorts students into above groups
After every student is placed in those groups(note that they also remain in the students table still), I begin by inserting the students who are forced to be paired together in groups(well, I have tried to), insert students who have one or more entries in the deny table into groups where they are able to be placed, and just fill the remaining groups with the leftovers.
There are a couple of things that will be of some use:
numGroups--predefined number of groups
maxGroupSize--automatically calculated; quick reference to largest amount of students allowed in a group
groups = {}--number of entries is equivalent to numGroups(i.e. groups = {{},{},{}} in the case of three groups). This is for storing students in so that the groups may be displayed to the end user after the algorithm is complete.
sortGroups()--predefined function that sorts the groups from smallest to largest; will sort largest to smallest if supplied a true boolean as a parameter)
As I stated before, I have no clue how to set up a recursive algorithm for this. Every time I try and insert the forced students together, I end up getting the same student in multiple groups, forced pairs not being paired together, etc. Also note the formats. In each student's force/deny table, the name of the target student is given -- not a direct reference to the student. What sort of algorithm should I use(if one exists for this case)? Any help is greatly appreciated.

Seems to me like you are facing an NP-Hard Problem here.
This is equivalent to graph-coloring problem with k colors, where edges are the denial lists.
Graph Coloring:
Given a graph G=(V,E), and an integer `k`, create coloring function f:V->{1,2,..,k} such that:
f(v) = f(u) -> (v,u) is NOT in E
The reduction from graph coloring to your problem:
Given a graph coloring problem (G,k) where G=(V,E), create an instance of your problem with:
students = V
for each student: student.denial = { student' | for each edge (student,student')}
#groups = k
Intuitively, each vertex is represented by a student, and a student denies all students that there is an edge between the vertices representing them.
The number of groups is the given number of colors.
Now, given a solution to your problem - we get k groups that if student u denies student v - they are not in the same group, but this is the same as coloring u and v in different colors, so for each edge (u,v) in the original graph, u and v are in different colors.
The other way around is similar
So, we got here a polynomial reduction from graph-coloring problem, and thus finding an optimal solution to your problem is NP-Hard, and there is no known efficient solution to this problem, and most believe one does not exist.
Some alternatives are using heuristics such as Genetic Algorithms that does not provide optimal solution, or using time consuming brute force approaches (that are not feasible for large number of students).
The brute force will just generate all possible splits to k groups, and check if it is a feasible solution, at the end - the best solution found will be chosen.

How do I pick the most beneficial combination of items from a set of items?

I'm designing a piece of a game where the AI needs to determine which combination of armor will give the best overall stat bonus to the character. Each character will have about 10 stats, of which only 3-4 are important, and of those important ones, a few will be more important than the others.
Armor will also give a boost to 1 or all stats. For example, a shirt might give +4 to the character's int and +2 stamina while at the same time, a pair of pants may have +7 strength and nothing else.
So let's say that a character has a healthy choice of armor to use (5 pairs of pants, 5 pairs of gloves, etc.) We've designated that Int and Perception are the most important stats for this character. How could I write an algorithm that would determine which combination of armor and items would result in the highest of any given stat (say in this example Int and Perception)?

Targeting one statistic
This is pretty straightforward. First, a few assumptions:
You didn't mention this, but presumably one can only wear at most one kind of armor for a particular slot. That is, you can't wear two pairs of pants, or two shirts.
Presumably, also, the choice of one piece of gear does not affect or conflict with others (other than the constraint of not having more than one piece of clothing in the same slot). That is, if you wear pants, this in no way precludes you from wearing a shirt. But notice, more subtly, that we're assuming you don't get some sort of synergy effect from wearing two related items.
Suppose that you want to target statistic X. Then the algorithm is as follows:
Group all the items by slot.
Within each group, sort the potential items in that group by how much they boost X, in descending order.
Pick the first item in each group and wear it.
The set of items chosen is the optimal loadout.
Proof: The only way to get a higher X stat would be if there was an item A which provided more X than some other in its group. But we already sorted all the items in each group in descending order, so there can be no such A.
What happens if the assumptions are violated?
If assumption one isn't true -- that is, you can wear multiple items in each slot -- then instead of picking the first item from each group, pick the first Q(s) items from each group, where Q(s) is the number of items that can go in slot s.
If assumption two isn't true -- that is, items do affect each other -- then we don't have enough information to solve the problem. We'd need to know specifically how items can affect each other, or else be forced to try every possible combination of items through brute force and see which ones have the best overall results.
Targeting N statistics
If you want to target multiple stats at once, you need a way to tell "how good" something is. This is called a fitness function. You'll need to decide how important the N statistics are, relative to each other. For example, you might decide that every +1 to Perception is worth 10 points, while every +1 to Intelligence is only worth 6 points. You now have a way to evaluate the "goodness" of items relative to each other.
Once you have that, instead of optimizing for X, you instead optimize for F, the fitness function. The process is then the same as the above for one statistic.

If, there is no restriction on the number of items by category, the following will work for multiple statistics and multiple items.
Data preparation:
Give each statistic (Int, Perception) a weight, according to how important you determine it is
Store this as a 1-D array statImportance
Give each item-statistic combination a value, according to how much said item boosts said statistic for the player
Store this as a 2-D array itemStatBoost
Algorithm:
In pseudocode. Here assume that itemScore is a sortable Map with Item as the key and a numeric value as the value, and values are initialised to 0.
Assume that the sort method is able to sort this Map by values (not keys).
//Score each item and rank them
for each statistic as S
for each item as I
score = itemScore.get(I) + (statImportance[S] * itemStatBoost[I,S])
itemScore.put(I, score)
sort(itemScore)
//Decide which items to use
maxEquippableItems = 10 //use the appropriate value
selectedItems = new array[maxEquippableItems]
for 0 <= idx < maxEquippableItems
selectedItems[idx] = itemScore.getByIndex(idx)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio