I have been working on this problem (https://github.com/alexpchung/File-Distribution-Planning/blob/master/README.pdf) where I need to find an optimal solution to place the files in the node.
Here is my algorithm which I have used so far
Say number of nodes is N.
keep track of available file size for every node iterate through
every file, it has N choices to go to (assuming file fits in etc)
Recursively evaluate for every
Another solution which I have thought is to iterate through each and every node and do a knapsack 0/1. Unfortunately, i got struck because since the node sizes are not fixed it will be an incorrect solution.
If you have any pointers that would be great.
Thanks.
Maybe you can benchmark this:
Sort two lists.(capacity,size, all increasing)
Start from biggest file.
Also start from biggest node.
Check if it fits
true: put it in
false: put it to "failed" list since no bigger node exists.
If selected(biggest) node is full, iterate next smaller node
İterate to next smaller file.
go back to checking step until either one of conditions true
all files assigned, empty nodes exist
all nodes full, unplaced files exist
(*)sort only nodes on their empty spaces(empty nodes exist=true or both true)
duplicate node list with opposite order
check if "latest added file" on least "empty space"d can fit biggest "empty space"d node and if transition yields equal/balanced empty space on both
true: send file to that node
false: iterate on next "least empty space" node since that file can't fit others neighter
iterate both lists(and remove refined pairs from lists)
if at least 1 files could be refined, go to (*)
Related
Disclaimer: I really believe that this is not a duplicate of similar questions. I've read those, and they (mostly) recommend using a heap or a priority queue. My question is more of the "I don't understand how those would work in this case" kind.
In short:
I'm referring to the typical A* (A-star) pathfinding algorithm, as described (for example) on Wikipedia:
https://en.wikipedia.org/wiki/A*_search_algorithm
More specifically, I'm wondering about what's the best data structure (which can be a single well known data structure, or a combination of those) to use so that you never have O(n) performance on any of the operations that the algorithm requires to do on the open list.
As far as I understand (mostly from the Wikipedia article), the operations needed to be done on the open list are as follows:
The elements in this list need to be Node instances with the following properties:
position (or coordinates). For the sake of argument, let's say this is a positive integer ranging in value from 0 to 64516 (I'm limiting my A* area size to 254x254, which means that any set of coordinates can be bit-encoded on 16 bits)
F score. This is positive floating point value.
Given these, the operations are:
Add a node to the open list: if a node with the same position (coordinates) exists (but, potentially, with a different F score), replace it.
Retrieve (and remove) from the open list the node with the lowest F score
(Check if exists and) retrieve from the list a node for a given position (coordinates)
As far as I can see, the problem with using a Heap or Priority Queue for the open list are:
These data structure will use the F-score as sorting criteria
As such, adding a node to this kind of data structure is problematic: how do you check optimally that a node with a similar set of coordinates (but a different F Score) doesn't already exist. Furthermore, even if you somehow are able to do this check, if you actually find such a node, but it is not on the top of the Heap/Queue, how to you optimally remove it such that the Heap/Queue keeps its correct order
Also, checking for existence and removing a node based on its position is not optimal or even possible: if we use a Priority Queue, we have to check every node in it, and remove the corresponding one if found. For a heap, if such a removal is necessary, I imagine that all remaining elements need to be extracted and re-inserted, so that the heap still remains a heap.
The only remaining operating where such a data structure would be good is when we want to remove the node with the lowest F-score. In this case the operation would be O(Log(n)).
Also, if we make a custom data structure, such as one that uses a Hashtable (with position as key) and a Priority Queue, we would still have some operations that require suboptimal processing on either of these: In order to keep them in sync (both should have the same nodes in them), for a given operation, that operation will always be subomtimal on one of the data structures: adding or removing a node by position would be fast on the Hashtable but slow on the Priority Queue. Removing the node with the lowest F score would be fast on the Priority Queue but slow on the Hashtable.
What I've done is make a custom Hashtable for the nodes that uses their position as key, that also keeps track of the current node with the lowest F score. When adding a new node, it checks if its F score is lower than the currently stored lowest F score node, and if so, it replaces it. The problem with this data structure comes when you want to remove a node (whether by position or the lowest F scored one). In this case, in order to update the field holding the current lowest F score node, I need to iterate through all the remaining node in order to find which one has the lowest F score now.
So my question is: is there a better way to store these ?
You can combine the hash table and the heap without slow operations showing up.
Have the hash table map position to index in the heap instead of node.
Any update to the heap can sync itself (which requires the heap to know about the hash table, so this is invasive and not just a wrapper around two off-the-shelf implementations) to the hash table with as many updates (each O(1), obviously) as the number of items that move in the heap, of course only log n items can move for an insertion, remove-min or update-key. The hash table finds the node (in the heap) to update the key of for the parent-updating/G-changing step of A* so that's fast too.
In an interview I was asked to detect the loop node in the linked list and count the number of nodes in the loop. As i was unaware of the flyod algorithm, I tried to figure out my own approach.
The idea is in this scenario, the address of two nodes will be pointing to the same node(loop node).
Eg.
1-->2-->4-->5-->7-->3-->4
Here 2->next and 3->next is same, which is the address of 4. Which means there is a loop in the linked list and 4 is the loop node. And traversing from 4 to 4 will give the number of nodes in loop.
Is there a way we can go ahead with this approach ????
Of course you can trivially find a cycle by maintaining a set of addresses already visited. Since you are interested in loop size, you can make this a map instead: address->count. The count is how many nodes have been visited upon visiting the one at the corresponding address. When you discover a node already in the table, you can get the loop size by subtracting the count in the table from the current one. Of course you can get the same effect by maintaining a "visited" bit and count in the nodes themselves. Then no separate map is needed.
So i have been asked in my homework to merge-sort 2 different sorted circular linked lists without useing a sentinal node allso the lists can be empty, my questsion is what is a sentinal node in the first place?
a sentinal node is a node that contains no real data - it's just there for the convenience of the implementation.
Thus a list with 4 real elements might have one or more extra nodes, making a total of 5 or 6 nodes.
Those extra nodes might be place holders (e.g. marking where you started the merge), pseudo-nodes indicating the head of the list, or anything else the algorithm designer can think up.
A sentinel node is a node that you add to your code to avoid handling degeneracies with special code. For merge sort for example, you can add a node with value = INFINITY to the end of both lists that you want to merge, this guarantees that once you hit the end of a list you can't go beyond that because the value is always greater (or equal) to the values in the other list.
So if you are not using a sentinel, you have to write code to handle this. In your merge routine, you should check that you've reached the end..
Sentinel node is a traversal path terminator in linked lists and tree. It doesn't hold or reference any data managed by the data structure. One of the benefit is to Reduce algorithmic complexity and code size.In your case the complexity,code size will be increased and speed of operation will be decreased.
Apparently you could do either, but the former is more common.
Why would you choose the latter and how does it work?
I read this: http://www.drdobbs.com/cpp/stls-red-black-trees/184410531; which made me think that they did it. It says:
insert_always is a status variable that tells rb_tree whether multiple instances of the same key value are allowed. This variable is set by the constructor and is used by the STL to distinguish between set and multiset and between map and multimap. set and map can only have one occurrence of a particular key, whereas multiset and multimap can have multiple occurrences.
Although now i think it doesnt necessarily mean that. They might still be using containers.
I'm thinking all the nodes with the same key would have to be in a row, because you either have to store all nodes with the same key on the right side or the left side. So if you store equal nodes to the right and insert 1000 1s and one 2, you'd basically have a linked list, which would ruin the properties of the red black tree.
Is the reason why i can't find much on it that it's just a bad idea?
down side of store as multiple nodes:
expands tree size, which make search slower.
if you want to retrieve all values for key K, you need M*log(N) time, where N is number of total nodes, M is number of values for key K, unless you introduce extra code (which complicates the data structure) to implement linked list for these values. (if storing collection, time complexity only take log(N), and it's simple to implement)
more costly to delete. with multi-node method, you'll need to remove node on every delete, but with collection-storage, you only need to remove node K when the last value of key K is deleted.
Can't think of any good side of multi-node method.
Binary Search trees by definition cannot contain duplicates. If you use them to produce a sorted list throwing out the duplicates would produce an incorrect result.
I am working on an implementation of Red Black trees in PHP when I ran into the duplicate issue. We are going to use the tree for sorting and searching.
I am considering adding an occurrence value to the node data type. When a duplicate is encountered just increment occurrence. When walking the tree to produce output just repeat the value by the number of occurrences. I think I would still have a valid BST and avoid having a whole chain of duplicate values which preserve the optimal search time.
I'm trying to find the width of a directed acyclic graph... as represented by an arbitrarily ordered list of nodes, without even an adjacency list.
The graph/list is for a parallel GNU Make-like workflow manager that uses files as its criteria for execution order. Each node has a list of source files and target files. We have a hash table in place so that, given a file name, the node which produces it can be determined. In this way, we can figure out a node's parents by examining the nodes which generate each of its source files using this table.
That is the ONLY ability I have at this point, without changing the code severely. The code has been in public use for a while, and the last thing we want to do is to change the structure significantly and have a bad release. And no, we don't have time to test rigorously (I am in an academic environment). Ideally we're hoping we can do this without doing anything more dangerous than adding fields to the node.
I'll be posting a community-wiki answer outlining my current approach and its flaws. If anyone wants to edit that, or use it as a starting point, feel free. If there's anything I can do to clarify things, I can answer questions or post code if needed.
Thanks!
EDIT: For anyone who cares, this will be in C. Yes, I know my pseudocode is in some horribly botched Python look-alike. I'm sort of hoping the language doesn't really matter.
I think the "width" you're considering here isn't really what you want - the width depends on how you assign levels to each node where you have some choice. You noticed this when you were deciding whether to assign all sources to level 0 or all sinks to the max level.
Instead, you just want to count the number of nodes and divide by the "critical path length", which is the longest path in the dag. This gives the average parallelism for the graph. It depends only on the graph itself, and it still gives you an indication of how wide the graph is.
To compute the critical path length, just do what you're doing - the critical path length is the maximum level you end up assigning.
In my opinion when you're doing this type of last minute development, its best to keep the new structures separate from the ones you are already using. At this point, if I were pressed by time I would go for a simpler solution.
Create an adjacency matrix for the graph using the parent data (should be easy)
Perform a topological sort using this matrix. (or even use tsort if pressed for time)
Now that you have a topological sort, create an array level, one element for each node.
For each node:
If the node has no parents set its level to 0
Otherwise set it to the minimum of level its parents + 1.
Find the maximum level width.
The question is as Keith Randall asked, is this the right measurement you need?
Here's what I (Platinum Azure, the original author) have so far.
Preparations/augmentations:
Add "children" field to linked list ("DAG") node
Add "level" field to "DAG" node
Add "children_left" field to "DAG" node. This is used to make sure that all children are examined before a parent is examined (in a later stage of the algorithm).
Algorithm:
Find the number of immediate children for all nodes; also, determine leaves by adding nodes with children==0 to list.
for l in L:
l.children = 0
for l in L:
l.level = 0
for p in l.parents:
++p.children
Leaves = []
for l in L:
l.children_left = l.children
if l.children == 0:
Leaves.append(l)
Assign every node a "reverse depth" level. Normally by depth, I mean topologically sort and assign depth=0 to nodes with no parents. However, I'm thinking I need to reverse this, with depth=0 corresponding to leaves. Also, we want to make sure that no node is added to the queue without all its children "looking at it" first (to determine its proper "depth level").
max_level = 0
while !Leaves.empty():
l = Leaves.pop()
for p in l.parents:
--p.children_left
if p.children_left == 0:
/* we only want to append parents with for sure correct level */
Leaves.append(p)
p.level = Max(p.level, l.level + 1)
if p.level > max_level:
max_level = p.level
Now that every node has a level, simply create an array and then go through the list once more to count the number of nodes in each level.
level_count = new int[max_level+1]
for l in L:
++level_count[l.level]
width = Max(level_count)
So that's what I'm thinking so far. Is there a way to improve on it? It's linear time all the way, but it's got like five or six linear scans and there will probably be a lot of cache misses and the like. I have to wonder if there isn't a way to exploit some locality with a better data structure-- without actually changing the underlying code beyond node augmentation.
Any thoughts?