How to write or approach the following problem:
Having N file paths as input (with maximum length M), located in K different folders (nested at any level) find L (where L <= K) tree root folders which contains equal (or as equal as possible) number of underlying files. Input paths are absolute, e.g.
/folderA/file1
/folderA/folderB/file2
/folderA/folderB/file3
/folderA/folderB/folderC/file4
/folderD/file5
Here are a couple of pointers to get you started:
Since you need to group folders hierarchically, tree would be a suitable data structure to use
Each file path can be added to the tree (so the file becomes a leaf in the tree)
Number of files in a folder would be equal to the number of nodes in the subtree of that folder
Using these ideas, it should be easy for you to come up with a solution.
Related
Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).
Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).
I have been working on this problem (https://github.com/alexpchung/File-Distribution-Planning/blob/master/README.pdf) where I need to find an optimal solution to place the files in the node.
Here is my algorithm which I have used so far
Say number of nodes is N.
keep track of available file size for every node iterate through
every file, it has N choices to go to (assuming file fits in etc)
Recursively evaluate for every
Another solution which I have thought is to iterate through each and every node and do a knapsack 0/1. Unfortunately, i got struck because since the node sizes are not fixed it will be an incorrect solution.
If you have any pointers that would be great.
Thanks.
Maybe you can benchmark this:
Sort two lists.(capacity,size, all increasing)
Start from biggest file.
Also start from biggest node.
Check if it fits
true: put it in
false: put it to "failed" list since no bigger node exists.
If selected(biggest) node is full, iterate next smaller node
İterate to next smaller file.
go back to checking step until either one of conditions true
all files assigned, empty nodes exist
all nodes full, unplaced files exist
(*)sort only nodes on their empty spaces(empty nodes exist=true or both true)
duplicate node list with opposite order
check if "latest added file" on least "empty space"d can fit biggest "empty space"d node and if transition yields equal/balanced empty space on both
true: send file to that node
false: iterate on next "least empty space" node since that file can't fit others neighter
iterate both lists(and remove refined pairs from lists)
if at least 1 files could be refined, go to (*)
I have the list of numbers 50,40,60,30,70. Lets assume I would like to insert these to an empty 2-3-4 Tree. Which one of these numbers would be the parent root of the tree and why? Is it the insertion order, is it how big the number is? I would like to be able to draw 234Tree when I'm giving a list of numbers. I can't seem to do that because I don't know which one to use as a parent root to start with. Simply, what factor specifies the parent root of this tree.
In a balanced tree data structure, the root element will usually contain a value close to the median of the items that have been added to it. However, because the tree will usually not be perfectly balanced, you may not have the exact median in the root. The exact structure of the tree may be dependent on the order the values were added to it.
In your question, you mention adding five items to a 2-3-4 tree. That will always end up with a two-level tree structure, but the exact structure will vary depending on the order the elements are added. If you add them in the order they're listed in the question, you'll get:
root -> <50>
/ \
<30,40> <60,70>
But if you added the elements in another order, you could have 40 or 60 in the root and 50 in one of the leaf nodes.
The task is to make a tree from list of sorted paths. Each node is a filesystem object(file or folder).
Currently I'm using this one (pseudo code):
foreach(string path in pathList)
{
INode currentNode = rootNode;
StringCollection pathTokens = path.split(pathSplitter);
foreach(pathToken in pathTokens)
{
if (currentNode.Children.contains(pathToken ))
{
currentNode = currentNode.Children.find(pathToken);
}
else
{
currentNode = currentNode.Children.Add(pathToken);
}
}
}
pathSplitter is a \ for win and / for *nix.
Is there a more efficient way to solve that task?
They key quality of your input data is that the list of paths is sorted. Hence you can work with common prefixes between the current and previous nodes quite efficiently. What you can do is maitain the last trace through the tree data structure from its root the leaf folder node. Then for the current path you just traverse the previous trace (i.e. process the current path relative to the last path) instead of finding the right position in the tree again and again.
When comparing the last and current path, three cases may happen:
1) Same paths
\path\to\folder\file1.txt
\path\to\folder\file2.txt
The trace remains, node for file2.txt is added.
2) New path is a subpath
\path\to\folder\file1.txt
\path\to\folder\subfolder\file2.txt
Nodes for subfolder and file2.txt are added.
3) New path is different
\path\to\folder\file1.txt
\path\to\another_folder\subfolder\file2.txt
First you need to back-track the trace to represent \path\to\. Then, nodes for another_path, subfolder and file2.txt are added. (Note that the another_folder\subfolder\ portion may be missing completely — I hope it's clear.)
Depending on the overall characteristics and volume of data such algoritm may perform faster. You could play with some formal Big-O estimations, but I think it would be faster to just test it.
The algorithm seems optimal to me; if I am not mistaken, the sorting of paths implies that the nodes will be generated in a depth-first sequence with respect to the tree on which they originate. This means that no unneccessary backtracing in the graph is performed. Furthermore, the algorithm is linear in the number of paths in the input and every path is processin in time linear in its length, so the overall running time is linear in the size of the input. Complexity-wise, this means that the algorithm is optimal since it is impossible to read all paths with lower runtime complexity.
We have a project in our Data Structures course and I am stuck with one of the problems.
I could not find any suitable solution for this problem in the web due to the special complexity limitations we where given.
The Problem:
let there be two Linked Lists, which intersect after m and n nodes (and continue). the first list has m nodes before the common node and the second one has n nodes up to the common node.
(m and n are not known).
There are two pointers L1, L2 to the first link in each list.
There is NO pointer to the end of any list.
The problem is to find the common node within limitations of O(m+n) [we cant run to the end of any of the links...], with a limit of O(1) additional memory [No option of changing/adding additional data in every link].
The two lists have only pointers pointing forwards (Singly Linked List).
The list pointers can be changed, but the order of the original list needs to be restored.
[although a solution that will ruin the list is also better than nothing].
I am after days of drawing lists and nodes.... losing my mind here :)
Thanks a lot,
Barak.
You already know the number of elements before the common node in each list.
... the first list has m nodes before the common node ...
Just skip that number of nodes in the corresponding list to reach to the common intersecting node.
I am not sure if you are asking the right question here. Kindly update if there is a change in your problem statement.
->Update:
Iterate each list to find its length.
length(List1) = x
length(List2) = y
Let x > y
skip (x-y) nodes on List1.
Traverse List1 and List2 simultaneously and compare the nodes of both the lists. When you find the nodes on both lists to be equal, that will be your point of intersection.