Let's pretend I have the following grid. I have to connect pairs of letters. Not only the same letters have to be connected, but I must also make sure that the connecting paths don't cross each other. What's the algorithm that could tell me if it is possible to connect all the pairs without crossing paths and the shortest path?
I realize that this is a graph problem and the shortest path part could be solved using BFS. What I am not sure about is the crossing paths.
+---+---+---+---+---+---+---+---+
| A | | | B | | | | |
+-------------------------------+
| | | | | | | | |
+-------------------------------+
| | | B | | | | D | |
+-------------------------------+
| | | | | | | | |
+-------------------------------+
| | C | | | C | | | |
+-------------------------------+
| | | | A | | | | |
+-------------------------------+
| | | | | | | D | |
+-------------------------------+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
This is an NP-complete problem called "Disjoint Connecting Paths". Other than some super-polynomial algorithm (really slow), there are some approximation algorithms (might make a mistake, or is non-optimal).
An Approximation Algorithm for the Disjoint Paths Problem in Even-Degree
Planar Graphs - Jonas Kleinberg (pdf)
Related
Do we need to think about underlying cluster while designing nifi templates?
Here is my simple flow
+-----------------+ +---------------+ +-----------------+
| | | | | |
| READ FROM | | MERGE | | PUT HDFS |
| KAFKA | | FILES | | |
| +-----------------------> | +---------------------> | |
| | | | | |
| | | | | |
| | | | | |
+-----------------+ +---------------+ +-----------------+
I have 3 nodes cluster.. When system is running I check "cluster" menu and see only master node is utilizing sources, other cluster nodes seems idle... The question is in such a cluster should I design template according to cluster or nifi should do the load balancing.
I saw one of my colleagues created remote processors for each node on cluster and put a load balancer in front of these within template, is it required? (like below)
+------------------+
| | +-------------+
| REMOTE PROCESS | | input port |
+----> | GROUP FOR | | (rpg) |
| | NODE 1 | +-------------+
| | | |
| | | |
| +------------------+ v
+-----------------+ +-----------------+ RPG
| | | | | +--------------+
| READ FROM | | | | | |
| KAFKA | | LOAD BALANCER | | +------------------+ | MERGE FILES |
| +-------------> | +-------------> | | | |
| | | | | | REMOTE PROCESS | | |
| | | | | | GROUP FOR | | |
| | | | | | NODE 2 | | |
+-----------------+ +-----------------+ RPG | | +--------------+
| +------------------+ |
| |
| v
|
| +-------------------+ +---------------+
| | | | |
| | REMOTE PROCESS | | PUT HDFS |
+-----> | GROUP FOR | | |
| NODE 3 | | |
| | | |
| | | |
+-------------------+ +---------------+
And what is the use-case for load-balancer except remote clusters, can I use load-balancer to split traffic into several processors to speedup the operation?
Apache NiFi does not do any automatic load balancing or moving of data, so it is up to you to design the data flow in a way that utilizes your cluster. How to do this will depend on the data flow and how the data is being brought into the cluster.
I wrote this article once to try and summarize the approaches:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
In you case with Kafka, you should be able to have the flow run as shown in your first picture (without remote process groups). This is because Kafka is a data source that will allow each node to consume different data.
If ConsumeKafka appears to be running on only one node, there could be a couple of reasons for this...
First, make sure ConsumeKafka is not scheduled for primary node only.
Second, figure out how many partitions you have for your Kafka topic. The Kafka client (used by NiFi) will assign 1 consumer to 1 partition, so if you have only 1 partition then you can only ever have 1 NiFi node consuming from it. Here is an article to further describe this behavior:
http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
I have a graph (User-[Likes]->Item) with millions nodes and billions nodes (roughly 50G in disk) built on a powerful machine with 256G RAM and 40 cores. Currently, I'm computing the allshortestpath() between two items.
To improve the cypher query performance, I set dbms.pagecache.memory=100g and wrapper.java.additional=-Xmx32g, with the hope that the whole neo4j can be loaded into meomory. However, when I execute the shortestpath query, the CPU usage is 1625% while MEMORY usage is only 5.7%, and I didn't see performance improvements on the cypher query. Am I missing something in the setting? Or can I setup something to run the query faster? I have read the Performance Tuning guide in the developer manual but didn't find solution.
EDIT1:
The cypher query is to count the number of unique users that like both two items. The full pattern would be (Brand)-[:Has]->(Item)<-[:LIKES]-(User)-[:LIKES]->(Item)<-[:HAS]-(Brand)
profile
MATCH p = allShortestPaths((p1:Brand {FID:'001'})-[*..4]-(p2:Brand {FID:'002'}))
with [r in RELS(p)|type(r)] as relationshipPath,
[n in nodes(p)|id(n)][2] as user, p1, p2
return p1.FID, p2.FID, count(distinct user);
EDIT2:
Below is a sampler query plan. It now seems that I'm not using shortestsPath efficiently (380,556,69 db hits). I use shortestsPath to get the common user node between start/end nodes, and then use count(distinct) to get the unique user. Is it possible to tell cypher to eliminate paths which contain the node that have been visited before?
Can you try to run this instead:
MATCH (p1:Brand {FID:'001'}),(p2:Brand {FID:'002'})
MATCH (u:User)
WHERE (p1)-[:Has]->()<-[:LIKES]-(u) AND
(p2)-[:Has]->()<-[:LIKES]-(u)
RETURN p1,p2,count(u);
This starts at the user and checks against both brands, the explain plan looks much better
+----------------------+----------------+------------------------------------------+---------------------------+
| Operator | Estimated Rows | Variables | Other |
+----------------------+----------------+------------------------------------------+---------------------------+
| +ProduceResults | 0 | count(u), p1, p2 | p1, p2, count(u) |
| | +----------------+------------------------------------------+---------------------------+
| +EagerAggregation | 0 | count(u) -- p1, p2 | p1, p2 |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p2 -- p1, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[78] -- anon[87], anon[89], p1, u | (p1)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[87], anon[89] -- p1, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p1, u | |
| | +----------------+------------------------------------------+---------------------------+
| +SemiApply | 0 | p1 -- p2, u | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +Expand(Into) | 0 | anon[119] -- anon[128], anon[130], p2, u | (p2)-[:Has]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Expand(All) | 0 | anon[128], anon[130] -- p2, u | (u)-[:LIKES]->() |
| | | +----------------+------------------------------------------+---------------------------+
| | +Argument | 1 | p2, u | |
| | +----------------+------------------------------------------+---------------------------+
| +CartesianProduct | 0 | u -- p1, p2 | |
| |\ +----------------+------------------------------------------+---------------------------+
| | +CartesianProduct | 0 | p2 -- p1 | |
| | |\ +----------------+------------------------------------------+---------------------------+
| | | +Filter | 0 | p1 | p1.FID == { AUTOSTRING0} |
| | | | +----------------+------------------------------------------+---------------------------+
| | | +NodeByLabelScan | 0 | p1 | :Brand |
| | | +----------------+------------------------------------------+---------------------------+
| | +Filter | 0 | p2 | p2.FID == { AUTOSTRING1} |
| | | +----------------+------------------------------------------+---------------------------+
| | +NodeByLabelScan | 0 | p2 | :Brand |
| | +----------------+------------------------------------------+---------------------------+
| +NodeByLabelScan | 0 | u | :User |
+----------------------+----------------+------------------------------------------+---------------------------+
I have an interesting programming problem that I need to solve for a iPhone app that I am currently building. The problem is actually a logic problem that does not need to be specific to any particular programming language.
The app needs to produce a linkages map (apologies if this isn't the right terminology but it makes sense to me). You have the following data:
A=C
B=A
C=O
D=F
E=F
F=G
G=D
H=J
I=L
J=N
K=A
L=O
M=C
N=H
O=E
The letters A through to O can be linked to any other letter. The app needs to follow the links to create a map, so starting with A, A link to C, C link to O, O links to E, E links to F etc
When complete this map would look like the attached photo.
http://i.stack.imgur.com/TEfAs.jpg
The problem I have is that I need to write code that will output any map using any combination of links. So for example another link list might look like
A=B
B=A
C=A
D=A
E=A
F=A
G=A
H=A
I=A
J=A
K=A
L=A
M=A
N=A
O=A
I can't get my head around the pseudocode / logic for drawing the app. There are always 15 letters A-O and a letter can never be linked to itself so A can never = A.
Can anyone help to come up with the logic for drawing the map?
What you want is to draw a graph. There is no canonical graphical representation of a graph. So if you have no constrains how the graph should be drawn, you can simply make a row of the Letters and than draw arches between the letters according to your map,
Little like this (ASCII-ART):
Example
+-----------------------------------------+
+--------------------------------------+ |
+-----------------------------------+ | |
+--------------------------------+ | | |
+-----------------------------+ | | | |
+--------------------------+ | | | | |
+-----------------------+ | | | | | |
+--------------------+ | | | | | | |
+-----------------+ | | | | | | | |
+--------------+ | | | | | | | | |
+-----------+ | | | | | | | | | |
+--------+ | | | | | | | | | | |
+-----+ | | | | | | | | | | | |
| | | | | | | | | | | | | |
A B C D E F G H I J K L M N O
| |
+--+
Example
+-----------------------------+
+-----------------------------+ |
+--+ +-----------------------------------+
| | | | +--------+ | |
A B C D E F G H I J K L M N O
| | | | | | | | | | | |
+-----+ | +--+ | +-----+ | +--------+
| +-----+ | | +-----------+
| | +--+ +-----------------+
| +--------+ |
+-----------------------------+
Look a bit confusing, but you cannot always avoid crossings. [In this example you could, but I did not try to avoid crossing, because they cannot be avoided in the general case.]
I'm looking to modify the "Category Blog Layout" in Joomla 1.5. I want to modify it such that even article rows are right aligned and odd ones are left aligned. I'd like to do this to the article title as well, hence using html or css (page class suffix in params) in the article body itself is not an option as it only affects the table html (contentpaneopen) generated by the following two files for each blog item's text:
\components\com_content\views\category\tmpl\blog.php
\components\com_content\views\category\tmpl\blog_item.php
I am guessing, I need to overload these two files in my custom template to achieve what I want. Problem is I don't see how to access the row number that blog_item.php is dealing with.
I have found that ContentViewCategory::getItems in \components\com_content\views\category\view.html.php has the following lines of code:
$item->odd = $k;
$item->count = $i;
But I can't figure out how to access these.
Any ideas?
PS: This is the kind of layout I want to achieve:
---------------------------------------------------
| -------------- ---------------------------- |
| | | | | |
| | | | | |
| | row 1 | | row 1 text | |
| | Image | | | |
| | | | | |
| | | | | |
| |------------| |--------------------------| |
--------------------------------------------------|
---------------------------------------------------
| |--------------------------| |--------------| |
| | | | | |
| | | | | |
| | row 2 text | | row 2 | |
| | | | Image | |
| | | | | |
| | | | | |
| |--------------------------| |--------------| |
--------------------------------------------------|
---------------------------------------------------
| -------------- ---------------------------- |
| | | | | |
| | | | | |
| | row 3 | | row 3 text | |
| | Image | | | |
| | | | | |
| | | | | |
| |------------| |--------------------------| |
--------------------------------------------------|
I hate to answer my own question, but once I was able to debug Joomla, this turned out to be a lot simpler than I thought. I overloaded these two files:
\components\com_content\views\category\tmpl\blog.php
\components\com_content\views\category\tmpl\blog_item.php
in my custom theme folder:
\templates\\html\com_content\views\category\tmpl\blog.php
\components\\html\com_content\views\category\tmpl\blog_item.php
Following is the one liner I added to blog.php:
$this->assign('itemIndex', $i);
before calling:
echo $this->loadTemplate('item')
Now I can call $this->itemIndex in blog_item.php to get the row index and do what I want with it.
What about a more ordered one?
Like creating thumbnails with one size in one side,
the same introtext and title fonts anall
So I'm stuck. I am working on a credit system with expirations. Similar to credit card miles but not exactly. By the way I am sorry for the book ahead but I needed to add enough detail to help get the whole picture.
What I need is a system where a user accumulates credits for doing activities. But they can also spend these credits on activities. The credits should expire after 30 days if they are not used. I seem to be stuck on how to accurately calculate this in a batch that will run every night. Any ideas in any language would be greatly appreciated as I seem to be stuck on just one minor detail that I can't get around. Here is an example of the data:
7/1: +5 - user signs up
7/2: +5 - user interacts with system
7/2: -3 - user purchases activity
7/3: +5 - user interacts with system
So at this point the user has received 15 credits and has spent 3. Leaving him with a total of 12 credits. (At least I got basic math down :P)
I should add that currently we are playing with the idea of having two fields: last processed, next processed. So these values at this time assuming it was a new sign up are:
Last Processed Date: 7/1
Next Process Date: 8/1
So now 8/1 comes around. The batch starts and looks at all credits that are older than 30 days. Which at this point is 5.
This is where it starts to get fuzzy.
Then the system should look at all the credits that have been spent in the last 30 days to see if they are using any credits. Because they should only expire if they haven't been used. So there are 3. So I then deduct the user 2 credits because that is the difference of credits earned older than 30 days and what has been spent. So I finish the batch and set the dates accordingly for the next day. Now assuming they haven't spent anymore I start the calculation over of credits earned older than 30, which is 5 and credits spent which again is 3. But I obviously don't want to consider the 3 credits that I considered yesterday. What is a good approach to not include those 3 credits again for consideration.
That is where I am stuck.
We are thinking about writing a debit record for the expired credits so we can track them but having a hard time seeing how I can use it in this calculation.
If you read this far thank you. If you even make a somewhat effort in the answer I will at a minimum give you an up vote for effort.
EDIT:
Ok #Greg mentioned something that I forgot to address. The idea of putting a flag on the credits considered. A valid point but not one that can work because of the following scenario:
Let's say that on a particular day a user spends 10 credits. But the expired credits that the batch is considering only accumulated to 5. Well he should still have 5 more credits left over to not have expired because he spent more than a single expiration. So the flag wouldn't work because we would have skipped those 5 extra credits. Hope that makes sense?
For every user of the system keep an array, that stores information about the amount of credits available to the user for the next 30 consecutive days
For example the data for some user might look like this
8 |
7 | |
6 | | | |
5 | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
Every time the user earns some credits, You increase amounts for all days by the number of credits earned. For example if the user earns 2 credits the table changes as follows. It's like rising the whole graph up.
10 |
9 | |
8 | | | |
7 | | | | | | | | | | |
6 | | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
If The user has x credits today and spends y credits, You decrease the amount of credits available to him to x - y, for every day he has an amount greater than x - y. For days he has no more than x - y, the amount stays the same. It's like cutting the top of the graph off. For example if the user spends 3 credits the graph changes to
7 | | | | | | | | | | |
6 | | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
Every day You shift the graph to the left to model expiring credits. The user will have the following amounts tomorrow
7 | | | | | | | | | |
6 | | | | | | | | | | | | | | | |
5 | | | | | | | | | | | | | | | | | | | | | | |
4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-------------------------------------------------------------
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
^ ^ ^
| \_ |
today tomorrow in 15 days
I wouldn't consider trying to process the data as you present it. Instead, you should keep track of how many credits the user has, and when they expire. That way you keep track of which credits were used when the purchase is made, instead of trying to work it all out later.
So when the user signs up, they have:
5 credits expiring on 8/1
After interacting with the system the next day:
5 credits expiring on 8/1
5 credits expiring on 8/2
After purchasing something:
2 credits expiring on 8/1
5 credits expiring on 8/2
And so on.
Assuming you run this batch on a daily basis, you can have a table that keeps track of all the credits they earned, and the credits they used (negative credits).
At the beginning of the next month, your job is simply to find out which of the credits earned on the first day were not spent during the month.
The number of credits earned on the first day - the credits they spent all of last month. If the number is positive, they have some credits that need to expired. So simple add a record in the table with a negative credit. This will zero-out the unused credits.
The next day, repeat the process by seeing how many credits they earned on the second day minus the sum of all the credits they earned in the last month, taking into account the record with the negative credits you created the previous day.
How about adding a flag to the expenditures? If the flag is not set, then you can include that expenditure in the batch, if necessary. If you do use the expenditure to offset an expiration, then you set the flag. Next time through, you'll ignore that expenditure because the flag is set.
Use a debit record to record normal expenditures. When the monthly batch job runs, it can calculate the total debits which are less than or equal to the expiring credits. If there are credits to expire, simply insert an appropriate debit record (appropriate == to cancel the excess, in your application). In this way, any 'running total' code which examines only credits and debits will reach the same balance that your batch code intended.
One approach to this problem is to store only the transactions, not the balance. Then you always calculate the balance in real time when needed. Here's the data:
Date : Amount : Expiries
7/1 : +5 : 7/31
7/2 : +5 : 8/1
7/2 : -3 : never
7/3 : +5 : 8/2
The balance at any time is simply the total of all transactions that have not yet expired. No need to run any batch processes.
Regarding Julians reply (that I can't comment to yet), I'm dealing with just the same problem and Julians approach won't work because that would result the account being able to go negative.
If the user didn't use the service for one month, on 8/4 the account balance would be -3 and one activity worth of 5 would bring the balance to 2, not to 5 as it should.