I am using 3 node RF3 Cassandra cluster on AWS.
I set a timeout of a read request for 10ms. I have noticed some of my requests are timing out. This is what I observe in CQL with TRACING ON;
Tracing session: 9fc1d420-9829-11e6-b04a-834837c1747b
activity | timestamp | source | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+-----------
Execute CQL3 query | 2016-10-22 10:32:02.274000 | 10.20.30.40 | 0 | 127.0.0.1
Parsing select * from recipes where id = fcc7d8b5-46d3-4867-903c-4a5c66a1fd2e; [Native-Transport-Requests-8] | 2016-10-22 10:32:02.274000 | 10.20.30.40 | 264 | 127.0.0.1
Preparing statement [Native-Transport-Requests-8] | 2016-10-22 10:32:02.274000 | 10.20.30.40 | 367 | 127.0.0.1
reading data from /10.20.0.1 [Native-Transport-Requests-8] | 2016-10-22 10:32:02.275000 | 10.20.30.40 | 680 | 127.0.0.1
Sending READ message to /10.20.0.1 [MessagingService-Outgoing-/10.20.0.1] | 2016-10-22 10:32:02.286000 | 10.20.30.40 | 12080 | 127.0.0.1
READ message received from /10.20.30.40 [MessagingService-Incoming-/10.20.30.40] | 2016-10-22 10:32:02.296000 | 10.20.0.1 | 51 | 127.0.0.1
Executing single-partition query on recipes [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2423 | 127.0.0.1
Acquiring sstable references [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2481 | 127.0.0.1
Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2548 | 127.0.0.1
Bloom filter allows skipping sstable 55 [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2614 | 127.0.0.1
Bloom filter allows skipping sstable 130 [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2655 | 127.0.0.1
Bloom filter allows skipping sstable 140 [ReadStage-8] | 2016-10-22 10:32:02.298000 | 10.20.0.1 | 2704 | 127.0.0.1
Bloom filter allows skipping sstable 141 [ReadStage-8] | 2016-10-22 10:32:02.298001 | 10.20.0.1 | 2739 | 127.0.0.1
Merged data from memtables and 4 sstables [ReadStage-8] | 2016-10-22 10:32:02.298001 | 10.20.0.1 | 2796 | 127.0.0.1
Read 0 live and 0 tombstone cells [ReadStage-8] | 2016-10-22 10:32:02.298001 | 10.20.0.1 | 2854 | 127.0.0.1
Enqueuing response to /10.20.30.40 [ReadStage-8] | 2016-10-22 10:32:02.299000 | 10.20.0.1 | 2910 | 127.0.0.1
Sending REQUEST_RESPONSE message to /10.20.30.40 [MessagingService-Outgoing-/10.20.30.40] | 2016-10-22 10:32:02.302000 | 10.20.0.1 | 6045 | 127.0.0.1
REQUEST_RESPONSE message received from /10.20.0.1 [MessagingService-Incoming-/10.20.0.1] | 2016-10-22 10:32:02.322000 | 10.20.30.40 | 47911 | 127.0.0.1
Processing response from /10.20.0.1 [RequestResponseStage-42] | 2016-10-22 10:32:02.322000 | 10.20.30.40 | 48056 | 127.0.0.1
Request complete | 2016-10-22 10:32:02.323239 | 10.20.30.40 | 49239 | 127.0.0.1
While looking at DataStax documentation it seems that source_elapsed column is elapsed time in microseconds before the event occurred on the source node.
There is a big time gap between Sending REQUEST_RESPONSE message to /10.20.30.40 and REQUEST_RESPONSE message received from /10.20.0.1.
Does this indicate a network latency issue?
Related
We have an Mview refresh(complete refresh) which runs at DB1 and it inturn it send the SELECT statement over dblink to DB2. Initially, when its run through privileged user it gets stuck runs for hours. When we investigated it shows the waits due to "Cell single block physical read" . However, when we kill the refresh either from DB1/DB2 and re-run it finishes quickly (2 to 10 mins) and wait event it shows (DB CPU or Cell multi block read - wait event). This is happening almost everyday for us.
We compared the execution plan for both the times it run on DB2, we don't see much change neither in plan_hash_value and SQL_ID nor in the execution plan. Does anybody know/or tell me what else am I missing in here and also any ideas how can resolve this problem.
1st run
=======
SNAP_ID NODE BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_LIO AVG_PIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ -------------- --------------
77142 3 01-OCT-19 07.15.05.867 AM fybjvvtk09u2j 0 912.735 1,090,393.0 440,686.0
77143 3 01-OCT-19 07.30.00.413 AM fybjvvtk09u2j 0 842.735 989,493.0 467,590.0
kill and re-run
===============
SNAP_ID NODE BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_LIO AVG_PIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ -------------- --------------
77144 4 01-OCT-19 07.45.01.936 AM fybjvvtk09u2j 1 98.649 14,272,525.0 14,591,211.0
Execution plan 1st run :-
Plan hash value: 3547885274
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 447K(100)| | | |
| 1 | HASH GROUP BY | | 27M| 1414M| 2004M| 447K (2)| 00:00:18 | | |
|* 2 | HASH JOIN | | 27M| 1414M| 281M| 180K (4)| 00:00:08 | | |
| 3 | TABLE ACCESS STORAGE FULL | AUDIT_LOG | 11M| 146M| | 8504 (4)| 00:00:01 | | |
| 4 | PARTITION HASH ALL | | 30M| 1200M| | 165K (4)| 00:00:07 | 1 | 256 |
|* 5 | TABLE ACCESS STORAGE FULL | DISA_DAILY_HHT_PUB | 30M| 1200M| | 165K (4)| 00:00:07 | 1 | 256 |
| 6 | SORT AGGREGATE | | 1 | 6 | | | | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| DISA_DAILY_HHT_PUB_IDX2 | 1 | 6 | | 4 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."AUDIT_LOG_ID"="AL"."AUDIT_LOG_ID")
5 - storage("P"."AUDIT_LOG_ID"=)
filter("P"."AUDIT_LOG_ID"=)
Execution plan on kill and re-run :-
code
Plan hash value: 3547885274
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 437K(100)| | | |
| 1 | HASH GROUP BY | | 26M| 1360M| 1928M| 437K (2)| 00:00:18 | | |
|* 2 | HASH JOIN | | 26M| 1360M| 281M| 179K (4)| 00:00:08 | | |
| 3 | TABLE ACCESS STORAGE FULL | AUDIT_LOG | 11M| 146M| | 8504 (4)| 00:00:01 | | |
| 4 | PARTITION HASH ALL | | 29M| 1171M| | 165K (4)| 00:00:07 | 1 | 256 |
|* 5 | TABLE ACCESS STORAGE FULL | DISA_DAILY_HHT_PUB | 29M| 1171M| | 165K (4)| 00:00:07 | 1 | 256 |
| 6 | SORT AGGREGATE | | 1 | 6 | | | | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| DISA_DAILY_HHT_PUB_IDX2 | 1 | 6 | | 4 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------------
Do we need to think about underlying cluster while designing nifi templates?
Here is my simple flow
+-----------------+ +---------------+ +-----------------+
| | | | | |
| READ FROM | | MERGE | | PUT HDFS |
| KAFKA | | FILES | | |
| +-----------------------> | +---------------------> | |
| | | | | |
| | | | | |
| | | | | |
+-----------------+ +---------------+ +-----------------+
I have 3 nodes cluster.. When system is running I check "cluster" menu and see only master node is utilizing sources, other cluster nodes seems idle... The question is in such a cluster should I design template according to cluster or nifi should do the load balancing.
I saw one of my colleagues created remote processors for each node on cluster and put a load balancer in front of these within template, is it required? (like below)
+------------------+
| | +-------------+
| REMOTE PROCESS | | input port |
+----> | GROUP FOR | | (rpg) |
| | NODE 1 | +-------------+
| | | |
| | | |
| +------------------+ v
+-----------------+ +-----------------+ RPG
| | | | | +--------------+
| READ FROM | | | | | |
| KAFKA | | LOAD BALANCER | | +------------------+ | MERGE FILES |
| +-------------> | +-------------> | | | |
| | | | | | REMOTE PROCESS | | |
| | | | | | GROUP FOR | | |
| | | | | | NODE 2 | | |
+-----------------+ +-----------------+ RPG | | +--------------+
| +------------------+ |
| |
| v
|
| +-------------------+ +---------------+
| | | | |
| | REMOTE PROCESS | | PUT HDFS |
+-----> | GROUP FOR | | |
| NODE 3 | | |
| | | |
| | | |
+-------------------+ +---------------+
And what is the use-case for load-balancer except remote clusters, can I use load-balancer to split traffic into several processors to speedup the operation?
Apache NiFi does not do any automatic load balancing or moving of data, so it is up to you to design the data flow in a way that utilizes your cluster. How to do this will depend on the data flow and how the data is being brought into the cluster.
I wrote this article once to try and summarize the approaches:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
In you case with Kafka, you should be able to have the flow run as shown in your first picture (without remote process groups). This is because Kafka is a data source that will allow each node to consume different data.
If ConsumeKafka appears to be running on only one node, there could be a couple of reasons for this...
First, make sure ConsumeKafka is not scheduled for primary node only.
Second, figure out how many partitions you have for your Kafka topic. The Kafka client (used by NiFi) will assign 1 consumer to 1 partition, so if you have only 1 partition then you can only ever have 1 NiFi node consuming from it. Here is an article to further describe this behavior:
http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
I need to use the CUBE grouping function in order to get every possible combination so that this report can be used on APEX with parameters controlling how the report views. My full query is too complicated to show and will probably just confuse the situation. here I have the cutdown version
Select /*+ NO_PARALLEL */ ETP_ETPC_UID,
Decode (Grouping(ETP_ET_UID), 1, '-Grouped-', ETP_ET_UID),
/* Decode (Grouping(ETP_ET_UID), 1, '-Grouped-', ET_ABBR), */
Decode (Grouping(ETP_REFERENCE_1), 1, '-Grouped-', ETP_REFERENCE_1),
Decode (Grouping(ETP_REFERENCE_2), 1, '-Grouped-', ETP_REFERENCE_2),
Sum(ETP_COUNT)
From ETP_PROFILE,
ET_EVENT_TYPE
Where ETP_ET_UID = ET_UID
Group By
ETP_ETPC_UID,
Cube( /*(*/ETP_ET_UID/*, ET_ABBR)*/ , ETP_REFERENCE_1, ETP_REFERENCE_2)
The issue I am currently having is to do with the sections I have commented out. According to this Oracle Base article the composite column is meant to be treated as 1, so great, I can have the ET_ABBR in my query and just set ET_UID and ET_ABBR as a composite (un-comment my code and you will see).
When I do this, it seems to ridiculously increase the cost of the query (according to explain plan) and indeed it takes forever to run. If I remove my ET_ABBR column (how the code is right now with comments in place) it loads very quick and Explain plan cost is great.
Am I doing something wrong here, should I be using Grouping Sets or something like that? This is the first time I am messing with these special grouping commands, and they seem to be good, but very confusing.
EDIT:
Explain plan for Commented Query:
Plan hash value: 3169115854
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 142K| 9M| 408 (3)| 00:00:05 | | |
| 1 | SORT GROUP BY | | 142K| 9M| 408 (3)| 00:00:05 | | |
| 2 | GENERATE CUBE | | 142K| 9M| 408 (3)| 00:00:05 | | |
| 3 | SORT GROUP BY | | 142K| 9M| 408 (3)| 00:00:05 | | |
| 4 | PARTITION RANGE ALL| | 142K| 9M| 401 (1)| 00:00:05 | 1 |1048575|
| 5 | PARTITION LIST ALL| | 142K| 9M| 401 (1)| 00:00:05 | 1 | 10 |
| 6 | TABLE ACCESS FULL| ETP_PROFILE | 142K| 9M| 401 (1)| 00:00:05 | 1 |1048575|
------------------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)
Explain Plan for un-commented code:
Plan hash value: 2063641247
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 54 | 6426 | 427 (4)| 00:00:06 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | MULTI-TABLE INSERT | | | | | | | |
| 3 | SORT GROUP BY ROLLUP | | 54 | 4536 | 412 (3)| 00:00:05 | | |
|* 4 | HASH JOIN | | 142K| 11M| 406 (1)| 00:00:05 | | |
| 5 | TABLE ACCESS FULL | ET_EVENT_TYPE | 55 | 605 | 4 (0)| 00:00:01 | | |
| 6 | PARTITION RANGE ALL | | 142K| 9M| 401 (1)| 00:00:05 | 1 |1048575|
| 7 | PARTITION LIST ALL | | 142K| 9M| 401 (1)| 00:00:05 | 1 | 10 |
| 8 | TABLE ACCESS FULL | ETP_PROFILE | 142K| 9M| 401 (1)| 00:00:05 | 1 |1048575|
| 9 | DIRECT LOAD INTO | SYS_TEMP_0FD9D6E6F_E9BC1839 | | | | | | |
| 10 | DIRECT LOAD INTO | SYS_TEMP_0FD9D6E70_E9BC1839 | | | | | | |
| 11 | LOAD AS SELECT | SYS_TEMP_0FD9D6E70_E9BC1839 | | | | | | |
| 12 | SORT GROUP BY ROLLUP | | 54 | 3402 | 3 (34)| 00:00:01 | | |
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E6F_E9BC1839 | 54 | 3402 | 2 (0)| 00:00:01 | | |
| 14 | MULTI-TABLE INSERT | | | | | | | |
| 15 | SORT GROUP BY ROLLUP | | 54 | 3240 | 3 (34)| 00:00:01 | | |
| 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E6F_E9BC1839 | 54 | 3240 | 2 (0)| 00:00:01 | | |
| 17 | DIRECT LOAD INTO | SYS_TEMP_0FD9D6E71_E9BC1839 | | | | | | |
| 18 | DIRECT LOAD INTO | SYS_TEMP_0FD9D6E70_E9BC1839 | | | | | | |
| 19 | LOAD AS SELECT | SYS_TEMP_0FD9D6E70_E9BC1839 | | | | | | |
| 20 | SORT GROUP BY ROLLUP | | 54 | 2322 | 3 (34)| 00:00:01 | | |
| 21 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E71_E9BC1839 | 54 | 2322 | 2 (0)| 00:00:01 | | |
| 22 | VIEW | | 162 | 19278 | 6 (0)| 00:00:01 | | |
| 23 | VIEW | | 162 | 15066 | 6 (0)| 00:00:01 | | |
| 24 | UNION-ALL | | | | | | | |
| 25 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E6F_E9BC1839 | 54 | 4536 | 2 (0)| 00:00:01 | | |
| 26 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E70_E9BC1839 | 54 | 4536 | 2 (0)| 00:00:01 | | |
| 27 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6E71_E9BC1839 | 54 | 3240 | 2 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("SYS_TBL_$2$"."ETP_ET_UID"="SYS_TBL_$1$"."ET_UID")
Note
-----
- dynamic sampling used for this statement (level=2)
I'm trying find all the values in my hosts table, which do not contain partial match to values in my maildomains table.
hosts
+-------------------+-------+
| host | score |
+-------------------+-------+
| www.gmail.com | 489 |
| www.hotmail.com | 653 |
| www.google.com | 411 |
| w3.hotmail.ca | 223 |
| stackexchange.com | 950 |
+-------------------+-------+
maildomains
+---------------+
| email |
+---------------+
| gmail |
| hotmail |
| outlook |
| mail |
+---------------+
Specifically, I am looking to do SELECT * of hosts where the hosts.host NOT LIKE any value in '%.maildomains.email%'
Desired output:
+-------------------+-------+
| host | score |
+-------------------+-------+
| www.google.com | 411 |
| stackexchange.com | 950 |
+-------------------+-------+
Here's how I think it should work logically:
SELECT h.*, m.email FROM (SELECT h.* FROM hosts WHERE score > 100 as h)
h LEFT OUTER JOIN maildomains m ON (h.host LIKE CONCAT('%.',m.email,'%'))
WHERE m.email IS NULL
This results in error 10017: both left and right aliases encountered in join ''%''
I also managed to get a similar query to run without error as CROSS JOIN, but it yields bad results:
SELECT h.*, m.email FROM (SELECT h.* FROM hosts WHERE score > 100 as h)
h CROSS JOIN maildomains m
WHERE h.host NOT LIKE CONCAT('%.',m.email,'%')
+-------------------+---------+---------+
| p.host | p.score | m.email |
+-------------------+---------+---------+
| www.gmail.com | 489 | hotmail |
| www.gmail.com | 489 | outlook |
| www.gmail.com | 489 | mail |
| www.hotmail.com | 653 | gmail |
| www.hotmail.com | 653 | outlook |
| www.hotmail.com | 653 | mail |
| www.google.com | 411 | gmail |
| www.google.com | 411 | hotmail |
| www.google.com | 411 | outlook |
| www.google.com | 411 | mail |
| w3.hotmail.ca | 223 | gmail |
| w3.hotmail.ca | 223 | outlook |
| w3.hotmail.ca | 223 | mail |
| stackexchange.com | 950 | gmail |
| stackexchange.com | 950 | hotmail |
| stackexchange.com | 950 | outlook |
| stackexchange.com | 950 | mail |
+-------------------+---------+---------+
I appreciate any and all guidance.
You could do something like this:
select host from hosts h left outer join maildomains m on (regexp_replace(regexp_replace(regexp_replace(regexp_replace(h.host,'www.',''),'.com',''),'.ca',''),'w3.','') = m.email) where email is NULL;
If your Hive version is 0.13 or newer, than you could use a subquery in the WHERE clause to filter the rows from the hosts table. The following is a more generalized approach that would not require you to enumerate all of the top-level domains you might find in your data:
SELECT host, score
FROM hosts
WHERE
regexp_extract(hosts.host, "(?:.*?\\.)?([^.]+)\\.[^.]+", 1) NOT IN
(SELECT email FROM maildomains);
This approach isolates the portion of the host domain just before the TLD with the regexp_extract and then checks to see if that domain name occurs in the subquery on the maildomains table.
In reading RFC 4733, it doesn't clearly state whether the event duration should not increment in the final 3 e-bits. It seems the important information in the event is the m-bit, timestamp, and e-bit. If the event duration does increment in the final 3 e-bits, would it make sense to consider each of the 3 e-bits as seperate events and triplicate the tones? Or should the first e-bit received be the end of the event and the last 2 ebits be disgarded? I have a wireshark capture that shows the event duration incrementing in the 3 ebits and I am tyring to make sense of this.
Given that the final packet of the event may be transmitted three times, the duration field should monotonically increase. In the discussion in the comments we thus see three packets, each with the E bit set, and durations of 720, 800 and 880. This indicates that the packets are sent 80ms apart, because the duration field in the packet indicates that the event "has so far lasted as long as indicated by this parameter".
However, it's still a single event, so your playout of the event should last for the duration of the first packet you receive.
For example, you're seeing three packets arrive, but if the first packet (with duration 720) didn't arrive, you'd see the second packet (with duration 800), and you should play the tone for 800ms.
That said, I'd expect the sender to send the end packet with the same duration, rather than what you're seeing. That might be a bug in the sender. (Transmission must cause an increment in duration, but this is retransmission.)
The sender is clearly breaking the RFC since
the 'E' bit should be set when the event has ended
the duration is increased according to the duration of the event
If the duration is still increasing then clearly the event has not ended but if the E bit was set the event has ended - i.e. contradiction
On the other hand (from 2.5.2.2)
once the receiver has received the end of the event it should stop playing the tone.
A receiver SHOULD NOT restart a tone once playout has stopped.
The receiver MAY determine on the basis of retained history and the timestamp and event code of the current packet that it corresponds to an event already played out and lapsed. In that case, further reports for the event MUST be ignored
i.e. You can tell that the event has already played out from the timestamp and should not repeat the event in this case
The example in the RFC 4733 is given in Table 5
https://datatracker.ietf.org/doc/rfc4733/
+-------+-----------+------+--------+------+--------+--------+------+
| Time | Event | M | Time- | Seq | Event | Dura- | E |
| (ms) | | bit | stamp | No | Code | tion | bit |
+-------+-----------+------+--------+------+--------+--------+------+
| 0 | "9" | | | | | | |
| | starts | | | | | | |
| 50 | RTP | "1" | 0 | 1 | 9 | 400 | "0" |
| | packet 1 | | | | | | |
| | sent | | | | | | |
| 100 | RTP | "0" | 0 | 2 | 9 | 800 | "0" |
| | packet 2 | | | | | | |
| | sent | | | | | | |
| 150 | RTP | "0" | 0 | 3 | 9 | 1200 | "0" |
| | packet 3 | | | | | | |
| | sent | | | | | | |
| 200 | RTP | "0" | 0 | 4 | 9 | 1600 | "0" |
| | packet 4 | | | | | | |
| | sent | | | | | | |
| 200 | "9" ends | | | | | | |
| 250 | RTP | "0" | 0 | 5 | 9 | 1600 | "1" |
| | packet 4 | | | | | | |
| | first | | | | | | |
| | retrans- | | | | | | |
| | mission | | | | | | |
| 300 | RTP | "0" | 0 | 6 | 9 | 1600 | "1" |
| | packet 4 | | | | | | |
| | second | | | | | | |
| | retrans- | | | | | | |
| | mission | | | | | | |
If you look at the last 2, you see they both have the "E" bit set, and that the duration is set to 1600 (8000 * 200/1000).
I hope that helps!