My flow works correctly but after one hour data of flow becomes disappeared. I've reduced and increased Heap size memory from 100mb until 8g it did not work, my cpu usage increased until 500% and then data of my flow is disappeared.I mean In/out of all processors became zero,I attached my flow. does anybody have a solution?
my system config:
macOs high sierra
processor 2.3 GHz Intel Core i7
memory 16 GB 1600 MHz DDR3
this is log of my flow
enter image description here
this is my flow after losing data and deleting content
enter image description here
I hope this explanation of these basic concepts clears up the confusion.
About NiFi
NiFi is a flow management tool, you can have processors to ingest, process and egest data.
Typically a message comes in, and goes out once NiFi is done with it.
About statistics
Each processor will keep track of incoming and outgoing messages. These messages are tracked for a while on the processor, and then 'forgotten'. I believe the time period is 5 minutes.
About queues
You can inspect a queue to see the messages in it, if there are no messages you cannot inspect them of course. You might be interested in the provenance.
About provenance
You can check the provenance of a message in the queue to see how it developed (content, timestamps) as it passed the processors. I have personally worked mostly with NiFi in HDF, so I'm not sure if this option is available when you run NiFi without a platform around it.
Detecting problems in NiFi
Of course there may be exceptions, but if NiFi is unable to pick up messages, I would expect them to get stuck in a queue. And if NiFi is processing them but failing, you would expect red squares to start appearing in the UI.
So usually it is quite easy to tell if something is going wrong in NiFi.
Related
We work with a lot of data and have a high throughput of files going through our Nifi instances. We have recently been losing providence records and don't seem to understand what the cause is.
Below are some details, if relevant:
We have our Providence database on its own drive in the cloud, and are not seeing any high IO usage or resource contention.
We have added additional threads to this, as well as 999k file handles.
If it means anything, providence data is kept for 2 weeks in our configuration.
We are on Nifi version 1.15.3, but are planning on an upgrade in the near future.
Any ideas on what the cause may be and how to remediate this? Thanks!
I am currently using a IBM MQ Advanced for Developers server for testing our client and was able to achieve around 1000 messages per second using the sample consumer written in jms, which seems to be pretty slow. Is this a limit for dev server, and if yes that what throughput can be achieved using a licensed production IBM MQ server.
There is no artificial limit associated with IBM MQ Advanced for Developers. It is the same as the licensed production version of IBM MQ.
You don't say what type of machine you were using, what persistence your messages were, what size they were, or any other qualifying criteria.
You say client, but I don't know whether you mean "network attached application" or "driving application". Clearly if your program is running "client-attached" (MQ parlance for network attached), then the network performance will also come into this.
On my Windows laptop, I get 4500 non-persistent msgs/sec, or 2000 persistent msgs/sec using a simple C-language locally bound program. Over client connection (just using localhost, not actually going out over a real network connection) I get 2700 non-persistent msgs/sec, or 1500 persistent msgs/sec.
You should read the MQ Performance Reports for details of the expected rates you can get.
As an ex MQ performance person I would say - it depends.
At one level you can ask - what can one application in isolation process.
For persistent messages this will come down to the rate at which you can write to the log files.
If you have 10 applications in parallel each putting and getting from their own queue, then you will not get 10 times the throughput - you might get 8 or 9 times the throughput.
If they are all processing the same queue, then the throughput may drop a bit more as the queue usage is serialised.
If only one application is writing to the log, the application may see 1 millisecond response time. If you have 10 applications running concurrently, they may see a 3 milliseconds response time - so individual throughput goes down, but with more threads, the overall throughput goes up.
If you have requests coming in over the network, you need to add network time, but you can run more clients and so get improved throughput.
If your application has a delay built in - it may only process a low message rate. You can have lots (1000s) of these and get a high >overall< throughput.
If your application is putting and getting as fast as possible, you may find that you can run 10-100 instances before the throughput plateaus.
Let's say you want to run you box so it is using 75% of the CPU, and the logging is 50% busy.
If you have just MQ on the box, then this can run more messages than if you had DB2 on the box (with DB2 using 50% of the CPU)
If you have an application (DB2) hammering the disk, then the MQ throughput will go down.
If you have lots of applications putting to a server queue - and one server program, you will find the throughput is limited by the rate at which the server can process work. If it is doing DB2 work, it will be slower than no DB2 work. If you find the server queue depth is over 5 then you need more server instances.
As Morag said, see the performance reports, but they are not the clearest reports to understand.
I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.
There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.
Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.
I am facing strange issue with Elasticsearch. I have 8 nodes with same configurations (16GB RAM and 8 core CPU).
One node "es53node6" has always high load as shown in the screenshot below. Also 5-6 nodes were getting stopped yesterday automatically after every 3-4 hours.
What could be the reason?
ES version : 5.3
there can be a fair share of reasons. Maybe all data is stored on that node (which should not happen by default), maybe you are sending all the requests to this single node.
Also, there is no automatic stopping of Elasticsearch built-in. You can configure Elasticsearch that it stops the JVM process when an out-of-memory exception occurs, but this is not enabled by default as it relies on a more recent JVM.
You can use the hot threads API to check where the CPU time is spent in Elasticsearch.
I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?
I have created a small list which includes the following items, please extend the list if you know more:
Monitor Spark Master/Worker from failing
Monitor HDFS from getting filled/going down
Monitor network connectivity for master/worker
Monitor Spark Jobs from getting killed
That's a good list. But in addition to those I would actually monitor the status of the receivers of the streaming application (assuming you are some non-HDFS source of data), whether they are connected or not. Well, to be honest, this was tricky to do with older versions of Spark Streaming as the instrumentation to get the receiver status didnt quite exist. However, with Spark 1.0 (to be released very soon), you can use the org.apache.spark.streaming.StreamingListener interface to get the events regarding the status of the receiver.
A sneak peak to the to-be-released Spark 1.0 docs is at
http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html