It works fine these days. Suddenly Processors still have task running even been stopped, and the running task need to terminate manually.
Any thoughts?
update 1
I use nipyapi to manipulate some processors to start and stop over and over again. There are the APIs I used
nipyapi.canvas.get_processor(identifier=p_id, identifier_type='id')
nipyapi.canvas.get_process_group(identifier=pg_id, identifier_type='id')
nipyapi.canvas.schedule_processor(processor=p_id, scheduled=True, refresh=True)
I restart NiFi and problem solved, but after executing those APIs many times (about 10000 times, grep processor id | wc -l) problem occurred.
I reckon those APIs create a lot web connections and not being stopped.
Stopping a processor is really just telling the scheduler not to trigger any more executions. It is often the case that an already-triggered thread is still executing after the processor has been stopped, which is why the Terminate option was added.
Related
I'm trying to pull some data from HDFS. I'm running the listHDFS and fetchHDFS processor for this.
When I stopped the fetchHDFS processor, there were a number of active threads even after stopping the processor. To kill these threads, I used the "terminate" option.
Just wanted to know the working of the terminate option.
Does it gracefully shut all the connections with the FS?
Since all the threads are killed, do I lose out on the data that was consumed by these threads?
Is it advised the terminate option only when the threads are stuck or the flow enters a frozen state?
When you stop a processor it tells the NiFi framework to no longer schedule/execute the processor, but there may already be threads executing which need to finish what they were doing. Usually these threads should complete and you will see the active threads go away, but sometimes a thread is blocked (typically when trying to make a network connection somewhere without having proper timeouts set) and this thread may never complete, and therefore needs to be terminated.
The terminate option will issue an interrupt to the thread and then quarantine it, which takes it out of the pool for further execution. The thread may then complete in the background, or if it did not respond to the interrupt and is blocked then it may stay stuck in the background until the next restart of NiFi.
In the FetchHDFS case, assuming it was successfully fetching data, it was most likely in the middle of reading a file from HDFS and just needs a few minutes to complete and shouldn't need to use terminate. If it was never fetching data and was stuck connecting to HDFS then you would use terminate.
I can't start or stop 2 processors: PutHiveQL and ListHDFS after I had to restart NiFi because of memory overload. Other processors react fine.
Even after 4 hours the processors don't react. I tried stopping the processors by stopping the Process Group but it failed.
Check the nifi-app.log file, you can also make a tail -f to the file to see what's going on in real time.
Also check this values in the processors:
Concurrent tasks.
Run schedule.
Run duration.
The default configuration is not so good and can give problems.
Is it possible to create a script that is always running on my VPS server? And what need i to do to run it the hole time? (I haven't yet a VPS server, but if this is possible i wants to buy one!
Yes you can, there are many methods to get your expected result.
Supervisord
Supervisord is a process control system that keeps any process running. It automatically start or restart your process whenever necessary.
When to use it: Use it when you need a process that run continuously, eg.:
A queue worker that reads a database continuously waiting for a job to run.
A node application that acts like a daemon
Cron
Cron allow you running processes regularly, in time intervals. You can for example run a process every 1 minute, or every 30 minutes, or any time interval you need.
When to use it: Use it when your process is not long running, it do a task and end, and you do not need it beign restarted automatically like on Supervisord, eg.:
A task that collects logs everyday and send it on a gzip by email
A backup routine.
Whatever you choose, there are many tutorials on the internet on how configuring both, so I'll not go into this details.
A ruby application is showing multiple processes in the server, though it's the same application. It's a windows server.
How can I remove all but one process for this application without manually closing them from the windows task manager.
Pleas help.
Depending on how you're running your app (Passenger? Thin? Mongrel? mod_ruby?), this could actually be normal. As in, the app keeps a pool of processes running until they time out, each awaiting new requests, much like a dynamic php/fastcgi pool would do.
Along the same lines, and per Peter's comment, might it be using threads? If so, it could be equally normal, as in it launches some background jobs before returning and the processes remain around until those jobs are completed.
It is reported that thin opens multiple threads per connection over time and i suppose a thread will be a process.
Try if
thin restart -C /etc/thin/app.yml
helps.
See http://jordanhollinger.com/2011/04/22/how-to-use-thin-effectivly
Limitting your maximum connections, timeout to the minimum will also help.
Suppose I include a rather long-running startup task into my Azure role - running something like up to several minutes. What happens if the startup task runs "too long".
I'm currently testing on Compute Emulator and observe the following.
I have a 450 megabytes .zip file together with Info-Zip unzip. The startup task unzips the archive. Deployment starts and I look into Task Manager. Numerous service processes start, then unzip.exe is run. After about two minutes all those processes stop and then start anew and unzip.exe starts again.
So it looks like a deployment is allowed to run for about two minutes, then is forcefully reset and started again.
Is this the expected behavior? Does it persist on real cloud? Are there any hard limits on how long a role startup can take? How do I address this situation except moving the unpacking into RoleEntryPoint.OnStart()?
I had the same question, so tried an experiment. I ran a Startup Task - taskType="simple" so that it would block the Roles from beginning to execute - and let it run for 50 hours. The Fabric Controller did not complain and the portal did not show any error. It finished its long "do nothing" loop after the 50 hours was up, then this Startup Task exited, and my Web Role started up fine.
So my emperical test says Startup Tasks can take a long time! At least 50 hours.
This should inform the load balancer that your process is still busy:
http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.serviceruntime.roleinstancestatuscheckeventargs.setbusy.aspx
I have run startup tasks that run for a pretty long time (think 20-30 mins) and the role is simply in a 'Busy' state. I don't think there is a hard limit for how long the role will stay in that state as long as the Startup task is still executing and did not exit with a non-zero return code (in fact, this is a gotcha for most first time startup task creators when they pop a prompt). The FC is technically still running just fine, so there would be no reason to 'recover' the role (i.e. heartbeats are still going).
The dev emulator just notices when the role hasn't started and warns you. If you click the 'keep waiting' option, it will continue to run the Startup task to completion. The cloud does not do this of course (warn you).
Never tried a task that ran super long, so there might be a very long limit. I seem to recall 3 hrs was a magic number in some timeout cases like role recycles, but I have never tried...
There are some heartbeats that the Azure Fabric Agent will do against the role. If these are not acknowledged (say a long-running blocking process), this could cause the role to be flagged as unavailable.
You might try putting your startup process into a background thread that runs independently. This should help you keep the role from being recycled while the process is starting up. Just keep in mind you may need to make some adjustments if you get requests before the role fully starts up. There's also a way (that I can't seem to recall ATM) to flag the role and take it out of the load balancer temporarially while your process completes.