Scrapinghub job failed - can't diagnose - scrapinghub

The spider stopped in the middle of the crawl (after 7h run, 20K requests). The job status is "failure". Even though there are no ERROR messages in the log. The log look like the code just stopped running on a particular code line range without any errors reported. It happened in spider_idle method override. The logs are enabled and I can see all preceding INFO messages indicated normal run of the spider. I don't know how to enable DEBUG messages in the scrapinghub log.
Checked memory consumption - it is stable, at least in short tests, now waiting for long run results.
How can I retrieve more info after job "failed"?

Jobs are stopped automatically in 24hours for free accounts. The status is "cancelled" and the log shows SIGTERM in this case, normally.

Related

NiFi - Processor has stopped but task still running

It works fine these days. Suddenly Processors still have task running even been stopped, and the running task need to terminate manually.
Any thoughts?
update 1
I use nipyapi to manipulate some processors to start and stop over and over again. There are the APIs I used
nipyapi.canvas.get_processor(identifier=p_id, identifier_type='id')
nipyapi.canvas.get_process_group(identifier=pg_id, identifier_type='id')
nipyapi.canvas.schedule_processor(processor=p_id, scheduled=True, refresh=True)
I restart NiFi and problem solved, but after executing those APIs many times (about 10000 times, grep processor id | wc -l) problem occurred.
I reckon those APIs create a lot web connections and not being stopped.
Stopping a processor is really just telling the scheduler not to trigger any more executions. It is often the case that an already-triggered thread is still executing after the processor has been stopped, which is why the Terminate option was added.

Sidekiq - view completed jobs

Is it possible to somehow view sidekiq completed job list - for example, find all PurchaseWorkers with params (1)? Yesterday in my app delayed method that was supposed to run didn't and associated entity (lets say 'purchase') got stuck in limbo with state "processing". I am trying to understand whats the reason: job wasn't en-queued at all or was en-queued but for some reason exited unexpectedly. There were no errors in sidekiq log.
Thanks.
This is old but I wanted to see the same thing since I'm not sure if jobs I scheduled ran or not!
Turns out, Sidekiq doesn't have anything built in to see jobs that completed and still doesn't seem to.
If it err'd and never completes it should be in the 'dead' queue. But to check that something actually ran seems to be beyond Sidekiq by default.
The FAQ suggests installing 3rd party plugins to track and log information: https://github.com/mperham/sidekiq/wiki/FAQ#how-can-i-tell-when-a-job-has-finished One of them allows for having a callback to do follow up (maybe add a record for completed jobs elsewhere?)
You can also setup Sidekiq to log to somewhere other than STDOUT (default) so you can output log information about your jobs. In this case, logging that it's complete or catching errors if for some reason it is never landing in the retrying or dead jobs queue when there is a problem. See https://github.com/mperham/sidekiq/wiki/Logging
To see jobs still in queue you can use the Rails console and look at the queue by queue name https://www.rubydoc.info/gems/sidekiq/Sidekiq/Queue
One option is the default stats provided by sidekiq - https://github.com/mperham/sidekiq/wiki/Monitoring#using-the-built-in-dashboard
The best options is to use the Web UI provided here - https://github.com/mperham/sidekiq/wiki/Monitoring#web-ui

Websphere application server on aix not starting

While tring to start websphere server it is throwing the error:
thread server startup:2'' (0000000a) has been active for 786590 milliseconds and may be hung.
How to resolve this WSVR0605W error?
That error will always be displayed when a thread has been running for a certain time. It is not for certain that this actually indicates an error, it depends on what your applications do on your server. If you have an operation that runs within the same thread for more than 10 minutes, this error will eventually pop up even though it is within the normal operation of your application. If this is the case you will later get a message of the style [10/29/13 8:23:11:008 CET] 00004709 ThreadMonitor W WSVR0606W: Thread "Thread_ID" (00004709) was previously reported to be hung but has completed. It was active for approximately 727968 milliseconds. There is/are 0 thread(s) in total in the server that still may be hung.
If you do not have anything that should be running for that long, the thread may indeed be hung and you need to investigate what operation exactly is hung. Start by examining the logs using the same thread ID as the one reported as possibly hung.

Heroku, apparent silent failure of sucker_punch

My app runs on Heroku with unicorn and uses sucker_punch to send a small quantity of emails in the background without slowing the web UI. This has been working pretty well for a few weeks.
I changed the unicorn config to the Heroku recommended config. The recommended config
includes an option for the number of unicorn processes and I upped the number of processes from 2 to 3.
Apparently that was too much. The sucker_punch jobs stopped running. I have log messages that indicate when they are queued and I have messages that indicate when they start processing. The log shows them being queued but the processing never starts.
My theory is that I exceeded memory by going from 2 to 3 unicorns.
I did not find a message anywhere indicating a problem.
Q1: should I expect to find a failure messsage somewhere? Something like "attempting to start sucker_punch -- oops, not enough memory"?
Q2: Any suggestions on how I can be notified of a failure like this in the future.
Thanks.
If you are indeed exceeding dyno memory, you should find R14 or R15 errors in your logs. See https://devcenter.heroku.com/articles/error-codes#r14-memory-quota-exceeded
A more likely problem, though, given that you haven't found these errors, is that something within the perform method of your sucker punch worker is throwing an exception. I've found sucker punch tasks to be a pain to debug because it appears the lib swallows all exceptions silently. Try instantiating your task and calling perform on it from a rails console to make sure that it behaves as you expect.
For example, you should be able to do this without causing an exception:
task = YourTask.new
task.perform :something, 55

Delayed_job going into a busy loop when lodging second task

I am running delayed_job for a few background services, all of which, until recently, run in isolation e.g. send an email, write a report etc.
I now have a need for one delayed_job, as its last step, to lodge another delayed_job.
delay.deploy() - when delayed_job runs this, it triggers a deploy action, the last step of which is to ...
delay.update_status() - when delayed_job runs this job, it will check the status of the deploy we started. If the deploy is still progressing, we call delay.update_status() again, if the deploy has stopped we write the final deploy status to a db record.
Step 1 works fine - after 5 seconds, delayed_job fires up the deploy, which starts the deployment, and then calls delay.update_status().
But here,
instead of update_status() starting up in 5 seconds, delayed_job goes into a busy loop, firing of a bunch of update_status calls, and looping really hard without pause.
I can see the logs filling up with all these calls, the server slows down, until the end-condition for update_status is reached (deploy has eventually succeeded or failed), and things get quiet again.
Am I using Delayed_Job::delay() incorrectly, am I missing a basic tenent of this use-case ?
OK it turns out this is "expected behaviour" - if you are already in the code running for a delayed_job, and you call .delay() again, without specifying a delay, it will run immediately. You need to add the parameter run_at:
delay(queue: :deploy, run_at: 10.seconds.from_now).check_status
See the discussion in google groups

Resources