How windows recovery work with failure count? - windows

I am using the following command to configure the service failure recovery
sc failure "service" actions= ""/60000/restart/60000/run/120000 reset= 60 command = "\"c:\\windows\notepad2.exe
(used notepad2.exe just for testing)
From the Microsoft documentation here:-
Actions
This field contains an array of integer values that specify the
actions taken by the SCM if the service fails. Separate the values in
the array by [~]. The integer value in the Nth element of the array
specifies the action performed when the service fails for the Nth
time.
So, what I am getting from this is the count of failure will decide the action => For first failure Actions[0] will be executed and for the second Actions[1] will be executed and for all subsequent failures Actions[2] will be
I have following configuration for the service for testing this behavior:-
Then I tried killing the process under which service is running by using taskkill.
Here is the first log
Then I tried starting the service manually.
Then again I tried killing the service after ~ 2 mins ( => the reset count will set failure count to 0 as it is configured to 1 minute).
Here is the log for the error
In above figure, it is clear that why count is resetting to 0 because reset setting we have given60 sec and our service was running more than 2 mins.
But the action described for recovery is wrong as Restarting the service is the action for the second failure not for the first failure.
So why the count for failure is coming 1 but the action for recovery is the action corresponding to the second failure action?

I was just playing around with a similar issue, and after I set the "Reset fail count after:" to "1" day, it seems to be working. A possible explanation is that by setting the "Reset fail count" to 1 day, it will not reset the fail count back to 0 after the first fail (which is you stopping and restarting the service manually), and lets it cycle through the rest of the actions (depending on conditions/actions). Your mileage may vary.

Related

I'm executing the script for 1 hr but it is running for 10 min

I'm executing the script for 1 hour but it is running for 10 minutes , i also check loop forever, test data is also proper, all the script is running properly without any error , I run the script thrice validate all the things but im not getting why it is happening
How to overwrite the issues and how to run a the script for 1 hour
Normally you can find the reason for termination of a thread or test in jmeter.log file. If it is not there or it's vague - you can increase JMeter logging level to something more verbose
The most common reasons for premature end of the test are:
Not enough loops to cover the anticipated duration of the test in the Thread Group
Not enough test data if CSV Data Set Config is configured to stop thread on EOF
Thread group is configured to stop the thread/test on a sampler error
There is Flow Control Action sampler somewhere configured to stop thread/test
There is a runtime error like OutOfMemoryError or StackOverFlowError

Error working with "ScrollElasticSearchHttp" processor in NiFi

I am trying to retrieve data from an index in ElasticSearch. I configured the "QueryElasticSearchHttp" processor and it works just fine. However when I try to use the ScrollElasticsearchHttp processor with the same URL, query, index properties and set the 'scroll' to default 1 minute, it doesn't work.
I get an error response of 404 : "Elasticsearch returned code 404 with message Not found".
I am also tailing the log on the ES cluster and I see this error;
[DEBUG][o.e.a.s.TransportSearchScrollAction] [2] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:[127.0.0.1:9300][indices:data/read/search[phase/query+fetch/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [2]
at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:457) ~[elasticsearch-7.5.2.jar:7.5.2]
I am on Apache NiFi 1.10.0
Here is the config for the processor:
I should see a total of 441 hits, and with page size 20 I should see 23 queries being made to ES.
But I don't get a single result back. I have tried higher values for "scroll" and also played around with "page size" to no avail.
I also noticed that even though the ScrollElasticsearchHttp processor is set to run every 1m, on the ES log I don't see any error log repeated every minute.
Update:
When I cleared the state via UI: "View state" -> "Clear State", I was able to make a single call, that returned a page full of hits in one flowfile.
However, there are more pages to be retrieved. How do I make the processor to go fetch the next page?
My understanding was that the single invocation of the ScrollElasticsearchHttp will page through all the result sets and bring in each page as one flowfile. Is this not correct?
Please decrease the scheduling time to around 10-20 sec. So in every 10-20 sec processor will fetch the next set of records based on your page size.
You can check the state value when the fetching process is in progress i.e. you will find a scroll id in it. Once the fetching process is complete then state value will be changed to "finishedQuery" : true.

Access denied calling EnumJobs

I'm trying to get the domain username of jobs in a printer queue on Windows Server 2012 R2 Standard. Code snippet below is in Delphi. OpenPrinter and EnumJobs are part of the Windows Spooler API.
Update! Setting maxJobs to a higher multiple of 4 allows for more jobs in the queue to be enumerated. eg. Setting maxJobs=8 allows for two jobs, but not three. maxJobs=12 allows for three jobs.
Solved! It looks like I can just ignore the return value of EnumJobs, and simply see if the number of jobs it returns > 0 (the last argument when calling). This seems to work fine for all instances listed below, including printer via a share.
const
maxJobs = 4;
var
h : THandle;
jia : array [1..maxJobs] of JOB_INFO_1;
jiz, jic : DWord; // size of jia, count of jia
begin
if OpenPrinter('DocTest', h, nil) then
begin
if EnumJobs(h, 0, maxJobs, 1, #jia, SizeOf(jia), jiz, jic) then
[...]
EnumJobs returns true or false depending on different conditions listed below. If it returns false in any of the following situations, the error message I'm retrieving is "System Error. Code: 5. Access is denied".
Clearly a permissions problem. I have assigned Print, Manage this Printer, and Manage documents to Everyone in the printer security settings. All jobs have been submitted after those settings have been assigned. My program is running in a session logged in as the domain administrator.
EnumJobs returns TRUE if I print a job from the same session I'm running this program in, and there's only one job in the queue. (See Update above for change)
EnumJobs returns TRUE if I print from another session on the server (it has terminal services installed) as any user, and there's only one job in the queue. (See Update above for change)
EnumJobs returns FALSE if there is more than one job in the queue. It doesn't matter if the jobs are for the same user or not. (See Update above for change)
EnumJobs returns FALSE if I print a job from another server to the printer share. Both servers are in the same domain. It doesn't matter which user prints the job, including the domain administrator.
What's going on here, in particular getting an access denied when enumerating more than (maxJobs / 4) job(s) at a time?
Ignore the return value of EnumJobs and inspect the out argument pcReturned to see if it's greater than 0. This indicates the number of print jobs found.

Understanding delayed_job status

I've implemented long-running tasks in my Rails app using delayed_job along with delayed_job_web. My delayed_job configuration instructs jobs to be attempted once, and for failures to be retained:
config/initializers/delayed_job.rb:
Delayed::Worker.max_attempts = 1
Delayed::Worker.destroy_failed_jobs = false
I tried 2 test jobs that automatically raised errors, in order to see how failures behave. What I get is the following:
My expectation was that Failed jobs would have a count of 2, but that Enqueued / Working / Pending would all be 0. I can't find any documentation on what determines whether a job is Enqueued / Working / Pending, or even what the difference between Working and Pending is (the web interface describes both lists as "contains jobs currently being processed".)
Can anyone provide some clarity?
If you check https://github.com/ejschmitt/delayed_job_web/blob/master/lib/delayed_job_web/application/app.rb , you see the following (starting line 114):
when :working
'locked_at is not null'
when :failed
'last_error is not null'
when :pending
'attempts = 0'
end
Enqueued would be the total number of delayed jobs, i.e. Delayed::Job.count
Working jobs are those that have been locked by the delayed_job process and are currently being worked.
Failed are those that have a last_error
Pending are those jobs that have never been attempted.

Schedules and `max_failures` attribute

Can't get the max_failures idea. From the documentation:
This attribute specifies the number of times a job can fail on consecutive scheduled runs before it is automatically disabled.
So, let's suppose I have a schedule. Its running count is 100. Its failure count is 18. Its max failures is 20.
Current run has finished successfully.
I expect: if I will break it - it will run exactly 20 times on state FAILED after which it will be changed to BROKEN
What I get: it runs 2 times so failure count is 20 and despite the fact it were just 2 consecutive runs the schedule is changed to state BROKEN.
What have I missed?
I think "consecutive scheduled runs" means exactly that. If it succeeds, the failure count should be reset to 0.
EDIT
Guess I was wrong, sorry.
Reading up: http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/schedadmin004.htm
As per Gary's comment - looks like you need to reset the failure count manually.

Resources