nomad: summary has failed=1 and complete=1 at the same time - nomad

I've run one dispatched job that exits with code 0. But in job summary I see job has failed and completed at the same time.
# nomad status run-packaging-a8173887-b37f-4273-9ad7-8691654bb5d4/dispatch-1503480991-c1bd3b3f
ID = run-packaging-a8173887-b37f-4273-9ad7-8691654bb5d4/dispatch-1503480991-c1bd3b3f
Name = run-packaging-a8173887-b37f-4273-9ad7-8691654bb5d4/dispatch-1503480991-c1bd3b3f
Submit Date = 08/23/17 09:36:31 UTC
Type = batch
Priority = 50
Datacenters = mhd
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
system 0 0 0 1 1 0
Allocations
ID Node ID Task Group Version Desired Status Created At
aec6e219 769ff893 system 0 run complete 08/23/17 09:36:31 UTC
And alloc-status:
# nomad alloc-status aec6e219
Recent Events:
Time Type Description
08/23/17 09:37:16 UTC Terminated Exit Code: 0
08/23/17 09:36:31 UTC Started Task started by client
08/23/17 09:36:31 UTC Task Setup Building Task Directory
08/23/17 09:36:31 UTC Received Task received by client
How to explain this result? How can it possible and why failed appeared on succesful task?

It failed once and completed on a second try.

Related

Changing A Tasks Title, or Date doesn't change the Tasks modified time - Google Tasks API

I am trying to troubleshoot why I cannot get changes to Task titles, and dates, from google tasks, based on the query param updatedMin:
In the following scenarios, all Changes made to tasks are done in Google Tasks flyout via calendar.google.com or done via the Android Tasks app.
Fail Scenarios
Fail Scenario 1:
I have a Task A in google Tasks with the Title of "foo"
I have a datetime called lastSync = 2022-04-04T04:24:02.773Z
I then change a Task A's title to "bar" at 2022-04-04T04:25:12.773Z - minute and 10 seconds greater than lastSync
I then do the following query:
import { google, tasks_v1 } from "googleapis";
const taskClient = google.tasks({ version: "v1", auth: oauth2Client });
if (list.updated) {
const updated = GoogleTaskClient.dateFromRfc339(list.updated);
if (updated > lastSync) {
const res = await taskClient.tasks.list({
tasklist: list.id,
updatedMin: formatRFC3339(lastSync),
showHidden: true,
showDeleted: true,
showCompleted: true,
maxResults: 100,
});
}
and the response has Zero items.
Fail Scenario 2:
I have a Task A in google Tasks with the Title of "foo"
I have a datetime called lastSync = 2022-04-04T04:24:02.773Z
I then change Task A's date at 2022-04-04T04:25:12.773Z - minute and 10 seconds greater than lastSync
run the query
and the response has Zero items.
Success scenarios
Success Scenario 1:
I have a Task A in google Tasks with the Title of "foo"
I have a datetime called lastSync = 2022-04-04T04:24:02.773Z
I then mark Task A as complete at 2022-04-04T04:25:12.773Z - minute and 10 seconds greater than lastSync
run the query
and the response includes Task A.
Success Scenario 2:
I have a Task A in google Tasks with the Title of "foo"
I have a datetime called lastSync = 2022-04-04T04:24:02.773Z
I then change a Task A's title to "bar" at 2022-04-04T04:25:12.773Z - minute and 10 seconds greater than lastSync
I then change Task A's date at 2022-04-04T04:25:15.773Z
run the query
and the response has Task A with the changes.
Summary
Changing the status of a Task always results in it being returned by the query, but changes to Date and Title don't appear to work with updatedMin.
Is this a known limitation of the task API - if so can you help me with some references.
I realized my error/ didn't mention this part 🤦‍♂️...
I was only getting tasks from TaskLists that were updated after the lastSync:
const taskClient = google.tasks({ version: "v1", auth: oauth2Client });
if (list.updated) {
const updated = GoogleTaskClient.dateFromRfc339(list.updated);
if (updated > lastSync) {
const res = await taskClient.tasks.list({
tasklist: list.id,
updatedMin: formatRFC3339(lastSync),
showHidden: true,
showDeleted: true,
showCompleted: true,
maxResults: 100,
});
}
The TaskList object from the API has an updated prop - string: Last modification time of the task list (as a RFC 3339 timestamp).
My error comes from thinking that changing the title of a Task or its time would result in a change to the updated prop of the parent List, but only changing the status of a task, or deleting a task does that or so it appears (I think it changes when order changes as well). Changes to Title, Description, Time, only cause the respective Task's update prop to be updated.
The docs could clarify that what qualifies as a update to a TaskList: https://developers.google.com/tasks/reference/rest/v1/tasklists#TaskList

Why aren't the Superset Alert Mails working, even after setting all the required configurations?

So, I am running Superset installed on an EC2 instance. In my config.py file, I had only made these changes:
FEATURE_FLAGS = {
"ALERT_REPORTS": True
}
EMAIL_NOTIFICATIONS = True
SMTP_HOST = "email-smtp.us-east-1.amazonaws.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_USER = "***my user***"
SMTP_PORT = 25
SMTP_PASSWORD = "***my pass***"
SMTP_MAIL_FROM = "***an email ID***"
ENABLE_SCHEDULED_EMAIL_REPORTS = True
ENABLE_ALERTS = True
After setting these, I remembered to do superset init before launching the service.
Yet, after the scheduled time, the UI shows no value in the last run column and the logs gives the following message:
DEBUG:cron_descriptor.GetText:Failed to find locale en_US
INFO:werkzeug:127.0.0.1 - - [02/Apr/2021 10:56:51] "GET /api/v1/report/?q=(filters:!((col:type,opr:eq,value:Alert)),order_column:name,order_direction:desc,page:0,page_size:25) HTTP/1.1" 200 -
Here's a sreenshot of the UI:
As can be seen, there is nothing in the last run column, even after the scheduled time (I had also scheduled it to every 1 minute interval - but same results
Alerts/reports are executed in the workers, with celery beat scheduling the jobs and celery workers executing them. You need to configure celery beat and workers to run, check out this documentation for some info on how to set this up with docker-compose: https://superset.apache.org/docs/installation/alerts-reports

Problem with nomad job deployment (raw_exec mode, v1.0.1)

Recent update from nomad v.0.9.6 to nomad v.1.01 breaks a job deployment.
Unfortunately I couldn't get any usable info from nomad agent about "pending or dead" status.
I also checked trace monitor from web-ui but without success.
Please could you give some advice on how to get reject/pending reason from the agent?
I use "raw_exec" driver (non-privileged user, driver.raw_exec.enable" = "1")
F
or deployment I use nomad-sdk (version 0.11.3.0)
You can find the job definition (from the nomad's point of view) here:
https://pastebin.com/ZXiaM9RW
OS details:
cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
Linux blade1.lab.bulb.hr 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Nomad agent details:
[root#blade1 ~]# nomad node-status
ID DC Name Class Drain Eligibility Status
5838e8b0 dc1 blade1.lab.bulb.hr <none> false eligible ready
Verbose output:
[root#blade1 ~]# nomad node-status -verbose
ID DC Name Class Address Version Drain Eligibility Status
5838e8b0-ebd3-5c47-a949-df3d601e0da1 dc1 blade1.lab.bulb.hr <none> 192.168.112.31 1.0.1 false eligible ready
[root#blade1 ~]# nomad node-status -verbose 5838e8b0-ebd3-5c47-a949-df3d601e0da1
ID = 5838e8b0-ebd3-5c47-a949-df3d601e0da1
Name = blade1.lab.bulb.hr
Class = <none>
DC = dc1
Drain = false
Eligibility = eligible
Status = ready
CSI Controllers = <none>
CSI Drivers = <none>
Uptime = 1516h1m31s
Drivers
Driver Detected Healthy Message Time
docker false false Failed to connect to docker daemon 2020-12-18T14:37:09+01:00
exec false false Driver must run as root 2020-12-18T14:37:09+01:00
java false false Driver must run as root 2020-12-18T14:37:09+01:00
qemu false false <none> 2020-12-18T14:37:09+01:00
raw_exec true true Healthy 2020-12-18T14:37:09+01:00
Node Events
Time Subsystem Message Details
2020-12-18T14:37:09+01:00 Cluster Node registered <none>
Allocated Resources
CPU Memory Disk
0/18000 MHz 0 B/53 GiB 0 B/70 GiB
Allocation Resource Utilization
CPU Memory
0/18000 MHz 0 B/53 GiB
Host Resource Utilization
CPU Memory Disk
499/20000 MHz 33 GiB/63 GiB (/dev/mapper/vg00-root)
Allocations
No allocations placed
Attributes
consul.datacenter = dacs
consul.revision = 1e03567d3
consul.server = true
consul.version = 1.8.5
cpu.arch = amd64
driver.raw_exec = 1
kernel.name = linux
kernel.version = 3.10.0-693.21.1.el7.x86_64
memory.totalbytes = 67374776320
nomad.advertise.address = 192.168.112.31:5656
nomad.revision = c9c68aa55a7275f22d2338f2df53e67ebfcb9238
nomad.version = 1.0.1
os.name = centos
os.signals = SIGTTIN,SIGUSR2,SIGXCPU,SIGBUS,SIGILL,SIGQUIT,SIGCHLD,SIGIOT,SIGKILL,SIGINT,SIGSTOP,SIGSYS,SIGTTOU,SIGFPE,SIGSEGV,SIGTSTP,SIGURG,SIGWINCH,SIGCONT,SIGIO,SIGTRAP,SIGXFSZ,SIGHUP,SIGPIPE,SIGTERM,SIGPROF,SIGABRT,SIGALRM,SIGUSR1
os.version = 7.4.1708
unique.cgroup.mountpoint = /sys/fs/cgroup/systemd
unique.consul.name = grabber1
unique.hostname = blade1.lab.bulb.hr
unique.network.ip-address = 192.168.112.31
unique.storage.bytesfree = 74604830720
unique.storage.bytestotal = 126698909696
unique.storage.volume = /dev/mapper/vg00-root
Meta
connect.gateway_image = envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level = info
connect.proxy_concurrency = 1
connect.sidecar_image = envoyproxy/envoy:v${NOMAD_envoy_version}
Job status details
[root#blade1 ~]# nomad status
ID Type Priority Status Submit Date
lightningCollector-lightningCollector service 50 pending 2020-12-18T15:06:09+01:00
[root#blade1 ~]# nomad status lightningCollector-lightningCollector
ID = lightningCollector-lightningCollector
Name = lightningCollector-lightningCollector
Submit Date = 2020-12-18T15:06:09+01:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = pending
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
lightningCollector-lightningCollector-0 0 0 0 0 0 0
Allocations
No allocations placed
Thank you for your effort and time!
Regards,
Ivan
I tested your job locally and was able to reproduce your experience. I noticed that ParentID was set in the job, which is used by Nomad to track child instances of periodic or dispatch jobs.
After setting the ParentID value to "", I was able to submit the job and it evaluated and scheduled properly.
I did some testing over the versions and determined the behavior changed in 0.12.0 and 0.12.1. I filed hashicorp/nomad #10422 in response to this difference in behavior.

Jmeter - UPDATED - Duration Assertion on While Controller (JDBC Sampler)? - UPDATED

My current environment: JMeter v2.11, remote Oracle 12, JDK 7
There is a system (A) that will send 2000 SOAP/XML submissions (per hour) into a receiving system (B). System B will insert a new row to the database table (for each new submission) setting the application.status column value to numeric value of 1. System (B) processes the requests and updates the application.status column from numeric value of 1 to numeric value of 6 once the processing is complete and the submissions are 'approved'.
I have a requirement that states these A to B submissions need to be 'approved' within 60 seconds - I am trying to setup my thread to verify this.
My current workings (after some start up help from Dmitri T) are as follows:
Thread Group
-Beanshell Sampler (to create an XML message)
-Beanshell Sampler (to submit XML to a web service)
-While Controller-->${__javaScript("${status_1}" != "6")}
--Duration Assertion-->60000 milliseconds (Duration)
--JDBC Request-->select status from application where applicationID = (select max(application_id) from application); VarName = status
Currently, my Thread Group will run and I will get multiple JDBC Requests executed until either the JDBC Request takes longer than the Duration Assertion value OR until the status value in the application table is updated to 6 (which equates to 'Approved' status).
This is NOT what I need.
I don't want to verify whether the JDBC request takes longer than the Duration value, it will never take longer than the Duration value, what I need the Duration Assertion for is to fail if the change from application.status=1 to application.status=6 takes longer than 60 seconds
As I state above - it won't prove my requirement to verify if the JDBC request takes longer than the Duration Assertion value (it never will), I need the Duration Assertion to check the application.status change takes less than 60 seconds.
I've tried the following:
Thread Group
-While Controller-->${__javaScript("${status_1}" != "6")}
--Duration Assertion-->60000 milliseconds (Duration)
--JDBC Request-->select status from application where applicationID = (select max(application_id) from application); VarName = status
Thread Group
-While Controller-->${__javaScript("${status_1}" != "6")}
--JDBC Request-->select status from application where applicationID = (select max(application_id) from application); VarName = status
--Duration Assertion-->60000 milliseconds (Duration)
Thread Group
-While Controller-->${__javaScript("${status_1}" != "6")}
--JDBC Request-->select status from application where applicationID = (select max(application_id) from application); VarName = status
---Duration Assertion-->60000 milliseconds (Duration)
I'm running out of ideas! - As with my previous requests, I appreciate any help anyone can provide.
Cheers!
Just move your Duration Assertion one level up (the same level as JDBC Request, not as a child of JDBC Request) - in that case it will be applied to While Controller duration, not the single request.
To learn more regarding assertions scope, cost and interoperability see How to Use JMeter Assertions in 3 Easy Steps guide.

Detect which worker returned a TTR-expired job to the queue?

I have multiple workers processing requests in a beanstalkd queue using beanstalk-client-ruby.
For testing purposes, the workers randomly dive into an infinite loop after picking up a job from the queue.
Beanstalk notices that a job has been reserved for too long and returns it to the queue for other workers to process.
How could I detect that this has happened, so that I can kill the malfunctioning worker?
Looks like I can get detect that a timeout has happened :
> job.timeouts
=> 0
> sleep 10
=> nil
> job.timeouts
=> 1
Now how can I something like:
> job=queue.reserve
=> 189
> job.MAGICAL_INFO_STORE[:previous_worker_pid] = $$
=> extraordinary magic happened
> sleep 10
=> nil
> job=queue.reserve
=> 189
> job.timeouts
=> 1
> kill_the_sucker(job.MAGICAL_INFO_STORE[:previous_worker_pid])
=> nil
Found a working solution myself:
Reserve a job
Setup a new tube with the job_id
Push a job with your PID in the body to the new tube
When a job with timeouts > 0 is found, pop the PID-task from the job_id queue.
Kill the worker

Resources