Marathon: Placement Constraints in Mesosphere Jobs - mesos

How can I add Placement Constraints in mesosphere jobs?
I know there a GUI way to add this to Services and what I need is how can I do the same to JOBS?

This is possible but maybe not exposed in GUI. See below example from readme
{
"id": "sample-job",
"description": "A sample job that sleeps",
"run": {
"cmd": "sleep 1000",
"cpus": 0.01,
"mem": 32,
"disk": 0,
"placement": {
"constraints": [
{
"attribute": "hostname",
"operator": "LIKE",
"value": "<host-name>"
}
]
}
},
"schedules": [
{
"id": "sample-schedule",
"enabled": true,
"cron": "0 0 * * *",
"concurrencyPolicy": "ALLOW"
}
]
}

Related

Step Functions ignoring Parameters, executing it with default input

I don't know if this is a bug of my own or Amazon's as I'm unable to properly understand what's going on.
I have the following Step Function (It's currently a work in progress), which takes care of validating an input and then sending it to an SQS lambda (CreateInAuth0 Step).
My problem is that apparently the SQS function is executing with the default parameters, fails, retries and then uses the ones specified in the Step Function. (Or at least that's what I'm gathering from the logs).
I want to know why it is ignoring the Parameters in the first go. Here's all the information I can gather.
There seems to be a difference between TaskStateEntered and TaskScheduled (which I believe it's ok).
{
"Comment": "Add the students to the platform and optionally adds them to their respective grades",
"StartAt": "AddStudents",
"States": {
"AddStudents": {
"Comment": "Validates that the students are OK",
"Type": "Task",
"Resource": "#AddStudents",
"Next": "Parallel"
},
"Parallel": {
"Type": "Parallel",
"Next": "FinishExecution",
"Branches": [
{
"StartAt": "CreateInAuth0",
"States": {
"CreateInAuth0": {
"Type": "Task",
"Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
"Parameters": {
"QueueUrl": "#SQS_StudentsRegistrationSQSFifo",
"MessageBody": {
"execId.$": "$.result.execId",
"tenantType.$": "$.result.tenantType",
"tenantId.$": "$.result.tenantId",
"studentsToCreate.$": "$.result.newStudents.success",
"TaskToken.$": "$$.Task.Token",
"studentsSucceeded": []
}
},
"End": true
}
}
}
]
},
"FinishExecution": {
"Comment": "This signals the ending in DynamoDB that the transaction finished",
"Type": "Task",
"End": true,
"Resource": "arn:aws:states:::dynamodb:putItem",
"ResultPath": "$.dynamoDB",
"Parameters": {
"TableName": "#SchonDB",
"Item": {
"pk": {
"S.$": "$.arguments.tenantId:exec:$.id"
},
"sk": {
"S": "result"
},
"status": {
"S": "FINISHED"
},
"type": {
"S": "#EXEC_TYPE"
}
}
},
"Retry": [
{
"ErrorEquals": [
" States.Timeout"
],
"IntervalSeconds": 1,
"MaxAttempts": 5,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FinishExecutionIfFinishErrored"
}
]
},
"FinishExecutionIfFinishErrored": {
"Type": "Pass",
"End": true
}
}
}
Here's the visual:
Here's the execution event history:
Note: I have a try/catch statement inside the SQS function that wraps the entire SQS' execution.

Automatically add Cloudwatch Alarms and Dashboards for new EC2 instances

I created a Cloudwatch Alarm and Dashboard for the disk space on my EC2 instance. Which is fine until autoscaling spins up a new instance and I have to manually edit the alarm + dashboard with the new instance id.
I suspect the solution lies with setting up Cloudwatch rules for when a new instance starts combined with Lambda although I'd really like some help with the Lambda side of things, if anyone can advise.
#Cloudwatch Alarm
{
"region": "eu-west-1",
"metrics": [
[ "CWAgent", "disk_used_percent", "path", "/", "InstanceId", "i-1234567890", "AutoScalingGroupName", "my-autoscalinggrp", "ImageId", "ami-a1b2c3d4e5", "InstanceType", "t3.micro", "device", "nvme0n1p1", "fstype", "xfs", { "stat": "Average" } ]
],
"view": "timeSeries",
"stacked": false,
"period": 300,
"annotations": {
"horizontal": [
{
"label": "disk_used_percent >= 80 for 1 datapoints within 5 minutes",
"value": 80
}
]
},
"title": "disk-used-alarm"
}
#Dashboard
{
"metrics": [
[ "CWAgent", "disk_used_percent", "path", "/", "InstanceId", "i-1234567890", "AutoScalingGroupName", "my-autoscalinggrp", "ImageId", "ami-a1b2c3d4e5", "InstanceType", "t3.small", "device", "nvme0n1p1", "fstype", "xfs", { "label": "Server1" } ]
],
"view": "singleValue",
"stacked": false,
"region": "eu-west-1",
"stat": "Average",
"period": 300,
"title": "Disk Usage Dashboard"
}

NiFi: ReplaceText alternatives to modify JSON

My NiFi application receives two kinda different types of JSON's.
First of them looks like:
[
{
"campaign": {
"resourceName": "customers/8952771329/campaigns/11381694617",
"status": "ENABLED",
"name": "Saint_Spring_Active Minerals_oct-nov_2020_trueview_skip_5766500views",
"id": "11381694617"
},
"metrics": {
"interactionEventTypes": [
"VIDEO_VIEW"
],
"clicks": "6",
"videoQuartileP100Rate": 0.44493171079034244,
"videoQuartileP25Rate": 0.9747718298919024,
"videoQuartileP50Rate": 0.7339309987701469,
"videoQuartileP75Rate": 0.5337562301767105,
"videoViewRate": 0.4471109114825628,
"videoViews": "27872",
"viewThroughConversions": "0",
"contentBudgetLostImpressionShare": 0.0000013066088274492382,
"contentImpressionShare": 0.0999,
"contentRankLostImpressionShare": 0.9001,
"conversionsValue": 0,
"conversions": 0,
"costMicros": "9338700950",
"ctr": 0.00009624947864865732,
"currentModelAttributedConversions": 0,
"currentModelAttributedConversionsValue": 0,
"engagementRate": 0,
"engagements": "0",
},
"segments": {
"device": "CONNECTED_TV",
"date": "2020-12-20"
}
}
]
And second:
[
{
"adGroup": {
"resourceName": "customers/5404177717/adGroups/110501283582",
"campaign": "customers/5404177717/campaigns/11628802542"
},
"metrics": {
"interactionEventTypes": [
"CLICK"
],
"clicks": "1",
"averageCpm": 95497428.02172929,
"gmailForwards": "0",
"gmailSaves": "0",
"gmailSecondaryClicks": "0",
"impressions": "4418",
"interactionRate": 0.00022634676324128565,
"interactions": "1"
},
"adGroupAd": {
"resourceName": "customers/5404177717/adGroupAds/110501283582~480227690139",
"status": "ENABLED",
"ad": {
"resourceName": "customers/5404177717/ads/480227690139",
"id": "480227690139",
"name": "20 sec perek"
},
"adGroup": "customers/5404177717/adGroups/110501283582"
},
"segments": {
"device": "DESKTOP",
"date": "2020-11-21"
}
}
]
I already have 2 tables in my database to save this data. I have an attribute table.name just to not create same block where's only table name is different.
My next block is FlattenJson. After this i'm using ReplaceText with search value (replacement value is empty string): (customers\\\/${client.customer.id}\\\/campaigns\\\/|customers\\\/${client.customer.id}\\\/adGroups\\\/).
Why this? From this line: "adGroup": "customers/5404177717/adGroups/110501283582" i only need last value 110501283582 as ad_group_id. And from this line: "campaign": "customers/5404177717/campaigns/11628802542" i only need 11628802542. ${client.customer.id} can be different, so i'm using EL features.
Also i need to change json value name adGroup to ad.group.id, for this i'm also using ReplaceText.
Can i do it faster without two ReplaceText processors?
Look at the following processors...I think using them can be an alternative:
JoltTransformJSON:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.JoltTransformJSON/
UpdateRecord:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.UpdateRecord/index.html

Elasticsearch queries consuming 100% of CPU

I'm still relatively new to Elasticsearch and, currently, I'm attempting to switch from Solr to Elasticsearch and am seeing a huge increase in CPU usage when ES is on our production website. The site sees anywhere from 10,000 to 30,000 requests to ES per second. Solr handles that load just fine with our current hardware.
The books index mapping: https://pastebin.com/bKM9egPS
A query for a book: https://pastebin.com/AdfZ895X
ES is hosted on AWS on an m4.xlarge.elasticsearch instance.
Our cluster is set up as follows (anything not included is default):
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"disk": {
"watermark": {
"low": "15.0gb",
"flood_stage": "5.0gb",
"high": "10.0gb"
}
},
"node_initial_primaries_recoveries": "4"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "60mb"
}
}
Our nodes have the following configuration:
"_nodes": {
"total": 2,
"successful": 2,
"failed": 0
},
"cluster_name": "cluster",
"nodes": {
"####": {
"name": "node1",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "####",
"roles": [
"master",
"data",
"ingest"
]
},
"###": {
"name": "node2",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "###",
"roles": [
"master",
"data",
"ingest"
]
}
}
Can someone please help me figure out what exactly is happening so I can get this deployment finished?

Rasa nlu tutorial doesn't work

Rasa NLU version (0.11.3):
Used backend / pipeline (spacy_sklearn):
Operating system (osx):
Issue: I tried to follow the tutorial: https://rasahq.github.io/rasa_nlu/tutorial.html?highlight=project#,
Installed spaCy + sklearn
Created config_spacy.json
Downloaded sample file and train
I've test greeting and goodbye intent and they are work
but when I test with command:
curl -X POST localhost:5000/parse -d '{"q":"I am looking for Mexican food"}' | python -m json.tool
it returns:
{
"intent": {
"name": "None",
"confidence": 1.0
},
"entities": [],
"text": "yes"
}
Content of configuration file (if used & relevant):
{
"project": null,
"fixed_model_name": null,
"config": "config.json",
"data": null,
"emulate": null,
"language": "en",
"log_file": null,
"log_level": "INFO",
"mitie_file": "data/total_word_feature_extractor.dat",
"spacy_model_name": null,
"num_threads": 1,
"max_training_processes": 1,
"path": "/rasa_nlu/projects",
"port": 5000,
"token": null,
"cors_origins": [],
"max_number_of_ngrams": 7,
"pipeline": [],
"response_log": "/rasa_nlu/logs",
"storage": null,
"aws_endpoint_url": null,
"duckling_dimensions": null,
"duckling_http_url": null,
"ner_crf": {
"BILOU_flag": true,
"features": [
[
"low",
"title",
"upper",
"pos",
"pos2"
],
[
"bias",
"low",
"word3",
"word2",
"upper",
"title",
"digit",
"pos",
"pos2",
"pattern"
],
[
"low",
"title",
"upper",
"pos",
"pos2"
]
],
"max_iterations": 50,
"L1_c": 1,
"L2_c": 0.001
},
"intent_classifier_sklearn": {
"C": [
1,
2,
5,
10,
20,
100
],
"kernel": "linear"
}
}
Status:
{
"available_projects": {
"default": {
"status": "ready",
"available_models": [
"fallback"
]
}
}
}
In your config file the pipeline is set to [] but needs to be configured properly. The documentation for the pipeline configuration option can be found here. The available options are discussed here.
The pipeline can either be a pre-configured pipeline like: mitie, spacy_sklearn, or keyword. It can also be a custom pipeline like: ["nlp_spacy", "ner_crf", "ner_synonyms"]. I would recommend setting your pipeline to:
pipeline: "space_sklearn"
Update your configuration file and restart the server. If the server is still running in a console window press Ctrl + c to stop it. Then re-enter the command you used to start it.

Resources