how to scale down instances based on their uptime with apache marathon? - mesos

I find myself in a situation where I have the necessity to scale down container instances based on their actual lifetime. It looks like fresh instances are removed first when scaling down through marathon's API. Is there any configuration I'm not aware of to implement this kind of strategy or policy when scaling down instances on apache marathon?
As of right now I'm using marathon-lb-autoscale to atumatically adjust the number of running instances. What actually happens under the hood though is that marathon-lb-autoscale does perform a PUT request updating the instances property of the current application when req/s increases or decreaseas.
scale_list.each do |app,instances|
req = Net::HTTP::Put.new('/v2/apps/' + app)
if !#options.marathonCredentials.empty?
req.basic_auth(#options.marathonCredentials[0], #options.marathonCredentials[1])
end
req.content_type = 'application/json'
req.body = JSON.generate({'instances'=>instances})
Net::HTTP.new(#options.marathon.host, #options.marathon.port).start do |http|
http.request(req)
end
end
end
I don't know if the upgradeStrategy configuration is taken into account when scaling down instances. With default settings i cannot get the expected behaviour to work.
{
"upgradeStrategy": {
"minimumHealthCapacity": 1,
"maximumOverCapacity": 1
}
}
ACTUAL
instance 1
instance 2
PUT /v2/apps/my-app {instances: 3}
instance 1
instance 2
instance 3
PUT /v2/apps/my-app {instances: 2}
instance 1
instance 2
EXPECTED
instance 1
instance 2
PUT /v2/apps/my-app {instances: 3}
instance 1
instance 2
instance 3
PUT /v2/apps/my-app {instances: 2}
instance 2
instance 3

One can specify a killSelection directly inside the application's config and specify YoungestFirst which kills youngest tasks first or OldestFirst which kills the oldest ones first.
Reference: https://mesosphere.github.io/marathon/docs/configure-task-handling.html

Related

Gatling - Dynamic Scenario, InjectionProfile, Assertion creation based on Configuration

I am attempting to write a simulation that can read from a config file for a set of apis that each have a set of properties.
I read the config for n active scenarios and create requests from a CommonRequest class
Then those requests are built into scenarios from a CommonScenario
CommonScenarios have attributes that are using to create their injection profiles
That all seems to work no issue. But when I try to use the properties / CommonScenario request to build a set of Assertions it does not work as expected.
// get active scenarios from the config
val activeApiScenarios: List[String] = Utils.getStringListProperty("my.active_scenarios")
// build all active scenarios from config
var activeScenarios: Set[CommonScenario] = Set[CommonScenario]()
activeApiScenarios.foreach { scenario =>
activeScenarios += CommonScenarioBuilder()
.withRequestName(Utils.getProperty("my." + scenario + ".request_name"))
.withRegion(Utils.getProperty("my." + scenario + ".region"))
.withConstQps(Utils.getDoubleProperty("my." + scenario + ".const_qps"))
.withStartQps(Utils.getDoubleListProperty("my." + scenario + ".toth_qps").head)
.withPeakQps(Utils.getDoubleListProperty("my." + scenario + ".toth_qps")(1))
.withEndQps(Utils.getDoubleListProperty("my." + scenario + ".toth_qps")(2))
.withFeeder(Utils.getProperty("my." + scenario + ".feeder"))
.withAssertionP99(Utils.getDoubleProperty("my." + scenario + ".p99_lte_assertion"))
.build
}
// build population builder set by adding inject profile values to scenarios
var injectScenarios: Set[PopulationBuilder] = Set[PopulationBuilder]()
var assertions : Set[Assertion] = Set[Assertion]()
activeScenarios.foreach { scenario =>
// create injection profiles from CommonScenarios
injectScenarios += scenario.getCommonScenarioBuilder
.inject(nothingFor(5 seconds),
rampUsersPerSec(scenario.startQps).to(scenario.rampUpQps).during(rampOne seconds),
rampUsersPerSec(scenario.rampUpQps).to(scenario.peakQps).during(rampTwo seconds),
rampUsersPerSec(scenario.peakQps).to(scenario.rampDownQps) during (rampTwo seconds),
rampUsersPerSec(scenario.rampDownQps).to(scenario.endQps).during(rampOne seconds)).protocols(httpProtocol)
// create scenario assertions this does not work for some reason
assertions += Assertion(Details(List(scenario.requestName)), TimeTarget(ResponseTime, Percentiles(4)), Lte(scenario.assertionP99))
}
setUp(injectScenarios.toList)
.assertions(assertions)
Note scenario.requestName is straight from the build scenario
.feed(feederBuilder)
.exec(commonRequest)
I would expect the Assertions get built from their scenarios into an iterable and pass into setUp().
What I get:
When I print out everything the scenarios, injects all look good but then I print my "assertions" and get 4 assertions for the same scenario name with 4 different Lte() values. This is generalized but I configured 12 apis all with different names and Lte() values, etc.
Details(List(Request Name)) - TimeTarget(ResponseTime,Percentiles(4.0)) - Lte(500.0)
Details(List(Request Name)) - TimeTarget(ResponseTime,Percentiles(4.0)) - Lte(1500.0)
Details(List(Request Name)) - TimeTarget(ResponseTime,Percentiles(4.0)) - Lte(1000.0)
Details(List(Request Name)) - TimeTarget(ResponseTime,Percentiles(4.0)) - Lte(2000.0)
After the simulation the assertions all run like normal:
Request Name: 4th percentile of response time is less than or equal to 500.0 : false
Request Name: 4th percentile of response time is less than or equal to 1500.0 : false
Request Name: 4th percentile of response time is less than or equal to 1000.0 : false
Request Name: 4th percentile of response time is less than or equal to 2000.0 : false
Not sure what I am doing wrong when building my assertions. Is this even a valid approach? I wanted to ask for help before I abandon this for a different approach.
Disclaimer: Gatling creator here.
It should work.
Then, there are several things I'm super not found of.
assertions += Assertion(Details(List(scenario.requestName)), TimeTarget(ResponseTime, Percentiles(4)), Lte(scenario.assertionP99))
You shouldn't be using the internal AST here. You should use the DSL like you've done for the injection profile.
var assertions : Set[Assertion] = SetAssertion
activeScenarios.foreach { scenario =>
You should use map on activeScenarios (similar to Java's Stream API), not use a mutable accumulator.
val activeScenarios = activeApiScenarios.map(???)
val injectScenarios = activeScenarios.map(???)
val assertions = activeScenarios.map(???)
Also, as you seem to not be familiar with Scala, you should maybe switch to Java (supported for more than 1 year).

Why triton serving shared memory failed with running multiple workers in uvicorn in order to send multiple request concurrently to the models?

I run a model in triton serving with shared memory and it works correctly.
In order to simulate backend structure I wrote a Fast API for my model and run it with gunicorn with 6 workers. Then I wrote anthor Fast API to route locust requests to my first Fast Fast API as below image(pseudo code). my second Fast API runs with uvicorn. but the problem is when I used multiple workers for my uvicorn, triton serving failed to shared memory.
Note: without shared memory every thing works but my response time is much longer than the shared memory option. so I need to use shared memory option.
here is my triton client code:
I have a functions in my client code named predict function which used the requestGenerator to shared input_simple and output_simple spaces.
this is my requestGenerator generator:
def requestGenerator(self, triton_client, batched_img_data, input_name, output_name, dtype, batch_data):
triton_client.unregister_system_shared_memory()
triton_client.unregister_cuda_shared_memory()
output_simple = "output_simple"
input_simple = "input_simple"
input_data = np.ones(
shape=(batch_data, 3, self.width, self.height), dtype=np.float32)
input_byte_size = input_data.size * input_data.itemsize
output_byte_size = input_byte_size * 2
shm_op0_handle = shm.create_shared_memory_region(
output_name, output_simple, output_byte_size)
triton_client.register_system_shared_memory(
output_name, output_simple, output_byte_size)
shm_ip0_handle = shm.create_shared_memory_region(
input_name, input_simple, input_byte_size)
triton_client.register_system_shared_memory(
input_name, input_simple, input_byte_size)
inputs = []
inputs.append(
httpclient.InferInput(input_name, batched_img_data.shape, dtype))
inputs[0].set_data_from_numpy(batched_img_data, binary_data=True)
outputs = []
outputs.append(
httpclient.InferRequestedOutput(output_name,
binary_data=True))
inputs[-1].set_shared_memory(input_name, input_byte_size)
outputs[-1].set_shared_memory(output_name, output_byte_size)
yield inputs, outputs, shm_ip0_handle, shm_op0_handle
this is my predict function:
def predict(self, triton_client, batched_data, input_layer, output_layer, dtype):
responses = []
results = None
for inputs, outputs, shm_ip_handle, shm_op_handle in self.requestGenerator(
triton_client, batched_data, input_layer, output_layer, type,
len(batched_data)):
self.sent_count += 1
shm.set_shared_memory_region(shm_ip_handle, [batched_data])
responses.append(
triton_client.infer(model_name=self.model_name,
inputs=inputs,
request_id=str(self.sent_count),
model_version="",
outputs=outputs))
output_buffer = responses[0].get_output(output_layer)
if output_buffer is not None:
results = shm.get_contents_as_numpy(
shm_op_handle, triton_to_np_dtype(output_buffer['datatype']),
output_buffer['shape'])
triton_client.unregister_system_shared_memory()
triton_client.unregister_cuda_shared_memory()
shm.destroy_shared_memory_region(shm_ip_handle)
shm.destroy_shared_memory_region(shm_op_handle)
return results
Any help would be appreciated to help me how to use multiple uvicorn workers to send multiple requests concurrently to my triton code without failing.

I lose user session with Ruby + Sinatra + puma + sequel only when worker process puma> 1

My app in Heroku with Ruby + Sinatra + puma + sequel is ok while worker process = 1 when increasing worker process = 2 or if increasing dyno = 2 I start with problems of losing the user session randomly at different points in the system making it very difficult to locate the specific error through heroku logs.
The same app works fine with:
But you lose the value of session[: user] with:
My app rack sinatra class:
class Main <Sinatra :: Aplicación
use Rack :: Session :: Pool
set: protection ,: except =>: frame_options
def usuarioLogueado?
if defined?( session[:usuario] )
if session[:usuario].nil?
return false
else
return true
end
else
return false
end
end
get "/" do
if usuarioLogueado?
redirect "/app"
.....
else
redirect "/home"
end
end
end
My sequel connection:
pool_size = 10
# db = Sequel.connect (strConexion ,: max_connections => pool_size )
# db.extension (: connection_validator)
# db.pool.connection_validation_timeout = -1
My puma.rb: (20 connections max DB)
workers Integer (ENV ['WEB_CONCURRENCY'] || 1)
threads_count = Integer (ENV ['MAX_THREADS'] || 10)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV ['PORT'] || 3000
Rack::Session::Pool is a simple memory based session store. Each process has its own store and they are not shared between processes or hosts. When a request gets directed to a different dyno or different process on the same dyno, the session data will not be available.
You could look at sticky sessions, but they won’t work in all situations (e.g. when dynos are created or destroyed) and won’t work at all if you have multiple processes on a single dyno.
You should look at using cookie based sessions, or set up a shared server side store such as memcached with Dalli, so that it doesn’t matter which dyno or process each request is routed to.

Why is EC2 instance status 'terminated' when I try to create it with IO1 volume type?

Here's a block of code that is used to create an EC2 instance:
def create_instance(connection, name, instance_type, security_groups, ami, key, placement, cluster, optimized_ebs):
#NEW BLOCK BEGIN
dev_sda1 = boto.ec2.blockdevicemapping.EBSBlockDeviceType()
dev_sda1.size = 23 # size in Gigabytes
dev_sda1.volume_type = 'io1'
dev_sda1.iops = 44
bdm = boto.ec2.blockdevicemapping.BlockDeviceMapping()
bdm['/dev/sda1'] = dev_sda1
#NEW BLOCK END
res = connection.run_instances(
ami,
key_name=key,
instance_type=instance_type,
security_groups=security_groups,
placement=placement,
ebs_optimized=optimized_ebs,
block_device_map=bdm)
inst = res.instances[0]
time.sleep(30)
inst.update()
connection.create_tags([inst.id], {'Name': '%s-%s' % (cluster, name),'Cluster': cluster})
Before the #NEW BLOCK code block was added it all worked. After create_instance was called, I would check the state of the new instance and it would be 'running'.
I added the block to create the instance with the volume of type 'IO1' instead of the default (following the accepted answer here). I do not get any exceptions or other errors here but when I check the instance state later I get 'terminated'. What am I doing wrong?
There may be other issues but one problem I see is the value you are providing for iops in the block device mapping. That value must be between 100-4000 (see API docs).

Akka actors and Clustering-I'm having trouble with ClusterSingletonManager- unhandled event in state Start

I've got a system that uses Akka 2.2.4 which creates a bunch of local actors and sets them as the routees of a Broadcast Router. Each worker handles some segment of the total work, according to some hash range we pass it. It works great.
Now, I've got to cluster this application for failover. Based on the requirement that only one worker per hash range exist/be triggered on the cluster, it seems to me that setting up each one as a ClusterSingletonManager would make sense..however I'm having trouble getting it working. The actor system starts up, it creates the ClusterSingletonManager, it adds the path in the code cited below to a Broadcast Router, but it never instantiates my actual worker actor to handle my messages for some reason. All I get is a log message: "unhandled event ${my message} in state Start". What am I doing wrong? Is there something else I need to do to start up this single instance cluster? Am I sending the wrong actor a message?
here's my akka config(I use the default config as a fallback):
akka{
cluster{
roles=["workerSystem"]
min-nr-of-members = 1
role {
workerSystem.min-nr-of-members = 1
}
}
daemonic = true
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = ${akkaPort}
}
}
actor{
provider = akka.cluster.ClusterActorRefProvider
single-message-bound-mailbox {
# FQCN of the MailboxType. The Class of the FQCN must have a public
# constructor with
# (akka.actor.ActorSystem.Settings, com.typesafe.config.Config) parameters.
mailbox-type = "akka.dispatch.BoundedMailbox"
# If the mailbox is bounded then it uses this setting to determine its
# capacity. The provided value must be positive.
# NOTICE:
# Up to version 2.1 the mailbox type was determined based on this setting;
# this is no longer the case, the type must explicitly be a bounded mailbox.
mailbox-capacity = 1
# If the mailbox is bounded then this is the timeout for enqueueing
# in case the mailbox is full. Negative values signify infinite
# timeout, which should be avoided as it bears the risk of dead-lock.
mailbox-push-timeout-time = 1
}
worker-dispatcher{
type = PinnedDispatcher
executor = "thread-pool-executor"
# Throughput defines the number of messages that are processed in a batch
# before the thread is returned to the pool. Set to 1 for as fair as possible.
throughput = 500
thread-pool-executor {
# Keep alive time for threads
keep-alive-time = 60s
# Min number of threads to cap factor-based core number to
core-pool-size-min = ${workerCount}
# The core pool size factor is used to determine thread pool core size
# using the following formula: ceil(available processors * factor).
# Resulting size is then bounded by the core-pool-size-min and
# core-pool-size-max values.
core-pool-size-factor = 3.0
# Max number of threads to cap factor-based number to
core-pool-size-max = 64
# Minimum number of threads to cap factor-based max number to
# (if using a bounded task queue)
max-pool-size-min = ${workerCount}
# Max no of threads (if using a bounded task queue) is determined by
# calculating: ceil(available processors * factor)
max-pool-size-factor = 3.0
# Max number of threads to cap factor-based max number to
# (if using a bounded task queue)
max-pool-size-max = 64
# Specifies the bounded capacity of the task queue (< 1 == unbounded)
task-queue-size = -1
# Specifies which type of task queue will be used, can be "array" or
# "linked" (default)
task-queue-type = "linked"
# Allow core threads to time out
allow-core-timeout = on
}
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 1
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 1
}
}
}
}
Here's where I create my Actors(its' written in Groovy):
Props clusteredProps = ClusterSingletonManager.defaultProps("worker".toString(), PoisonPill.getInstance(), "workerSystem",
new ClusterSingletonPropsFactory(){
#Override
Props create(Object handOverData) {
log.info("called in ClusterSingetonManager")
Props.create(WorkerActorCreator.create(applicationContext, it.start, it.end)).withDispatcher("akka.actor.worker-dispatcher").withMailbox("akka.actor.single-message-bound-mailbox")
}
} )
ActorRef manager = system.actorOf(clusteredProps, "worker-${it.start}-${it.end}".toString())
String path = manager.path().child("worker").toString()
path
when I try to send a message to the actual worker actor, should the path above resolve? Currently it does not.
What am I doing wrong? Also, these actors live within a Spring application, and the worker actors are set up with some #Autowired dependencies. While this Spring integration worked well in a non-clustered environment, are there any gotchyas in a clustered environment I should be looking out for?
thank you
FYI:I've also posted this in the akka-user google group. Here's the link.
The path in your code is to the ClusterSingletonManager actor that you start on each node with role "workerSystem". It will create a child actor (WorkerActor) with name "worker-${it.start}-${it.end}" on the oldest node in the cluster, i.e. singleton within the cluster.
You should also define the name of the ClusterSingletonManager, e.g. system.actorOf(clusteredProps, "workerSingletonManager").
You can't send the messages to the ClusterSingletonManager. You must send them to the path of the active worker, i.e. including the address of the oldest node. That is illustrated by the ConsumerProxy in the documentation.
I'm not sure you should use a singleton at all for this. All workers will be running on the same node, the oldest. I would prefer to discuss alternative solutions to your problem at the akka-user google group.

Resources