OpenTelemetry for short-lived scripts? - open-telemetry

Our system consists of many python scripts that are run on "clean" machines, that is, they need to have as little additional software on them as possible. Is there a way we could use OpenTelemetry without having to run additional servers on those machines? Is there a push model for sending data instead of pull?

Considering your additional explanation I imagine you will eventually want to collect all telemetry from these systems. Using OTLP exporters you can send all three signals traces, metrics, logs to collector service (As of now only tracing is stable and metrics, logs work is experimental). You would not have to run any additional servers on these resource constrained servers for your use case. There are two deployments strategies recommended for opentelemetry collector.
As an agent - Runs along with the application on same host machine.
As a gateway - Runs on standalone server outside the application host machine.
Running collector agent on same application host machine offloads some of the work from language client libs and enhances the telemetry but can be resource incentive.
Read more about collector here https://opentelemetry.io/docs/collector/getting-started/

Related

503 error on server load tests on Wildfly server on Jelastic

I have an app deployed on a wildfly server on the Jelastic PaaS. This app functions normally with a few users. I'm trying to do some load tests, by using JMeter, in this case calling a REST api 300 times in 1 second.
This leads to around 60% error rate on the requests, all of them being 503 (service temporarily unavailable). I don't know what things I have to tweak in the environment to get rid of those errors. I'm pretty sure it's not my app's fault, since it is not heavy and i get the same results even trying to test the load on the Index page.
The topology of the environment is simply 1 wildfly node (with 20 cloudlets) and a Postgres database with 20 cloudlets. I had fancier topologies, but trying to narrow the problem down I cut the load balancer (NGINX) and the multiple wildfly nodes.
Requests via the shared load balancer (i.e. when your internet facing node does not have a public IP) face strict QoS limits to protect platform stability. The whole point of the shared load balancer is it's shared by many users, so you can't take 100% of its resources for yourself.
With a public IP, your traffic goes straight from the internet to your node and therefore those QoS limits are not needed or applicable.
As stated in the documentation, you need a public IP for production workloads (a load test should be considered 'production' in this context).
I don't know what things I have to tweak in the environment to get rid of those errors
we don't know either and as your question doesn't provide sufficient level of details we can come up only with generic suggestions like:
Check WildFly log for any suspicious entries. HTTP 503 is a server-side error so it should be logged along with the stacktrace which will lead you to the root cause
Check whether Wildfly instance(s) have enough headroom to operate in terms of CPU, RAM, et, it can be done using i.e. JMeter PerfMon Plugin
Check JVM and WildFly specific JMX metrics using JVisualVM or the aforementioned JMeter PerfMon Plugin
Double check Undertow subsystem configuration for any connection/request/rate limiting entries
Use a profiler tool like JProfiler or YourKit to see what are the slowest functions, largest objects, etc.

testing performance on Linux dockers

I have several scripts testing performance of Linux server. there are about 12 dockers containers running inside the Linux.
We are interested on collecting also metrics of containers (right now we are collecting only of the Linux machine itself)
Is there any plugin for this? or can this be done with the Perfmon plugin?
There are several ways of monitoring Docker instance statistics:
Built-in Runtime Metrics
cAdvisor - probably this one will be the easiest to setup and use
Any of built-in Linux monitoring tools
Of course you can normally use JMeter PerfMon Plugin as this way you will get performance monitoring results integrated into your test script and be able to correlate JMeter metrics with server health metrics. Just make sure there is a TCP/UDP connectivity between JMeter and PerfMon Server Agent, default port is 4445 so make sure container exposes this port to the outside world.

Quartz with centralized scheduling and monitoring

We are trying to revamp our batch job scheduling and monitoring process over the entire enterprise. Currently all our batch jobs are scheduled using Unix crontab and are monitored using log files generated by shell scripts.
This process has lot of disadvantages and as the number of applications grow this gets really complicated.
Two copies of applications need to be deployed one to App-Server and one as standalone(since business logic is shared between both). This is complicating our build process too.
There is no easy of use web-ui for us to see the status of jobs and manually run failed jobs remotely without getting onto the unix box.
There is no fail over or load balanced batch processing.
So I was thinking of using Quartz (with our existing Spring apps) in our applications and deploy them to App-Servers and no longer rely on the unix crontab.
Is there a way I can write a centralized web application from where I can schedule and monitor jobs running on different quartz schedulers on different app servers?
P.S: I know quartzdesk.com is one solution, but I don't want to enable RMI on my JVM.
You could use SpringBoot scheduler as an Orchestrator and call REST APIs for the remote (or local, if you are small) execution. This way, as your app grows you could easily leverage a load balancer.
If you have the possibility of using cloud services (like Amazon, Azure or Google Cloud), this could be done easily using their own load balancers. They also support docker and could take care of any peaks of utilization.

JMeter search result is different in the local machine and remote server machine

JMeter search result is different in the Local machine and remote server machine.
The JMeter batch file runs both environments separately.
Run one website in JMeter but the local machine and remote server load time are different.
Both machine internet speed is proper.
Several factors can influence that:
1) Network path from running JMeter to your server. You should always consider that.
If, say, you're testing microservice based on Amazon Cloud (AWS), and the downstream consumers of its data are also running in the same cloud - it doesn't make a lot of sense to run JMeter at your local machine, you have to run it at AWS as well (as your consumers do) to get realistic timings.
The travel over network path there and back would add hundreds of milliseconds, moreover, it's pretty unpredictable, deviations may be huge.
2) GUI vs Non-GUI mode, the rule of thumb: GUI is for development/debug only. It takes quite a toll on the performance.
3) Available resources on the machine - you didn't mention that at all, though mind that Java Runtime Environment is kind of far from being very lean, so if the machine is not dedicated for running especially JMeter, and especially if machine is not very powerful, the results may vary.
4) Addition to resource scope: by default, the scripts running JMeter are quite restrictive in resource allocation, and if you overwhelm the instance with a lot of threads to run, the timings may get distorted.
These are general factors, if you want it more specific to your case - show how & where (means, in what type of machine) you run your tests, and where's your target in terms of network path.

How to share large files between two microservices in Mesos?

I have a mesos cluster and I need to run two types of microservices, one is producing very large files (might be more than 2GB for file) the other one is analyzing those files. The analyzing microservice is taking more time than the producer service.
After the analysis service is done - the file can be deleted.
I thought of two options:
NFS - producer service creates all files on NFS and the analysis service is taking it directly from the shared folder. (I'm concerned that this approach will consume all internal bandwidth in my cluster)
Local Disk (my preferred) - in this case I need to somehow enforce the analysis micoroservice to run on the same Mesos slave as the producer service that created this specific file. (I'm not sure this approach is possible)
What would be best practice in this case?
I guess this can be implemented in different ways, depending on your requirements:
If you want to be able to handle a host (agent) failure, I think there is no other way than using a shared filesystem such as NFS. Otherwise, if you use Marathon to schedule your Microservices, the task will be restarted on another agent (where the data isn't locally available). Also, you would then need to make sure that the same mount points are available on each agent, and use these as host volumes in your containers. Unfortunately, the POD feature availability for co-locating tasks starts to be available in Mesos 1.1.0 and Marathon 1.4 (not yet finally released), as a side note...
If you don't care about host (agent) failures, then you possible could co-locate the two Microservices on the same agent if you use hostname constraints in Marathon, and mount the host volumes which then can be shared across the services. I guess you'd need some orchestration to only start the analysis service only after the producing service has finished.

Resources