how to setup openTelemetry tracing with Grafana Tempo - installation

I have the following scenario: 2 Services (client) running in one location that also make calls to one central backend not in the same location. The latency to this backend is not very good since the physical distance is very large.
I have now instumented everything using Open Telemerty. Focus relies on traces.
But now I am struggeling to decide how to correctly setup the infrastructure i.e. Storage Backend, OTel Collector. Ideally I would find a solution that collects locally and then on a pull basis gets the data to a central location.
my idea
I am new to this topic and so I need some input please if this makes sense at all.
I also dont want too much overhead for the tracing. Only what is necessary in order for it to work properly.

Related

Should event driven architecture be targeted for all data & analytics platforms?

For example,
You have an IT estate where a mix of batch and real-time data sources exists from multiple systems, e.g. ERP, Project management, asset, website, monitoring etc.
The aim is to integrate the datasources into a cloud environment (agnostic).
There is a need for reporting and analytics on combinations of all data sources.
Inevitably, some source systems are not capable of streaming, hence batch loading is required.
Potential use-cases for performing functionality/changes/updates based on the ingested data.
Given a steer for creating a future-proofed platform, architecturally, how would you look to design it?
It's a very open-end question, but there are some good principles you can adopt to help direct you in the right direction:
Avoid point-to-point integration, and get everything going through a few common points - ideally one. Using an API Gateway can be a good place to start, the big players (Azure, AWS, GCP) all have their own options, plus there's lots of decent independent ones like Tyk or Kong.
Batches and event-streams are totally different, but even then you can still potentially route them all through the gateway so that you get the centralised observability (reporting, analytics, alerting, etc).
Use standards-based API specifications where possible. A good REST based API, based off a proper resource model is a non-trivial undertaking, not sure if it fits with what you are doing if you are dealing with lots of disparate legacy integration. If you are going to adopt REST, use OpenAPI to specify the API's. Using this standard not only makes it easier for consumers, but also helps you with better tooling as many design, build and test tools support OpenAPI. There's also AsyncAPI for event/async API's
Do some architecture. Moving sh*t to cloud doesn't remove the sh*t - it just moves it to the cloud. Don't recreate old problems in a new place.
Work out the logical components in your new solution: what does each of them do (what's it's reason to exist)? Don't forget ancillary components like API catalogues, etc.
Think about layering the integration (usually depending on how they will be consumed and what role they need to play, e.g. system interface, orchestration, experience APIs, etc).
Want to handle data in a consistent way regardless of source (your 'agnostic' comment)? You'll need to think through how data is ingested and processed. This might lead you into more data / ETL centric considerations rather than integration ones.
Co-design. Is the integration mainly data coming in or going out? Is the integration with 3rd parties or strictly internal?
If you are designing for external / 3rd party consumers then a co-design process is advised, since you're essentially designing the API for them.
If the API's are for internal use, consider designing them for external use so that when/if you decide to do that later it's not so hard.
Taker a step back:
Continually ask yourselves "what problem are we trying to solve?". Usually, a technology initiate is successful if there's a well understood reason for doing it, which has solid buy-in from the business (non-IT).
Who wants the reporting, and why - what problem are they trying to solve?
As you mentioned its an IT estate aka enterprise level solution mix of batch and real time so first you have to identify what is end goal of this migration. You can think of refactoring applications. If you are trying to make it event driven then assess the refactoring efforts and cost. Separation of responsibility is the key factor for refactoring and migration.
If you are thinking about future proofing your solution then consider Cloud for storing and processing your data. Not necessary it will be cheap but mix of Cloud and on-prem could be a way. There are services available by cloud providers to move your data in minimal cost. Cloud native solutions are there for performing analysis on your data. Database migration service in AWS or Azure can move data and then capture on-going changes. So you can keep using on-prem db & apps and perform analysis for reporting on cloud. It will ease out load on your transactional DB. Most data sync from on-prem to cloud is near real time.

NiFi: Modify flow without disruption or downtime using Java API

Is there a way to modify NiFi flow dynamically using Java API? The use case is to add a processor to an active data flow (data is flowing through it). The new processor should be added at the beginning of the flow without application disruption or downtime.
In case Java API is not available, please feel free to suggest alternatives. I have already looked at change-nifi-flow-using-rest-api-part-1. Thanks.
Any action you can perform from the UI can also be performed from REST API, the UI is just making calls to the REST API behind the scenes.
I would suggest opening Chrome's Dev Tools and performing the action you are interested in and then seeing what requests were made to perform the action. You can then script these operations however you need.
In addition, if you are trying to deploy flows then you should be taking advantage of NiFi Registry which allows you to place a flow under version control. You can then make changes from your local instance or dev instance, and upgrade the flow in production in-place without stopping your whole NiFi instance.

How to handle common variables in microservices architecture?

Let's consider a situation, where multiple services relay on data that can change any time and should be updated in each microservice roughly at the same time - for example there is a list of supported languages or some common policies that could change one day and affect many services at once.
One solution that I could think of is to have another microservice that could hold that data and any service that needs current state can just ask for it. The drawback is that this data is not changing very frequently, asking by HTTP is not that cheap and there is a lot of traffic to this let's say global registry service. As it is not changing very often, many services could just cache the data - in order to not ask for it every time - and not be able to respond to change quick enough when the change is made to the configuration.
The other solution could be to externalize such configuration - in AWS for example there could be some configuration file on S3 that would be available for others. The drawback here is that there is no way (as far as I know) to track changes in such file and there is no way to add some logic for verification if changed value in configuration is correct (there is no typos and so on), etc.
So my question is how to handle global configuration/registry in microservice world so that there is little HTTP overhead, you can audit changes as well as introduce change at the same time in many services?
I will prefer the option 1. Apart from the HTTP overhead, this will also lead your system in an inconsistent state. Service 1 might be working on new values but service 2 will be on old.
Since this is a distributed system that we are talking about, I am willing to take a risk with availability.
Have a configuration service that allows you to plan your config changes. Instead of saying change the value of A from x to y, you say change from x to y at time t. This t allows you to consistently propagate changes to all your system.You need to put in effort to understand what the min value of t should be for you set of services, how will you make all services acknowledge the changes and make them at the right time and how will you manage the new services that come up in between.
Another approach is use Spring Cloud Config (or something similar). It ask the service to register with the centralised config service and make refresh call to all the services to update config. Limitation being not all configs could be refreshed and if you are behind the LB you still need to handle ways to make sure all instances gets updated.
Use Config Server( spring cloud config server) that will maintain centralized configurations, you need to make changes to config server related to configurations, each microservices will come on startup for configurations to config server, even after start up after certain interval of time microservices can come to config server for validating any change in configurations and update accordingly.
There are couple of ways to do it, a better way especially in prod is to use external Configuration Store Pattern.
You can save the configuration in external stores like Azure Key Vault or Azure App configuration
Find more details about Azure key vault here:
Azure key vault
5-Minute quickstarts of Azure key vault integration
If you absolutely must have a shared config, best decoupled architecture I've encountered is as follows:
You have a standalone Config Service, completely private to the outside world and can only be accessed through an internal network for your microservices
ON STARTUP: Microservices do a pull request from the Config Service of what is needed per service and is stored in memory. if it is unable to pull from Config Service do not allow it to start. Have Retry Mechanism on this front.
ON CHANGE of the Config Service: Publish an event to your messaging layer that will force services to update their respective configurations.
Caveats:
do not put time sensitive configurations here, since we are using asynchronous communications here (if you have time critical configs why are they shared in the first place, you might need to revisit)
you need to handle your own plumbing, retry mechanism, memory management etc etc.

Performance monitoring all layers of a system

I use several loadtesting tools (Loadrunner, JMeter, NeoLoad) to performance test different applications. Im wondering if it is possible to monitor all layers of an application stack so for example. Say i have the following data chain.
Loadbalancer <-x-> Application Server <-x-> RMI <-x-> Java Application <-x-> MQ <-x-> Legacy application <-x-> Database
Where i have marked the x in the chain i am interested in monitoring, for example avg responsetimes.
Obviously we could simply create a wrapper on all endpoints which would gather the statistics for us and maybe we could import it into loadrunner or other loadtesting tools and sideline hem with the tools inbuilt performance statistics, but maybe there is tools/applications which already does this?
If not, how should we proceed, in order to gather this kind of statistics?
The standard for this was supposed to be Application Response Measurement (ARM). It was a cross language set of APIs that did just what you were looking for. The issue is that the products that implement this spec all tend to be big, expensive "enterprise" level monitoring tools. Think multi-week installs, consultants, more infrastructure and lots of buzzwords.
Still, if this is a mission critical app with a mission critical budget, this may be what you need. But you may be able to build your own that does just enough without too much effort. A quick search turns up at least one open source ARM implementation if you still want to use that API.
Another option is to simply to have transactions you can run against each tier of the system to check general responsiveness. For example you can have a static web page on the LB, a no-op tx on the app server, a "hello" servlet on the Java app, put a message directly on the queue, etc. During a performance / load test, these could be hit directly by the load testing tool or you could write a wrapper servlet / application call that does this as a single HTTP (RMI?) call. Running these a few times a minute won't add too much load to the system, but it should help you pinpoint which tier is slower. The nice thing about this approach is that it also works in production, just watch out for security issues.
For single user kind of test, where you know you have problem (e.g. this tx is "slow"), I have also had pretty good luck with network tracing. It's very tedious, but when you aren't sure what tier is slow, starting up a network trace on a few machines and running a single tx usually gives a good idea of what the system is doing.
I have handled this decomposition a number of ways in the past. The first is at a very low level using protocol analyzer dumped data to find the time points where a conversation leaves tier X and enters tier Y. The second method is through the use of log examination for the various tiers. Something that can make your examination quite usefule in this case is a common log server for all of your components (syslog, Rsyslog, etc....) and a nice log parsing tool, such as the freely available Microsoft Logparser. The third method utilization of the audit trail for an application stored in the database. You may find this when working on enterprise services bus style applications which have a consumer/producer model and a bus to pass information rather than a direct connection. The audit trails I have seen are typically stored in a database and allow the tracking of an individual transaction through the entire application infrastructure. Your Load balancer, as a network device, may be out of the hunt on this one.
Note, if you go the protocol analyzer or log route, then be sure and synchronize all of your source information devices to a common time server. Having one of your collectors (analyzer, app log) off on a time stamp basis can really be a hair pulling experience when you get into the analysis phase.
As to how you move from your collected data into LoadRunner, that part is very mechanical. The Analysis program supports an interface to import external datapoints. The format is very specific and is documented in both help and the online docs. This import process works very well, as I often have to use it for collection of statistics from hosts which I do not have direct monitoring access to, but which need to be included as a part of the monitored test infrastructure.
James Pulley
Moderator (YahooGroups LoadRunner, Advanced-Loadrunner; GoogleGroups lr-LoadRunner; Linkedin LoadRunner, LoadRunnerByTheHour; SQAForums LoadRunner, WinRunner)

Measuring Application Performance

I was wondering if there is a tool to keep track of application performance. What I have in mind is a tool that will listen for updates and register performance metrics published by an application. i.e. time to serve a request, time a certain operation took to finish. And this tool would then aggregate the data and measure performance trends.
If you want to measure your application from outside, then you can use RRDtool to collect the data.
You can use slamd for webapp written in Java.
For Django use hotshot.
Search for profiler + your language, framework
Take a look at HP SiteScope. It's ability to drive the system with a Web User Script, to monitor the metrics on the backend, even to the extent of creation of custom shell scripts and database queries, plus the ability to add logic for report/alert against these combined data sets appears to be what you need.
Other mechanisms that you might consider would be a roll your own service using CURL to push information in, queries to the systems involved to pull metrics or database information and then your own interface for alerting and reporting.
Then it becomes a cost question, can you roll the level of functionality for less money than you can purchase an already existing solution on the open market.
Ref:
HP SiteScope Wiki Page

Resources