New Relic: How to set tags automatically, especially when Autoscaling - amazon-ec2

When using EC2 autoscaling with New Relic, there will inevitably be a number of servers being started and terminated over time. When using multiple scaling groups, it'd be very useful to have them distinguishable in NR by tags, for example one group could be tagged as "production,workers" and another as "staging,workers" and yet another with the tags "production,api". This can be achieved by manually tagging them in the NR web interface, but that's not practical.
Is there a way to accomplish this automatically, either through nrsysmond or a configuration API?

You can use New Relic's Rest API:
https://docs.newrelic.com/docs/features/getting-started-with-the-new-relic-rest-api

Related

With CQRS Pattern how are you still limiting to one service per database

According to my understanding
We should only have one service connecting to a database
With CQRS you will be keeping two databases in sync, hypothetically using some “service” glueing them together
Doesn’t that now mean there’s a service which only purpose is to keep the two in sync, and another service to access the data.
Questions
Doesn’t that go against rule number above? Or does this pattern only apply when native replication is being used?
Also, other than being able to independently scale the replicated database for more frequent reads, does the process of keeping both in sync kind of take away from that? Either way we’re writing the same data to both in the end.
Ty!
We should only have one service connecting to a database
I would rephrase this to: each service should be accessible via that service's api. And all internals, like database, should be completely hidden. Hence, there should be no (logical) database sharing between services.
With CQRS you will be keeping two databases in sync, hypothetically using some “service” glueing them together
CQRS is a pattern for splitting how a service talks to a data layer. Typical example would be something like separating reads and writes; as those are fundamentally different. E.g. you do rights as commands via a queue and reads as exports via some stream.
CQRS is just an access pattern, using it (or not using it) does nothing for synchronization. If you do need a service to keep two other ones in sync, then you still should use services' api's instead of going into the data layer directly. And CQRS could be under those api's to optimize data processing.
The text from above might address your first question. As for the second one: keeping database incapsulated to a service does allow that database (and service) to be scaled as needed. So if you are using replication for reads, that would be a reasonable solutions (assuming you address async vs sync replication).
As for "writing data on both ends", I am actually confused what does that mean...

Is there a feature for setting Min/Max/Fixed function/action replica in Openwhisk?

I have an Openwhisk setup on Kubernetes using [1]. For some study purpose, I want to have a fixed number of replicas/pods for each action that I deploy, essentially disabling the auto-scaling feature.
Similar facility exists for OpenFaas [2], where during deployment of a function, we can configure the system to have N function replicas at all times. These N function replicas (or pods) for the given function will always be present.
I assume this can be configured somewhere while deploying an action, but being a beginner in OpenWhisk, I could not find a way to do this. Is there a specific configuration that I need to change?
What can I do to achieve this in Openwhisk? Thanks :)
https://github.com/apache/openwhisk-deploy-kube
https://docs.openfaas.com/architecture/autoscaling/#minmax-replicas
OpenWhisk serverless functions follow closer to AWS lambda. You don’t set the number of replicas. OpenWhisk uses various heuristics and can specialize a container in milliseconds and so elasticity on demand is more practical than kube based solutions. There is no mechanism in the system today to set minimums or maximums. A function gets to scale proportional to the resources available in the system and when that capacity is maxed out, requests will queue.
Note that while AWS allows one to set the max concurrency, this isn’t the same as what you’re asking for, which is a fixed number of pre-provisioned resources.
Update to answer your two questions specifically:
Is there a specific configuration that I need to change?
There isn’t. This feature isn’t available at user level or deployment time.
What can I do to achieve this in Openwhisk?
You can modify the implementation in several ways to achieve what you’re after. For example, one model is to extend the stem-cell pool for specific users or functions. If you were interested in doing something like this, the project Apache dev list is a great place to discuss this idea.

Task distribution across microservices

We are building our first microservice architecture using Spring Boot and Kubernetes. I have a general question about scaling up one of our microservices which processes RSS feeds.
Currently we have about 100 feeds and run one instance of the microservice to process them. The feed sources are stored in a database and once the feeds are processed they are written to a central Kafka queue.
We want to increase the number of feeds and the number of instances of the microservice to process the feeds.
Are there any design patterns which I could follow to distribute the RSS feeds across the number of instances available? How would I dynamically allocate which microservice instance processes which set of feeds.
Any recommendations or best practice advice would be appreciated.
The first attempt is to use some messaging system.
You could send a message that some "rss feed must be processed" with essential information about this task (feed id, link whatever).
Then make all instances implement logic of consumption from the queue.
This way, the instances will compete for processing the job. The more messages you have in the more tasks to do you'll have (obviously). You can then scale out the number of microservices.
You can use hash function to distribute RSS feeds across your microservices. Lets say you have 5 instance of microservices, you can use below algorithm for assigning RSS to your microservices
hash_code = hashingAlgorithm(rss)
node_id = hash_code % num_of_nodes // 5 in this case
get_service(node_id).send(rss)
The process of assigning RSS to your microservices is also can be scaled easily, you can launch 3 independent process to read from your DB and assigning RSS to microservices without any coordination.

Multiple flows with nifi

We have multiple (50+) nifi flows that all do basically the same thing: pull some data out of a db, append some columns conver to parquet and upload to hdfs. They differ only in details such as the sql query to run or the location in hdfs that they land.
The question is how to factor these common nifi flows out such that any change made to the common flow automatically applies to all all derived flows. E.g if i want to add an extra step to also publish the data to Kafka I want to make this once and have it automatically apply to all 50 flows.
We’ve tried to get this working with nifi registry, however it seems like an imperfect fit. Essentially the issue is that nifi registry seems to work well for updating a flow in one environment (say wat) and then autmatically updating it in another environment (say prod). It seems less suited for updating multiple flows in the same environment with one specific example bing that it will reset the name of each flow to be the template name every time we redeploy meaning that al flows end up with the same name!
Does anyone know how one is supposed to manage a situation like ours asi guess it must be pretty common.
Apache NiFi has ProcessorGroups. As the name itself suggests, the processor groups are there to group together a set of processors' and their pipeline that does similar task.
So for your case what you can do is, you can refactor the flow by moving the common flow which can be reused with different pipelines to a separate processor group with an input port. Connect the outside flow that depends on this reusable flow by connecting to the input port of the reusable processor group. Depending on your requirement you can create an output port as well in this processor group and connect it with the outside flow.
Attaching a sample:
For the sake of explaining, I have made a mock flow so ignore the Processor types that are used, but rather see the name I had given to those processors.
The following screenshots show that I read from two different sources and individually connect them to two different processors that does the source specific changes to those processors
Then I connect these two flows to the input port of a processor group that has the reusable flow inside. So ultimately the two different flows shown in the above screenshot gets to work with a common reusable flow.
Showing what's inside the reusable flow:
Finally the output port output to outside connects the reusable flow to the outside component Write to somewehere
I hope this helps you with refactoring your complex flows. Feel free to get back, if you have any queries.

Comparison with mainstream workflow engines

I'd like to use Spring SM in my next future that has very simple workflows, 3-4 states, rule based transitions, and max actors.
The WF is pretty fixed, so storing its definition in java config is quite ok.
I'd prefer to use SM than WF engine which comes with the whole machinery, but I couldnt find out if there is a notion of Actor.
Meaning, only one particular user (determined by login string) can trigger a transition between states.
Also, can I run the same State machine definition in parallel. Is there a notion of instance, like process instance in WF jargon?
Thanks,
Milan
Actor with a security is an interesting concept but we don't have anything build in right now. I'd say that this can be accomplished via Spring Security i.e. https://spring.io/blog/2013/07/04/spring-security-java-config-preview-method-security/ and there's more in its reference doc.
I could try to think if there's something what we could do to make this easier with Spring Security.
Parallel machines are on my todo list. It is a big topic so takes while to implement. Follow https://github.com/spring-projects/spring-statemachine/issues/35 and other related tickets. That issue is a foundation of making distributed state machines.

Resources