Task distribution across microservices - microservices

We are building our first microservice architecture using Spring Boot and Kubernetes. I have a general question about scaling up one of our microservices which processes RSS feeds.
Currently we have about 100 feeds and run one instance of the microservice to process them. The feed sources are stored in a database and once the feeds are processed they are written to a central Kafka queue.
We want to increase the number of feeds and the number of instances of the microservice to process the feeds.
Are there any design patterns which I could follow to distribute the RSS feeds across the number of instances available? How would I dynamically allocate which microservice instance processes which set of feeds.
Any recommendations or best practice advice would be appreciated.

The first attempt is to use some messaging system.
You could send a message that some "rss feed must be processed" with essential information about this task (feed id, link whatever).
Then make all instances implement logic of consumption from the queue.
This way, the instances will compete for processing the job. The more messages you have in the more tasks to do you'll have (obviously). You can then scale out the number of microservices.

You can use hash function to distribute RSS feeds across your microservices. Lets say you have 5 instance of microservices, you can use below algorithm for assigning RSS to your microservices
hash_code = hashingAlgorithm(rss)
node_id = hash_code % num_of_nodes // 5 in this case
get_service(node_id).send(rss)
The process of assigning RSS to your microservices is also can be scaled easily, you can launch 3 independent process to read from your DB and assigning RSS to microservices without any coordination.

Related

Best way to track/trace a JSON Object (a time series data) as it flows through a system of microservices on a IOT platform

We are working on an IOT platform, which ingests many device parameter
values (time series) every second from may devices. Once ingested the
each JSON (batch of multiple parameter values captured at a particular
instance) What is the best way to track the JSON as it flows through
many microservices down stream in an event driven way?
We use spring boot technology predominantly and all the services are
containerised.
Eg: Option 1 - Is associating UUID to each object and then updating
the states idempotently in Redis as each microservice processes it
ideal? Problem is each microservice will be tied to Redis now and we
have seen performance of Redis going down as number api calls to Redis
increase as it is single threaded (We can scale this out though).
Option 2 - Zipkin?
Note: We use Kafka/RabbitMQ to process the messages in a distributed
way as you mentioned here. My question is about a strategy to track
each of this message and its status (to enable replay if needed to
attain only once delivery). Let's say a message1 is being by processed
by Service A, Service B, Service C. Now we are having issues to track
if the message failed getting processed at Service B or Service C as
we get a lot of messages
Better approach will be using Kafka instead of Redis.
Create a topic for every microservice & keep moving the packet from
one topic to another after processing.
topic(raw-data) - |MS One| - topic(processed-data-1) - |MS Two| - topic(processed-data-2) ... etc
Keep appending the results to same object and keep moving it down the line, untill every micro-service has processed it.

Message Based Microservices - Api Gateway Performance

I'm in the process of designing a micro-service architecture and I have a performance related question. This is what I am trying out with my design:
I have a several micro-services which perform distinct actions and store those results in their own data-store.
The micro-services receive work via a message queue where they receive requests to run their process for the specific data given. The micro-services do NOT communicate with each other.
I have an API gateway which effectively has three journeys:
1) Receive a request to process data which it then translates into several messages which it puts on the queue for the micro-services to process in their own time. The processing time can be in minutes or longer (not-instant)
2) Receives a request for the status of the process, where it returns the progress of the overall process.
3) Receives a request for combined data, which is some combination of all the results from the services.
My problem lies in #3 above and the performance of this process.
Whenever this request is received, the api gateway has to put a message request onto the queue for information from all the services, it than has to wait for all the services to reply with the latest state of their data and then it combines this data and returns to the caller.
This process is obviously rather slow as it has to wait for every service to respond. What is the way of speeding this up?
The only way I thought of solving this is having another aggregate service/data-store where duplicate data is stored and queried by my api gateway. I really don't like this approach as it duplicates data and is extra work/code.
What is the 'correct' and performant way of querying up-to-date data from my micro-services.
You can use these approach for Querying data across microservices. Reference
Selective data replication
With this approach, we replicate the data needed from other microservices into the database of our microservice. The only coupling between microservices is in the data replication configuration.
Composite service layer
With this approach, you introduce composite services that aggregate data from lower-level microservices.

Transform an IoT application from monolothic to microservices

I have a system composed of 3 sensors (Temperature, humidity, camera) attached to Arduino, 1 cloud, and 1 Mobile phone. I developed a monolithic IoT application that has different tasks needed to be executed in these three different locations (Arduino, cloud Mobile). all these sensors have common tasks which are: data detection, data transferring (executed on Arduino), data saving, data analysis and data notification (on the cloud), data visualization (on Mobile).
The problem here I know that a microservice is independent and it has its database. How to transform this application that I have to a one using microservice architecture? the first idea is representing each task as a microservice.
At the first, I considered each task as a component and I thought to represent each one as a microservice but they are linked. I mean that the output of the previous task is the input of the present one, So I can't make it like this because they aren't independent. Another thing for data collection microservice it should be placed on Arduino and the data should be sent to the cloud to be stored there in the database, so here we have a distant DB. For the data collection, I have the same idea as you since there are different things (sensors) so there will be diff microservices like (temperature data collection, camera data collection...).
First let me clear a confusion : when we say microservices are independent then how can we design a microservice in which output of the previous task is input for the next one.
First when we say microservice it means it is indepently deployable and manageable but as in any system there are dependencies microservices also depends upon each other. You can read about reactice microservice.
So you can have microservices which depend on one another, but we want these dependencies to be minimum.
Now lets understand benefits we want to adopt while doing microservice (this will help to answer your question):
Indepently deployable components (which scale up the deployment speed)- As in any big application there are components which are relatively independent of each other then if I want to change something in one component I should be confident another will not be impacted. In monolithic as all are inone binary impact would be high.
Independently Scalable - as diff. components require diff. scale we can have diff. types of databases and machine requirement.
and there are various and also some overhead which a microservice architecture bring (cant go in detail here , read on these things online)
NOW WE WILL DISCUSS the approach
As data collection is independent on how and what kind of analysis happen on that. I would have a DataCollectionService on cloud (collects data from all sensors, we create diff. for diff. sensors if those data are completely independent).
DataAnalysis as separate service (dosent need to know a thing about how data is collected like is it using mqtt , webscoket , periodic or in batches or whatever). This service needs data and will act upon it.
Notification Service
DataSendClient on Arduino : some client which sends data to data collection service.

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

How to avoid the same queue job being processed more than once when scaled across multiple dynos on Heroku

We have a Node.js application running loopback, the main purpose of which is to process orders received from the client. Currently the entire order process is handled during the single http request to make the order, including the payment, insertion into the database and sending confirmation emails etc.
We are finding that this method, whilst working at the moment, lacks scalability - the application is going to need to process, potentially, thousands of orders per minute as it grows. In addition, our order process currently writes data to our own database, however we are now looking at third party integrations (till systems) over which we have no control of the speed or availability.
In addition, we also currently have a potential race condition; we have to assign a 'short code' to each order for easy reference by the client - these need to rotate, so if the starting number is 1 and the maximum is 100, the 101st order must be assigned the number 1. At the moment we are looking at the previous order and either incrementing the previous reference by 1 or setting it back to the start - obviously this is fine at the moment due to the low traffic - however as we scale this could result in multiple orders being assigned the same reference number.
Therefore, we want to implement a queue to manage all of this. Our app is currently deployed on Heroku, where we already use a worker process for some of the monthly number crunching our app requires. Whilst having read some of the Heroku articles on implementing a queue (https://devcenter.heroku.com/articles/asynchronous-web-worker-model-using-rabbitmq-in-node, https://devcenter.heroku.com/articles/background-jobs-queueing) it is not clear how, over multiple worker dynos, we would ensure the order in which these queued items are processed and that the same job is not processed more than once by multiple dynos. The order of processing is not so important, however the lack of repetition is extremely important as if two orders are processed concurrently we run the risk of the above race condition.
So essentially my question is this; how do we avoid the same queue job being processed more than once when scaled across multiple dynos on Heroku?
What you need is already provided by RabbitMQ, the message broker used by the CloudAMQP add-on of Heroku.
You don't need to worry about the race condition of multiple workers. A job placed onto the queue is stored until a consumer retrieves it. When a worker consumes a job from the queue, no other workers will be able to consume it.
RabbitMQ manages all such aspects of message queing paradigm.
A couple of links useful for your project:
What is RabbitMQ?
Getting started with RabbitMQ and Node.js

Resources