Kafka ACL: performance and memory impact - performance

I have Kafka service running in production, I am using kafka ACL default authorizer, ACLs created are saved in Zookeeper.
Currently I have around 500 users using kafka service, And for each user I am adding around 2 ACL rules using kafka.acl.sh, it works fine.
But I am worried as the users are going to be increased, lets say when user count becomes 1 Million, then I may be having 2 Million ACL rules added to Zookeeper(2 rules for 1 user).
Will this cause any problem? in terms of performance/memory anyway to kafka/zookeeper?
Thanks!

Related

10k per second request postgres

I have a spring worker application that getting messages from Rabbitmq with concurrency 50, and this message, spring application checking variables in DB and with the message, and inserting new results of difference, to DB(for one message to get the difference, we are sending about 5-20 'select' requests, 1 - 5 insert, 1-5 update to DB).
Now problem is that the spring worker application, it uploading very slow to DB when setting concurrency to 200. (200k messages inserted about two days).
And beside this, I have another spring application for monitoring. And everything working very slow, Db, worker app, monitoring app.
How I can do it fastly and optimize. Should I use the Postgres cluster? Or I can implement it in another way.
My Postgres server (intel Xeon 10 cores, 60 GB ram, 1,6TB SSD)
You can use shared cache, load your data on application startup, update your cache in case of update/create operation. Get from cache whatever you want, no need to go in DB

Microservices: Simultaneous cache updates

I am developing a microservice. This MS will be deployed to docker containers and will be monitored by Kubernetes. I have to implement a caching solution using hazelcast distributed cache. My requirements are:
Preload the cache on startup of this microservice. For around 3000 stores I have to fetch two specific attributes and cache them.
Every 24 hours refresh the cache.
I implemented Spring #EventListener and on startup to make a database call for the 2 attributes and do a #CachePut and store them in Cache.
I also have a Spring scheduler with cron expression to refresh cache at every 6 AM in morning.
So far so good.
But what I did not realize that in clustered environment - 10-15 instances of my microservice will be in action and will try to do above 2 steps almost simultaneously - thus creating a stampede effect on my database and cache. Does anyone know what to do in this scenario? Is there any good design or even average one which I can follow?
Thanks.
You should be looking to use Hazelcast provided Loading and Storing Persistent Data mechanism that allows 2 options for writing: Write-through and write-behind and read-through for loading data into the cache.
Look for MapLoader and its methods, that will let you warm-up/preload your cluster and you have the freedom to do that with your own implementation.
Check for more details: https://docs.hazelcast.org/docs/3.11/manual/html-single/index.html#loading-and-storing-persistent-data

Process Laravel/Redis job from multiple server

We are building a reporting app on Laravel that need to fetch users data from a third-party server that allow 1 request per seconds.
We need to fetch 100K to 1000K rows based on user and we can fetch max 250 rows per request.
So the restriction is:
1. We can send 1 request per seconds
2. 250 rows per request
So, it requires 400-4000 request/jobs to fetch a user data, So, loading data for multiple users is very time-consuming and the server gets slow.
So, now, we are planning to load the data using multiple servers, like 4-10 servers to fetch users data, so we can send 10 requests per second from 10 servers.
How can we design the system and process jobs from multiple servers?
Is it possible to use a dedicated server for hosting Redis and connect to that Redis server from multiple servers and execute jobs? Can any conflict/race-condition happen?
Any hint or prior experience related to this would be really helpful.
The short answer is yes, this is absolutely possible and is something I've implemented in production apps many times before.
Redis is just like any other service and can run anywhere, with clients from anywhere, connecting to it. It's all up to your configuration of the server to dictate how exactly that happens (and adding passwords, configuring spiped, limiting access via the firewall, etc.). I'd reccommend reading up on the documentation they have in the Administration section here: https://redis.io/documentation
Also, when you do make the move to a dedicated Redis host, with multiple clients accessing it, you'll likely want to look into having more than just one Redis server running for reliability, high availability, etc. Redis has efficient and easy replication available with a few simple configuration commands, which you can read more about here: https://redis.io/topics/replication
Last thing on Redis, if you do end up implementing a master-slave set up, you may want to look into high availability and auto-failover if your Master instance were to go down. Redis has a really great utility built into the application that can monitor your Master and Slaves, detect when the Master is down, and automatically re-configure your servers to promote one of the slaves to the new master. The utility is called Redis Sentinel, and you can read about that here: https://redis.io/topics/sentinel
For your question about race conditions, it depends on how exactly you write your jobs that are pushed onto the queue. For your use case though, it doesn't sound like this would be too much of an issue, but it really depends on the constraints of the third-party system. Either way, if you are subject to a race condition, you can still implement a solution for it, but would likely need to use something like a Redis Lock (https://redis.io/topics/distlock). Taylor recently added a new feature to the upcoming Laravel version 5.6 that I believe implements a version of the Redis Lock in the scheduler (https://medium.com/#taylorotwell/laravel-5-6-preview-single-server-scheduling-54df8e0e139b). You can look into how that was implemented, and adapt for your use case if you end up needing it.

WebSphere PMI - How to find total number of concurrent users?

We have a clustered WebSphere environment and have 4 nodes in the cluster. I am trying to find number of concurrent users that use the application. I have turned ON Performance Monitoring Infrastructure (PMI) in WAS for Servlet, JVM and Thread Pool. When I monitor using Tivoli Performance Viewer, I believe I need to look at "LiveCount" under the "Servlet Session Manager". But the count seem to be VERY high, more than what I was expecting (LiveCount shows as 80-100).
Is this the Metric to look at when trying to find the total number of concurrent users?
Does it keep true login count or is it keeping track of number of sessions?
I was told the underlying application creates only 1 session per login until it gets timeout. At which point, the user will have to login again. So to me concurrent user count = session count = this Servlet Session Live Count.
Can anyone help me here and let me know which metric I should look in PMI to get the concurrent user count? We are are WebSphere 7.x.
I believe I need to count this metric value for all 4 nodes to come up with the total concurrent users.
You may consider logged on users vs concurrent users, depending what you are really looking for.
The logged on users would be successfully authenticated and having valid session in the application, but... they may currently do nothing in your application, or abandoned your application but not logged out.
They will be represented by LiveCount - The total number of sessions that are currently live.
The concurrent users would be users simultaneously access your application. They will be represented by ActiveCount - The total number of sessions that are currently accessed by requests.
And yes, you have to sum this from all your app servers.
See also:
Servlet session counters

Alternative to session replication \ tomcat clustering

We have 3 tomcats with the same web app, using the same DB.
We want to use non-stickey session.
this means we will have to share the session (replicate) between the tomcats (cluster?)
We dont like the idea of the delta-manger since it is an all-to-all replication with preformance cost.
However we dont really like the backup-manager as well (still multiple copies)
My question is:
Is it possible to define a single tomcat that will be a "session manager" and all other tomcats will not keep sessions by themselves?
this way no broadcasting of sessions is needed...
My reading of the Tomcat docs finds:
... when using the delta manager it will replicate to all nodes, even
nodes that don't have the application deployed.
exactly as you say, but then says:
To get around this problem, you'll want to use the BackupManager. This
manager only replicates the session data to one backup node
You seem to object to "multiple copies", but this doesn't seem very different from your proposed suggestion, the BackupManager is, so far as I can see, acting as a Session Manager.
When you don't have sticky sessions you are pretty much guaranteeing that 2 of every 3 requests will need to get a copy of the session data from somewhere else, with only 3 tomcats how much performance cost would all-to-all replication impose?
I suspect that tuning your session sizes is more important. Large sessions tend to be a problem for any sort of replication.

Resources