How to measure resource usage on partitionlevel in Service Fabric? - performance

With Service Fabric we get the tools to create custom metrics and capacities. This way we can all make our own resource models that the resource balancer uses to execute on runtime. I would like to monitor and use physical resources such as: memory, cpu and disk usage. This works fine as long as we keep using the default load.
But Load is not static for a service/actor, so I would like to use the built-in Dynamic load reporting. This is where I run into a problem, ReportLoad works on the level of partitions. However partitions are all within the same process on a Node. All methods for monitoring physical resources I found are using process as the smallest unit of measurement, such as PerformanceCounter. If this value would be used there could be hunderds of partitions reporting the same load and a load which is not representative of the partition.
So the question is: How can the resource usage be measured on partition level?

Not only are service instances and replicas hosted in the same process, but they also share a thread pool by default in .NET! Every time you create a new service instance, the platform actually just creates an instance of your service class (the one that derives from StatefulService or StatelessService) inside the host process. This is great because it's fast, cheap, and you can pack a ton of services into a single host process and on to each VM or machine in your cluster.
But it also means resources are shared, so how do you know how much each replica of each partition is using?
The answer is that you report load on virtual resources rather than physical resources. The idea is that you, the service author, can keep track of some measurement about your service, and you formulate metrics from that information. Here is a simple example of a virtual resource that's based on physical resources:
Suppose you have a web service. You run a load test on your web service and you determine the maximum requests per second it can handle on various hardware profiles (using Azure VM sizes and completely made-up numbers as an example):
A2: 500 RPS
D2: 1000 RPS
D4: 1500 RPS
Now when you create your cluster, you set your capacities accordingly based on the hardware profiles you're using. So if you have a cluster of D2s, each node would define a capacity of 1000 RPS.
Then each instance (or replica if stateful) of your web service reports an average RPS value. This is a virtual resource that you can easily calculate per instance/replica. It corresponds to some hardware profile, even though you're not reporting CPU, network, memory, etc. directly. You can apply this to anything that you can measure about your services, e.g., queue length, concurrent user count, etc.
If you don't want to define a capacity as specific as requests per second, you can take a more general approach by defining physical-ish capacities for common resources, like memory or disk usage. But what you're really doing here is defining usable memory and disk for your services rather than total available. In your services you can keep track of how much of each capacity each instance/replica uses. But it's not a total value, it's just the stuff you know about. So for example if you're keeping track of data stored in memory, it wouldn't necessarily include runtime overhead, temporary heap allocations, etc.
I have an example of this approach in a Reliable Collection wrapper I wrote that reports load metrics strictly on the amount of data you store by counting bytes: https://github.com/vturecek/metric-reliable-collections. It doesn't report total memory usage, so you have to come up with a reasonable estimate of how much overhead you need and define your capacities accordingly, but at the same time by not reporting temporary heap allocations and other transient memory usage, the metrics that are reported should be much smoother and more representative of the actual data you're storing (you don't necessarily want to re-balance the cluster simply because the .NET GC hasn't run yet, for example).

Related

EC2 host type for a DynamoDB batchWrite call

I have a requirement to bulk upload an excel sheet to a DynamoDB table and the maximum number of rows are 200,000. The website for bulk upload will be used less frequently, so we can assume there are only 1 - 2 bulk uploads being processed at a given time. In the backend, I am using Apache POI API to parse the excel sheet into DynamoDB Items.
Because we can only send up to 25 items in a batchWriteItem call, the currently latency is around 15 minutes (900 seconds) to completely upload all the 200,000 items. Hence I am planning to implement multi threading to execute multiple batchWriteItem API calls in parallel. Can you help me understand which EC2 host types are best suited for multi-threading for this purpose.
Any references will be really helpful.
Normally, multi-threading would be helped by using an Instance Type that has multiple CPUs.
However, you are describing behaviour that is waiting on network rather than CPU. Therefore, it is likely that the operation you describe is not being heavily impacted by CPU Utilization.
The best way to answer your question is to recommend that you experiment with different instance types to find the one that is best for your application's combination of needs:
Pick an instance family (eg m5) and try a few different sizes
Compare this against another family (eg c5) to see whether the improved performance is worth the extra cost
Monitor the application to find the bottleneck, which would either be RAM, CPU, Network or Disk access
Please note that smaller instances have less Network bandwidth, so you might need to choose a larger instance type to avoid being throttled on network bandwidth. This might result in excess CPU that isn't being fully utilized.

Application Architecture for scalable hyperledger v1.4 with IOT data

I am working on the Hyperledger Application that can store sensor data from IoT.
Using HLF v1.4 with Raft. Each IoT device will provide JSON data at fixed intervals which gets stored in Hyperledger. I have worked with HLF v1.3 which doesn't scale very well.
With v1.4, I am planning to start with 2 organization setup with 5 peers for each organization.
But the limiting factor seems to be, as the number of blocks increase by adding new transactions and querying the network takes a longer time.
What are the steps that can be taken to scale the HLF with v1.4 or onwards.
What type of Server specs should be used for good performance, like RAM, CPUs when selecting a server e.g EC2
You can change your block size. If you increase the size of your block, then the number of blocks will get reduced. For better query and Invoke functionality you can limit the data storing into Blockchain. Yes, Computation speed also matters in Blockchain, if you have good speed, tps may vary. Try with instance types like t3 medium or more than that like t3 large.

Azure Redis cache latency

I am working on an application having web job and azure function app. Web job generates the redis cache for function app to consume. Cache size is around 10 Mega Bytes. I am using lazy loading and all as per the recommendation. I still find that the overall cache operation is slow. Depending upon the size of the file i am processing, i may end up calling Redis cache upto 100,000 times . Wondering if I need to hold the cache data in a local variabke instead of reading it every time from redis. Has anyone experienced any latency in accessing Redis? Does it makes sense to create a singletone object in c# function app and refresh it based on some timer or other logic?
could you consider this points in your usage this is some good practices of azure redis cashe
Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
Remember that Redis is an In-Memory data store. so that you are aware of scenarios where data loss can occur.
Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
Redis works best with smaller values, so consider chopping up bigger data into multiple keys.
Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided.
Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.

Balancing Redis queries and in-process memory?

I am a software developer but wannabe architect new to the server scalability world.
In the context of multiple services working with the same data set, aiming to scale for redundancies and load balancing.
The question is: In a idealistic system, should services try to optimize their internal processing to reduce the amount of queries done to the remote server cache for better performance and less bandwidth at the cost of some local memory and code base or is it better to just go all-in and query the remote cache as the single transaction point every time any transaction need processing done on the data?
When I read about Redis and even general database usage online, the later seems to be the common option. Every nodes of the scaled application have no memory and read and write directly to the remote cache on every transactions.
But as a developer, I ask if this isn't a tremendous waste of resources? Whether you are designing at electronic chips level, at inter-thread, inter-process or inter-machine, I do believe it's the responsibility of each sub-system to do whatever it can to optimize its processing without depending on the external world if it can and hence reduce overall operation time.
I mean, if the same data is read over hundreds or time from the same service without changes (write), isn't it just more logical to keep a local cache and wait for notifications of changes (pub/sub) and only read only these changes to update the cache instead reading the bigger portion of data every time a transaction require it? On the other hand, I understand that this method implies that the same data will be duplicated at multiple place (more ram usage) and require some sort of expiration system not to keep the cache from filling up.
I know Redis is built to be fast. But however fast it is, in my opinion there's still a massive difference between reading directly from local memory versus querying an external service, transfer data over network, allocating memory, deserialize into proper objects and garbage collect it when you are finished with it. Anyone have benchmark numbers between in-process dictionaries query versus a Redis query on the localhost? Is it a negligible time in the bigger scheme of things or is it an important factor?
Now, I believe the real answer to my question until now is "it depends on your usage scenario", so let's elaborate:
Some of our services trigger actions on conditions of data change, others periodically crunch data, others periodically read new data from external network source and finally others are responsible to present data to users and let them trigger some actions and bring in new data. So it's a bit more complex than a single web pages deserving service. We already have a cache system codebase in most services, and we have a message broker system to notify data changes and trigger actions. Currently only one service of each type exist (not scaled). They transfer small volatile data over messages and bigger more persistent (changing less often) data over SQL. We are in process of moving pretty much all data to Redis to ease scalability and performances. Now some colleagues are having a heated discussion about whether we should abandon the cache system altogether and use Redis as the common global cache, or keep our notification/refresh system. We were wondering what the external world think about it. Thanks
(damn that's a lot of text)
I would favor utilizing in-process memory as much as possible. Any remote query introduces latency. You can use a hybrid approach and utilize in-process cache for speed (and it is MUCH faster) but put a significantly shorter TTL on it, and then once expired, reach further back to Redis.

Are Amazon's micro instances (Linux, 64bit) good for MongoDB servers?

Do you think using an EC2 instance (Micro, 64bit) would be good for MongoDB replica sets?
Seems like if that is all they did, and with 600+ megs of RAM, one could use them for a nice set.
Also, would they make good primary (write) servers too?
My database is only 1-2 gigs now but I see it growing to 20-40 gigs this year (hopefully).
Thanks
They COULD be good - depending on your data set, but likely they will not be very good.
For starters, you dont get much RAM with those instances. Consider that you will be running an entire operating system and all related services - 613mb of RAM could get filled up very quickly.
MongoDB tries to keep as much data in RAM as possible and that wont be possible if your data set is 1-2 gigs and becomes even more of a problem if your data set grows to 20-40 gigs.
Secondly they are labeled as "Low IO performance" so when your data swaps to disk (and it will based on the size of that data set), you are going to suffer from disk reads due to low IO throughput.
Be aware that micro instances are designed for spiky CPU usage, and you will be throttled to the "low background level" if you exceed the allotment.
The AWS Micro Documentation has good information of what they are intended for.
Between the CPU and not very good IO performance my experience with using micros for development/testing has not been very good. (larger instance types have been fine though), but a micro may work for your use case.
However, there are exceptions for a config or arbiter nodes, I believe a micro should be good enough for these types of machines.
There is also some mongodb documentation specific to EC2 which might help.

Resources