How to send an entire Entity Framework Core context from server to client - performance

I have an ASP.NET Core server application that uses Entity Framework Core to provide data of its SQL server.
There are clients that can consume the data via REST API calls.
At the beginning of the communication, it is required to have all the data on the clients but using the existing REST calls it takes minutes as the context contains thousands of entities.
So I looked for ideas and tried the following.
The best looking idea was some kind of serializing, so I created the following method:
public byte[] GetData()
{
string data = Newtonsoft.Json.JsonConvert.SerializeObject(this.ChangeTracker,
new Newtonsoft.Json.JsonSerializerSettings {
ReferenceLoopHandling = Newtonsoft.Json.ReferenceLoopHandling.Ignore });
return CompressAsGZip(data);
}
The results:
Serializing (and then compressing) the ChangeTracker of the context
Initially the ChangeTracker is empty so I can't do it unless I query all the data
Serializing (and then compressing) the entire DbContext
It has so many objects that after 20%, I got an OutOfMemoryException
Should I create a database backup and send the compressed bak file? I guess I couldn't restore it to anywhere as the client database provider is different (SQLite).
What would be the best way to send all the data to the client as fast as it can be?

Related

How to deal with access collision while loading data from external sources?

I have a Spring service that retrieves very often (every 1000-2000 ms) data from several external APIs. Loading data takes 100-200ms. If at that moment of loading the client asks for data, then the data would not be available, because it take some time to process returned data (I am converting it from the loaded format from the API to the internal storage format of the java.util.Map instance).
This service is heavily used in the system and clients query the data very often, so a collision (when the data will not be available at the moment) is very likely. Maybe it would be ideal to pause the method that clients use to access data if data is being loaded from an external API. Or use somme type of semaphore?
Q: Is there a general method / best practise to avoid a collision in a Spring application in such situation?

Make sure that data is loaded before the application startup | Spring webflux

I have a spring webflux application.
I am loading some list from database into bean. I have two ways of implementing the loading of this bean.
Approach 1: Reactive Way
#Bean
public List<Item> getItemList() throws IOException {
List<Item> itemList = new ArrayList<>();
itemRespository.findAll().collectList().subscribe(itemList::addAll);
return itemList;
}
Approach 2 : Blocking way
#Bean
public List<Item> getItemList() throws IOException {
List<Item> itemList = itemRespository.findAll().collectList().block();
return itemList;
}
Now as I want my application to be reactive, I don't want to use the blocking way.
But the endpoints which I am exposing through my controller depends on this bean's data.
#RestController
public class SomeController{
#Autowired
private List<items> getItemList;
#GetMapping('/endpoint')
public void process(){
List list = getItemList; //this may not get initialzed as the bean loading is reactive
//some more code
}
}
So in case of reactive approach, it may happen that somebody may call my endpoint(as application has already started and ready to serve requests), while due to some reason it may happened that my list has yet not bean retrieved from database(may be any reason ex: slowness of database server etc.), producing inconsistent results for the users calling my endpoint(which in turns depend on this bean).
I am looking for a solution for this scenario.
EDIT : More precise question is that should I load those beans reactively in my application, on which my exposed endpoints are dependent?
The current application architecture solution presented is a typical example on a design that is inherently blocking.
If the first request made to the api needs the items to be in place, then we must sure that they are there before we can take on requests. And the only way to ensure that is to block until the items de facto have been fetched and stored.
Since the design is inherently blocking, we need to rethink our approach.
What we want is to make the service available for requests as quick as possible. We can solve this by using a cache, that will get filled when the first request is made.
Which means application starts up with an empty cache. This cache could for instance be a #Component as spring beans are singletons by default.
the steps would be:
service starts up, cache is empty
service receives its first request
checks if there is data in the cache
if data is stale, evict the cache
if cache is empty, fetch the data from our source
fill the cache with our fetched data
set a ttl (time to live) on the data placed in the cache
return the data to the calling client
Second request:
request comes in to the service
checks if there is data in the cache
checks if the data is stale
if not grab the data and return it to the calling subscriber
There are several cache solutions out there, spring has their #Cachable annotation, which by default is just a key value store, but can be paired with an external solution like redis etc.
Other solutions can be Google guava which has a very good read on their github.
This type of solution is called trading memory for cpu we gain startup time and fast requests (cpu), but the cost is we will spend some more memory to hold data in a cache.

Reactive streaming approach of file upload in Spring (Boot)

We have spent a lot of hours on the inet and on stackoverflow, but none of the findings satisfied us in the way we planned a file upload in Spring context.
A few words towards our architecture. We have a node.js client which uploads files into a Spring Boot app. Let us call this REST endpoint our "client endpoint". Our Spring Boot application acts as middleware and calls endpoints of a "foreign system", so we call this endpoint a "foreign" one, due to distinction. The main purpose is the file handling between these two endpoints and some business logic in between.
Actually, the interface to our client looks like this:
public class FileDO {
private String id;
private byte[] file;
...
}
Here we are very flexible because it is our client and our interface defintion.
Due to the issue that under load our system has run out of memory sometimes, we plan to reorganize our code into a more stream-based, reactive approach. When i write "under load", i mean heavily under load, e.g. hundreds of file uploads at the same time with big files from at least some MB to at most 1GB. We know, that this tests don't represent real applications use cases, but we want to be prepared.
We spent some research into our challenge and we ended up with profiler tools showing us that according to our REST endpoints we store the files as byte arrays completely in our memory. Thats all, but not efficient.
Currently we are facing this requirement to deliver a REST endpoint for file upload and push these files into another REST endpoint of some foreign system. Doing so, our main applications intention is to be some middle tier for file upload. According to this initial situation we are looking forward to not have those files as a whole in our memory. Best would be a stream, maybe reactive. We are partially reactive with some business functions already, but at the very beginning of being familiar with all that stuff.
So, what are our steps so far? We introduced a new Client (node.js --> Spring Boot) interface as the following one. This works so far. But is it really a stream based approach? First metrics have shown, that this doesn't reduce memory utilization.
#PostMapping(value="/uploadFile", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
#ResponseStatus(HttpStatus.CREATED)
public Mono<Void> upload(#RequestPart(name = "id") String id, #RequestPart(name = "file") Mono<FilePart> file) {
fileService.save(id, file);
...
}
First question: is this type Mono<> right here? Or should we better have Flux of DataBuffer or something? And, if so, how the client shoud behave and deliver data in such a format that it is really a streaming approach?
The FileService class then should post this file(s) into the foreign system, perhaps do something else with given data, at least log the id and the file name. :-)
Our code in this FileService.save(..) actually looks like the following in between:
...
MultipartBodyBuilder bodyBuilder = new MultipartBodyBuilder();
bodyBuilder.asyncPart(...take mono somehow...);
bodyBuilder.part("id", id);
return webClient.create("url-of-foreign-system")
.uri("/uploadFile")
.syncBody(bodyBuilder.build())
.retrieve()
.bodyToMono(Result.class);
...
Unfortunately, the second REST endpoint, the one of our foreign system, looks little different to our first one. It will be enriched by data from another system. It takes some FileDO2 with an id and a byte array and some other meta data specific to the second foreign system.
As said, our approach should be to minimize the memory footprint of the actions in between client and foreign system. Sometimes we have not only to deliver data to that system, but also do some business logic that maybe slows down the whole streaming process.
Any ideas to do that in a whole? Currently we have not clue to do that all...
We appreciate any help or ideas.

How to handle Transaction in CosmosDB - "All or nothing" concept

I am trying to save multiple Document to 'multiple Collection' at once in one Transaction. So if one of the save fail, then all the saved document should RollBack
I am using SpringBoot & SpringData and using MongoAPi to connect to CosmosDB in Azure. I have read in their portal that this can be done by writing some Stored procedure. But is there a way we can do it from code like how spring have #Transaction annotation.?
Any help is really appreciated.
The only way you can write transactionally is with a stored procedure, or via Transactional Batch operations (SDK-based, in a subset of the language SDKs, currently .NET and Java) . But that won't help in your case:
Stored procedures are specific to the core (SQL) API; they're not for the MongoDB API.
Transactions are scoped to a single partition within a single collection. You cannot transactionally write across collections, regardless whether you use the MongoDB API or the core (SQL) API.
You'll need really ask whether transactions are absolutely necessary. Maybe you can use some type of durable-messaging approach, to manage your content-updates. But there's simply nothing built-in that will allow you to do this natively in Cosmos DB.
Transactions across partitions and collections is indeed not supported. If you really need a Rollback mechanism, it might be worthwhile to check the event sourcing pattern, as you might then be able to capture events instead of updating master entities. These events you could then easily delete, but still other processes might have executed using incorrect events.
We created a sort of unitofwork. We register all changes to the data model, including events and messages that are being sent. Only when we call a committ, the changes are persisted to the database, in the following order:
Commit updates
Commit deletes
Commit inserts
Send messages
Send events
Still, it's not watertight, but it avoids sending out messages/events/modifications to the data model as long as the calling process is not ready to do so(i.e. due to an error). This UnitOfWork is passed through our domain services to allow all operations of our command to be handled in one batch. It's then up to thé developer to realize if a certain operation can be committed as part of a bigger operarion(same UoW), or independent(new UoW).
We then wrapped our command handlers in a Polly policy to retry in case of update conflicts. Theoretically though we could get an update conflict on the 2nd update, which could cause an inconsistent data model, but we keep this in mind, when using the UoW.
It's not watertight, but hopefully it helps!
Yes, transactions are supported in Cosmos DB with the Mongo API. It believe it's a fairly new addition, but it's simple enough and described here.
I don't know how well it's supported in Spring Boot, but at least it's doable.
// start transaction
var session = db.getMongo().startSession();
var friendsCollection = session.getDatabase("users").friends;
session.startTransaction();
// operations in transaction
try {
friendsCollection.updateOne({ name: "Tom" }, { $set: { friendOf: "Mike" } } );
friendsCollection.updateOne({ name: "Mike" }, { $set: { friendOf: "Tom" } } );
} catch (error) {
// abort transaction on error
session.abortTransaction();
throw error;
}
// commit transaction
session.commitTransaction();

Handling large values in C# REDIS clients

I am developing a component that will provide a GET REST endpoint that will return a large (up to 2MB) JSON array of data. I will be using REDIS to cache the JSON array and the REST endpoint is implemented using a Web API 2 project.
I assumed that I could just return the data in the Response Stream so that I don't have to have very large strings in memory, but when I took a look at StackExchange.Redis I couldn't find any methods that return a Stream.
It appears that https://github.com/ctstone/csredis does, but this project looks pretty static now.
Am I missing something, or is there a workaround for this?

Resources