I'm running into a problem where I can't deserialize a string into a Task using System.Text.Json (.Net 5).
JsonSerializer.Deserialize<Task<TItem>>(serializedItem)
Background
I have a localized cache and store in it items retrieved from the DB.
I can't store the actual item in the cache, because any subsequent manipulation of the object results in the cached item being manipulation, affecting all further uses. I therefore store a serialized copy of the object.
Regarding performance...
I'm using the async/await pattern to call the DB (the whole app is async).
I read an article (may have been a video) in which Stephen Toub described the performance advantage of caching the Task. This SO article When to cache Tasks? goes into the details. Anyhow, I thought I'd try to take advantage of this (it works perfectly without serialization) using the following in my local cache "layer":
If Task is in my cache, await it and return the result.
Otherwise, call the DB method without awaiting it and add the resultant task to the cache
When I add serialization, then the deserialization of the task:
Task<TItem>? cachedItem = JsonSerializer.Deserialize<Task<TItem>>(serializedItem);
results in
Deserialization of types without a parameterless constructor, a
singular parameterized constructor, or a parameterized constructor
annotated with 'JsonConstructorAttribute' is not supported.
The answer here is relatively straightforward - serialize the data inside the Task result, not the task itself.
If you aren't using something like memcached but rather using something like a simple Dictionary in-process object cache then serializing and deserializing seems like a rather heavy approach to cloning objects though.
Related
We are using the following frameworks and versions:
jOOQ 3.11.1
Spring Boot 2.3.1.RELEASE
Spring 5.2.7.RELEASE
I have an issue where some of our business logic is divided into logical units that look as follows:
Request containing a user transaction is received
This request contains various information, such as the type of transaction, which products are part of this transaction, what kind of payments were done, etc.
These attributes are then stored individually in the database.
In code, this looks approximately as follows:
TransactionRecord transaction = transactionRepository.create();
transaction.create(creationCommand);`
In Transaction#create (which runs transactionally), something like the following occurs:
storeTransaction();
storePayments();
storeProducts();
// ... other relevant information
A given transaction can have many different types of products and attributes, all of which are stored. Many of these attributes result in UPDATE statements, while some may result in INSERT statements - it is difficult to fully know in advance.
For example, the storeProducts method looks approximately as follows:
products.forEach(product -> {
ProductRecord record = productRepository.findProductByX(...);
if (record == null) {
record = productRepository.create();
record.setX(...);
record.store();
} else {
// do something else
}
});
If the products are new, they are INSERTed. Otherwise, other calculations may take place. Depending on the size of the transaction, this single user transaction could obviously result in up to O(n) database calls/roundtrips, and even more depending on what other attributes are present. In transactions where a large number of attributes are present, this may result in upwards of hundreds of database calls for a single request (!). I would like to bring this down as close as possible to O(1) so as to have more predictable load on our database.
Naturally, batch and bulk inserts/updates come to mind here. What I would like to do is to batch all of these statements into a single batch using jOOQ, and execute after successful method invocation prior to commit. I have found several (SO Post, jOOQ API, jOOQ GitHub Feature Request) posts where this topic is implicitly mentioned, and one user groups post that seemed explicitly related to my issue.
Since I am using Spring together with jOOQ, I believe my ideal solution (preferably declarative) would look something like the following:
#Batched(100) // batch size as parameter, potentially
#Transactional
public void createTransaction(CreationCommand creationCommand) {
// all inserts/updates above are added to a batch and executed on successful invocation
}
For this to work, I imagine I'd need to manage a scoped (ThreadLocal/Transactional/Session scope) resource which can keep track of the current batch such that:
Prior to entering the method, an empty batch is created if the method is #Batched,
A custom DSLContext (perhaps extending DefaultDSLContext) that is made available via DI has a ThreadLocal flag which keeps track of whether any current statements should be batched or not, and if so
Intercept the calls and add them to the current batch instead of executing them immediatelly.
However, step 3 would necessitate having to rewrite a large portion of our code from the (IMO) relatively readable:
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
to:
userObjects.forEach(userObject -> {
dslContext.insertInto(...).values(userObject.getX(), ...).execute();
}
which would defeat the purpose of having this abstraction in the first place, since the second form can also be rewritten using DSLContext#batchStore or DSLContext#batchInsert. IMO however, batching and bulk insertion should not be up to the individual developer and should be able to be handled transparently at a higher level (e.g. by the framework).
I find the readability of the jOOQ API to be an amazing benefit of using it, however it seems that it does not lend itself (as far as I can tell) to interception/extension very well for cases such as these. Is it possible, with the jOOQ 3.11.1 (or even current) API, to get behaviour similar to the former with transparent batch/bulk handling? What would this entail?
EDIT:
One possible but extremely hacky solution that comes to mind for enabling transparent batching of stores would be something like the following:
Create a RecordListener and add it as a default to the Configuration whenever batching is enabled.
In RecordListener#storeStart, add the query to the current Transaction's batch (e.g. in a ThreadLocal<List>)
The AbstractRecord has a changed flag which is checked (org.jooq.impl.UpdatableRecordImpl#store0, org.jooq.impl.TableRecordImpl#addChangedValues) prior to storing. Resetting this (and saving it for later use) makes the store operation a no-op.
Lastly, upon successful method invocation but prior to commit:
Reset the changes flags of the respective records to the correct values
Invoke org.jooq.UpdatableRecord#store, this time without the RecordListener or while skipping the storeStart method (perhaps using another ThreadLocal flag to check whether batching has already been performed).
As far as I can tell, this approach should work, in theory. Obviously, it's extremely hacky and prone to breaking as the library internals may change at any time if the code depends on Reflection to work.
Does anyone know of a better way, using only the public jOOQ API?
jOOQ 3.14 solution
You've already discovered the relevant feature request #3419, which will solve this on the JDBC level starting from jOOQ 3.14. You can either use the BatchedConnection directly, wrapping your own connection to implement the below, or use this API:
ctx.batched(c -> {
// Make sure all records are attached to c, not ctx, e.g. by fetching from c.dsl()
records.forEach(record -> {
record.setX(...);
// ...
record.store();
}
});
jOOQ 3.13 and before solution
For the time being, until #3419 is implemented (it will be, in jOOQ 3.14), you can implement this yourself as a workaround. You'd have to proxy a JDBC Connection and PreparedStatement and ...
... intercept all:
Calls to Connection.prepareStatement(String), returning a cached proxy statement if the SQL string is the same as for the last prepared statement, or batch execute the last prepared statement and create a new one.
Calls to PreparedStatement.executeUpdate() and execute(), and replace those by calls to PreparedStatement.addBatch()
... delegate all:
Calls to other API, such as e.g. Connection.createStatement(), which should flush the above buffered batches, and then call the delegate API instead.
I wouldn't recommend hacking your way around jOOQ's RecordListener and other SPIs, I think that's the wrong abstraction level to buffer database interactions. Also, you will want to batch other statement types as well.
Do note that by default, jOOQ's UpdatableRecord tries to fetch generated identity values (see Settings.returnIdentityOnUpdatableRecord), which is something that prevents batching. Such store() calls must be executed immediately, because you might expect the identity value to be available.
When you develop an ASP.NET application using the repository pattern, do each of your methods create a new entity container instance (context) with a using block for each method, or do you create a class-level/private instance of the container for use by any of the repository methods until the repository itself is disposed? Other than what I note below, what are the advantages/disadvantages? Is there a way to combine the benefits of each of these that I'm just not seeing? Does your repository implement IDisposable, allowing you to create using blocks for instances of your repo?
Multiple containers (vs. single)
Advantages:
Preventing connections from being auto-closed/disposed (will be closed at the end of the using block).
Helps force you to only pull into memory what you need for a particular view/viewmodel, and in less round-trips (you will get a connection error for anything you attempt to lazy load).
Disadvantages:
Access of child entities within the Controller/View is limited to what you called with Include()
For pages like a dashboard index that shows information gathered from many tables (many different repository method calls), we will add the overhead of creating and disposing many entity containers.
If you are instantiating your context in your repository, then you should always do it locally, and wrap it in a using statement.
If you're using Dependency Injection to inject the context, then let your DI container handle calling dispose on the context when the request is done.
Don't instantiate your context directly as a class member, since this will not dispose of the contexts resources until garbage collection occurs. If you do, then you will need to implement IDipsosable to dispose the context, and make sure that whatever is using your repository properly disposes of your repository.
I, personally, put my context at the class level in my repository. My primary reason for doing so is because a distinct advantage of the repository pattern is that I can easily swap repositories and take advantage of a different backend. Remember - the purpose of the repository pattern is that you provide an interface that provides back data to some client. If you ever switch your data source, or just want to provide a new data source on the fly (via dependency injection), you've created a much more difficult problem if you do this on a per-method level.
Microsoft's MSDN site has good information the repository pattern. Hopefully this helps clarify some things.
I disagree with all four points:
Preventing connections from being auto-closed/disposed (will be closed
at the end of the using block).
In my opinion it doesn't matter if you dispose the context on method level, repository instance level or request level. (You have to dispose the context of course at the end of a single request - either by wrapping the repository method in a using statement or by implementing IDisposable on the repository class (as you proposed) and wrapping the repository instance in a using statement in the controller action or by instantiating the repository in the controller constructor and dispose it in the Dispose override of the controller class - or by instantiating the context when the request begins and diposing it when the request ends (some Dependency Injection containers will help to do this work).) Why should the context be "auto-disposed"? In desktop application it is possible and common to have a context per window/view which might be open for hours.
Helps force you to only pull into memory what you need for a
particular view/viewmodel, and in less round-trips (you will get a
connection error for anything you attempt to lazy load).
Honestly I would enforce this by disabling lazy loading altogether. I don't see any benefit of lazy loading in a web application where the client is disconnected from the server anyway. In your controller actions you always know what you need to load and can use eager or explicit loading. To avoid memory overhead and improve performance, you can always disable change tracking for GET requests because EF can't track changes on a client's web page anyway.
Access of child entities within the Controller/View is limited to what
you called with Include()
Which is rather an advantage than a disadvantage because you don't have the unwished surprises of lazy loading. If you need to populate child entities later in the controller actions, depending on some condition, you could load them through additional repository methods (LoadNavigationProperty or something) with the same or even a new context.
For pages like a dashboard index that shows information gathered from
many tables (many different repository method calls), we will add the
overhead of creating and disposing many entity containers.
Creating contexts - and I don't think we are talking about hundreds or thousands of instances - is a cheap operation. I would call this a very theoretical overhead which doesn't play a role in practice.
I've used both approaches you mentioned in web applications and also the third option, namely to create a single context per request and inject this same context into every repository/service I need in a controller action. They all three worked for me.
Of course if you use multiple contexts you have to be careful to do all the work in the same unit of work to avoid attaching entities to multiple contexts which will lead to well know exceptions. It's usually not a problem to avoid this situations but requires a bit more attention, especially when processing POST requests.
I lately use contexts per request, because it is easier and I just don't see the benefit of having very narrow contexts and I see no reason to use more than one single unit of work for the whole request processing. If I would need multiple contexts - for whatever reason - I could always create specialized methods which act with their own context instead of the "default context" of the request.
In Ruby practice is to send id instead of object in workers. Isn't that kind of CPU consuming process because we have to retrieve Object again from database.
Several reasons:
Saves space on the queue, also transfer time (app => queue, queue => workers).
Often it is easier to fetch fresh object from the database (as opposed to retrieving cached copy from the queue)
Argument to Resque.enqueue must be JSON-serializable. Complex objects not always can be serialized.
If you think about it the reasons are pretty obvious:
your object may change between the time te action is queued and handled and in general you don't want an outdated object.
an id a a lot lighter to transport than a whole object which you will need to serialize it in json/yaml or anything else.
if you need the associations the problem just got even worse :)
But in the end it depends on your application, if you only need some informations you can just send them to your worker directly without even using the full model.
I am looking into using Enterprise Caching Block for my .NET 3.5 service to cache a bunch of static data from the database.
From everything I have read, it seems that FileDependency is the best option for storing static data that does not expire too often. However, when the file changes and the cache is flushed, I need to get a callback once to do some post processing for that particular cache. If I implement ICacheItemRefreshAction and register it during adding an item to the cache, I get a callback for each one of them.
Is there a way to register a callback for the entire cache so that I dont see thousands of callbacks being invoked when the cache flushes?
Thanks
To address your follow up for a better way than FileDependency: you could wrap a SqlDependency in an ICacheItemExpiration. See SqlCacheDependency with the Caching Application Block for sample code.
That approach would only work with SQL Server and would require setting up Service Broker.
In terms of a cache level callback, I don't see an out of the box way to achieve that; almost everything is geared to the item level. What you could do would be to create your own CacheManager Implementation that features a cache level callback.
Another approach might be to have a ICacheItemRefreshAction that only performs any operations when the cache is empty (i.e. the last item has been removed).
My C# 3.5 application uses SQL Server 2008 R2, NHibernate and CastleProject ActiveRecord. The application imports emails to database along with their attachments. Saving of emails and attachments is performed by 50 emails in new session and transaction scope to make sure they are not stored in memory (there can be 100K of emails in some mailbox).
Initially emails are saved very quickly. However, closer to 20K emails performance degrades dramatically. Using dotTrace I got the following picture:
Obviously, when I save an attachment, NHibernate tries to see if it really should save it and probably compares with another attachments in the session. To do so, it compares them byte by byte what takes almost 500 seconds (for the snapshot on the picture) and 600M enumerator operations.
All this looks crazy, especially when I know for sure that SaveAndFlush indeed should save the attachment without any checks: I know for sure that it is new and should be saved.
However, I cannot figure out, how to instruct NHibernate to avoid this check (IsUpdateNecessary). Please advise.
P.S. I am not sure but it might appear that degradation of performance closer to 20K is not connected with having some older mails in memory: I noticed that in mailbox I am working with, larger emails are stored later than smaller so the problem may be only in attachments comparison.
Update:
Looks like I need something like StatelessSessionScope, but there is no documentation on it even at CastleProject site! If I do something like
using (TransactionScope txScope = new TransactionScope())
using (StatelessSessionScope scope = new StatelessSessionScope())
{
mail.Save();
}
it fails with exception that Save is not supported by stateless session. I am supposed to insert objects into session, but I do not have any Session (only SessionScope, which adds up to SessionScope only single OpenSession method which accepts strange paramenters).
May be I missed it in that long text, but are you using stateless session for importing data? Using that prevents a lot of checks and also bypasses first level cache, thus using minimal resources.
Looks like I've found an easy solution: for my class Attachment, causing biggest performance penalty, I overridden the following method:
protected override int[] FindDirty(
object id,
System.Collections.IDictionary previousState,
System.Collections.IDictionary currentState, NHibernate.Type.IType[] types)
{
return new int[0];
}
Thus, dirty check always consider it dirty and does not do that crazy per-byte comparison.