Biztalk Debatched Message Value Caching - caching

I get a file with 4000 entries and debatch it, so i dont lose the whole message if one entry has corrupting data.
The Biztalkmap is accessing an SQL server, before i debatched the Message I simply cached the SLQ data in the Map, but now i have 4000 indipendent maps.
Without caching the process takes about 30 times longer.
Is there a way to cache the data from the SQL Server somewhere out of the Map without losing much Performance?

It is not a recommendable pattern to access a database in a Map.
Since what you describe sounds like you're retrieving static reference data, another option is to move the process to an Orchestration where the reference data is retrieved one time into a Message.
Then, you can use a dual input Map supplying the reference data and the business message.
In this patter, you can either debatch in the Orchestration or use a Sequential Convoy.

I would always avoid accessing SQL Server in a map - it gets very easy to inadvertently make many more calls than you intend (whether because of a mistake in the map design or because of unexpected volume or usage of the map on a particular port or set of ports). In fact, I would generally avoid making any kind of call in a map that has to access another system or service, but if you must, then caching can help.
You can cache using, for example, MemoryCache. The pattern I use with that generally involves a custom C# library where you first check the cache for your value, and if there's a miss you check SQL (either for the paritcular entry or the entire cache, e.g.:
object _syncRoot = new object();
...
public string CheckCache(string key)
{
string check = MemoryCache.Default.Get(key) as string;
if (check == null)
{
lock (_syncRoot)
{
// make sure someone else didn't get here before we acquired the lock, avoid duplicate work
check = MemoryCache.Default.Get(key) as string;
if (check != null) return check;
string sql = #"SELECT ...";
using (SqlConnection conn = new SqlConnection(connStr))
{
conn.Open();
using (SqlCommand cmd = conn.CreateCommand())
{
cmd.CommandText = sql;
cmd.Parameters.AddWithValue(...);
// ExecuteScalar or ExecuteReader as appropriate, read values out, store in cache
// use MemoryCache.Default.Add with sensible expiration to cache your data
}
}
}
}
else
{
return check;
}
}
A few things to keep in mind:
This will work on a per AppDomain basis, and pipelines and orchestrations run on separate app domains. If you are executing this map in both places, you'll end up with caches in both places. The complexity added in trying to share this accross AppDomains is probably not worth it, but if you really need that you should isolate your caching into something like a WCF NetTcp service.
This will use more memory - you shouldn't just throw everything and anything into a cache in BizTalk, and if you're going to cache stuff make sure you have lots of available memory on the machine and that BizTalk is configured to be able to use it.
The MemoryCache can store whatever you want - I'm using strings here, but it could be other primitive types or objects as well.

Related

How to store the updates of state in an offchain database?

I want to store all the blockchain data in offchain database.
rpc has a function called EXPERIMENTAL_changes, I was told that I can do that by http polling of this method but I am unable to find out how to use it.
http post https://rpc.testnet.near.org jsonrpc=2.0 id=dontcare method=EXPERIMENTAL_changes \ params:='{ "changes_type": "data_changes", "account_ids": ["guest-book.testnet"], "key_prefix_base64": "", "block_id": 19450732 }'
For example here the results give:
"change": { "account_id": "guest-book.testnet", "key_base64": "bTo6Mzk=", "value_base64": "eyJwcmVtaXVtIjpmYWxzZSwic2VuZGVyIjoiZmhyLnRlc3RuZXQiLCJ0ZXh0IjoiSGkifQ==" }
What is key_base64?
Decoding it to string gives m::39
What is m::39?
For example, I have the following state data in the rust structure.
pub struct Demo {
user_profile_map: TreeMap<u128, User>,
user_products_map: TreeMap<u128, UnorderedSet<u128>>, // (user_id, set<product_id>)
product_reviews_map: TreeMap<u128, UnorderedSet<u128>>, // (product_id, set<review_id>)
product_check_bounty: LookupMap<u128, Vector<u64>>
}
How to know anything gets changed in these variables?
Will I have to check every block id for the point the contract is deployed, to know where there is the change?
I want to store all the blockchain data in offchain database.
If so, I recommend you take a look at the Indexer Framework, which allows you to get a stream of blocks and handle them. We use it to build Indexer for Wallet (keeps track of every added and deleted access key, and stores those into Postgres) and Indexer for Explorer (keeps track of every block, chunk, transaction, receipt, execution outcome, state changes, accounts, and access keys, and stores all of that in Postgres)
What is m::39?
Contracts in NEAR Protocol have access to the key-value storage (state), so at the lowest-level, you operate with key-value operations (NEAR SDK for AssemblyScript defines Storage class with get and set operations, and NEAR SDK for Rust has storage_read and storage_write calls to preserve data).
Guest Book example uses a high-level abstraction called PersistentVector, which automatically reads and writes its records from/to NEAR key-value storage (state). As you can see:
export const messages = new PersistentVector<PostedMessage>("m");
Guest Book defines the messages to be stored in the storage with m prefix, hense you see m::39, which basically means it is messages[39] stored in the key-value storage.
What is key_base64?
As key-value storage implies, the data is stored and accessed by keys, and the key can be binary, so base64 encoding is used to enable JSON-RPC API users with a way to query those binary keys as well (there is no way you can pass a raw binary blob in JSON).
How to know anything gets changed in these variables? Will I have to check every block id for the point the contract is deployed, to know where there is the change?
Correct, you need to follow every block, and check the changes. That is why we have built the Indexer Framework in order to enable community building services on top of that (we chose to build applications Indexer for Wallet and Indexer for Explorer, but others may decide to build GraphQL service like TheGraph)

Waiting for Realm writes to be completed

We are using Realm in a Xamarin app and have some issues refreshing the local database based on a remote source. Data is fetched from a remote endpoint and stored locally using Realm for easier/faster access.
Program flow is as follows:
Fetch data from remote source (if possible).
Loop through the entities returned by the remote source while keeping track of the IDs we've seen so far. New or updated entities are written to Realm.
Loop through the set of locally stored entities, removing entities we haven't seen in step 2 with Realm.Remove(entity); (in a transaction)
Return Realm.All<Entity>();
Unfortunately, the entities are returned by step 4 before all "remove" operations have been written. As a result, it takes a couple of refreshes before the local database is completely in sync.
The remove operation is done as follows:
foreach (Entity entity in realm.All<Entity>())
{
if (seenIds.Contains(entity.Id))
{
continue;
}
realm.Write(() => {
realm.Remove(entity);
});
}
Is there a way to have Realm wait till the transaction is completed, before returning the Realm.All<Entity>();?
I am pretty sure this is not particularly a Realm issue - the same pattern would cause problems with a lot of enumerable, mutable containers. You are removing items from a list whilst iterating it so enumeration is moving on too far.
There is no buffering on Realm transactions so I guarantee it is not about have Realm wait till the transaction is completed but is your list logic.
There are two basic ways to do this differently:
Use ToList to get a list of all objects from the All - this is expensive if many objects because you will instantiate all the objects.
Instead of removing objects inside the loop, add them to a list of items to be removed then iterate that list.
Note that using a transaction per-remove, as you are doing with Write here is relatively slow. You can do many operations in one transaction.
We are also working on other improvements to the Realm API that might give a more efficient way of handling this. It would be very helpful to know the relative data sizes - the number of removals vs records in the loop. We love getting sample data and schemas (can send privately to help#realm.io).
an example of option 2:
var toDelete = new List<Entity>();
foreach (Entity entity in realm.All<Entity>())
{
if (!seenIds.Contains(entity.Id))
toDelete.Add(entity);
}
realm.Write(() => {
foreach (Entity entity in toDelete))
realm.Remove(entity);
});

Does memcached have a pass-through mode?

I'm wondering if it's typical or even supported to "layer" memcached instances. Say, for example you have two memcached servers, one "local" and one "remote". Is there a way to request something from the "local" server, such that if there is a cache miss, the request is passed through to the remote server? That is, the local server requests the item from the remote server, and caches the result locally, and the next request for the item would fetch from the local cache.
Or is the only way to do this, to do something like this in your application code (pseudocode, I hope it's clear):
item := get(local, id)
if (!isValid(item)) {
item = get(remote, id)
if (isValid(item)) {
set(local, id, item)
} else {
// get the item from somewhere else
}
}
// do something with item
The use that I have in mind would be caching immutable objects, so coherency between the two caches is not an issue.
There is no pass through functionality in memcached. You would have to do your implementation in the application logic as you have shown in your question.

Caching Data in Web API

I have the need to cache a collection of objects that is mostly static (might have changes 1x per day) that is avaliable in my ASP.NET Web API OData service. This result set is used across calls (meaning not client call specific) so it needs to be cached at the application level.
I did a bunch of searching on 'caching in Web API' but all of the results were about 'output caching'. That is not what I'm looking for here. I want to cache a 'People' collection to be reused on subsequent calls (might have a sliding expiration).
My question is, since this is still just ASP.NET, do I use traditional Application caching techniques for persisting this collection in memory, or is there something else I need to do? This collection is not directly returned to the user, but rather used as the source behind the scenes for OData queries via API calls. There is no reason for me to go out to the database on every call to get the exact same information on every call. Expiring it hourly should suffice.
Any one know how to properly cache the data in this scenario?
The solution I ended up using involved MemoryCache in the System.Runtime.Caching namespace. Here is the code that ended up working for caching my collection:
//If the data exists in cache, pull it from there, otherwise make a call to database to get the data
ObjectCache cache = MemoryCache.Default;
var peopleData = cache.Get("PeopleData") as List<People>;
if (peopleData != null)
return peopleData ;
peopleData = GetAllPeople();
CacheItemPolicy policy = new CacheItemPolicy {AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(30)};
cache.Add("PeopleData", peopleData, policy);
return peopleData;
Here is another way I found using Lazy<T> to take into account locking and concurrency. Total credit goes to this post: How to deal with costly building operations using MemoryCache?
private IEnumerable<TEntity> GetFromCache<TEntity>(string key, Func<IEnumerable<TEntity>> valueFactory) where TEntity : class
{
ObjectCache cache = MemoryCache.Default;
var newValue = new Lazy<IEnumerable<TEntity>>(valueFactory);
CacheItemPolicy policy = new CacheItemPolicy { AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(30) };
//The line below returns existing item or adds the new value if it doesn't exist
var value = cache.AddOrGetExisting(key, newValue, policy) as Lazy<IEnumerable<TEntity>>;
return (value ?? newValue).Value; // Lazy<T> handles the locking itself
}
Yes, output caching is not what you are looking for. You can cache the data in memory with MemoryCache for example, http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.aspx . However, you will lose that data if the application pool gets recycled. Another option is to use a distributed cache like AppFabric Cache or MemCache to name a few.

jdbc batch performance

i'm batching updates with jdbc
ps = con.prepareStatement("");
ps.addBatch();
ps.executeBatch();
but in the background it seems, that the prostgres driver sends the query bit by bit to the database.
org.postgresql.core.v3.QueryExecutorImpl:398
for (int i = 0; i < queries.length; ++i)
{
V3Query query = (V3Query)queries[i];
V3ParameterList parameters = (V3ParameterList)parameterLists[i];
if (parameters == null)
parameters = SimpleQuery.NO_PARAMETERS;
sendQuery(query, parameters, maxRows, fetchSize, flags, trackingHandler);
if (trackingHandler.hasErrors())
break;
}
is there a possibility to let him send 1000 a time to speed it up?
AFAIK is no server-side batching in the fe/be protocol, so PgJDBC can't use it.. Update: Well, I was wrong. PgJDBC (accurate as of 9.3) does send batches of queries to the server if it doesn't need to fetch generated keys. It just queues a bunch of queries up in the send buffer without syncing up with the server after each individual query.
See:
Issue #15: Enable batching when returning generated keys
Issue #195: PgJDBC does not pipeline batches that return generated keys
Even when generated keys are requested the extended query protocol is used to ensure that the query text doesn't need to be sent every time, just the parameters.
Frankly, JDBC batching isn't a great solution in any case. It's easy to use for the app developer, but pretty sub-optimal for performance as the server still has to execute every statement individually - though not parse and plan them individually so long as you use prepared statements.
If autocommit is on, performance will be absolutely pathetic because each statement triggers a commit. Even with autocommit off running lots of little statements won't be particularly fast even if you could eliminate the round-trip delays.
A better solution for lots of simple UPDATEs can be to:
COPY new data into a TEMPORARY or UNLOGGED table; and
Use UPDATE ... FROM to UPDATE with a JOIN against the copied table
For COPY, see the PgJDBC docs and the COPY documentation in the server docs.
You'll often find it's possible to tweak things so your app doesn't have to send all those individual UPDATEs at all.

Resources