Transactions in Berkeley DB. Fast? - performance

When there is only one writer to a Berkeley DB, is it worth to use transactions?
Do transaction cause a significant slowdown? (in percents please)

You use transactions if you require the atomicity that they provide. Perhaps you need to abort the transaction, undoing everything in it? Or perhaps you need the semantic that should the application fail, a partially completed transaction is aborted. Your choice of transactions is based on atomicity, not performance. If you need it, you need it.
If you don't need atomicity, you may not need durability. Then, that is significantly faster!

Transactions with DB_INIT_TXN in Berkeley DB are not significantly
slower than other models, although generally maintaining a transactional
log requires all data to be written to the log before being written
to the database.
For a single writer and multiple readers, try the DB_INIT_CDB
model because the code is much simpler. Locks in the INIT_CDB
model are per-table and so overall throughput might be worse
than a INIT_TXN model because of coarse grained per-table
lock contention.
Performance will depend on access patterns more than whether
one uses DB_INIT_TXN or DB_INIT_CDB models.

Related

Which caching mechanism to use in my spring application in below scenarios

We are using Spring boot application with Maria DB database. We are getting data from difference services and storing in our database. And while calling other service we need to fetch data from db (based on mapping) and call the service.
So to avoid database hit, we want to cache all mapping data in cache and use it to retrieve data and call service API.
So our ask is - Add data in Cache when it gets created in database (could add up-to millions records) and remove from cache when status of one of column value is "xyz" (for example) or based on eviction policy.
Should we use in-memory cache using Hazelcast/ehCache or Redis/Couch base?
Please suggest.
Thanks
I mostly agree with Rick in terms of don't build it until you need it, however it is important these days to think early of where this caching layer would fit later and how to integrate it (for example using interfaces). Adding it into a non-prepared system is always possible but much more expensive (in terms of hours) and complicated.
Ok to the actual question; disclaimer: Hazelcast employee
In general for caching Hazelcast, ehcache, Redis and others are all good candidates. The first question you want to ask yourself though is, "can I hold all necessary records in the memory of a single machine. Especially in terms for ehcache you get replication (all machines hold all information) which means every single node needs to keep them in memory. Depending on the size you want to cache, maybe not optimal. In this case Hazelcast might be the better option as we partition data in a cluster and optimize the access to a single network hop which minimal overhead over network latency.
Second question would be around serialization. Do you want to store information in a highly optimized serialization (which needs code to transform to human readable) or do you want to store as JSON?
Third question is about the number of clients and threads that'll access the data storage. Obviously a local cache like ehcache is always the fastest option, for the tradeoff of lots and lots of memory. Apart from that the most important fact is the treading model the in-memory store uses. It's either multithreaded and nicely scaling or a single-thread concept which becomes a bottleneck when you exhaust this thread. It is to overcome with more processes but it's a workaround to utilize todays systems to the fullest.
In more general terms, each of your mentioned systems would do the job. The best tool however should be selected by a POC / prototype and your real world use case. The important bit is real world, as a single thread behaves amazing under low pressure (obviously way faster) but when exhausted will become a major bottleneck (again obviously delaying responses).
I hope this helps a bit since, at least to me, every answer like "yes we are the best option" would be an immediate no-go for the person who said it.
Build InnoDB with the memcached Plugin
https://dev.mysql.com/doc/refman/5.7/en/innodb-memcached.html

How to implement distributed transaction in CouchDB?

We are moving our database from oracle to couchDB, for one of the use case is to implement the distributed transaction management.
For Ex: Read the data from JMS Queue and update it in multiple document, if any thing fails then revert back and throws an exception to JMS queue.
As we know couchDB does not support distributed transaction management.
Can you please suggest any alternative strategy to implement this or any other way out?
More than technical aspects I feel you might be interested in the bottom line of that.
As mentionned distributed transactions are not possible - this notion doesn't even exist, because it is not necessary. Indeed, unlike in the relational world, 95% of the time when you feel that you need them it means that you are doing something wrong.
I'll be straightforward with you : dumping your relational data into couchdb will end up being a nightmare both for writes and reads. For the first you'll say : how can I do transactions ? For the laters : how can I do joins ? Both are impossible and are concepts which do not even exist.
The convenient conclusions - too - many people reach is that "CouchDb is not enterprise ready or ACID enough". But the truth is you need to take the time to rethink your data structures.
You need to rethink your data structures and make them document oriented because if you don't you are off the intended usage of couchdb - and as you know this is risky territory.
Read on DDD and aggregates design, and turn your records into DDD entities and aggregates. So there'd be an ETL layer to CouchDb. If you don't have the time to do that I'd recommend not using CouchDb - as much as I love it.
CouchDB doesn't have properties which are necessary for distributed transactions so it's impossible. All major distributed transaction algorithms (Two-Phase commit protocol, RAMP and Percolator-style distributed transactions, you can find details in this answer) require linearizability on the record level. Unfortunately CouchDB is an AP solution (in the CAP theorem sense) so it can't even guarantee record-level consistency.
Of cause you can disable replication to make CouchDB consistent but then you'll lose fault-tolerance. Another option is to use CouchDB as a storage and to build a consistent database on top of it but it's an overkill for your task and doesn't use any CouchDB-specific feature. The third option is to use CRDT but it works only if your transactions are commutative.

How to deal with Java EE concurrency

Please let me know the best practices of providing application concurrency in software project. I would like to use Hibernate for ORM. Spring to manage the transactions. Database as MySQL.
My concurrency requirement is to let as much as users to connect to the database and
make CRUD operations and use the services. But I do not like to have stale data.
How to handle data concurrency issues in DB.
How to handle application concurrency. What if two threads access my object simultaneously will it corrupt the state of my object?
What are the best practices.
Do you recommend to define isolation levels in spring methods
#Transactional(isolation=Isolation.READ_COMMITTED). How to take that decision.?
I have come up with below items would like to know your feedback and the way to address?
a. How to handle data concurrency issues in DB.
Use Version Tag and Timestamp.
b. How to handle application concurrency.
Provide optimistic locking. Not using synchronizations but create objects for each requests (Scope prototype).
c. What are the best practices.
cache objects whenever its possible.
By using transactions, and #Version fields in your entities if needed.
Spring beans are singletons by default, and are thus shared by threads. But they're usually stateless, thus inherently thread-safe. Hibernate entities shouldn't be shared between threads, and aren't unless you explicitely do it by yourself: each transaction, running in its own thread, will have its own Hibernate session, loading its own entity instances.
much too broad to be answered.
The default isolation level of your database is usually what you want. It is READ_COMMITTED with most databases I know about.
Regarding your point c (cache objects whenever possible), I would say that this is exactly what you shouldn't do. Caching makes your app stateful, difficult to cluster, much more complex, and you'll have to deal with the staleness of the cache. Don't cache anything until
you have a performance problem
you can't tweak your algorithms and queries to solve the performance problem
you have proven that caching will solve the performance problem
you have proven that caching won't cause problems worse than the performance problem
Databases are fast, and already cache data in memory for you.

ORM solutions (JPA; Hibernate) vs. JDBC

I need to be able to insert/update objects at a consistent rate of at least 8000 objects every 5 seconds in an in-memory HSQL database.
I have done some comparison performance testing between Spring/Hibernate/JPA and pure JDBC. I have found a significant difference in performance using HSQL.. With Spring/Hib/JPA, I can insert 3000-4000 of my 1.5 KB objects (with a One-Many and a Many-Many relationship) in 5 seconds, while with direct JDBC calls I can insert 10,000-12,000 of those same objects.
I cannot figure out why there is such a huge discrepancy. I have tweaked the Spring/Hib/JPA settings a lot trying to get close in performance without luck. I want to use Spring/Hib/JPA for future purposes, expandability, and because the foreign key relationships (one-many and many-many) are difficult to maintain by hand; but the performance requirements seem to point towards using pure JDBC.
Any ideas of why there would be such a huge discrepancy?
We have similar experience comparing Hibernate with JDBC in batch mode (Statement#executeBatch()). Basically, it seems like Hibernate just doesn't do that well with bulk operations. In our case, the Hibernate implementation was fast enough on our production hardware.
What you may want to do, is to wrap your database calls in a DAO, giving your application a consistent way of accessing your data. Implement your DAOs with Hibernate where it's convenient, and with JDBC where the performance requirements call for it.
As a minimum, you need to do batch inserts in Hibernate: http://www.hibernate.org/hib_docs/reference/en/html/batch.html Saves a lot of round-trip time.
And, as Justice mentioned, the primary goal of Hib is not computer performance, but developer performance. Having said that, it's usually possible to achieve comparable (not equal, but not that much worse) to JDBC results.
Never use one technology for all problems.
Depending on the problem decide what technology to use.
Of course jpa or hibernate is slower than jdbc. jdbc is on lower level than jpa.
Also a db professional with jdbc can write more optimized sql than jpa.
If you gave critical point where speed is required, jpa is not your choise.
Hibernate maintains a first-level cache of objects to use in dirty checking as well as to act as a Unit of Work and Identity Map. This adds to the overhead, especially in bulk-type operations. For bulk operations, you may want to investigate StatelessSessions that don't maintain this state.
All that mapping ... it can get a little bit expensive, with all the arcane logic and all the reflection and consistency-checking that it has to do.
The point of mapping is not to boost performance, of course. Typically, you take a performance hit. But what you lose in performance, you (can) gain many times over in developer productivity, consistency, testability, reliability, and so many more coveted attributes. Typically, when you need the extra performance and you don't want to give up mapping, you drop in some more hardware.

How to minimize transaction overhead in Oracle?

I have to simultaneously load data into a table and run queries on it. Because of data nature, I can trade integrity for performance. How can I minimize the overhead of transactions?
Unfortunately, alternatives like MySQL cannot be used (due to non-technical reasons).
Other than the general optimization practices that apply to all databases such as eliminating full table scans, removing unused or inefficient indexes, etc., etc., here are a few things you can do.
Run in No Archive Log mode. This sacrifices recoverability for speed.
For inserts use the /*+ APPEND */ hint. This puts data into the table above the high water mark which does not create UNDO. The disadvantage is that existing free space is not used.
On the hardware side, RAID 0 over a larger number of smaller disks will give you the best insert performance, but depending on your usage RAID 10 with its better read performance may provide a better fit.
This said, I don't think you will gain much from any of these changes.
Perhaps I'm missing something, but since in Oracle readers don't block writers and writers don't block readers, what exactly is the problem you are trying to solve?
From the perspective of the sessions that are reading the data, sessions that are doing inserts aren't really adding any overhead (updates might add a bit of overhead as the reader would have to look at data in the UNDO tablespace in order to reconstruct a read-consistent view of the data). From the perspective of the sessions that are inserting the data, sessions that are doing reads aren't really adding any overhead. Of course, your system as a whole might have a bottleneck that causes the various sessions to contend for resources (i.e. if your inserts are using up 100% of the available I/O bandwidth, that is going to slow down queries that have to do physical I/O), but that isn't directly related to the type of operations that the different sessions are doing-- you can flood an I/O subsystem with a bunch of reporting users just as easily as with a bunch of insert sessions.
You want transaction isolation read uncommitted. I don't recommend it but that's what you asked for :)
This will allow you to breach transaction isolation and read uncommitted inserted data.
Please read this Ask Tom article: http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html.
UPDATE: I was actually mistaking, Oracle doesn't really support read uncommitted isolation level, they just mention it :).
How about you try disabling all constraints in your table, then inserting all the data, then enabling them back again?
i.e. alter session set constraints=deffered;
However, if you had not set the constraints in your table to defferable during table creation, there might arise a slight problem.
What kind of performance volumes are you looking at? Are inserts batched or numerous small ones?
Before banging your head against the wall trying to think of clever ways to have good performance, did you create any simple prototypes which would give you a better picture of the out-of-the-box performance? It could easily turn out that you don't need to do anything special to meet the goals.

Resources