How to safely and efficiently connect to a MongoDB replicaset instance with the C# Driver - performance

I am using MongoDB with the C# driver and am wondering what is the most efficient yet safe way to create connections to the database.
Thread Safety
According to the Mongo DB C# driver documentation the MongoClient, MongoServer, MongoDatabase, MongoCollection and MongoGridFS classes are thread safe. Does this mean I can have a singleton instance of MongoClient or MongoDatabase?
The documentation also states that a connection pool is used for MongoClient, so the management of connections to MongoDB is abstracted from the MongoClient class anyway.
Example Scenario
Let's say I have three MongoDB instances in my replicaset; so I create MongoClient and MongoDatabase objects based upon the three server addresses for these instances. Can I create a static singleton for the database and client objects and use them across multiple requests simultaneously? What if one of the instances dies; if I cache the Mongo objects, how can I make sure this scenario is dealt with safely?

In my project I'm using a singleton MongoClient only, then get MongoServer and other stuff from MongoClient.
This is because what you said, the connection pool is in the MongoClient, I definitely don't want more than one connection pool. and here's what the document says:
When you are connecting to a replica set you will still use only one
instance of MongoClient, which represents the replica set as a whole.
The driver automatically finds all the members of the replica set and
identifies the current primary.
Actually the MongoClient is added to C# driver since 1.7, to represent the whole replica set and handle failover, load balancing stuff. Because MongoServer doesn't have the ability to to that. Thus you shouldn't cache MongoServer because once a server is offline you can't know it.
EDIT: Just had a look at the source code. I may have made a mistake. The MongoClient doesn't handle connection pool. the MongoServer does (at least until driver 1.7, haven't looked at the latest driver source yet). This makes sense because MongoServer represents a real Mongo instance. And one connection pool stores connections only to that server.

Related

Which connection pool implementation has the behaviour that i want?

So i am running a spring boot server which i use to query a MySQL database. So far i have been using the auto-configured HikariCP connection pool with JOOQ so i had almost nothing to do with the connection pool. But now i need to query two different schemas (on the same server) and it seems like i can't auto-configure two connection pools so i have to tinker with the DataSource myself. I would like to conserve the native behavior of the connection, i.e have a set of persistent connections so that the server can dispatch the queries and once the query is resolved, the connection is still there and free to use again. I have found multiple implementations of connection pools allowing to have multiple DataSource to query multiple servers but i don't know if each of them is using the behavior that i just described.
Implementation #1 :
https://www.ru-rocker.com/2018/01/28/configure-multiple-data-source-spring-boot/
Implementation #2 :
https://www.stubbornjava.com/posts/database-connection-pooling-in-java-with-hikaricp
I feel like #2 is the most straight forward solution but i am sceptical to the idea of creating a new DataSource everytime i want to query. If i don't close it, am i just opening now connections over and over again? So obviously i would have to close them once finished but then it's not really a connection pool anymore. (Or am i misunderstanding this?)
Meanwhile #1 seems more reliable but again, i would be calling new HikariDataSource everytime so is that what i am looking for?
(Or is there a more simple solution that i have been missing out because i need to query two different schemas but still on the same server and dialect)
Ok so it turns out i don't have to setup multiple connections in my case. As i am querying the same server with the same credentials, i don't have to setup a connection for each shema. I just removed the schema that i specified in my jdbc url config:
spring.datasource.url=jdbc:mysql://localhost:5656/db_name?useUnicode=true&serverTimezone=UTC
Becomes
spring.datasource.url=jdbc:mysql://localhost:5656/?useUnicode=true&serverTimezone=UTC
And then as i had already generated the POJO with the JOOQ generator i could reference my table from the schema object, i.e: Client.CLIENT.ID.as("idClient") becomes ClientSchema.CLIENTSCHEMA.CLIENT.ID.as("idClient"). This way i can query multiple schemas without setting up any new additional connection.
How to configure MAVEN and JOOQ to generate sources from multiple schemas:
https://www.jooq.org/doc/3.13/manual/code-generation/codegen-advanced/codegen-config-database/codegen-database-catalog-and-schema-mapping/

Jax rs client pool

I am working on setting up a REST Client using jax-rs 2 client API.
In the api doc it says "Clients are heavy-weight objects that manage the client-side communication infrastructure. Initialization as well as disposal of a Client instance may be a rather expensive operation. It is therefore advised to construct only a small number of Client instances in the application." (https://docs.oracle.com/javaee/7/api/javax/ws/rs/client/Client.html). As per this statement it sounds like Client is not thread-safe and i should not be using single Client instance for all requests.
I am using CXF implementation, so far i didn't find a way to set up pool for Client objects.
If anyone has any information reg this could you please share.
Thanks in advance.
By default, CXF uses a transport based on the in-JDK HttpURLConnection object to perform HTTP requests.
Connection pooling is performed allowing persistent connections to reuse the underlying socket connection for multiple http requests.
Set these system properties to configure the pool(default values)
http.keepalive=true
http.maxConnections=5
Increment the value of http.maxConnections to set the maximum number of idle connections that will be simultaneously kept alive, per destination. See in this link the complete list of properties properties.html
In this post are explained some detail how it works
Java HttpURLConnection and pooling
Note also that the default JAX-RS client is not thread-safe by default. Check the limitations for proper use here
When you need many requests executed simultaneosly CXF can also use the asynchronous apache HttpAsyncClient. Ser details here
http://cxf.apache.org/docs/asynchronous-client-http-transport.html

Spring Data when does it connect to the database

I have been researching Spring Data Rest especially for cassandra and one of the questions my coworkers and I had was when does Spring Data connect to the database. We don't always want a rest controller to connect to the database so when does spring establish a connection if say we had a class extend the CRUDRepository? Does it connect to the database during the start of application itself? Is that something we can control?
For example, I implemented this example on Spring's website:
https://spring.io/guides/gs/accessing-data-rest/
At what point in the code does spring connect to the database?
Spring will connect to the DB as soon as the Datasource get initialized. Basically, Spring contexts will become alive somehow (Web listeners, manually calling them) and start creating beans. As soon as it reaches the Datasource, connection will be made and the connection pool will be populated.
Of course the above is based on a normal out of the box configuration and everything can be setup up to your taste.
So unless, you decide to control the connections yourself, DB connections will be sitting there waiting to be used.
Disagree with the above answer.
As part of research i initiated the datasource using a bean configuration and then changed my database password(not in my spring application but the real db username password)
The connection stays for a while and then in some point of time (maybe idle time) it stops working and throws credential exception.
This is enough to say the JPA does not keep the connection sitting and waiting to be used but uses some mechanism to occupy/release the db connection as per the need.

Using MongoDB API in a Web application

I'd need to use the MongoClient and DB objects repeatedly in a Web application:
MongoClient mongoClient = new MongoClient();
DB db = mongoClient.getDB( "test" );
Is it safe to cache and re-use these objects among different clients accessing our application?
Thanks
You should create this once and inject it via CDI/Guice, if you can. If you can't do that, you could use a static factory method to return the one instance of your MongoClient. MongoClient maintains a connection pool and is safe to use between different threads. If you create a new MongoClient with each request, not only is it going to be a performance hit to set up that pool and open a new connection, but you'll likely leave dangling connections unless you properly close that MongoClient at the end of the request.
Yes. From Getting Started with Java Driver, "you will only need one instance of class MongoClient even with multiple threads".
As a side note, the Mongo Java driver is a pain to use. The dev team I'm part of is very happy with Jongo, a wrapper around the Java driver that allows queries to be written more like shell queries.

jConnect4 pooled connection does not work as documented

Official Sybase jConnect Programmers Reference suggests following way to use pooled connections:
SybConnectionPoolDataSource connectionPoolDataSource = new SybConnectionPoolDataSource();
...
Connection ds = connectionPoolDataSource.getConnection();
...
ds.close();
However getDataSource always causes exception. I decompiled SybConnectionPoolDataSource and found that the method call explicitly generates an error:
public Connection getConnection() throws SQLException
{
ErrorMessage.raiseError("JZ0S3", "getConnection()");
return null;
}
Does anyone have an idea why the documentation contradicts to the implementation?
I can't comment specifically for Sybase because 1) I don't use it and 2) your link doesn't work, but I can try to give you a theory based on my own experience maintaining a JDBC driver (Jaybird/Firebird JDBC) and looking at what some of the other implementations do.
The ConnectionPoolDataSource is probably the least understood part of the JDBC API. Contrary to what the naming suggests and how it has been implemented in some JDBC implementations this interface SHOULD NOT provide connection pooling and should not implement DataSource (or at least: doing that can lead to confusion and bugs; my own experience).
The javadoc of the ConnectionPoolDataSource is not very helpful, the javax.sql package documentation provides a little bit more info, but you really need to look at the JDBC 4.1 specification, Chapter 11 Connection Pooling to get a good idea how it should work:
[...] the JDBC driver provides an implementation of ConnectionPoolDataSource that the application server uses to build and manage the connection pool.
In other words: ConnectionPoolDataSource isn't meant for direct use by a developer, but instead is used by an application server for its connection pool; it isn't a connection pool itself.
The application server provides its clients with an implementation of the DataSource interface that makes connection pooling transparent to the client.
So the connection pool is made available to the user by means of a normal DataSource implementation. The user uses this as would it be one that doesn't provide pooling, and uses the connections obtained as if it is a normal physical connection instead of one obtained from a connection pool:
When an application is finished using a connection, it closes the logical connection using the method Connection.close. This closes the logical connection but does not close the physical connection. Instead, the physical connection is returned to the pool so that it can be reused.
Connection pooling is completely transparent to the client: A client obtains a pooled connection and uses it just the same way it obtains and uses a non pooled connection.
This is further supported by the documentation of PooledConnection (the object created by a ConnectionPoolDataSource):
An application programmer does not use the PooledConnection interface directly; rather, it is used by a middle tier infrastructure that manages the pooling of connections.
When an application calls the method DataSource.getConnection, it gets back a Connection object. If connection pooling is being done, that Connection object is actually a handle to a PooledConnection object, which is a physical connection.
The connection pool manager, typically the application server, maintains a pool of PooledConnection objects. If there is a PooledConnection object available in the pool, the connection pool manager returns a Connection object that is a handle to that physical connection. If no PooledConnection object is available, the connection pool manager calls the ConnectionPoolDataSource method getPoolConnection to create a new physical connection. The JDBC driver implementing ConnectionPoolDataSource creates a new PooledConnection object and returns a handle to it.
Unfortunately, some of JDBC drivers have created data sources that provide connection pooling by implementing both DataSource and ConnectionPoolDataSource in a single class, instead of the intent of the JDBC spec of having a DataSource that uses a ConnectionPoolDataSource. This has resulted in implementations that would work if used as a normal DataSource, but would break if used as a ConnectionPoolDataSource (eg in the connection pool of an application server), or where the interface was misunderstood and the wrong methods where used to create connections (eg calling getPooledConnection().getConnection()).
I have seen implementations (including in Jaybird) where the getPooledConnection() would be used to access a connection pool internal to the implementation, or where only connections obtained from the getConnection() of the implementation would work correctly, leading to all kinds of oddities and incorrect behavior when that implementation was used to fill a connection pool in an application server using the getPooledConnection().
Maybe Sybase did something similar, and then decided that wasn't such a good idea so they changed the DataSource.getConnection() to throw an exception to make sure it wasn't used in this way, but at the same time maintaining the API compatibility by not removing the methods defined by DataSource. Or maybe they extended a normal DataSource to easily create the physical connection (instead of wrapping a normal one), but don't want users to use it as a DataSource.

Resources