need some software architecture insight on this. Which of the following is more efficient in terms of resource (cpu, memory, database)?
Having a single database connection in one flow? (Close connection only after everything is done, including business logic)
Having multiple database connections in one flow? (Open then close the database connection immediately after the query is executed)
By business logic, this is where data returned from the query is sanitized, or manipulated according to business rules.
Attaching here is the diagram for visual representation.
UPDATE:
Programming language: PHP (Laravel for web app, Lumen for API)
Database: MySQL
Host: AWS
opening new connection between your runtime and your database needs your OS to create new socket ( if runtime and database are on the same system this socket probably is linux socket, else this socket is tcp/udp socket)
this socket creation has overhead itself.
so I don't suggest to open and close connections after each database usage.
but there are specific conditions you want do that.
for example your database has a limited number of concurrent connections and you have thousands of long running processes using this connections, maybe in this situation you can use second approach.
Related
Reading this article: http://go-database-sql.org/accessing.html
It says that the sql.DB object is designed to be long-lived and that we should not Open() and Close() databases frequently. But what should I do if I have 10 different MySQL servers and I have sharded them in a way that I have 511 databases in each server for example the way Pinterest shards their data with MySQL?
https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f
Then would I not need to constantly access new nodes with new databases all the time? As I understand then I have to Open and Close the database connection all the time depending on which node and database I have to access.
It also says that:
If you don’t treat the sql.DB as a long-lived object, you could
experience problems such as poor reuse and sharing of connections,
running out of available network resources, or sporadic failures due
to a lot of TCP connections remaining in TIME_WAIT status. Such
problems are signs that you’re not using database/sql as it was
designed.
Will this be a problem? How should I solve this issue then?
I am also interested in the question. I guess there could be such solution:
Minimize number of idle connection in pool db.SerMaxIdleConns(N)
Make map[serverID]*sql.DB. When you have no such connection - add it to map.
Make Dara more local - so backends usually go to “their” databases. However Pinterest seems not to use it.
Increase number of sockets and files on backend machines so they can keep more open connections.
Provide some reasonable idle timeout so very old unused connections could be closed.
In my project, developers use a single instance of Connection instead of a connection pool on an Oracle 12c.
Using a pool is a common practice and Oracle itself documents it: http://docs.oracle.com/database/121/JJUCP/get_started.htm#JJUCP8120.
But JDBC 4.2 specification says:
13.1.1 Creating Statements
Each Connection object can create multiple Statement objects that may be used concurrently by the program.
Why using a pool of connections instead of a single connection, if it's possible to use statements to manage concurrency?
The Oracle Database Dev Team strongly discourages using a single Connection in multiple threads. That almost always causes problems. As a general rule we will not consider any problem report that does this.
A Connection can have multiple Statements and/or ResultSets open at one time but only one can execute at a time. Connections are strictly single threaded and blocking. We try to prevent multiple threads from accessing a Connection simultaneously but there are a few odd cases where it is possible. These are all but guaranteed to cause problems. (It is not practical to fix or prevent these cases mostly for performance reasons. Just don't share a single Connection across multiple threads.)
If a client connects to the database via a dedicated server connection then that database session will only serve that client . If the client connects to the database via shared server connection, then a given database session may serve multiple clients over its lifetime.
This is documented here.
Also, at any one point in time, a session can only execute one thing at a time. If that wasn't the case, then running things in parallel wouldn't spawn multiple other sessions!
A single connection cannot execute several statements concurrently.
Yes one connection can execute more that one statement. It will be the programmer to chose connection pooling setting or multiple statements when executing over more than one thread. Most databases in the market can handle multiple statements in one connection.
Is there any problem on using many open connections at the same time from different threads?
From what I've read it's thread safe by default, but, can this be hurting performance rather than improving it?
Having multiple connection is not a problem, the only thing to keep in mind is that SQLite does not support concurrency of multiple write transactions. From the SQlite site:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
SQLite is an "untypical" database management system: in practice it is a library that offers SQL as language to access a simple "database-in-a-file", and a few other functionalities of DBMSs. For instance, it has no real concurrency control (it uses the Operating Systems functions to lock the db file).
So, if you need concurrent insertions into a database, you should use something else, for instance PostgreSQL.
The documentation say:
A connection can only be used from within the thread that created it.
Moving connections between threads or creating queries from a
different thread is not supported.
In addition, the third party libraries used by the QSqlDrivers can
impose further restrictions on using the SQL Module in a multithreaded
program. Consult the manual of your database client for more
information.
It is mean you have to create connection to database which will be linking with parent thread. At docs of QSqlDatabase class you can see description:
The QSqlDatabase class represents a connection to a database.
The QSqlDatabase class provides an interface for accessing a database
through a connection. An instance of QSqlDatabase represents the
connection. The connection provides access to the database via one of
the supported database drivers, which are derived from QSqlDriver.
Create a connection (i.e., an instance of QSqlDatabase) by calling one
of the static addDatabase() functions, where you specify the driver or
type of driver to use (i.e., what kind of database will you access?)
and a connection name.
Using static addDatabase() function is way to create connection.
But as Renzo said SQLite does not support multiple write transactions at the same time. So you need some mechanisms(wrapper) for synchronizing threads like task queue using low-level mutex or something like that. More information you can see at docs.
I am using Pentaho-BI server installation in my web application as a third party installation.I am using its saiku analytics and reporting files by embedding their specific links in iframe of my application. Problem is I am not getting how it creates database connections, in terms of numbers?? Because many times it throws error regarding 'No connection is available in pool'. I know there are properties like max available connection, max idle connections , wait and sql validation. But How to release connections?? And if Pentaho handles it in its own way then how?? Because increasing number of max connections available will create load on database server, when many users are using my BI server.
One solution I found is just to restart my BI server, but It's not a valid solution for production environment. Other solution I think is scheduler, but I have no clues about it and not getting proper info on net.
The defaults for max connections are incredibly low. This is standard tomcat connection pooling stuff, I would definitely try increasing the default, see if that helps. you can monitor concurrent connections on the db side - just because you have 100 connections to the db it doesn't necessarily mean they'll be all used at once.
Also; Are you using mysql? You should try the c3po pooling driver it handles timeouts and things better than the standard driver so you shouldnt ever get dead connections sitting in the pool.
I'm developing a large scale website and when I was checking linq queries with SQL Profiler I found that there is about 30 login/logout actions. I want to decrease these actions by using an open connection for all DataContexts but I don't know how to do it. Do you have any suggestion?
That is pretty bad idea - you are creating large scalable website so let Linq-to-sql handle connection itself.
Linq-to-sql internally handles connection opening and releasing in effective way for it usage. It uses default ADO.NET connection pooling so the connection is correctly reused and not opened for every single context.
Using single connection for all context is exactly what makes your application non scalable and not working. Single connection allows only single transaction so once two requests want to make concurrent changes and use their own transaction your application will crash.
Do not share contexts and do not share connections - let ADO.NET handle connection pooling and create new context for every single request or you can expect serious issues.