How can I speed up my AWS RDS Postgres performance? - ruby

I just set up a db.t2.micro instance on Amazon's AWS. I am using sinatra to load a localhost webpage. I am using Active Record to do maybe about 30~ queries and it's taking 92 seconds to load. It's extremely slow. I tried doing custom parameters as listed here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL
This didn't help speed anything up. I'm not sure how I can speed up this instance. This is my first time hosting a database. Any help would be appreciated.
When I run my sinatra app it host locally(localhost). Here is where the 30~ queries are taking 92 seconds to load. When I run select * statements in Postgres they take only a couple seconds.

The problem is the latency between you and Amazon's data center.
For example when you are in New York and your RDS instance is in Amazon's data center on the west coast, then the latency between you and the data center is about 80-100ms. That means when your local application sends a query to the database then it takes about 100ms before the database receives the query. To return the answer it takes again an additional 100ms.
That said: Assume a roundtrip takes 300ms and you have ~30 queries then your application loses about 10 seconds doing nothing – just waiting for data being sent through the wire. And there are other factors that might slow down this even more: Big packets or lost packets (the server has to ask again), bad internet connections, wireless connections, the distance between you and the database being longer than my example.
Therefore the database should be as near as possible to the application server in the same data center to minimize latency.

Related

Fetching rows with Snowflake JDBC while the query is running in the server

I have a complex query that runs a long time (e.g 30 minutes) in Snowflake when I run it in the Snowflake console. I am making the same query from a JVM application using JDBC driver. What appears to happen is this:
Snowflake processes the query from start to finish, taking 30 minutes.
JVM application receives the rows. The first receive happens 30 minutes after the query started.
What I'd like to happen is that Snowflake starts to send rows to my application while it is still executing the query, as soon as data is ready. This way my application could start processing the rows in the first 30 minutes.
Is this possible with Snowflake and JDBC?
First of all, I would request to check the Snowflake warehouse size and do the tuning. It's not worth waiting for 30 mins when by resizing of the warehouse, the query time can be reduced one fourth or less than that. By doing any of the below, your cost will be almost the same or low. The query execution time will be reduced linearly as you increase the warehouse size. Refer the link
Scale up by resizing a warehouse.
Scale out by adding clusters to a warehouse (requires Snowflake
Enterprise Edition or higher).
Now coming to JDBC, I believe it behaves the same way as for other databases as well

RethinkDB changefeeds performance: architectural advice?

I am building an application with RethinkDB and I'm about to switch to using changefeeds. But I'm facing an architectural choice and I'd like to get some advice.
My application currently loads all user data from several tables on user login (sending all of it to the frontend), and then processes requests from the frontend, altering the database, and preparing and sending changed items to users. I'd like to switch that over to changefeeds. The way I see it, I have two choices:
Set up a single changefeed for each table. Filter by users logged in to a particular server, and distribute the changes to users manually. These changefeeds are never closed, e.g. they have the lifetime of my servers.
When a user logs in, set up an individual changefeed for that user, for that user's data only (using a getAll with a secondary index). Maintain as many changefeeds as there are currently logged in users. Close them when users log out.
Solution #1 has a big disadvantage: RethinkDB changefeeds do not have a concept of time (or version number), like for example Kafka does. This means that there is no way to a) load initial data, and b) get changes that happened since the initial load. There is a time window where changes can be lost: between initial data load (a) and the moment the changefeed is set up (b). I find this worrying.
Solution #2 seems better, because includeInitial can be used to get initial data, and then get subsequent changes without interruption. I'd have to deal with initial load performance (it's faster to load a single dump of all data than process thousands of updates), but it seems more "correct". But what about scaling? I'm planning to handle up to 1k users per server — is RethinkDB prepared to handle thousands of changefeeds, each being essentially a getAll query? The actual activity in these changefeeds will be very low, it's just the number that I'm worried about.
The RethinkDB manual is a bit terse about changefeed scaling, saying that:
Changefeeds perform well as they scale, although they create extra intracluster messages in proportion to the number of servers with open feed connections on each write.
Solution #2 creates many more feeds, but the number of servers with open feed connections is actually the same for both solutions. And "changefeeds perform well as they scale" isn't quite enough to go on :-)
I'd also be interested to know what are recommended practices for handling server restarts/upgrades and disconnections. The way I see it, if anything happens to RethinkDB, clients have to perform a full data load (using includeInitial) after reconnecting, because there is no way to know what changes have been lost during downtime. Is that what people do?
RethinkDB should be able to handle thousands of changefeeds just fine if it's on reasonable hardware. One thing some people to do lower network load in that case is they put a proxy node on the same machine as their app server, and connect to that, since the proxy node knows enough to deduplicate the changefeed messages coming in over the network, and because it takes a lot of CPU/memory load off of their main cluster.
Currently the only way to recover from a crash is to restart the changefeed using includeInitial. There are plans to add write timestamps in the future, but handling deletes is complicated in that case.

CouchDB - Mobile application architecture - Replication performance

I built a mobile application based on CouchDB.
For security reason, i have to make sure that a document can be read only by the users allowed to do do it. As i cannot manage the access right at document level, i create one couchdb database per user, and i replicate documents from my main couchDB database in each user database with a filtered replication.
This model work very well, but today i faced huge performance issues.
I tried to have all my replications continuous, filtered and bi-directionnal, but after 80 users (so 81 databases and 160 simultanous continuous replications), there was too much replications and my couchDB service start to slow down and even crashed sometimes. Notices that all the databases are on the same server (and i could not have more than one server)
I tried to put in place a "manual" replications, but even this way when i need to replicate a document from my main database to all my 80 users databases, each filtered replication from my main database to a user database take around 30 seconds.
Maybe i have an issue with my replication filter, i store for each document a list of users allowed to see it. As each user has it own database, i replicate only the document the user is allowed to see in its database. Here is my replication function :
function(doc, req) {
if(doc.userList) {
if(doc.userList.indexOf(req.query.username) > 1) {
return true;
}
}
return false;
}
The goal of my application is to get around 1000 users, that is totally impossible with the current architecture / performance.
I have three questions :
1. Even if i think that it's not possible, Is it possible to get about 1000 databases in continuous replication on the same server?
2. Is there anything wrong with my replication filter? Is there any way to improve it to have fast databases replications?
3. If the current architecture is not good at all, what kind of architecture would you advise in my case?
Thank you very much !
We finally changed our global project architecture.
The main server cannot handle more than 100 replicated databases even if the configuration limits can be changed, after 80 synchronied databases couchdb logs start to explode. I may wrong, but i think that this kind of architecture is not possible on a single server.
Here is the solution we put in place.
We removed all the users databases and we plugged all our mobile applications directly on the main database and do a filtered replication directly on the main database : http://pouchdb.com/api.html#replication by using this solution : Example 3: filter function inside of a design document
This new model is now working we did some stress tests and we didn't get any issue until 1000 simultaneous users.
Just be aware that pouchDB, to replicate a database, ask couchdb all the modifications applied on the main database since the last synchronisation (even for filtered replication). So when you create a new pouchdb database and synchronise it, if your main couchDB is old and has a big historical (check couchdb _changes API), it can take a very (very) long time !
Step 0 is always identify the bottleneck. My first guess based on your scenarioe outlined would be to look at I/O perf. Check out
GET /_stats/couchdb
and
GET /_active_tasks
Each database gets its own read & write file descriptors so as the number of open databases on the server increases, so does the I/O resources required. Hope this helps

Azure SQL Data IO 100% for extended periods for no apparent reason

I have an Azure website running about 100K requests/hour and it connects to Azure SQL S2 database with about 8GB throughput/day. I've spent a lot of time optimizing the database indexes, queries, etc. Normally the Data IO, CPU and Log IO percentages are well behaved in the 20% range.
A recent portion of the data throughput is retained for supporting our customers. I have a nightly maintenance procedure that removes obsolete data to manage database size. This mostly works well with the exception of removing image blobs in a varbinary(max) field.
The nightly procedure has a loop that sets 10 records varbinary(max) field to null at a time, waits a couple seconds, then sets the next 10. Nightly total for this loop is about 2000.
This loop will run for about 45 - 60 minutes and then stop running with no return to my remote Sql Agent job and no error reported. A second and sometimes third running of the procedure is necessary to finish setting the desired blobs to null.
In an attempt to alleviate the load on the nightly procedure, I started running a job once every 30 seconds throughout the day - it sets one blob to null each time.
Normally this trickle job is fine and runs in 1 - 6 seconds. However, once or twice a day something goes wrong and I can find no explanation for it. The Data I/O percentage peaks at 100% and stays there for 30 - 60 minutes or longer. This causes the database responsiveness to suffer and the website performance goes with it. The trickle job also reports running for this extended period of time. If I stop the Sql Agent job, it can take a few minutes to stop but the Data I/O continues at 100% for the 30 - 60 minute period.
The web service requests and database demands are relatively steady throughout the business day - no volatile demands that would explain this. No database deadlocks or other errors are reported. It's as if the database hits some kind of backlog limit where its ability to keep up suddenly drops and then it can't catch up until something that is jammed finally clears. Then the performance will suddenly return to normal.
Do you have any ideas what might be causing this intermittent and unpredictable issue? Any ideas what I could look at when one of these events is happening to determine why the Data I/O is 100% for an extended period of time? Thank you.
If you are on SQL DB V12, you may also consider using the Query Store feature to root cause this performance problem. It's now in public preview.
In order to turn on Query Store just run the following statement:
ALTER DATABASE your_db SET QUERY_STORE = ON;

JDBC Requests High CPU at JMeter Client

I was implemented a JDBC test plan with my database on a web-server (I built a web server by myself). When I start a simple request from JMeter Client (Ex: SELECT * From link d WHERE d.linkLIKE '%com%'), then the CPU of JMeter would high usage (90-100%) for a long time (~5 mins, but I set my test plan in 6s :(. And on server side, CPU high very short time - 5-7seconds (I think this time for the query to database). I tried to change the HEAP in jmeter.bat to more than 1024m, but is wasn't successful.
Can you help me to solve this problem?
I'd run EXPLAIN PLAN on that SQL query. You're likely to see a TABLE SCAN because of the way you wrote the WHERE clause. That takes a lot of time, more as your table grows, because it requires that you examine each and every record.

Resources