Sql Server large data - performance

I have sql server 2012 installed on a Virtual machine.
With quad core processor (2.0 GHz) and 32 GB RAM.
The Data volume is about 400 GB and there is a Table with 824 million records
in it.
The Records are date wise.
I have to pull data from this table as per the start date and end date given by the user. - this generally takes 24 minutes to process query.
Is there is any way to optimize this ,
even I have tried sql server 2012 column store index in it yet there is no significant improvement in the performance of the query.
Should i ask for some higher machine configurations for such large data or is there is any other way to achieve good performance in it?

Related

Improve an Insert query with 70 million rows in a stored procedure that sometimes runs longer than usual

I have a stored procedure with one of the inserts into the statement below that dumps around 70 million rows into a temp table by UNIONing multiple tables and views. Normally, it takes some 10-12 mins but at times the execution time doubles.
It is running on SQL Server version 2014 enterprise edition with the latest service pack with 8 cores and 32 GB memory allocated to buffer pool. Unfortunately, I'm unable to trace the reason behind the slow execution. While no waits happen while it runs, the tempdb utilization goes up; however, we have sized tempdb appropriately with 8 files corresponding to 8 cores assigned to.
Nothing else runs on the server during this batch process. I have the source code of the stored procedure code and the actual plan but I'm not sure how I can upload it as the text exceeds the character limit of the question post. I did try using pasetetheplan.com but it's not accepted due to its large size (more than 2 MB).

How can I improve ClickHouse server's startup time?

I am evaluating ClickHouse's performance for potential use in a project. The write performance has been encouraging up to this point but as I was running my tests and had to restart the server a few times, I noticed an issue which has the potential of being a hard showstopper: the server startup time is fluctuating and most of the times extremely high.
My evaluation server contains 26 databases holding about 54 billion records and taking up 697.32 GB on disk.
With this amount of data I have been getting startup times from as low as 7m35s to almost 3h.
Is this normal? Can it be solved with some fancier configuration? Am I doing something really wrong? Because, as it stands, such a long startup time is a showstopper.
The main cause of slow startup time is due to the gigantic amount of metadata needed to be loaded, which has a positive correlation with the number of data files. In order to boost startup time, you need to either shrink the file count or get more memory in order to preserve all dentry and inode caches.
My evaluation server contains 26 databases holding about 54 billion records and taking up 697.32 GB on disk.
I'd suggest the following:
Try adjustint current data partition schemes in a coarser manner
Use OPTIMIZE TABLE <table> FINAL to compact all data files
Upgrade data disk to SSD or efficient RAID, or use file systems like btrfs to store meta data separately on fast storage.

How should PostgreSQL be configured for this setup?

I would like to tweak my PostgreSQL server but even after reading a few tutorials online I am not getting good performance out of the database.
I've got a server with the following specs:
Windows Server 2012 R2 Datacenter
Intel CPU E5-2670 v2 # 2.50 GHz
64-bit Operating System
512 GB RAM
PostgreSQL 9.3
I would like to use postgres as a data storage / aggregation system for the following tasks:
Read data from various data sources (mostly flat files) (volumes between 100GB and 1TB)
Pre-process / clean data
Aggregate data
Feed aggregated or sampled data into R or python for modelling
Up to 10 concurrent users only
This means, I do not really care about the following:
Update speads (I only bulk-load data)
Failure resistance (in the unlikely event that things break, I can always reload everything from my input files)
Currently, load speeds are fine, but creating indexes and aggregating data takes very long and barely uses any memory.
Here is my current postgres.config: http://pastebin.com/KpSi2zSd
I think the obvious step here is to increase the work_mem and maintenance_work_mem considerably, with the fine detail being "how much"?
If you have control over how many aggregation queries and/or index creations are running at a time then you can be pretty aggressive with these, but you face the risk that with 10 concurrent users and a 30GB setting you could be putting your server under memory pressure.
It would really benefit you to get some execution plans for the slow running queries, as they will tell you that you need so-much memory for "Sort Method: external merge Disk" for example, and you can then adjust your settings while keeping an eye on the total memory usage on the server.
I wouldn't rule out that you have to re-jig your loads so that the most resource intensive run on their own, while less resource intensive operations run at the same time.
However, I think at the moment you are lacking some of the hard metrics that will let you make a good choice on memory allocation.

PostgreSql dropping CONSTRAINT on large table extremely slow

We have a database table in PostgreSql 9.2 running on Windows with approximately 750 million rows in it. We dropped a FK constraint on a column, but the statement has now been running for 48 hours and is still not complete.
The server has 32GB RAM and 16 CPU's. We have unfortunately not increased the maintenance_work_mem before running the SQL statement. It is therefore set at 256MB. The resource on the machine are not even coming close to maximum. As a matter of fact: CPU usage is below 3%, we have 80% of RAM free and Disk I/O does not go above 5MB/s, even though the machine can easily exceed 100MB/s.
Why does it take this long to drop a FK CONSTRAINT?
Is there a way to increase the performance of this statement execution whilst it is running?
What is the most efficient way of adding a FK CONSTRAINT to a table of this size?

Hbase concurrency making it slow

I have 1 master server and 5 region server and each server has 200 GB disk space and 16 GB RAM on each. I created a table in HBase which has 10 million records. I am using hbase-0.96 version on hadoop 2.
Table Name - sh_self_profiles
column family - profile
In this table, we have 30 columns in each row.
When I get a single column value from HBase, it takes around 10 ms. My problem is when I hit 100 or more concurrent requests the time slowly accumulates and increases to more than 400 ms instead of completing in 10ms only. When 100 requests are hit linearly, each one takes 10 ms only.
One thing that you should check is how well distributed your table is.
You can do this by going to the HBase master web console http://:60010, you will be able to see how many regions you have for your table. If you have not done anything special on table creation you could easily have only one or two regions, which means that all the requests are being directed to a single region server.
If this is the case, you can recreate your table with pre-split regions (I would suggest a multiple of 5, such as 15 or 20), and make sure that the concurrent gets that you are doing are equally spread over the row-key space.
Also, pls check how much RAM you have allocated to the region server - you might need to increase it from the default. If you are not running anything else other than HBase Region Sever on those machines, you could probably increase to 8GB ram.
Other than that, you could also adjust the default for hbase.regionserver.handler.count.
I hope this helps.
Which client are you using? Are you using the standard Java client, the Thrift client, the HTTP REST client, or something else? If your use case is a high amount of random reads of single column values, I highly recommend you try asynchbase as it is much faster than the standard synchronous Java client.

Resources