How to improve write performance in DolphinDB? - performance

I am performing various computations on DolphinDB, the speed are pretty good enough. But I am having an roadblock with the write performance in DolphinDB because my data have thousands of columns, it takes a few seconds to write 100 records. How can I improve it?

Have you tried cache engine in DolphinDB? This should work for your case. To enable cache engine, please add one line setup in configuration file as follows
//in unit of GB
chunkCacheEngineMemSize = 2

Related

How to improve AWS Glue's performance?

I have a simple job on AWS that takes more than 25 minutes. I changed the number of DPUs from 10 to 100 (the max allowed), the job still takes 13 minutes.
Any other suggestions on improving the performance?
I've noticed the same behavior.
My understanding is that the job time includes spinning up an EMR cluster, which takes several minutes. So if it takes.. say 8 minutes (just a guess), then your job time went from 17 -> 5.
Unless CPU or memory was a bottleneck for your existing job, adding more DPUs (i.e. more CPU and memory) wouldn't benefit your job significantly. At least the benefits will not be linear, i.e. 10 times more DPU doesn't mean that the job will run 10 times faster.
I suggest that you gradually increase the number of DPUs to look at performance gains, and you will notice that after a certain point adding more DPUs doesn't have a major impact on performance and that probably is the right amount of DPUs for your job.
Can we take a look at your job? Sometimes simple may not be performant. We've found that simple things like using the DynamicFrame.map transformation is really slow and you might be better off using a tmp table and mapping your data using the SQLContext

which are the criteras to find out that the webserver can handle load using jmeter? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have created a test plan using 100 threads. how can we conclude that the web server can handle load? which are the factors we can be taken for the load test.
I personally think you need to define your own metrics for your test plan to get a load test pass.
Typical metrics I would use.
Each response should come back in less than 250 ms. (Adjust to what your customer would expect)
All responses should come back with a non error response.
The server should be in a 'good state' after the load test. (Check memory, threads,database connection leaks etc)
To many resources being consumed is also a bad sign. Database connections, memory , hard disk for log files. Define your own metrics here.
Successive 'soak tests' to compliment your load tests would also be a good idea.
Basically run the a smaller amount of jmeter tests every two hours (So the DBA's etc. don't complain) over the weekend and check on the Monday.
I would recommend to you to first clarify your concepts about performance testing and its types (like load test, stress test, soak test etc). You can refer to following blog to get basic understanding about performance testing and its types:
Load vs. Stress testing
http://www.testerlogic.com/performance-testing-types-concepts-issues/
Once you have a better understanding of concepts, you will be in better position to ask the right question. For now, you can focus on following points..
what is expected load on your web server (in normal and extreme scenarios!)
what is your acceptable criteria for response time, load time etc
Once you know these numbers, you can create a jmeter test which runs for a specific time span (say 1 hour) and no. of threads increase step-by-step (100 user in first 10 minutes, 200 users from 10-20 mins, 300 users from 20-30 mins and so on). (hint: you can use ramp-up period to achieve this scenario).
You can perform these tests and check the reports and compare the response time and other performance factors during first 10 minutes (when load was 100 users) and in last 10 minutes when load was maximum.
This is just to give you a high level idea. As i said before it will better if you first clarify basic performance testing concepts and then design/perform the actual testing.
Like the rjdkolb said you have to define your metrics, check what you require from your service/app.
It all depends what service you are working with - do you have some stable load on the server, or some peaks, do you think there will be like 100 users online or 10000 at once, do you need fast answers or just proper answers in reasonable time. Maybe business foresee that the load will be building gradually through next year and it will start with just 100 requests per minute but will finish with 1000 per sec?
If you think that, like mentioned in other answer, you need an answer in less than 250 ms, then gradually increase load to check how many users/requests you can handle to still have responses on time. And maybe you need answers for 1000 users working simultaneously - then try load like this and check do they have they answers and how fast are they coming back? A lot to think about, do you think?
Try to read a bit about types of performance testing - maybe here on soapui or this explanation of some metrics. A lot of texts on the internet can guide you in your way.
Have fun.

Spring Batch Performance Improvement

I am writing a Spring Batch which needs to read from a Database table and then process the read data (while reading more database tables) and then finally write to a database. The performance of the Spring Batch needs to be updated so that 10 files are written every 1 second.
I followed this post and managed to increase some performance by using multi threaded steps.
But still the desired performance goal can not be met. Can anyone guide me on how to get more throughput from Spring batch.
Your performance depends of a lot of factors.
For example :
What does your query looks like? Are there any joins/subrequest who could slow down your whole job?
What does your processor do?
Did you use indexed tables (with a specific index tablespaces on a faster drive)?
Parallel processing, multi-threading and partitionning is only a small part of your performance gain.

Cassandra integration with hadoop for read performance

I am using Apache Cassandra for storing around 100 million records. There is one single node with the following specifications-
RAM-32GB, HDD-2TB, Intel quad core processor.
With cassandra there is a read performance problem. For some queries it takes around 40mins for giving the output. After searching for how to improve the read performance i came to know about the following factors-
Compaction strategy,compression techniques, key cache, increase the heap space, turning off the swap space for cassandra.
After doing these optimizations, the performance remains the same. After seraching, I came around for integrating Hadoop with cassandra.Is it the correct way to do the queries in cassandra or any other factors I am missing here??
Thanks.
It looks like you data model could be improved. 40 minutes is something impossible. I download all data from 6 million records (around 10gb) within few minutes. And think it because I convert data in the process of download and store them. Trivial selects must take milliseconds.
Did you build it on the base of queries that you must do ?

Performance issue for batch insertion into marklogic

I have the requirement to insert 10,000 docs into marklogic in less than 10 seconds.
I tested in one single-node marklogic server in the following way:
use xdmp:spawn to pass the doc insertion task to task server;
use xdmp:document-insert without specify forest explicitly;
the task server has 8 theads to process tasks;
We have enabled CPF.
The performance is very bad: it took 2 minutes to finish the 10,000 doc creation.
I'm sure the performance will be better if I tested it in a cluster environment, but I'm not sure whether it can finish in less than 10 seconds.
Please advise the way of improving the performance.
I would start by gathering more information. What version of MarkLogic is this? What OS is it running on? What's the CPU? RAM? What's the storage subsystem? How many forests are attached to the database?
Then gather OS-level metrics, to see if one of the subsystems is an obvious bottleneck. For now I won't speculate beyond that.
If you need a fast load, I wouldn't use xdmp:spawn for each individual document, nor use CPF. But 2 minutes for 10k docs doesn't necessarily sound slow. On the other hand, I have reached up to 3k/sec, but without range indexes, transforms, whatsoever. And a very fast disk (e.g. ssd)..
HTH!
Assuming 2 socket server, 128GB-256GB of ram, fast IO (400-800MB/sec sustained)
Appropriate number of forests (12 primary or 6 primary/6 secondary)
More than 8 threads assuming enough cores
CPF off
Turn on perf history, look in metrics, and you will see where the bottleneck is.
SSD is not required - just IO throughput...which multiple spinning disks provide without issue.

Resources