MemSQL performance issues - performance

I have a single node MemSQL install with one master aggregator and two leaves (all on a single box). The machine has 2 cores, 16Gb RAM, and MemSQL columnstore data is ~7Gb (coming from 21Gb CSV). When running queries on the data, memory usage caps at ~2150Mb (11Gb sitting free). I've configured both leaves to have maximum_memory = 7000 in the memsql.cnf files for both nodes (memsql-optimize does similar). During query execution, the master aggregator sits at 100% CPU, with the leaves 0-8% CPU.
This does not seems like an efficient use of system resources, but I'm not sure what I can do to configure the system or MemSQL to make more efficient use of CPU or memory. Any help would be greatly appreciated!

If during query execution your machine is at 100% cpu (on all cores), it doesn't really matter which MemSQL node it is, your workload throughput is still bottlenecked on cpu. However for most queries you wouldn't expect most of the cpu use to be on the aggregator, so you may want to take a look at EXPLAIN or PROFILE of your queries.
Columnstore data is cached in memory as part of the OS file cache - it isn't counted as memory reserved by MemSQL, which is why your memory usage is less than the size of the columnstore data.

My database was coming from some other place than the current memsql install (perhaps an older cluster configuration) despite there only being a single memsql cluster on the machine. Looking at the Databases section in the Web UI was displaying no databases/tables, but my queries were succeeded with the expected answers.
drop database/reload from CSV managed to remedy the situation. All core threads are now used during query.

Related

Nifi memory continues to expand

I used a three-node nifi cluster, the nifi version is 1.16.3, the hardware is 8core 32G memory, and the solid-state high-speed hard disk is 2T. OS is CentOS7.9, ARM64 hardware architecture.
The initial configuration of nifi is xms12g and xmx12G(bootstrip.conf).
Native installation, docker is not used, and only nifi installed on all thoese machines, using integrated zookeeper.
Run 20 workflow everyday from 00:00 to 03:00, and the total data size is 1.2G. Collect csv documents to the greenplum database.
My problem now is that the memory usage of nifi is increasing every day, 0.2G per day, and all three nodes are like this. Then the memory is slowly full and then the machine is dead. This procedure is about a month(when the memory is set to 12G.).
That is to say, I need to restart the cluster every month. I use a native processor and workflow.
I can't locate the problem. Who can help me?
I may have any descriptions. Please feel to let me know,thanks.
I have made the following attempts:
I set the initial memory to 18G or 6G, and the speed of workflow processing has not changed. The difference is that, after setting it to 18G, it will freeze for a shorter time.
I used openjre1.8, and I tried to upgrade it to 11, but it was useless.
i add the following configuration, and is also useless:
java.arg.7=-XX:ReservedCodeCacheSize=256m
java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
java.arg.9=-XX:+UseCodeCacheFlushing
Every day's timing tasks consume little resources. Even if the memory is adjusted to 6G, 20 tasks run at the same time, the memory consumption is about 30%, and it will run out in half an hour.

AWS ElasticSearch Java Process Limit

AWS documentation makes clear the following:
Java Process Limit
Amazon ES limits Java processes to a heap size of 32 GB. Advanced users can specify the percentage of the heap used for field data. For more information, see Configuring Advanced Options and JVM OutOfMemoryError.
Elastic search instance types span right up to 500GB memory - so my question (as a Java / JVM amateur) is how many Java processes does ElasticSearch run? I assume a 500GB ElasticSearch instance (r4.16xlarge.elasticsearch) is somehow going to make use of more than 32GB + any host system overhead?
Elasticsearch uses one java process (per node).
Indeed as quoted it is advised not to go over the 32GB RAM from performance efficiency reasons (the JVM would need to use 64bits pointers, which would decrease performance).
Another recommendation is to keep memory for the file system cache, which lucene uses heavily in order to load doc-values, and info from disk into memory.
Depending on your workload, it is better to run multiple VMs on a single 500gb server. you better use 64gb-128gb VMs, each divided between 31gb for Elasticsearch and the rest for the file system cache.
multiple VMs on a server means that each VM is Elasticsearch node.

Spark SQL performance with Simple Scans

I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, for a workload similar to TPCH (analytical queries with multiple/multi-way large joins and aggregations). Each machine has 12GB of Memory and 4 cores. My total data size is 150GB, stored in HDFS (stored as Hive tables), and I am running my queries through Spark SQL using hive context.
After checking the performance tuning documents on the spark page and some clips from latest spark summit, I decided to set the following configs in my spark-env:
SPARK_WORKER_INSTANCES=4
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=2500M
(As my tasks tend to be long so the overhead of starting multiple JVMs, one per worker is much less than the total query times). As I monitor the job progress, I realized that while the Worker memory is 2.5GB, the executors (one per worker) have max memory of 512MB (which is default). I enlarged this value in my application as:
conf.set("spark.executor.memory", "2.5g");
Trying to give max available memory on each worker to its only executor, but I observed that my queries are running slower than the prev case (default 512MB). Changing 2.5g to 1g improved the performance time, it is close to but still worse than 512MB case. I guess what I am missing here is what is the relationship between the "WORKER_MEMORY" and 'executor.memory'.
Isn't it the case that WORKER tries to split this memory among its executors (in my case its only executor) ? Or there are other stuff being done worker which need memory ?
What other important parameters I need to look into and tune at this point to get the best response time out of my HW ? (I have read about Kryo serializer, and I am about trying that - I am mainly concerned about memory related settings and also knobs related to parallelism of my jobs). As an example, for a simple scan-only query, Spark is worse than Hive (almost 3 times slower) while both are scanning the exact same table & file format. That is why I believe I am missing some params by leaving them as defaults.
Any hint/suggestion would be highly appreciated.
Spark_worker_cores is shared across the instances. Increase the cores to say 8 - then you should see the kind of behavior (and performance) that you had anticipated.

Cassandra - Optimizing hardware in cluster

I have been able to get Cassandra working on a macbook cluster (for fun). Now I am trying to operationalize this for research.
Currently, I have a single linux machine running intel 3770K lga 1150. I would like to create a cluster for the purpose on running cassandra. Can I use cheap machines (2-3 nodes with intel i5, 4tb hd, and 8 gb ram)? What is the best configuration to do this right the first time?
Is is possible to use the new nodes to operate cassandra and the current machine just utilize the data for analysis?
8gb ram is pretty low. Id recommend a minimum of 16gb (more the better) so you can safely allocate 8gb heap while leaving room for the offheap stuff. Especially if you want to store multiple TB of data on it you want more then 8gb. Some data models are worse then others. If using non-ssd's be sure to have a dedicated drive to the commitlog so its not competing with data. It will work with what you listed but you wont get good performance once theres a decent amount of data.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAbout_c.html
You can create multiple data centers to separate your different workloads. DSE workload snitch will do that for you if using datastax enterprise.

Would HBase/HDFS deployment make sense with 100mbit/s network interfaces?

I guess that 100Mbit/s network interface will be bottle neck for HDFS and slow down HBase on top of it (max compactions speed about 10MB/s, etc.). Would this deployment make sense?
I am thinking that "now" when when SSD comes in to game even 1Gbit/s network interfeces still can be bottleneck, so maybe building a cluster with 100Mbit/s should never be taken into account (even for HDD)?
To keep it short:
You should never use a SSD in HDFS, these flash memorys have a limited number of writes. HDFS has many writes, that's mainly because of the replication. If you are using HBase as a NoSQL DB this will result in even more writes.
The bottlenecks are as you said the harddisk and the network. Network is an even higher bottleneck because you are distributing the data, so it has to be replicated and if you are running jobs, they could be copied if the data is not locally available (Reducers have to copy much stuff).
So you should definitely for a better network than 10Mbit or 100Mbit. That implies your switch and the NICs on the nodes.
A hdd raid will not result in a higher bandwidth in writing, there were several benchmarks that proof that. Have a look at the HDFS Wiki, it must be described there.
100MB network is not likely to be a good setup for an hadoop cluster you can see cisco's presentation from Hadoop World for some analysis of network usage. That said depending on your actual load and cluster size it might be workable - though you might want to make sure you actually need Hadoop if that is the case.
regarding SSDs they cost more per MB and depending on your write load you may have to replace them sooner than HDDs but they will save you electricity - I guess it wouldn't be cost effective to use them in a large cluster (I don't know of anyone who did)
You can use SSDs for some of the disks e.g. for the temporary space on the cluster (such as map/reduce intermediate results) to get the IO benefits
Whether or not your network will be the bottleneck depends on the kinds of jobs you are running. If you do text processing (e.g. running Stanford NER or coreference suite), then a 100Mbit/s network will be the least of your concerns. However, if you are doing a lot of I/O intensive processing (most jobs with big reduce steps), then it will be. As always, it depends on your workload. But, I think it is safe to say that a 100Mb network is the most likely culprit for a bottleneck given recent processors and nodes with several disks.

Resources