Apache Pinot server component consumes unexpected amount of memory - apache-pinot

Problem Description:
Docker is used to deploy Apache Pinot on production servers (VMs).
Pinot's official documentation has been followed for this purpose.
What has been done
Pinot servers consume more memory than the data and replication factor we have.
The things has been tried were the followings:
Defining Xms and Xmx flags for JVM in ‍JAVA_OPTS environment variables
Setup monitoring on machines in order to gain the observability
Remove the indices (like inverted index) from the table definition
System Specification:
we have 3 servers, 2 controllers and 2 brokers with the following specifications:
24 core CPU
64 gigabytes of Memory
738 gigabytes of SSD disk
Sample Docker-compose file on one of the servers:
version: '3.7'
services:
pinot-server:
image: apachepinot/pinot:0.11.0
command: "StartServer -clusterName bigdata-pinot-ansible -zkAddress 172.16.24.14:2181,172.16.24.15:2181 -configFileName /server.conf"
restart: unless-stopped
hostname: server1
container_name: server1
ports:
- "8096-8099:8096-8099"
- "9000:9000"
- "8008:8008"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx20G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log -javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml"
volumes:
- ./server.conf:/server.conf
- ./data/server_data/segment:/var/pinot/server/data/segment
- ./data/server_data/index:/var/pinot/server/data/index
table config:
{
"tableName": "<table-name>",
"tableType": "REALTIME",
"segmentsConfig": {
"schemaName": "<schema-name>",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "60",
"replication": "3",
"timeColumnName": "date",
"allowNullTimeValue": false,
"replicasPerPartition": "3",
"segmentPushType": "APPEND",
"completionConfig": {
"completionMode": "DOWNLOAD"
}
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {
"realtimeCompleted": "DefaultTenant_OFFLINE"
}
},
"tableIndexConfig": {
"noDictionaryColumns": [
<some-fileds>
],
"rangeIndexColumns": [
<some-fileds>
],
"rangeIndexVersion": 1,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"sortedColumn": [
"date",
"id"
],
"bloomFilterColumns": [],
"loadMode": "MMAP",
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false
},
"metadata": {},
"routing": {
"instanceSelectorType": "strictReplicaGroup"
},
"query": {},
"fieldConfigList": [],
"upsertConfig": {
"mode": "FULL",
"hashFunction": "NONE"
},
"ingestionConfig": {
"streamIngestionConfig": {
"streamConfigMaps": [
{
"streamType": "kafka",
"stream.kafka.topic.name": "<topic-name>",
"stream.kafka.broker.list": "<kafka-brokers-list>",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.decoder.prop.format": "JSON",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "1h",
"realtime.segment.flush.segment.size": "300M"
}
]
}
},
"isDimTable": false
}
server.conf file:
pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/var/pinot/server/data/index
pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
pinot.set.instance.id.to.hostname=true
After ingesting data from real-time stream (Kafka in our case) the data grows in the memory and the containers faced to OOMKilled error:
We have no clue about what is happening on the server, would someone help us finding the root cause of this problem?
P.S. 1: For following the complete process of how the Pinot is deployed you can see this repository on github.
P.S. 2: It is known that the size of data in Pinot can be calculated using the following formula:
Data size = size of data (retention) * retention period * replication factor
For example if we have data with retention of 2d (two days), and each day we have approximately 2 gigabytes of data, and the replication factor equals to 3, the data size is about 2 * 2 * 3 = 12 gigabytes

As it is described in the question, the problem is with creating the table not the Apache Pinot itself. Apache Pinot keeps the keys for Upsert operation on heap. In order to scale the performance, it is required to increase the Kafka partitions.
Based on the documentation, the default upsert mode is equals to None.

Related

How do I change root volume size of AWS Batch at runtime

I have an application which makes requests to AWS to start batch jobs. The jobs vary, and therefore resource requirements change for each job.
It is clear how to change CPUs and memory, however I cannot figure out how to specify root volume size, or if it is even possible
Here is an example of the code I am running:
import boto3
client = boto3.client('batch')
JOB_QUEUE = "job-queue"
JOB_DEFINITION="job-definition"
container_overrides = {
'vcpus': 1,
'memory': 1024,
'command': ['echo', 'Hello World'],
# 'volume_size': 50 # this is not valid
'environment': [ # this just creates env variables
{
'name': 'volume_size',
'value': '50'
}
]
}
response = client.submit_job(
jobName="volume-size-test",
jobQueue=JOB_QUEUE,
jobDefinition=JOB_DEFINITION,
containerOverrides=container_overrides)
My question is similar to this However I am specifically asking if this is possible at runtime. I can change the launch template however that doesn't solve the issue of being able to specify required resources when making the request. Unless the solution is to create multiple launch templates and then select that at run time, though that seems unnecessarily complicated
You can use AWS Elastic File System for this. EFS volumes can be mounted to the containers created for your job definition. EFS doesn't require you to provide a specific volume size because it automatically grows and shrinks depending on the usage.
You need to specify an Amazon EFS file system in your job definition through the efsVolumeConfiguration property
{
"containerProperties": [
{
"image": "amazonlinux:2",
"command": [
"ls",
"-la",
"/mount/efs"
],
"mountPoints": [
{
"sourceVolume": "myEfsVolume",
"containerPath": "/mount/efs",
"readOnly": true
}
],
"volumes": [
{
"name": "myEfsVolume",
"efsVolumeConfiguration": {
"fileSystemId": "fs-12345678",
"rootDirectory": "/path/to/my/data",
"transitEncryption": "ENABLED",
"transitEncryptionPort": integer,
"authorizationConfig": {
"accessPointId": "fsap-1234567890abcdef1",
"iam": "ENABLED"
}
}
}
]
}
]
}
Reference: https://docs.aws.amazon.com/batch/latest/userguide/efs-volumes.html

How to create Azure Databricks Job cluster to save some costing compared to Standard cluster?

I have a few pipeline jobs on Azure Databricks that run ETL solutions using standard or high concurency clusters.
I've noticed on azure costings page that job cluster is a cheaper option that should do the same thing. https://azure.microsoft.com/en-gb/pricing/calculator/
All Purpose - Standard_DS3_v2
0.75DBU
×
£0.292Per DBU per hour
×
=
£0.22
Job Cluster - Standard_DS3_v2
0.75DBU
×
£0.109Per DBU per hour
×
=
£0.08
I have configured job cluster by creating a new job and selecting new job cluster as per tutorial below: https://docs.databricks.com/jobs.html#create-a-job
The job was successfull and ran for couple days. However, the cost did not really go down. Have I missed anything?
Cluster Config
{
"autoscale": {
"min_workers": 2,
"max_workers": 24
},
"cluster_name": "",
"spark_version": "9.1.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.scheduler.mode": "FAIR",
"spark.sql.sources.partitionOverwriteMode": "dynamic",
"spark.databricks.service.server.enabled": "true",
"spark.databricks.repl.allowedLanguages": "sql,python,r",
"avro.mapred.ignore.inputs.without.extension": "true",
"spark.databricks.cluster.profile": "serverless",
"spark.databricks.service.port": "8787"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"enable_elastic_disk": true,
"cluster_source": "JOB",
"init_scripts": []
}

How to add "cold data node" to elasticsearch cluster using helm?

I would like to add COLD data node (NOT data node) to my elasticsearch cluster using helm:
My values.yaml:
...
roles:
master: "false"
ingest: "false"
data: "false"
remote_cluster_client: "false"
ml: "false"
data_cold: "true"
...
but when deploy it, i got this error:
java.lang.IllegalArgumentException: unknown setting [node.data_cold] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
Any idea please ?
Thank you in advance!
Assuming you're using the Elastic helm charts, I accomplished this by setting the following in my values.yml:
extraEnvs:
- name: 'node.attr.data'
value: '{{ ilm_phase }}'
and setting the following in my vars.yml for each individual data tier:
ilm_phase: 'cold' # ...or hot, or whatever...
And then finally, using a custom node attribute in my ILM policy.
It's not ideal, but it works well, even if it's not as nuanced as using node.roles. If someone else has a better method, I'm open to it.
Edit
I forgot that I also added the following template, which applies to all new indices created. This forces all new indices to be created on the hot data nodes.
PUT _template/ilm-set-index-ilm-hot
{
"order": 127,
"index_patterns": [ "*" ],
"settings": {
"index": {
"routing": {
"allocation": {
"require": {
"data": "hot"
}
}
}
}
},
"mappings": {},
"aliases": {}
}

ECS provisions multiple servers but never runs the task

I have an ECS cluster where the capacity provider is an auto-scaling group of ec2 servers with a Target Tracking scaling policy and Managed Scaling turned on.
The min capacity of the cluster is 0, the max is 100. The instance types it's employing are c5.12xlarge.
I have a task that uses 4 x vCPUs and 4 GiB memory. When I run a single instance of that task on that cluster, ECS very slowly auto scales the group to > 1 servers (usually 2 to begin with, and then eventually adds a third one - I've tried multiple times), but never actually runs the task and stays in a state of PROVISIONING for ages and ages before I get annoyed and stop the task.
Here is a redacted copy of my task description:
{
"family": "my-task",
"taskRoleArn": "arn:aws:iam::999999999999:role/My-IAM-Role",
"executionRoleArn": "arn:aws:iam::999999999999:role/ecsTaskExecutionRole",
"cpu": "4 vCPU",
"memory": 4096,
"containerDefinitions": [
{
"name": "my-task",
"image": "999999999999.dkr.ecr.us-east-1.amazonaws.com/my-container:latest",
"essential": true,
"portMappings": [
{
"containerPort": 12012,
"hostPort": 12012,
"protocol": "tcp"
}
],
"mountPoints": [
{
"sourceVolume": "myEfsVolume",
"containerPath": "/mnt/efs",
"readOnly": false
}
]
}
],
"volumes": [
{
"name": "myEfsVolume",
"efsVolumeConfiguration": {
"fileSystemId": "fs-1234567",
"transitEncryption": "ENABLED",
"authorizationConfig": {
"iam": "ENABLED"
}
}
}
],
"requiresCompatibilities": [
"EC2"
],
"tags": [
...
]
}
My questions are:
Why, if I'm running a single task that would easily run on once instance, is it scaling the group to at least 2 servers?
Why does it never just deploy and run my task?
Where can I look to see what the hell is going on with it (logs, etc)?
So it turns out that, even if you set an ASG to be the capacity provider for an ECS cluster, if you haven't set the User Data up in the launch configuration for that ASG to have something like the following:
#!/bin/bash
echo ECS_CLUSTER=my-cluster-name >> /etc/ecs/ecs.config;echo ECS_BACKEND_HOST= >> /etc/ecs/ecs.config;
then it will never make a single instance available to your cluster. ECS will respond by continuing to increase the desired capacity of the ASG.
Personally I feel like this is something that ECS should ensure happens without your knowledge. Maybe there's a good reason why not.

How to rotate ELK logs?

I have indexes around 250 GB all-together in 3 host i.e. 750 GB data in ELK cluster.
So how can I rotate ELK logs to keep three months data in my ELK cluster and older logs should be pushed some other place.
You could create your index using "indexname-%{+YYYY.MM}" naming format. This will create a distinct index every month.
You could then filter this index, based on timestamp, using a plugin like curator.
The curator could help you set up a CRON job to purge those older indexes or back them up on some s3 repository.
Reference - Backup or Restore using curator
Moreover, you could even restore these backup indexes whenever needed directly from s3 repo for historical analysis.
Answer by dexter_ is correct, but as the answer is old, a better answer would be:
version 7.x of elastic stack provides a index life cycle management policies, which can be easily managed with kibana GUI and is native to elk stack.
PS, you still have to manage the indices like "indexname-%{+YYYY.MM}" as suggested dexter_
elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html
It took me a while to figure out exact syntax and rules, so I'll post the final policy I used to remove old indexes (it's based on the example from https://aws.amazon.com/blogs/big-data/automating-index-state-management-for-amazon-opensearch-service-successor-to-amazon-elasticsearch-service/):
{
"policy": {
"description": "Removes old indexes",
"default_state": "active",
"states": [
{
"name": "active",
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "14d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": [
"mylogs-*"
]
}
}
}
It will automatically apply the policy for any new mylogs-* indexes, but you'll need to apply it manually for existing ones (under "Index Management" -> "Indices").

Resources