Proper Sizing for Mainnet Chainlink Node on AWS - chainlink

This is a sizing question.
Chainlink docs say a min of 2 cores, 4 gigs of ram
AWS Chainlink quickstart defaults to 2 cores, 2 gigs of ram
What is the proper size for a mainnet production chainlink node?
Thanks
Chris
Anubis Soft

I would recommend you to follow the official documentation of Chainlink: https://docs.chain.link/docs/running-a-chainlink-node/
As you have already mentioned, 4GB ram is specified there as min requirements.
If you have too small amount of RAM left on your machine then the Chainlink node does not start cleanly and gets stuck in the initialization phase, as it can not execute and process the queries to the database.
"Of course, you can still try it with the AWS requirements"

Related

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

Suggestions required in increasing utilization of yarn containers on our discovery cluster

Current Setup
we have our 10 node discovery cluster.
Each node of this cluster has 24 cores and 264 GB ram Keeping some memory and CPU aside for background processes, we are planning to use 240 GB memory.
now, when it comes to container set up, as each container may need 1 core, so max we can have 24 containers, each with 10GB memory.
Usually clusters have containers with 1-2 GB memory but we are restricted with the available cores we have with us or maybe I am missing something
Problem statement
as our cluster is extensively used by data scientists and analysts, having just 24 containers does not suffice. This leads to heavy resource contention.
Is there any way we can increase number of containers?
Options we are considering
If we ask the team to run many tez queries (not separately) but in a file, then at max we will keep one container.
Requests
Is there any other way possible to manage our discovery cluster.
Is there any possibility of reducing container size.
can a vcore (as it's a logical concept) be shared by multiple containers?
Vcores are just a logical unit and not in anyway related to a CPU core unless you are using YARN with CGroups and have yarn.nodemanager.resource.percentage-physical-cpu-limit enabled. Most tasks are rarely CPU-bound but more typically network I/O bound. So if you were to look at your cluster's overall CPU utilization and memory utilization, you should be able to resize your containers based on the wasted (spare) capacity.
You can measure utilization with a host of tools but sar, ganglia and grafana are the obvious ones but you can also look at Brendan Gregg's Linux Performance tools for more ideas.

How to work with a group of people using Zeppelin?

I a trying to work with Zeppelin on my Hadoop Cluster:
1 edge node
1 name node
1 secondary node
16 data nodes.
Node specification:
CPU: Intel(R) Xeon(R) CPU E5345 # 2.33GHz, 8 cores
Memory: 32 GB DDR2
I have some issues with this tool when more than 20 people want to use it at the same time.
This is mainly when I am using pyspark - either 1.6 or 2.0.
Even if I set zeppelin.execution.memory = 512 mb and spark.executor memory = 512 mb is still the same. I have tried a few interpreter options (for pyspark), like Per User in scoped/isolated and others and still the same. It is a little better with globally option but still after a while I can not do anything there. I was looking on Edge Node and I saw that memory is going up very fast. I want to use Edge Node only as an access point.
If your deploy mode is yarn client, then your driver will always be the access point server (the Edge Node in your case).
Every notebook (per note mode) or every user (per user mode) instantiates a spark context allocating memory on the driver and on the executors. Reducing spark.executor.memory will alleviate the cluster but not the driver. Try reducing spark.driver.memory instead.
The Spark interpreter can be instantiated globally, per note or per user, I don't think sharing the same interpreter (globally) is a solution in your case, since you can only run one job at a time. Users would end up waiting for every one else's cells to compile before being able to do so themselves.

How much data can my Hadoop cluster handle?

I have a 4 node cluster configured to have 1 Namenode and 3 datanodes. Im performing a TPCH benchmark and i would like to know how much data you think my cluster can handle without affecting query response times. My total available HD size is about 700GB, each node has cpu with 8 cores and 16GB of RAM.
I saw some calculations that we could do to find the volume limit but i didnt understand IT, if someone could explain on a simple way how to calculate the data volume that a cluster can handle it would be very helpful.
Thank you
You can use 70 to 80 % of space in ur cluster to store the data, remaining will be used for processing and to store intermediate results in ur cluster.
This way performance will not be impacted
As you mentioned, you already configured your 4 node cluster. You can go and check in NN webUI-->Configured capacity section to find out the storage details, Let me know if you find any difficulties.

Elastic cloud vs elastic search on local server

Does the Elastic cloud and elastic search setup on local machine consumes the same speed for data gathering ?
Well it depends on (your local setup).
How many machines/nodes
How much CPU and memory per node. And one of most important is if nodes has SSD
It depends on network.
In ElasticCloud you can choose amount of memory and storage, but not amount of nodes (Nodes depends on amount of memory because its better to have one node with 32 gigs of memory than 3 nodes with 10 gb for instance.) Also important that EsCloud setup is using SSD.
So again all of that depends while local setup will give you more flexibility and control, but cloud could simplify your life.
One more option would be to go with AWS or Azure because you will be able to add remove nodes on demands so it would be a bit easier to experiment and see what setup is better for you.
To Sum up: if we are talking that you have same setup locally and same setup in cloud there will be no difference in terms of performance but, only one thing would be different its latency.

Resources