Installing ceph using kolla-ansible for all-in-one setup - ansible

I am trying to deploy the all-in-one configuration using kolla-ansible with ceph enabled
enable_ceph: "yes"
#enable_ceph_mds: "no"
enable_ceph_rgw: "yes"
#enable_ceph_nfs: "no"
enable_ceph_dashboard: "{{ enable_ceph | bool }}"
#enable_chrony: "yes"
enable_cinder: "yes"
enable_cinder_backup: "yes"
glance_backend_ceph: "yes"
gnocchi_backend_storage: "{{ 'ceph' if enable_ceph|bool else 'file' }}"
cinder_backend_ceph: "{{ enable_ceph }}"
cinder_backup_driver: "ceph"
nova_backend_ceph: "{{ enable_ceph }}"
And, my setup consists of a Virtual Box VM with Ubuntu 18.04.4 desktop version with 2 CPU cores, 30 GB Disk (single disk), 2GB RAM, the partitioning type is msdos.
ansible version==2.9.7
kolla-ansible version==9.1.0
In order to install ceph OSD using kolla-ansible i read that a partition should have the name KOLLA_CEPH_OSD_BOOTSTRAP_BS.
Hence, i created root partition with 20 GB i.e /dev/sda1 and then an extended partition /dev/sda2 for the rest 20GB and followed by two logical partitions (/dev/sda5 and /dev/sda6) each of 10GB for OSD. But in msdos type partitioning there is no feature to allocate name to partitions.
So my questions are:
How do I go about labeling the partition in case of msdos type partition in order for kolla-ansible to recognize that /dev/sda5 and /dev/sda6 is designated for Ceph-OSD ?
Is it a must to have a separate storage drive than the one containing Operating System for Ceph OSD (i know its not recommended to have all in single disk) ?
How do I have to provision my single drive HD space in order to install Ceph-OSD using kolla-ansible?
P.S: I also tried to install ceph using kolla-ansible using an OpenStack VM (4 CPU cores, 80GB disk space - single drive, as i didn"t install Cinder in my OpenStack infra.) and Ubuntu 18.04.4 Cloud image, which has GPT partition type and supports naming partitions. The partitions were as follows:
/dev/vda1 for root partition
/dev/vda2 for ceph OSD
/dev/vda3 for ceph OSD
But the drawback was that, kolla-ansible wiped out complete disk and resulted in failure in installation.
Any help is highly appreciated. Thanks a lot in advance.

I also had installed an Kolla-Ansible single-node all-in-one with Ceph as storage backend, so I had the same problem.
Yes, the bluestore installation of the ceph doesn't work with a single partition. I had also tried different ways of labeling, but for me it only worked with a whole disk, instead of a partition. So for your virtual setup create a whole new disk, for example /dev/vdb.
For labeling, I used the following as bash-script:
#!/bin/bash
DEV="/dev/vdb"
(
echo g # create GPT partition table
echo n # new partiton
echo # partition number (automatic)
echo # start sector (automatic)
echo +10G # end sector (use 10G size)
echo w # write changes
) | fdisk $DEV
parted $DEV -- name 1 KOLLA_CEPH_OSD_BOOTSTRAP_BS
Be aware, that DEV at the beginning is correctly set for your setup. This creates a new partiton table and one partition on the new disc with 10GB size. The kolla-ansible deploy-run register the label and wipe the whole disc, so the size-value has nothing to say and is only for the temporary partition on the disc.
One single disc is enough for the Ceph-OSD in kolla-ansible. You don't need a second OSD. For this, add the following config-file in your kolla-ansible setup in this path, when you used the default kolla installation path: /etc/kolla/config/ceph.conf with the content:
[global]
osd pool default size = 1
osd pool default min size = 1
This is to make sure, that there is only one OSD requested by kolla-ansible. If your kolla-directory with the globals.yml is not under /etc/kolla/, you have to change the path of the config-file as well.
Solution for setup with one single disc with multiple partitions is to switch the storage-type of the ceph-storage in the kolla-ansible setup from bluestore to the older filestore OSD type. This also requires different partition-labels like written here: https://docs.openstack.org/kolla-ansible/rocky/reference/ceph-guide.html#using-an-external-journal-drive .
With filestore you need one parition with the label KOLLA_CEPH_OSD_BOOTSTRAP_FOO and a small journal-partition with label KOLLA_CEPH_OSD_BOOTSTRAP_FOO_J (the FOO in the name is really required...). To be able to switch your kolla-installation to filestore OSD type, edit all-in-one file's [storage] section by adding ceph_osd_store_type=filestore next to the host as follows, to override the default bluestore.
[storage]
localhost ansible_connection=local ceph_osd_store_type=filestore
The above method has been tested with ansible==2.9.7 and kolla-ansible==9.1.0 and OpenStack Train release and prior releases.

Related

is it right to limit cleaning /tmp each day in hadoop cluster

We have HDP cluster version – 2.6.4
Cluster installed on redhat machines version – 7.2
We noticed about the following issue on the JournalNodes machines ( master machines )
We have 3 JournalNodes machines , and under /tmp folder we have thousands of empty folders as
drwx------. 2 hive hadoop 6 Dec 20 09:00 a962c02e-4ed8-48a0-b4bb-79c76133c3ca_resources
an also a lot of folders as
drwxr-xr-x. 4 hive hadoop 4096 Dec 12 09:02 hadoop-unjar6426565859280369566
with content as
beeline-log4j.properties BeeLine.properties META-INF org sql-keywords.properties
/tmp should be purged every 10 days according to the configuration file:
more /usr/lib/tmpfiles.d/tmp.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
# See tmpfiles.d(5) for details
# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d
# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp
You have new mail in /var/spool/mail/root
So we decrease the retention to 1d instead of 10d in order to avoid this issue
Then indeed /tmp have only folders content of one day
But I want to ask the following questions
Is it ok to configure the retention about /tmp in Hadoop cluster to 1day ?
( I almost sure it ok , but want to hear more opinions )
Second
Why HIVE generate thousands of empty folders as XXXX_resources ,
and is it possible to solve it from HIVE service , instead to limit the retention on /tmp
It is quite normal having thousands of folders in /tmp as long as there is still free space available for normal run. Many processes are using /tmp, including Hive, Pig, etc. One day retention period of /tmp maybe too small, because normally Hive or other map-reduce tasks can run more than one day, though it depends on your tasks. HiveServer should remove temp files but when tasks fail or aborted, the files may remain, also it depend on Hive version. Better to configure some retention, because when there is no space left in /tmp, everything stops working.
Read also this Jira about HDFS scratch dir retention.

Mesos slave node unable to restart

I've setup a Mesos cluster using the CloudFormation templates from Mesosphere. Things worked fine after cluster launch.
I recently noticed that none of the slave nodes are listed in the Mesos dashboard. EC2 console shows the slaves are running & pass health checks. I restarted nodes on cluster but that didn't help.
I ssh'ed into one of the slaves and noticed mesos-slave services are not running. Executed sudo systemctl status dcos-mesos-slave.service but that couldn't start the service.
Looked in /var/log/mesos/ and tail -f mesos-slave.xxx.invalid-user.log.ERROR.20151127-051324.31267 and saw the following...
F1127 05:13:24.242182 31270 slave.cpp:4079] CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to create temporary file: No space left on device
But the output of df -h and free show there is plenty of disk space left.
Which leads me to wonder, why is it complaining about no disk space?
Ok I figured it out.
When running Mesos for a long time or under frequent load, the /tmp folder won't have any disk space left since Mesos uses the /tmp/mesos/ as the work_dir. You see, the filesystem can only hold a certain number of file references(inodes). In my case, slaves were collecting large number of file chuncks from image pulls in /var/lib/docker/tmp.
To resolve this issue:
1) Remove files under /tmp
2) Set a different work_dir location
It is good practice to run
docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
this way you will free space by deleting unused docker images

Run MapR script (Google Cloud SDK) to create MapR Hadoop cluster on GCP does not work

I downloaded scripts from https://github.com/mapr/gce for run MapR script to create MapR Hadoop cluster on GCP.
I already credential Google account with GCP. gcloud auth list OK.
Run MapR script.
./launch-admin-training-cluster.sh --project stone-cathode-10xxxx --cluster MaprBank10 --config-file 4node_yarn.lst --image centos-6 --machine-type n1-standard-2 --persistent-disks 1x256
This's messages from Cygwin command line.
CHECK: -----
project-id stone-cathode-10xxxx
cluster MaprBank10
config-file 4node_yarn.lst
image centos-6 machine n1-standard-2
zone us-central1-b
OPTIONAL: -----
node-name none
persistent-disks 1x256
----- Proceed {y/N} ? y Launch node1
Creating persistent data volumes first (1x256) seq: not found
Launch node2
Creating persistent data volumes first (1x256) seq: not found
Launch node3
Creating persistent data volumes first (1x256) seq: not found
Launch node4
Creating persistent data volumes first (1x256) seq: not found
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
How to Investigate and solve issue. Thank you very much.
The course ADM-201 has particular requirement that you must follow.
The config file that you have chosen is 4node_yarn.lst and it supports 3 x 50 persistent-disks for each node in your 4-node MapR cluster. Since you are mentioning only one disk (1 x 256) in your command, it is not able to meet it's requirements.
Also carefully follow the "Set Up a Virtual Cluster" provided by MapR. Below is the screenshot from the guide provided by MapR.

Cloudera installation dfs.datanode.max.locked.memory issue on LXC

I have created virtual box, ubuntu 14.04LTS environment on my mac machine.
In virtual box of ubuntu, I've created cluster of three lxc-containers. One for master and other two nodes for slaves.
On master, I have started installation of CDH5 using following link http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
I have also made necessary changes in the /etc/hosts including FQDN and hostnames. Also created passwordless user named as "ubuntu".
While setting up the CDH5, during installation I'm constantly facing following error on datanodes. Max locked memory size: dfs.datanode.max.locked.memory of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
Exception in secureMain: java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 922746880 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1050)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:411)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)
Krunal,
This solution will be probably be late for you but maybe it can help somebody else so here it is. Make sure your ulimit is set correctly. But in case its a config issue.
goto:
/run/cloudera-scm-agent/process/
find latest config dir,
in this case:
1016-hdfs-DATANODE
search for parameter in this dir:
grep -rnw . -e "dfs.datanode.max.locked.memory"
./hdfs-site.xml:163: <name>dfs.datanode.max.locked.memory</name>
and edit the value to the one he is expecting in your case(65536)
I solved by opening a seperate tab in Cloudera and set the value from there

Installing Hadoop over 5 hard drives on a desktop

I have been working with installing Hadoop. I followed some instruction on a Udemy course, and I installed Hadoop on pseudo distributed mode, on my laptop. It was fairly straightforward.
After that, I started to wonder if I could set up Hadoop on a desktop computer. So went out and bought an empty case and put in a 64 bit, 8 core AMD processor, along with a 50GB SSD hard drive and 4 inexpensive 500GB hard drives. I installed Ubuntu 14.04 on the SSD drive, and put virtual machines on the other drives.
I'm envisioning using my SSD as the master and using my 4 hard drives as nodes. Again, everything is living in the same case.
Unfortunately, and I've been searching everywhere, and I can't find any tutorials, guides, books, etc, that describe setting up Hadoop in this manner. It seems like most everything I've found that details installation of Hadoop is either a simple pseudo distributed setup (which I've already done), or else the instructions jump straight to large scale commercial applications. I'm still learning the basics, clearly, but I'd like to play in this sort-of in between place.
Has anyone done this before, and/or come across any documentation / tutorials / etc that describe how to set Hadoop up in this way? Many thanks in advance for the help.
You can run hadoop in different VM's which are located in different drives in the same system.
But you need to allocate same configurations for all the master and slave nodes
Also ensure that all the VM's having different ip addresses.
You can get different IP addresses by connecting your master computer to the LAN or you need to disable some functionality in VM machines in order to get different IP addresses.
if you done the hadoop installation in pseduo mode means then follow the below steps this may help you.
MULTINODE :
Configure the hosts in the network using the following settings in the host file. This has to be done in all machine [in namenode too].
sudo vi /etc/hosts
add the following lines in the file:
yourip1 master
yourip2 slave01
yourip3 slave02
yourip4 slave03
yourip5 slave04
[Save and exit – type ESC then :wq ]
Change the hostname for the namenode and datanodes.
sudo vi /etc/hostname
For master machine [namenode ] – master
For other machines – slave01 and slave02 and slave03 and slave04 and slave 05
Restart the machines in order to get the settings related to the network applied.
sudo shutdown -r now
Copy the keys from the master node to all datanodes, so as this will help to access the machines without asking for permissions everytime.
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave01
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave02
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave03
#ssh-copy-id –i ~/.ssh/id_rsa.pub hduser#slave04
Now we are about to configure the hadoop configuration settings, so navigate to the ‘conf’ folder.
cd ~/hadoop/etc
Edit the slaves file within the hadoop directory.
vi ~/hadoop/conf/slaves
And add the below :
master
slave01
slave02
slave03
slave04
Now update localhost to master in core-site.xml,hdfs-site.xml,mapred-site.xml and yarn-site.xml
Now copy the files from the hadoop/etc/hadoop folder from master to slave machines.
then format you namenode in all machines.
and start the hadoop services.
I given you the some clues for how to configure the hadoop multinode cluster.
Never tried, but if you type ifconfig then it gives you same ipaddress on all the vm machines in hard drives. So this may not be the better option to go..
You can try creating Hadoop Cluster on Amazon EC2 for free using this step by step guide HERE
Or Video guide HERE
Hope it helps!

Resources