Creating dataproc cluster from a template - cluster-computing

How to change the cluster image version while creating a dataproc cluster from the existing cluster's template i.e. the yaml file?
Here my existing cluster is having an older version of dataproc image but I want to use the latest image in the new cluster. Is it possible to do so?

To change image version in YAML file, you should set or change it in the imageVersion field:
config:
# . . .
softwareConfig:
# . . .
imageVersion: <IMAGE_VERSION>
# . . .

Related

Apache Nifi Create packer<number>temp files in /tmp folder

I upgraded version of my nifi cluster (4 nodes) from 1.12 to 1.13 then nifi start to create a lot of files in /tmp folder.
The file's names is in this pattern packer(number)temp .
example of files:
packer1000351268325980549temp
packer2431509819824357743temp
The nifi write this file to the disk until is full, and then error (No space left on device) is coming.
what should I do to prevent this files to create ?
My cluster:
4 Nodes
Apache Nifi version: 1.13.2
OS: centos 7
thanks, Tom.
Looks you use packer in your nifi cluster. Packer creates images and stores them in /tmp. Just reconfigure your packer installation.

CDH4 : Add new node to existing cluster

I have successfully created hadoop cluster with CDH4 on ubuntu . I have created this with one master(master) and one slave(slave1) . Now I want to add one more cluster . For this I just cloned slave2 and updated hosts and ssh accordingly . Then I updated conf/slaves file with all datanode dns names in all nodes and restarted everything . But it's not detecting the new datanode instead it only shows the old one that is slave1 not slave2 . Can anyone please help me on this ?
I have used cdh4-repository_1.0_all.deb
#user2009755, you need to create a master and slave file only in the master. And in configuration files in $HADOOP_HOME/etc/hadoop, make necessary changes to the URI pointing to the master node.NOTE: Try to format the namenode and delete the tmp files (usually /tmp/*) but if you changed it in core-site.xml, format that directory in all nodes and start all the daemons, it worked for me.
There is so many reasons,
Have you change the dfs.replication value to 3 in conf/hdfs-site.xml??
check on master with cammands hduser#master:~$ ssh slave it should be show the slave terminal if not then execute this cammand -hduser#master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser#slave
for fully understand see this link
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Nutch 2.0 and Hadoop. How to prevent caching of conf/regex-urlfilter.txt

I have nutch 2.x and hadoop 1.2.1 on single machine.
I configure seed.txt, conf/regex-urlfilter.txt and run command
crawl urls/seed.txt TestCrawl http://localhost:8088/solr/ 2
Then I want to change rules in conf/regex-urlfilter.txt
I changed it in 2 files:
~$ find . -name 'regex-urlfilter.txt'
./webcrawer/apache-nutch-2.2.1/conf/regex-urlfilter.txt
./webcrawer/apache-nutch-2.2.1/runtime/local/conf/regex-urlfilter.txt
Then I run
crawl urls/seed.txt TestCrawl2 http://localhost:8088/solr/ 2
But changes in regex-urlfilter.txt doesn't affect.
Hadoop report that it use file.
cat /home/hadoop/data/hadoop-unjar6761544045585295068/regex-urlfilter.txt
When I see content of file I see old file
How to force hadoop to use new config?
This settings stored in arhive file
/home/hadoop/webcrawer/apache-nutch-2.2.1/build/apache-nutch-2.2.1.job
Run
ant clean
ant runtime
to replace it with new settings or edit arhive file /home/hadoop/webcrawer/apache-nutch-2.2.1/build/apache-nutch-2.2.1.job

where did the configuration file stored in CDH4

I setup a CDH4
Now I can configure the hadoop on the web page.
I want to know where did the cdh put the configuration file on the local file system.
for example, I want to find the core-site.xml, but where is it?
By default, the installation of CDH has the conf directory located in
/etc/hadoop/
You could always use the following command to find the file:
$ sudo find / -name "core-site.xml"

Cassandra nodetool snapshot creates snapshot directory but not in the data directory specified

I am using Cassandra nodetool to take a snapshot .
The way i use it is, cd to $CASSANDRA_HOME and then do bin/nodetool snapshot . It says a snapshot directory has been created but I don't find in the data directory.
What am I doing wrong?
Snapshots are in $DATA_DIR/$KEYSPACE/$COLUMN_FAMILY/snapshots/.

Resources