Clickhouse SET command on a cluster? - clickhouse

I have a Clickhouse cluster.
It's great but I'm struggling to enable allow_experimental_geo_types on the cluster.
I can do it on a single cluster
CREATE TABLE t1 (p Polygon, id UInt64) ENGINE=MergeTree() ORDER BY (id);
But as soon as I try to do this on a cluster we get an error.

put to all servers
$ cat /etc/clickhouse-server/users.d/allow_experimental_geo_types.xml
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<allow_experimental_geo_types>1</allow_experimental_geo_types>
</default>
</profiles>
</yandex>
no other way for on_cluster

Related

In xml file, append an attribute to node element if that attribute not exist using shell script

I have the below requirement on xml file using Shell script
Standalone.xml file
<?xml version='1.0' encoding='UTF-8'?>
<server xmlns="urn:jboss:domain:11.0">
<profile>
<subsystem xmlns="urn:jboss:domain:deployment-scanner:2.0">
<deployment-scanner path="deployments" relative-to="jboss.server.base.dir" scan-enabled="false" scan-interval="5000" deployment-timeout="600"/>
</subsystem>
...........Other subsystems tags.......
<server>
Would like to check node contains the attribute "deployment-timeout" or not
if exist read the value then check
if the value is <900
update the value to 900
else
Leave as it is
else
Append attribute with value i.e deployment-timeout="900" to node <deployment-scanner> at the end like this
*<deployment-scanner path="deployments" relative-to="jboss.server.base.dir" scan-enabled="false" scan-interval="5000" deployment-timeout="900"/>*
Please help me...
advance thanks

How to make clickhouse take new users.xml file?

Do I have to restart clickhouse to make it read any update to users.xml?
Is there a way to juse "reload" clickhouse?
These files are reloaded in runtime, no need to restart the server.
As you can notice config folder has several files, like
config-preprocessed.xml
config.xml
users-preprocessed.xml
users.xml
.*-preprocessed.xml are for parsed config so you will see when it is loaded and parsed.
I wouldn't recommend to modify files '/etc/clickhouse-server/config.xml' or 'etc/clickhouse-server/user.xml' because it will be rewritten after upgrading ClickHouse and you will lose custom settings.
The subfolder '/etc/clickhouse-server/config.d/' and '/etc/clickhouse-server/users.d/' serve to store overrides for 'config.xml' and 'user.xml' relatively.
Example overrides for 'config.xml':
config.d/config.xml
<?xml version="1.0"?>
<yandex>
<listen_host replace="replace">::</listen_host>
<dictionaries_config replace="replace">dictionaries/*.xml</dictionaries_config>
<openSSL>
<client>
<verificationMode replace="replace">none</verificationMode>
</client>
</openSSL>
</yandex>
config.d/cluster.xml
<?xml version="1.0"?>
<yandex>
<remote_servers>
<your_cluster>
<!-- topology definition -->
</your_cluster>
</remote_servers>
<zookeeper>
<!-- .. -->
</zookeeper>
</yandex>
config.d/kafka.xml
<?xml version="1.0"?>
<yandex>
<!-- The default configuration for Kafka Engine Table (https://clickhouse.yandex/docs/en/operations/table_engines/kafka/#configuration). -->
<kafka>
<bootstrap_servers>11.22.33.44:6667,11.22.33.55:6667,11.22.33.66:6667</bootstrap_servers>
<auto_offset_reset>latest</auto_offset_reset>
</kafka>
<!-- The Topics configurations. -->
<kafka_topic_name>
<group_id>clickhouse-group_id</group_id>
</kafka_topic_name>
</yandex>
Example overrides for 'users.xml':
users.d/user.xml
<?xml version="1.0"?>
<yandex>
<users>
<default>
<password replace="replace">hello_clickhouse</password>
</default>
<readonly>
<password replace="replace">hello</password>
</readonly>
</users>
</yandex>
Another example of config overrides.
See for details ClickHouse Configuration files.

OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.
My objective was however, to define some global properties in file referenced in job-xml tag in global section.
After long fight and reading many articles I still cannot make it work.
I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.
Hopefully, you can give me a hint.
In short:
I have properties, dbserver, dbuser and dbpassword defined in /user/dm/conf/environment.xml
These properties are referenced in my /user/dm/jobs/sqoop-test/workflow.xml
At runtime, I receive an EL_ERROR saying that dbserver variable cannot be resolved
Here are details:
I'm using Cloudera 5.7.1 distribution installed on single node.
environment.xml file was uploaded into hdfs into /user/dm/conf folder.
Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dbserver</name>
<value>someserver</value>
</property>
<property>
<name>dbuser</name>
<value>someuser</value>
</property>
<property>
<name>dbpassword</name>
<value>somepassword</value>
</property>
</configuration>
workflow.xml file was uploaded into /user/dm/jobs/sqoop-test-job. Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
<global>
<job-xml>/user/dm/conf/env.xml</job-xml>
</global>
<start to="get-data"/>
<action name="get-data">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputRootPath}"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
<arg>--query</arg>
<arg>select col1 from table where $CONDITIONS</arg>
<arg>--split-by</arg>
<arg>main_id</arg>
<arg>--target-dir</arg>
<arg>${outputRootPath}/table</arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
</kill>
<end name='end'/>
</workflow-app>
Now, I execute oozie workflow from command line:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
Where my job-config.xml is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>namenode:8020</value>
</property>
<property>
<name>jobTracker</name>
<value>jobtracker:8021</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
<name>outputRootPath</name>
<value>/user/dm/data/sqoop-test</value>
</property>
</configuration>
OK, you are making two big mistakes.
1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)
Workflow Functional Specification
has a section 19 about Global Configuration
has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
the XML schema specification clearly shows the <global> element
Sqoop action Extension
does not make any mention of Global parameters
has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema
In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global> thing that was introduced in Workflow schema V0.4
2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.
Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.
Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global> Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.
Two things to note:
when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)
What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.

Hadoop Multi-Node Cluster Installation on Ubuntu Issue - Troubleshoot

I have three Ubuntu 12.04 LTS computers that I want to install Hadoop on in a Master/Slave configuration as described here. It says to first install Hadoop as a single node and then proceed to multi-node. The single node installation works perfectly fine. I made the required changes to the /etc/hosts file and configured everything just as the guide says, but when I start the Hadoop cluster on the master, I get an error.
My computers, aptly named ironman, superman and batman, with batman (who else?) being the master node. When I do sudo bin/start-dfs.sh, the following shows up.
When I enter the password, I get this:
When I try sudo bin/start-all.sh, I get this:
I can ssh into the different terminals, but there's something that's not quite right. I checked the logs on superman/slave terminal and it says that it can't connect to batman:54310 and some zzz message. I figured my /etc/hosts is wrong but in fact, it is:
I tried to open port 54310 by changing iptables, but the output screens shown here are after I made the changes. I'm at my wit's end. Please tell me where I'm going wrong. Please do let me know if you need any more information and I will update the question accordingly. Thanks!
UPDATE: Here are my conf files.
core-site.xml Please note that I had put batman:54310 instead of the IP address. I only changed it because I thought I'd make the binding more explicit.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://130.65.153.195:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>130.65.153.195:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
My conf/masters file is simply batman and my conf/slaves file is just:
batman
superman
ironman
Hope this clarifies things.
First things first: Make sure you can ping the master from slave and slave from master. Login to each machine individually and ping the other 2 hosts. Make sure they are reachable via their hostnames. It is possible that you have not add /etc/hosts entries in the slaves.
Secondly, you need to setup passwordless SSH access. You can use ssh-keygen -t rsa and ssh-copy-id for this. This will help remove the password prompts. It is a good idea to create a separate user for this (and not use root).
If this doesn't help, please post your log output.

Session Clustering Tomcat + terracotta on single server

I want to make session clustering with terracotta and 2 tomcat on single server.
i following instruction from :
http://artur.ejsmont.org/blog/content/how-to-setup-terracotta-session-clustering-and-replication-for-apache-tomcat-6
This is my tc-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
<servers>
<server name="nodea" host="localhost">
<data>/home/meruvian/mydatafolder</data>
<logs>/home/meruvian/mylogsfolder</logs>
<l2-group-port>9530</l2-group-port>
</server>
<server name="nodeb" host="localhost">
<data>/home/meruvian/mydatafolder</data>
<logs>/home/meruvian/mylogsfolder</logs>
<l2-group-port>9530</l2-group-port>
</server>
</servers>
<clients>
<logs>/var/log/myclientlogsfolder</logs>
<modules>
<module name="tim-tomcat-6.0" version="2.2.0"/>
</modules>
</clients>
<application>
<dso>
<instrumented-classes>
<include>
<class-expression>*..*</class-expression>
</include>
<exclude>org.apache.coyote..*</exclude>
<exclude>org.apache.catalina..*</exclude>
<exclude>org.apache.jasper..*</exclude>
<exclude>org.apache.tomcat..*</exclude>
</instrumented-classes>
<web-applications>
<web-application>sessionapp</web-application>
</web-applications>
</dso>
</application>
</tc:tc-config>
Then when i try to execute command :
/start-tc-server.sh -f ~/Terracotta/terracotta-3.6.2/tc-config.xml
But i get error message like bellow :
Fatal Terracotta startup exception:
*******************************************************************************
You have not specified a name for your Terracotta server, and there are 2 servers defined in the Terracotta configuration file. The script can not automatically choose between the following server names: nodea, nodeb. Pass the desired server name to the script using the -n flag.
*******************************************************************************
What the meaning of
<web-application>sessionapp</web-application>
Is it my contex path of my app ?
Anyone can help me to solve this, to cluster session with tomcat + terracotta ?
Thanks
I am by no means an authority on Terracotta, but according to me,
there are 2 problems here:
You have specified 2 servers running on localhost, with no port specifications. Both will try to take up port 9510 (dso port) which will cause a problem. You need to specify different dso ports.
Assuming you do fix the port configuration, you have both your servers running on localhost, so terracotta needs to know the server you are trying to start.
Use this command to start nodea:
/start-tc-server.sh -f ~/Terracotta/terracotta-3.6.2/tc-config.xml -n nodea
Similarly for nodeb.
See if that helps.

Resources