Am using Docker 1.12.2 build bb80604 with swarm enabled
I have
a swarm cluster with 2 leader & 3 slave nodes.
named volume on each on slave & leader nodes.
Elistacsearch running on 2 master servers
Volume create command
docker volume create -d local-persist -o
mountpoint=/data/docker/swarm/elasticsearch --name esvolume
Now when i run docker service create command to create 5 replicas of Elasticsearch, 3 nodes start (1 on each slave server) whereas remaining 2 replicas fails
docker service create --replicas 5 --name esdata \
--restart-max-attempts 5 --network myesnetwork \
-e CLUSTER_NAME=swarmescluster \
-e MASTER_NODES=esmaster \
--mount type=volume,src=esvolume,dst=/var/lib/elasticsearch \
--mount type=volume,src=esvolume,dst=/var/log/elasticsearch \
myimagename
Error for failure is
Caused by: java.lang.IllegalStateException: failed to obtain node
locks, tried [[/var/lib/elasticsearch/swarmescluster]] with lock id
[0]; maybe these locations are not writable or multiple nodes were
started without increasing [node.max_local_storage_nodes] (was [1])?
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:259)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.node.Node.(Node.java:240) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.node.Node.(Node.java:220) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:191)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:191)
~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
~[elasticsearch-5.0.0.jar:5.0.0]
Questions
How can I configure replicas to write to same path or dynamic path (I need persistent data)
If I want to set value of 'node.max_local_storage_nodes' when a replica is created, how i do it an runtime?
Related
I am trying to setup a SYBASE cluster on RedHat 7.5 using Pacemaker. I want the Active/Passive mode, where SYBASE will be running only in a single node a the time, but when I configure in such way it's work fine during the configuration, but when the standby node reboots the SYBASE resource is trying to get started on node 2 which it should not happen once it´s up and running on node 1.
I have configured Pacemaker as:
- lvm-sybasedev-res and lvm-databasedev-res are there in order to give shared volume (iSCSI) access to the correct node where SYBASE will be running at the time.
- The sybase-res resource has been created using the bellow command:
Resource Group: sybase-rg
lvm-sybasedev-res (ocf::heartbeat:LVM): Started sdp-1
lvm-databasedev-res (ocf::heartbeat:LVM): Started sdp-1
sybase-IP (ocf::heartbeat:IPaddr2): Started sdp-1
sybase-res (ocf::heartbeat:sybaseASE): Started sdp-1
> pcs resource create sybase-res ocf:heartbeat:sybaseASE server_name="SYBASE" db_user="sa" \
db_passwd="password" sybase_home="/global/sdp/sybase" sybase_ase="ASE-15_0" \
sybase_ocs="OCS-15_0" interfaces_file="/global/sdp/sybase/interfaces" \
sybase_user="sybase" --group sybase-rg --disable
I have constraint colocation setup in order to keep all resource under sybase-rg resoure group on the same node.
I was expecting that if the sybase-rg is up and running on node-1 (sdp-1)... even the node-2 (sdp-2) reboots it should not affect sybase-res because it's the inactive node which is rebooting.
Do I miss something? Any help is welcome.
Regards,
I'm trying to use YARN node labels to tag worker nodes, but when I run applications on YARN (Spark or simple YARN app), those applications cannot start.
with Spark, when specifying --conf spark.yarn.am.nodeLabelExpression="my-label", the job cannot start (blocked on Submitted application [...], see details below).
with a YARN application (like distributedshell), when specifying -node_label_expression my-label, the application cannot start neither
Here are the tests I have made so far.
YARN node labels setup
I'm using Google Dataproc to run my cluster (example : 4 workers, 2 on preemptible nodes). My goal is to force any YARN application master to run on a non-preemptible node, otherwise the node can be shutdown at any time, thus making the application fail hard.
I'm creating the cluster using YARN properties (--properties) to enable node labels :
gcloud dataproc clusters create \
my-dataproc-cluster \
--project [PROJECT_ID] \
--zone [ZONE] \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 10 \
--num-preemptible-workers 2 \
--properties 'yarn:yarn.node-labels.enabled=true,yarn:yarn.node-labels.fs-store.root-dir=/system/yarn/node-labels'
Versions of packaged Hadoop and Spark :
Hadoop version : 2.8.2
Spark version : 2.2.0
After that, I create a label (my-label), and update the two non-preemptible workers with this label :
yarn rmadmin -addToClusterNodeLabels "my-label(exclusive=false)"
yarn rmadmin -replaceLabelsOnNode "\
[WORKER_0_NAME].c.[PROJECT_ID].internal=my-label \
[WORKER_1_NAME].c.[PROJECT_ID].internal=my-label"
I can see the created label in YARN Web UI :
Spark
When I run a simple example (SparkPi) without specifying info about node labels :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
In the Scheduler tab on YARN Web UI, I see the application launched on <DEFAULT_PARTITION>.root.default.
But when I run the job specifying spark.yarn.am.nodeLabelExpression to set the location of the Spark application master :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--conf spark.yarn.am.nodeLabelExpression="my-label" \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
The job is not launched. From YARN Web UI, I see :
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Diagnostics: Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ;
I suspect that the queue related to the label partition (not <DEFAULT_PARTITION, the other one) does not have sufficient resources to run the job :
Here, Used Application Master Resources is <memory:1024, vCores:1>, but the Max Application Master Resources is <memory:0, vCores:0>. That explains why the application cannot start, but I can't figure out how to change this.
I tried to update different parameters, but without success :
yarn.scheduler.capacity.root.default.accessible-node-labels=my-label
Or increasing those properties :
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-am-resource-percent
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.user-limit-factor
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.minimum-user-limit-percent
without success neither.
YARN Application
The issue is the same when running a YARN application :
hadoop jar \
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-shell_command "echo ok" \
-jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-queue default \
-node_label_expression my-label
The application cannot start, and the logs keeps repeating :
INFO distributedshell.Client: Got application report from ASM for, appId=6, clientToAMToken=null, appDiagnostics= Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ; , appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1520354045946, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, [...]
If I don't specify -node_label_expression my-label, the application start on <DEFAULT_PARTITION>.root.default and succeed.
Questions
Am I doing something wrong with the labels? However, I followed the official documentation and this guide
Is this a specific problem related to Dataproc? Because the previous guides seems to work on other environments
Maybe I need to create a specific queue and associate it with my label? But since I'm running a "one-shot" cluster to run a single Spark job I don't need to have specific queues, running jobs on the default root one is not a problem for my use-case
Thanks for helping
A Google engineer answered us (on a private issue we raised, not in the PIT), and gave us a solution by specifying an initialization script to Dataproc cluster creation. I don't think the issue comes from Dataproc, this is basically just YARN configuration. The script sets the following properties in capacity-scheduler.xml, just after creating the node label (my-label) :
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
From the comment going along with the script, this "set accessible-node-labels on both root (the root queue) and root.default (the default queue applications actually get run on)". The root.default part is what was missing in my tests. Capacity for both is set to 100.
Then, restarting YARN (systemctl restart hadoop-yarn-resourcemanager.service) is needed to validate the modifications.
After that, I was able to start jobs that failed to complete in my question.
Hope that will help people having the same issues or similar.
Hey I have a cluster id mismatch for some reason, i had it on 1 node then disapperead after clearing data dir few times , changing cluster token and node names, but apperead on another
here is the script i use
IP0=10.150.0.1
IP1=10.150.0.2
IP2=10.150.0.3
IP3=10.150.0.4
NODENAME0=node0
NODENAME1=node1
NODENAME2=node2
NODENAME3=node3
# changing these on each box
THISIP=$IP2
THISNODENAME=$NODENAME2
etcd --name $THISNODENAME --initial-advertise-peer-urls http://$THISIP:2380 \
--data-dir /root/etcd-data \
--listen-peer-urls http://$THISIP:2380 \
--listen-client-urls http://$THISIP:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://$THISIP:2379 \
--initial-cluster-token etcd-cluster-2 \
--initial-cluster $NODENAME0=http://$IP0:2380,$NODENAME1=http://$IP1:2380,$NODENAME2=http://$IP2:2380,$NODENAME3=http://$IP3:2380 \
--initial-cluster-state new
I get
2016-11-11 22:13:12.090515 I | etcdmain: etcd Version: 2.3.7
2016-11-11 22:13:12.090643 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-11-11 22:13:12.090713 I | etcdmain: listening for peers on http://10.150.0.3:2380
2016-11-11 22:13:12.090745 I | etcdmain: listening for client requests on http://10.150.0.3:2379
2016-11-11 22:13:12.090771 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-11-11 22:13:12.090960 I | etcdserver: name = node2
2016-11-11 22:13:12.090976 I | etcdserver: data dir = /root/etcd-data
2016-11-11 22:13:12.090983 I | etcdserver: member dir = /root/etcd-data/member
2016-11-11 22:13:12.090990 I | etcdserver: heartbeat = 100ms
2016-11-11 22:13:12.090995 I | etcdserver: election = 1000ms
2016-11-11 22:13:12.091001 I | etcdserver: snapshot count = 10000
2016-11-11 22:13:12.091011 I | etcdserver: advertise client URLs = http://10.150.0.3:2379
2016-11-11 22:13:12.091269 I | etcdserver: restarting member 7fbd572038b372f6 in cluster 4e73d7b9b94fe83b at commit index 4
2016-11-11 22:13:12.091317 I | raft: 7fbd572038b372f6 became follower at term 8
2016-11-11 22:13:12.091346 I | raft: newRaft 7fbd572038b372f6 [peers: [], term: 8, commit: 4, applied: 0, lastindex: 4, lastterm: 1]
2016-11-11 22:13:12.091516 I | etcdserver: starting server... [version: 2.3.7, cluster version: to_be_decided]
2016-11-11 22:13:12.091869 E | etcdmain: failed to notify systemd for readiness: No socket
2016-11-11 22:13:12.091894 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-11-11 22:13:12.096380 N | etcdserver: added member 7508b3e625cfed5 [http://10.150.0.4:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.099800 N | etcdserver: added member 14c76eb5d27acbc5 [http://10.150.0.1:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.100957 N | etcdserver: added local member 7fbd572038b372f6 [http://10.150.0.2:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.102711 N | etcdserver: added member d416fca114f17871 [http://10.150.0.3:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.134330 E | rafthttp: request cluster ID mismatch (got cfd5ef74b3dcf6fe want 4e73d7b9b94fe83b)
the other memebers are not even running, how that's possible ?
Thank you
For all those who stumble upon this from google:
The error is about peer member ID, that tries to join cluster with same name as another member (probably old instance) that already exists in cluster (with same peer name, but another ID, this is the problem).
you should delete the peer and re-add it like shown in this helpful post:
In order to fix this it was pretty simple, first we had to log into an existing working server on the rest of the cluster and remove server00 from its member list:
etcdctl member remove <UID>
This free's up the ability to allow the new server00 to join but we needed to simply tell the cluster it could by issuing the add command:
etcdctl member add server00 http://1.2.3.4:2380
It you follow the logs on server00 you'll then see that everything spring into life. You can confirm this with the commands:
etcdctl member list
etcdctl cluster-health
Use "etcdctl member list" to find what are the IDs of current members, and find the one which tries to join cluster with wrong ID, then delete that peer from "members" with "etcdctl member remove " and try to rejoin him.
Hope it helps.
I just ran into this same issue, 2 years later. Dmitry's answer is fine but misses what the OP likely did wrong in the first place when setting up an etcd cluster.
Running an etcd instance with "--cluster-state new" at any point, will generate a cluster ID in the data directory. If you try to then/later join an existing cluster, it will use that old generated cluster ID (which is when the mismatch error occurs). Yes, technically the OP had an "old cluster" but more likely, and 100% common, is when someone is trying to stand up their first cluster, they don't notice the procedure has to change. I find that etcd kind of generally fails in providing a good usage model.
So, removing the member (you don't really need to if the new node never joined successfully) and/or deleting the new node's data directory will "fix" the issue, but its how the OP setup the 2nd cluster node that the problem.
Here's an example of the setup nuance: (sigh... thanks for that etcd...)
# On the 1st node (I used Centos7 minimal, with etcd installed)
sudo firewall-cmd --permanent --add-port=2379/tcp
sudo firewall-cmd --permanent --add-port=2380/tcp
sudo firewall-cmd --reload
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
# turn on etcdctl v3 api support, why is this not default?!
export ETCDCTL_API=3
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state new
Ok, the first node is running. The cluster data is in the ~/data directory. In future runs you only need (note that cluster-state isn't needed):
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Next, add your 2nd node's expected cluster name and peer URLs:
etcdctl --endpoints="https://127.0.0.1:2379" member add etcd2 --peer-urls="http://<next node's IP address>:2380"
Adding the member is important. You won't be able to successfully join without doing it first.
# Next on the 2nd/new node
export CL_NAME=etcd1
export HOST=$(hostname)
export IP_ADDR=$(ip -4 addr show ens33 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=https://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380 --initial-cluster-state existing --initial-cluster="etcd1=http://<IP of 1st node>:2380,etcd2=http://$IP_ADD:2380"
Note the annoying extra arguments here. --initial-cluster must have 100% of all nodes in the cluster identified... which doesn't matter after you join the cluster because cluster data will be replicated anyways... Also "--initial-cluster existing" is needed.
Again, after the 1st time the 2nd node runs/joins, you can run it without any cluster arguments:
sudo etcd --name $CL_NAME --data-dir ~/data --advertise-client-urls=http://127.0.0.1:2379,https://$IP_ADDR:2379 --listen-client-urls=https://0.0.0.0:2379 --initial-advertise-peer-urls https://$IP_ADDR:2380 --listen-peer-urls https://$IP_ADDR:2380
Sure, you could keep running etcd with all the cluster settings in there, but they "might" get ignored for whats in the data directory. Remember that if you join a 3rd node, knowledge of the new node member is replicated to the remaining node, and those "initial" cluster settings could be completely false/misleading in the future when your cluster changes. So run your joined nodes with no initial cluster settings unless you are actually joining one.
Also, last bit to impart, you should/must run at least 3 nodes in a cluster, otherwise the RAFT leader election process will break everything. With 2 nodes, when 1 node goes down or they get disconnected, the node will not elect itself and spin in an election loop. Clients can't talk to an etcd service that's in election mode... Great availability! You need a minimum of 3 nodes to handle if 1 goes down.
in my case i got the error
rafthttp: request cluster ID mismatch (got 1b3a88599e79f82b want b33939d80a381a57)
due to incorrect config on one node
two my nodes got in config
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380"
and one node got
env ETCD_INITIAL_CLUSTER="etcd-01=http://172.16.50.101:2380"
to resolve the problem i stopped etcd on all nodes, edited incorrect config,
deleted /var/lib/etcd/member folder in all nodes , restarted etcd on all nodes and voila !
p.s.
/var/lib/etcd - is the folder where etcd save its data in my case
My --data-dir=/var/etcd/data, remove and recreate it, that works for me. It seems that something of previous etcd cluster I made left in this directory, which may affect the etcd settings.
I have faced the same problem, our leader etcd server went down and after replacing it with new we were getting an error
rafthttp: request sent was ignored (cluster ID mismatch)
It was looking for the old cluster-Id and generating some random local cluster with some misconfiguration.
Followed these steps to fix the issue.
Login to other working cluster and remove unreachable member from
the cluster
etcdctl cluster-health
etcdctl member remove member-id
Login to new server and stop if etcd process is running systemctl etcd2 stop
Remove data from the data directory rm -rf /var/etcd2/data Keep backup of this data somewhere in other folder before deleting it.
Now start your cluster with --initial-cluster-state existing parameter, don't use --initial-cluster-state new if you are already adding server to existing cluster.
Now go back to one of the running etcd server and add this new member to cluster etcdctl member add node0 http://$IP:2380
I have spent a lot of time on debugging this issue and now my cluster is running healthy with all members. Hope this information helps.
Add a new node to a existing etcd cluster.
etcdctl member add <new_node_name> --peer-urls="http://<new_node_ip>:2380"
Attention, if you enable TLS, replace http with https
Run etcd in new node. It is important to add "--initial-cluster-state existing", the purpose is telling new node that join the existing cluster, instead of creating a new cluster.
etcd --name <new_node_name> --initial-cluster-state existing ...
Check the result
etcdctl member list
I am setting up a hadoop testbed that has have two nodes / servers (A, B). Server B contains the docker daemon and other hadoop related services such as Data Node, Secondary name node and Node manager while server A has resource manager and name node. When a container is spawned / launched on Server B using DCE (Docker container executor), I want to attach a volume to it.
Can somebody kindly suggest on how I could do this in the DCE environment?
According to me we can add volume to Docker Container in 2 ways
1) Specify Volume in Docker file
FROM ubuntu
RUN mkdir /myvol
RUN echo "hello world" > /myvol/greeting
VOLUME /myvol
2) Specify the Volume during run time
docker run -it v /hdfs/foldername dockerrepositoryyname:version /bin/bash
For more details refer https://docs.docker.com/engine/reference/builder/
Hope this Help!!!...
I am very new to HIVE as well AWS-EMR. As per my requirement, i need to create Hive Metastore Outside the Cluster (from AWS EMR to AWS RDS).
I followed the instruction given in
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-dev-create-metastore-outside.html
I made changes in hive-site.xml and able to setup hive metaStore to Amazon RDS mysql server. To bring the changes in action, currently i am rebooting the complete cluster so hive start storing metastore to AWS-RDS. This way it is working.
But i want to avoid rebooting the cluster, is there any way i can restart the service?
Just for those who are gonna come from Google
To restart any EMR service
In order to restart a service in EMR, perform the following actions:
Find the name of the service by running the following command:
initctl list
For example, the YARN Resource Manager service is named “hadoop-yarn-resourcemanager”.
Stop the service by running the following command:
sudo stop hadoop-yarn-resourcemanager
Wait a few seconds, then start the service by running the following command:
sudo start hadoop-yarn-resourcemanager
Note: Stop/start is required; do not use the restart command.
Verify that the process is running by running the following command:
sudo status hadoop-yarn-resourcemanager
Check for the process using ps, and then check the log file for any errors in the log directory /var/log/.
Source : https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/
sudo stop hive-metastore
sudo start hive-metastore
On EMR 5.x I have found this to work:
hive --service metastore --stop
hive --service metastore --start
For me this approach worked:
Get the pid
Kill the process
Process restarts by itself
Commands for 1 & 2:
ps aux | grep MetaStore
sudo -u hive kill <pid from above>
Here if you are not familiar with ps you can use the following command which will show the headers for PID and only one line of the hive Metastore command:
ps aux | egrep "MetaStore|PID" | grep -v grep
Hive Server restarted by itself. Validate again by ps the pig would have changed.
ps aux | grep MetaStore
You don't have to restart the entire cluster. While launching the cluster, you can specify a hive-site.xml file with the details of RDS. If you are not following this option and making the changes manually after launching the cluster, you don't need to restart the entire cluster. Just restart the hive-metastore service alone. Hive metastore is running in the master node only
You can launch the cluster either by using multiple ways.
1) AWS console
2) Using API (Java, Python etc)
3) Using AWS cli
You can keep the hive-site.xml in S3 and perform this activity as a bootstrap step while launching the cluster. AWS api is providing the feature to specify custom hive-site.xml from S3 rather than the one created by default.
If you are using hive from the master machine alone, you don't have to make the changes in all the machines.
An example of specifying the hive-site.xml while launching EMR using aws cli is given below
aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig \
--use-default-roles --ec2-attributes KeyName=myKey \
--instance-type m3.xlarge --instance-count 3 \
--bootstrap-actions Name="Install Hive Site Configuration",Path="s3://elasticmapreduce/libs/hive/hive-script",\
Args=["--base-path","s3://elasticmapreduce/libs/hive","--install-hive-site","--hive-site=s3://mybucket/hive-site.xml","--hive-versions","latest"]