HiBench Benchmar suite error: INPUT_HDFS: unbound variable

HiBench Benchmar suite error: INPUT_HDFS: unbound variable - hadoop

I have installed Hadoop 2.7.1 on Ubuntu Virtual Machine. I want to execute Kmeans algorithm with HiBench, but when I execute the script prepare.sh, I have the following error:
patching args=
Parsing conf: /home/hduser/HiBench/conf/00-default-properties.conf
Parsing conf: /home/hduser/HiBench/conf/01-default-streamingbench.conf
Parsing conf: /home/hduser/HiBench/conf/10-data-scale-profile.conf
Parsing conf: /home/hduser/HiBench/conf/20-samza-common.conf
Parsing conf: /home/hduser/HiBench/conf/30-samza-workloads.conf
Parsing conf: /home/hduser/HiBench/workloads/kmeans/conf/00-kmeans-default.conf
Parsing conf: /home/hduser/HiBench/workloads/kmeans/conf/10-kmeans-userdefine.conf
Probing spark verison, may last long at first time...
probe sleep jar: /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar
Traceback (most recent call last):
File "/home/hduser/HiBench/bin/functions/load-config.py", line 556, in <module>
load_config(conf_root, workload_root, workload_folder, patching_config)
File "/home/hduser/HiBench/bin/functions/load-config.py", line 165, in load_config
check_config()
File "/home/hduser/HiBench/bin/functions/load-config.py", line 172, in check_config
assert HibenchConf.get(prop_name, None) is not None, "Mandatory configure missing: %s" % prop_name
AssertionError: Mandatory configure missing: hibench.hdfs.master
/home/hduser/HiBench/bin/functions/workload-functions.sh: line 39: .: filename argument required
.: usage: . filename [arguments]
start HadoopPrepareKmeans bench ./prepare.sh: line 25: INPUT_HDFS: unbound variable
I have set the configurations on the file 99-user_defined_properties.conf.template. The configurations are :
# Hadoop home
hibench.hadoop.home /usr/local/hadoop/bin
# Spark home
hibench.spark.home /PATH/TO/YOUR/SPARK/ROOT
# HDFS master, set according to hdfs-site.xml
hibench.hdfs.master hdfs://localhost:54310
# Spark master
# standalone mode: `spark://xxx:7077`
# YARN mode: `yarn-client`
# unset: fallback to `local[1]`
hibench.spark.master yarn-client
How can I solver this?

AssertionError: Mandatory configure missing: hibench.hdfs.master
You need to fix this configuration error.
Did you name your file properly? 99-user_defined_properties.conf.template is the template, the actual configuration file is supposedly named 99-user_defined_properties.conf.

hibench.hdfs.master sets IP address of the HDFS master node. The default value is http://127.0.0.1:8020. But if your cluster has a different address, you have to update it in hadoop.conf. Usually, you can find the right IP address in Hadoop's configuration file core-site.xml.

Even i faced the same error. I could solve it by manually setting hibench.master.hostname and hibench.slaves.hostname in hibench.conf file. Ensure that the hdfs port in hadoop.conf file is specified correctly as specified in hadoop configuration files.

Related

Could not find or load main class hdfs problem

I am trying to use Apache Rya for some tests (https://rya.apache.org/).
For those who are familiar with Rya and RDF stores, I am trying to do a bulk loading which is explained here: https://github.com/apache/rya/blob/master/extras/rya.manual/src/site/markdown/loaddata.md.
Briefly, I should copy a Jar file 'mapreduce/target/rya.mapreduce--shaded.jar' into an hdfs volume then run the following command:
hadoop hdfs://volume/rya.mapreduce-<version>-shaded.jar org.apache.rya.accumulo.mr.tools.RdfFileInputTool -Dac.zk=localhost:2181 -Dac.instance=accumulo -Dac.username=root -Dac.pwd=secret -Drdf.tablePrefix=rya_ -Drdf.format=N-Triples hdfs://volume/dir1,hdfs://volume/dir2,hdfs://volume/file1.nt
Well I copied the needed Jar and the input files into hdfs and verified that they are really there using bin/hadoop fs -put command. My problem is that when I run the cmd in the official example I get the following lines of error that I could not understand or resolve.
/project/hadoop/libexec/hadoop-functions.sh: line 2393: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_USER: invalid variable name
/project/hadoop/libexec/hadoop-functions.sh: line 2358: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_USER: invalid variable name
/project/hadoop/libexec/hadoop-functions.sh: line 2453: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_OPTS: invalid variable name
Error: Could not find or load main class hdfs:..localhost:9000.user.rya.mapreduce-4.0.0-incubating-shaded.jar
For information; all env variables are properly set, HADOOP_HOME and HADOOP_PREFIX

rsyslog 8.34.0: could not load module '/usr/lib/rsyslog/omuxsock.so'

My project requires forwarding log using rsyslog to a socket. rsyslog provides omuxsock output module for the same. When I try to use it using standard example, I see below error.
rsyslogd: could not load module '/usr/lib/rsyslog/omuxsock.so', dlopen: Error loading shared library /usr/lib/rsyslog/omuxsock.so: No such file or directory [v8.34.0 try http://www.rsyslog.com/e/2066 ]
Could someone please help me in solving this issue?
System Info
Alpine container = v3.8
rsyslog = 8.34.0-r0
Full log :-
/ # rsyslogd -N6 | head -10
rsyslogd: version 8.34.0, config validation run (level 6), master config /etc/rsyslog.conf
rsyslogd: could not load module '/usr/lib/rsyslog/omuxsock.so', dlopen: Error loading shared library /usr/lib/rsyslog/omuxsock.so: No such file or directory [v8.34.0 try http://www.rsyslog.com/e/2066 ]
rsyslogd: invalid or yet-unknown config file command 'OMUxSockSocket' - have you forgotten to load a module? [v8.34.0 try http://www.rsyslog.com/e/3003 ]
rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 30: errors occured in file '/etc/rsyslog.conf' around line 30 [v8.34.0 try http://www.rsyslog.com/e/2207 ]
rsyslogd: error during parsing file /etc/rsyslog.d/rsyslog_stats.conf, on or before line 15: invalid character '\' in expression - is there an invalid escape sequence somewhere? [v8.34.0 try http://www.rsyslog.com/e/2207 ]

Ansible: error when deploying playbooks in parallel

i am setting up a kubernetes cluster with ansible.
This is running fine.
Now i usually have 2 or 3 clusters i can test different things with.
Often it happens at some point in time that the cluster/server gots broken. If that happens, i usually recreate the servers and start the playbook again. because this takes some time, i want to be able to run 2 or more playbooks in parallel.
But every time i do this, i get the following error:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: FileNotFoundError: [Errno 2] No such file or directory
I run my playbook like this:
"$ansible_playbook"
-i "${ANSIBLE_HOSTS}"
"${ANSIBLE_YML}"
--flush-cache
--user root
--become
--become-user root
--ask-sudo-pass
What could be the reason for the error?
I can imagine, that ansible creates some files in the background, used by the different playbooks. But which files could that be?
thx in advance!
Update more detailed error log (-vvv)
ansible-playbook 2.7.8
config file = /home/mod/cod/wo/thingylabs/kubernetes-provisioning/playbooks/test1/ansible.cfg
configured module search path = ['/home/mod/cod/wo/thingylabs/kubernetes-provisioning/vendors/kubespray/library']
ansible python module location = /usr/lib/python3.7/site-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 3.7.2 (default, Jan 10 2019, 23:51:51) [GCC 8.2.1 20181127]
Using /home/mod/cod/wo/thingylabs/kubernetes-provisioning/playbooks/test1/ansible.cfg as config file
SUDO password:
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory
the full traceback was:
Traceback (most recent call last):
File "/usr/bin/ansible-playbook",
exit_code = cli.run()
File "/usr/lib/python3.7/site-packages/ansible/cli/playbook.py", line 104, in run
loader, inventory, variable_manager = self._play_prereqs(self.options)
File "/usr/lib/python3.7/site-packages/ansible/cli/__init__.py", line 786, in _play_prereqs
inventory = InventoryManager(loader=loader, sources=options.inventory)
File "/usr/lib/python3.7/site-packages/ansible/inventory/manager.py", line 148, in __init__
self.parse_sources(cache=True)
File "/usr/lib/python3.7/site-packages/ansible/inventory/manager.py", line 207, in parse_sources
source = unfrackpath(source, follow=False)
File "/usr/lib/python3.7/site-packages/ansible/utils/path.py", line 47, in unfrackpath
basedir = op.getcwd()
FileNotFoundError: [Errno 2] No such file or directory

Python (boto) TypeError launching Spark Cluster

Following is attempt to launch cluster with ten slaves.
12:13:44/sparkup $ec2/spark-ec2 -k sparkeast -i ~/.ssh/myPem.pem \
-s 10 -z us-east-1a -r us-east-1 launch spark2
Here is output. Note that the same command had been successful with the February Master code. Today I had updated to latest 1.4.0-SNAPSHOT
Setting up security groups...
Searching for existing cluster spark2 in region us-east-1...
Spark AMI: ami-5bb18832
Launching instances...
Launched 10 slaves in us-east-1a, regid = r-68a0ae82
Launched master in us-east-1a, regid = r-6ea0ae84
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.........unable to load cexceptions
TypeError
p0
(S''
p1
tp2
Rp3
(dp4
S'child_traceback'
p5
S'Traceback (most recent call last):\n File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1280, in _execute_child\n sys.stderr.write("%s %s (env=%s)\\n" %(executable, \' \'.join(args), \' \'.join(env)))\nTypeError\n'
p6
sb.Traceback (most recent call last):
File "ec2/spark_ec2.py", line 1444, in <module>
main()
File "ec2/spark_ec2.py", line 1436, in main
real_main()
File "ec2/spark_ec2.py", line 1270, in real_main
cluster_state='ssh-ready'
File "ec2/spark_ec2.py", line 869, in wait_for_cluster_state
is_cluster_ssh_available(cluster_instances, opts):
File "ec2/spark_ec2.py", line 833, in is_cluster_ssh_available
if not is_ssh_available(host=dns_name, opts=opts):
File "ec2/spark_ec2.py", line 807, in is_ssh_available
stderr=subprocess.STDOUT # we pipe stderr through stdout to preserve output order
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1328, in _execute_child
raise child_exception
TypeError
The AWS console shows that instances are actually running. So it is unclear what actually failed.
Any hints or workarounds appreciated.
UPDATE This same error occurs when doing login command. It seems to be problem with the boto API - but the cluster itself appears to be OK.
ec2/spark-ec2 -i ~/.ssh/sparkeast.pem login spark2
Searching for existing cluster spark2 in region us-east-1...
Found 1 master, 10 slaves.
Logging into master ec2-54-87-46-170.compute-1.amazonaws.com...
unable to load cexceptions
TypeError
p0
(.. same exception stacktrace as above )

The issue is that the python-2.7.6 installation on my yosemite macbook appears to have become corrupted.
I reset the PATH and PYTHONPATH to point to a custom homebrew installed python version and then the boto - and other python commands including building spark performance project - work fine.

Setting elasticsearch properties in spark-submit

I'm trying to launch Spark jobs that use Elastic Search input via command line using spark-submit as described in http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
I'm setting the properties in a file, but when launching spark-submit it gives the following warnings:
~/spark-1.0.1-bin-hadoop1/bin/spark-submit --class Main --properties-file spark.conf SparkES.jar
Warning: Ignoring non-spark config property: es.resource=myresource
Warning: Ignoring non-spark config property: es.nodes=mynode
Warning: Ignoring non-spark config property: es.query=myquery
...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed
My config file looks like (with correct values):
es.nodes nodeip:port
es.resource index/type
es.query query
Setting the properties in the Configuration object in the code works, but I need to avoid this workaround.
Is there a way to set those properties via command line?

I don't know if you resolved your issue (if so, how?), but I found this solution:
import org.elasticsearch.spark.rdd.EsSpark
EsSpark.saveToEs(rdd, "spark/docs", Map("es.nodes" -> "10.0.5.151"))
Bye

When you pass a config file to spark-submit, it only loads configs that start with 'spark.'
So, in my config I simply use
spark.es.nodes <es-ip>
and in the code itself I have to do
val conf = new SparkConf()
conf.set("es.nodes", conf.get("spark.es.nodes"))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

HiBench Benchmar suite error: INPUT_HDFS: unbound variable - hadoop

AssertionError: Mandatory configure missing: hibench.hdfs.master You need to fix this configuration error. Did you name your file properly? 99-user_defined_properties.conf.template is the template, the actual configuration file is supposedly named 99-user_defined_properties.conf.

hibench.hdfs.master sets IP address of the HDFS master node. The default value is http://127.0.0.1:8020. But if your cluster has a different address, you have to update it in hadoop.conf. Usually, you can find the right IP address in Hadoop's configuration file core-site.xml.

Even i faced the same error. I could solve it by manually setting hibench.master.hostname and hibench.slaves.hostname in hibench.conf file. Ensure that the hdfs port in hadoop.conf file is specified correctly as specified in hadoop configuration files.

Related

Could not find or load main class hdfs problem

rsyslog 8.34.0: could not load module '/usr/lib/rsyslog/omuxsock.so'

Ansible: error when deploying playbooks in parallel

Python (boto) TypeError launching Spark Cluster

Setting elasticsearch properties in spark-submit

Categories

Resources