How can I get Jmeter info Logs while running distributed testing? - jmeter

I want to run Jmeter distributed testing, I want Jmeter to write info logs on log file but in distributed mode it only provides us logs which is related to connection, it doesnot really gives the log of execution.
How can I get the actual logs??
Thanks in advance.

The execution log is being written on slave side, if you run slave via jmeter-server.bat or jmeter-server you should see jmeter-server.log file in the folder where you launched slave instance from.
If you don't see the log file you can specify its name and location via -j command line argument like:
jmeter -s -j jmeter-server.log ......
More information:
Remote Testing
How to Perform Distributed Testing in JMeter
JMeter Distributed Testing Step-by-step

Related

Jmeter Distributed Testing -vary thread counts between slaves jmeter

I am running Jmeter in Distributed Mode. What Jmeter does is- distribute the Number of threads(Users) equally between the slaves. What I want is to distribute it partially. For Eg- Total users- 10, Slave1 - 8, Slave2- 2.
JMeter slaves are totally independent, therefore if you have 10 threads in the thread group 1st slave will execute 10 threads and 2nd slave will execute 10 threads so you will have 20 threads in total.
If you want to distribute load between slaves in uneven way you can do it as follows:
Define threads property in the Thread Group using __P() function like:
${__P(threads,)}
On each remote slave set this threads property in user.properties file (located in JMeter's "bin" folder), like:
on slave 1:
threads=8
on slave 2:
threads=2
Alternatively you can pass the property value via -J command-line argument like:
on slave 1:
jmeter -Jthreads=8 -s .....
on slave 2:
jmeter -Jthreads=2 -s .....
See Apache JMeter Properties Customization Guide for more information on setting and overriding JMeter properties.

How to configure the number of threads to run on slave machines in Jmeter?

How can I control the number of threads running on each jmeter slave machines.
i.e. if I have 300 threads in total and 2 slave machines, I want the load to be distributed evenly on both slave machine - 150 threads to run on slave machine A & 150 Threads to run on slave machine B.
I have tried running in non gui mode also with the below commands
Jmeter -n -t TESTING.jmx -R 10.27.30.93 –J 6
to make it run on a specific slave server for 6 threads, but its not working.
It invokes the same number of threads saved in the test plan
Set "Number of Threads" for Thread Group(s) using __P() function like
${__P(threads,)}
Amend your JMeter startup script invocation as follows:
jmeter -n -t TESTING.jmx -R 10.27.30.93 –Gthreads=6
As per JMeter command-line help:
-G, --globalproperty <argument>=<value>
Define Global properties (sent to servers)
e.g. -Gport=123
or -Gglobal.properties
Another option is configure desired number of threads for each remote engine in user.properties file (lives under /bin folder of JMeter installation).
See Apache JMeter Properties Customization Guide for more information on setting and/or overriding JMeter Properties.

Spark - Add Worker from Local Machine (standalone spark cluster manager)?

When running spark 1.4.0 in a single machine, I can add worker by using this command "./bin/spark-class org.apache.spark.deploy.worker.Worker myhostname:7077". The official documentation points out another way by adding "myhostname:7077" to the "conf/slaves" file followed by executing the command "sbin/start-all.sh" which invoke the master and all workers listed in conf/slaves file. However, the later method doesn't work for me (with time-out error). Can anyone help me with this?
Here is my conf/slaves file (assume the master URL is myhostname:700):
myhostname:700
The conf.slaves file should just be the list of the hostnames, you don't need to include the port # that spark runs on (I think if you do it will try and ssh on that port which is probably where the timeout comes from).

Error: Jobflow entered COMPLETED while waiting to ssh

I just started to practice AWS EMR.
I have a sample word-count application set-up, run and completed from the web interface.
Following the guideline here, I have setup the command-line interface.
so when I run the command:
./elastic-mapreduce --list
I receive
j-27PI699U14QHH COMPLETED ec2-54-200-169-112.us-west-2.compute.amazonaws.comWord count
COMPLETED Setup hadoop debugging
COMPLETED Word count
Now, I want to see the log files. I run the command
./elastic-mapreduce --ssh --jobflow j-27PI699U14QHH
Then I receive the following error:
Error: Jobflow entered COMPLETED while waiting to ssh
Can someone please help me understand what's going on here?
Thanks,
When you setup a job on EMR, this means that Amazon is going to provision a cluster on-demand for you for a limited amount of time. During that time, you are free to ssh to your cluster and look at the logs as much as you want, but by the time your job has finished running, then your cluster is going to be taken down ! At that point, you won't be able to ssh anymore because your cluster simply won't exist.
The workflow typically looks like this:
Create your jobflow
It will be for a few minutes in status STARTING. At that point if you try to run ./elastic-mapreduce --ssh --jobflow <jobid> it will simply wait because the cluster is not available yet.
After a while the status will switch to RUNNING. If you had already started the ssh command above it should automatically connect you to your cluster. Otherwise you can initiate your ssh command now and it should connect you directly without any wait.
Depending on the nature of your job, the RUNNING step could take a while or be very short, it depends what amount of data you're processing and the nature of your computations.
Once all your data has been processed, the status will switch to SHUTTING_DOWN. At that point, if you already sshed before you will get disconnected. If you try to use the ssh command at that point, it will not connect.
Once the cluster has finished shutting down it will enter a terminal state of either COMPLETED or FAILED depending on whether your job succeeded or not. At that point your cluster is no longer available, and if you try to ssh you will get the error you are seeing.
Of course there are exceptions, you could setup an EMR cluster in interactive mode, for example you just want to have Hive setup and then ssh there and run Hive queries and you would have to take your cluster down manually. But if you just want a MapReduce job to run, then you will only be able to ssh for the duration of the job.
That being said, if all you want to do is debugging, there is not even a need to ssh in the first place ! When you create your jobflow, you have the option to enable debugging, so you could do something like that:
./elastic-mapreduce --create --enable-debugging --log-uri s3://myawsbucket
What that means is that all the logs for your job will end up being written to the S3 bucket specified (you have to own this bucket of course and have permission to write to it). Also if you do that, you can go into the AWS console afterwards in the EMR section, and you will be able to see next to your job a button to debug as shown below in the screenshot, this should make your life much easier:

How to tell if I am about to run Hadoop streaming job on a cluster or in "local" mode?

Hadoop streaming will run the process in "local" mode when there is no hadoop instance running on the box. I have a shell script that is controlling a set of hadoop streaming jobs in sequence and I need to condition copying files from HDFS to local depending on whether the jobs have been running locally or not. Is there a standard way to accomplish this test? I could do a "ps aux | grep something" but that seems ad-hoc.
Hadoop streaming will run the process in "local" mode when there is no hadoop instance running on the box.
Can you pl point to the reference for this?
A regular or a streaming job will run the way it is configured, so we know ahead of time in which mode a Job is run. Check the documentation for configuring Hadoop on a Single Node and Cluster in different modes.
Rather than trying to detect at run time which mode the process is operating, it is probably better to wrap the tool you are developing in a bash script that explicitly selects local vs cluster operatide. The O'Reilly Hadoop describes how to explicitly choose local using a configuration file override:
hadoop v2.MaxTemperatureDriver -conf conf/hadoop-local.xml input/ncdc/micro max-temp
where conf-local.xml is an XML file configured for local operation.
I haven't tried this yet, but I think you can just read out the mapred.job.tracker configuration setting.

Resources