How to run Apache Tez Locally? - hadoop

One of the ways that one can execute tez is in local mode beside integrated with hadoop. In order to run it localy
I read this page and understood the changes I have to make and I updated tez-site.xml configuration. But I don't know how to start it.
I tried running one of the tez-examples (e.g. wordCount) that has a main method. But it stalls and don't print anything to stdout. Is there anything that I have to start first?
How can I run tez in local mode?

I managed to run it with including needed libraries. I could changing pom and build the final jar file with the dependencies, but I preferred not to change the project.
After building it with mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true
I ran it with setting java classpath:
java -cp tez-dist/target/tez-0.7.0/lib/*:tez-dist/target/tez-0.7.0/* org.apache.tez.examples.OrderedWordCount in.txt out

Related

Where does Jenkins store the project source

I have a Jenkins job that uses a script to build my project. On the following line, the script fails mvn -e -X -Dgit='$git' release:prepare.
Because I want to search for the cause of this, I want to go to the Jenkins server and run mvn -e -X -Dgit='$git' release:prepare from the command line, to see if it works.
Does Jenkins store the projects' source code somewhere, such that I can go to that folder and call Maven?
If yes, then where?
Yes, It Stores the project files for the job by default at
/var/lib/jenkins/workspace/{your-job-name}
This is where jenkins suppose the project files to be present or it pulls it from a source before start working/building from it.
Quote from Andrew M.:
"Hudson/Jenkins doesn't quite work that way. It stores configurations and job information in /var/lib/jenkins by default (if you're using the .deb package). If you want to setup persistence for a specific application, that's something you'll want to handle yourself - Hudson is a continuous integration server, not a test framework.
Check out the Wiki article on Continuous Integration for an overview of what to expect."
From this Question on serverfault.
This worked for me:
/var/jenkins/workspace/JobNameExample
but, if your build machine (node) is a different than the one where Jenkins is running (manager), You need specify it:
/var/jenkins/workspace/JobNameExample/label/NodeName
Where you can define label too:
jenkins stores its workspace files currently in /var/jenkins_home/workspace/project_name
I am running from docker though!

problems running state machine examples

Congratulations on the spring state machine, I found it yesterday and have been trying it out, specifically the turnstile example running in STS. I found it very easy and intuitive to build a FSM.
Because spring shell doesn't work well in STS I tracked down the instructions to run the examples from the command line in the reference doc,
"java -jar
spring-statemachine-samples-turnstile-1.0.0.BUILD-SNAPSHOT.jar"
,
but running it got an error
"no main manifest attribute, in spring-statemachine-samples-turnstile-1.0.0.BUILD-SNAPSHOT.jar".
Although not even a novice in using gradle, I tried fixing this by adding this line to build.gradle in the jar section
"manifest.attributes['Main-Class'] = 'demo.turnstile.Application'"
(which doesn't handle the various sub-projects I know) but got this error
"NoClassDefFoundError: org/springframework/shell/Bootstrap".
If it is possible to run the samples from gradle, could you include them in the reference document? I tried running the samples using
gradle run
but it there was no interaction with the shell scripts.
Samples are designed to be run as executable jar and with shell so that you can interact without a need to recompile with every change. Your error indicates that you didn't build that sample jar as mentioned in docs.
./gradlew clean build -x test
This will automatically use spring boot plugin which will add the necessary jar manifest headers to jar meta info to make it a true executable jar. Essentially every every sample is a spring boot app.
Building SM sample projects in Windows Environment:
Open Command prompt (windows key + r -->cmd-->Enter), Change directory to project root folder spring-statemachine-master (Inside the Extracted folder).
Run gradlew install to get all spring dependencies copied to local machine.
Run gradlew clean build -x test to get the spring shell jars built. Courtesy Janne
These steps should ideally get all .jar built, look into \build\libs folder of respective sample project for jar files.
Run the like any other java jar file java -jar [jar-file-name.jar] (make sure to be change directory to jar file directory location).
One more thing where I was stuck was, How to give events to SM:
It's like this sm event EVENT_NAME_AS_DEFINED_IN_CLASS. Ref
E.g.: sm event RINSE --> to washer project

How to start tomcat using maven in debug mode

I have found maven plugin to start tomcat.
Do Maven have any plugin to start Tomcat in debug mode?
If you're using Eclipse and you're running Maven externally (not using M2Eclipse) then you can use whatever command line command you usually use but use mvnDebug instead of mvn.
As an example, I run the tomcat plugin under the "run" profile so my normal command is:
mvn clean install -Prun
This uses the <maven-dir>/bin/mvn script but to run in debug mode, simply substitute <maven-dir>/bin/mvnDebug in.
mvnDebug clean install -Prun
If mvnDebug isn't on your PATH then you might have to use the full path to it (or create a link from a directory on your path, like /usr/bin, to it), e.g:
/path/to/maven-dir/mvnDebug clean install -Prun
I'm using maven 3.0.5 and the mvnDebug script comes out of the box. If you look inside it then you'll see it basically does what Titi Wangsa Bin Damhore says, but you'll note that suspend=y is used so the JVM waits for you to connect your debugger before continuing:
MAVEN_DEBUG_OPTS="-Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000"
This may or may not be what you want.
we can cheat.
use java opts
in *IX
export JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044"
then run your maven,
it should go to debug mode

How to build and execute examples in Mahout in Action

I am learning Mahout in Action now and writing to ask how to build and execute examples in the book. I can find instructions with eclipse, but my environment doesn't include UI. So I copied the first example (RecommenderIntro) to RecommenderIntro.java and compile it through javac.
I got an error because the package was not imported. So I am looking for :
Approaches to import missing packages.
I guess, even it compiles successfully, .class file will be generated,
how can I execute it? through "java RecommnderIntro"? I can execute
mahout examples through sudo -u hdfs hadoop jar
mahout-examples-0.7-cdh4.2.0-job.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job, how can I
do something similar for my own example?
All my data is saved in HBase tables, but in the book (and even
google), I cannot find a way to integrate it with HBase, any
suggestions?
q1 and q2, you need a java build tool like maven.
You build the hadoop-jar with : 'mvn clean install' This creates your hadoop job in target/mia-job.jar
You then execute your job with:
hadoop jar target/mia-job.jar RecommenderIntro inputDirIgnored outputDirIgnored
(The RecommenderIntro ignores parameters, but hadoop forces you to specify at least 2 parameters usually the input and output dir )
q3: You can't out-of-the-box.
Option1: export your hbase data to a text file 'intro.csv' with content like: "%userId%, %ItemId%, %score%" as described in the book. Because that's the file the RecommenderIntro is looking for.
Option2: Modify the example code to read data from hbase...
ps1. for developing such an application I'd really advise using an IDE. Because it allows you to use code-completion, execute, build, etc. A simple way to get started is to download a virtual image with hadoop like Cloudera or HortonWorks and install an IDE like eclipse. You can also configure these images to use your hadoop cluster, but you dont need to for small data sets.
ps2. The RecommenderIntro code isn't a distributed implementation and thus can't run on large datasets. It also runs locally instead of on a hadoop cluster.

How run mahout in action example ReutersToSparseVectors?

I want run "ReutersToSparseVectors.java". I can compile and created JAR file without problem.
I compiled this file by below command:
javac -classpath hadoop-core-0.20.205.0.jar:lucene-core-3.6.0.jar:mahout-core-0.7.jar:mahout-math-0.7.jar ReutersToSparseVectors.java
created JAR file with below command:
jar cvf ReutersToSparseVectors.jar ReutersToSparseVectors.class
When I write java -jar ReutersToSparseVectors.jar to run, give me below error:
Failed to load Main-Class manifest attribute from
ReutersToSparseVectors.jar
Do you can help me to solve this problem?
IF this example can run with hadoop, please me that how i can run this with hadoop.
instead of using -jar option, then it's better to to run:
java -cp mahout-core.jar:... mia.clustering.ch09.ReutersToSparseVectors
or you can use mvn exec:java command, as described in README for examples...
mvn exec:java -Dexec.mainClass="mia.clustering.ch09.ReutersToSparseVectors"
Or you can run this file directly from your IDE (assuming, that you correctly imported Maven project).
P.S. your command isn't working, because to run with -jar switch, the .jar file should have special entry in manifest that describes that class should be started by default...
P.P.S. It's better to use book's examples with Mahout 0.7, as they were tested for it. You can use it with version 0.7 if you need, by then you need to take source code from mahout-0.7 branch of repository with examples (link is above)

Resources