How to build Mahout /usr/lib resource folders after build with Maven - maven

I am new to this stuff so I hope someone can help;
I want to build my own Apache Mahout installation from source code. I have Maven2.2.1. Following the instructions on the Mahout wiki I was able to check out the code (Mahout-0.6-SNAPSHOT) and build Mahout with Maven. At least that is was I thought happened after "mvn install" from the root of the folder containing the checked out src code. Test were run, which took a while.
So I now have all these jars (called artifacts if Im not mistaking) in a Maven repository on ~/.m2/repository.
So my first question now is; how do I get from here to a 'installed' package like I am used to when I run a RPM on redhat. By that I mean a new folder under /usr/lib/ and from there a /lib a /bin etc. folder.
Second question is about dependency jars. I can see in the repository that Mahout was built with a hadoop-core-0.20.204.0.jar but that is not the jar I want because I run a Hadoop cluster with another hadoop-core jar from Cloudera. How would I go about to build Mahout again with the right hadoop-core jar? Or would it just be a matter of changing one hadoop-core jar for another in the /lib folder being created (after my first question is answered)
Thanks

Related

How can I add .m2 repository jars or maven dependency jars in a freestyle project in jenkins?

I have a maven project on my local system, but I created a freestyle project in jenkins and gave its the workspace directory as of my project.
Being new to jenkins, I am wondering how to add maven repository jars to a freestyle project's classpath while writing its batch file to execute the project, in such a way that all the jars which are in recursive directories
C:\users\xyz\.m2\repository\*
come in the path while building the project. I don't know whether it is possible or not without any hassle, but when we create a maven project in jenkins it automatically takes all the repository jars in the build path. So there must be a way around rather than putting those libraries manually into the build path. I have searched so much on google but nothing popped up.
Any input would be appreciated.
Thanks
I recommend you to use EnvInject Plugin. This will prepare environment to be built before the job run. So in your case adding the path can also be done using this one. If you are running the jenkins on windows machine, doing this will do the trick. Or if you are running on Linux please feel free to explore or comment here again.

Use Maven to start programs

I apologize if this sounds to simple (or the fact that there are other links that define this problem) - but I'm a complete beginner to Maven and even Java.
All that I'm trying to do is to run this code to see what it does:
https://github.com/semanticvectors/semanticvectors/wiki/GettingStarted
The Wiki says that uses can either download the .jar file or use the maven repo. I downloaded their .jar file and tried to run it but failed. I use this code:
java -jar /home/user/semanticvectors-5.6.jar
That .jar file didn't work for me and from other stackoverflow links, it seems that either the .jar file is not setup properly or I have a non-compatitble java version.
In any case, I've decided to try using Maven to get this running. I've installed Maven using:
sudo apt-get install maven
It seems to be working as everything was successful in setup. But now I'm not too sure what to do after. This Wiki (linked above) as go to this Maven repo site (https://oss.sonatype.org/#nexus-search;quick%7Esemanticvectors). To my understanding (and correct me if I'm wrong) I thought Maven is a super repository for developers and testers to work from the same code, so I thought I could use Maven as an alternative to running to program. Anyways, I'm open to any suggestions to get the program running to see what it does, thanks.
If you're interested in knowing more about me: I'm running a 16.04 Ubuntu system with Java 8.
The idea is that you can either build the JAR yourself - get the source from SVN and build it (using maven commands, as maven is a build tool), or you can use the existing JAR that is already "prepared" and ready for use in the maven-repository (nexus, in this case).
The result should be the same - if you use the JAR as a dependency in your code (add it to your pom.xml) or if you build it yourself.
You can learn more about Maven and things will be much clearer...

Why after compiling/building my AWS sdk jar is only 3kb?

I was previously on 1.6.x and mvn clean install builds wit no issues; end up with a 10mb jar and able to run all my code.
Now I want to upgrade to 1.10.x for the new lambda/apig/ddb support, I changed the version from 1.6.x to 1.10.x.
But after I build with mvn clean install, I see the jar is only 3kb, why is that? Has something changed with the way the AWS sdk works? At least from I've seen, one new way is you can be pick specific services rather than the whole sdk.
Edit:
The 3kb jar has the following:
pom.properties:
version=1.10.66
groupId=com.amazonaws
artifactId=aws-java-sdk
pom.xml, which lists all the aws sdk services
Have you uncompresssed the jar file and looked up what is in that folder? Based on the information in your question, I guess you have just got your own source files there but not the dependencies. If any, maven dependencies are usually located at META-INF\maven in jar. BTW, do you use maven plugins? For example, maven-jar-plugin only outputs your source code in jar while maven-assembly-plugin also outputs maven dependencies.

Setting spark classpaths on EC2: spark.driver.extraClassPath and spark.executor.extraClassPath

Reducing size of application jar by providing spark- classPath for maven dependencies:
My cluster is having 3 ec2 instances on which hadoop and spark is running.If I build jar with maven dependencies, it becomes too large(around 100 MB) which I want to avoid this as Jar is getting replicating on all nodes ,each time I run the job.
To avoid that I have build a maven package as "maven package".For dependency resolution I have downloaded the all maven dependencies on each node and then only provided above below jar paths:
I have added class paths on each node in the "spark-defaults.conf" as
spark.driver.extraClassPath /home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/spark/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector-java_2.10/1.2.0-rc1/spark-cassandra-connector-java_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector_2.10/1.2.0-rc1/spark-cassandra-connector_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/org/apache/cassandra/cassandra-thrift/2.1.3/cassandra-thrift-2.1.3.jar:/home/spark/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar
It has worked,locally on single node.
Still i am getting this error.Any help will be appreciated.
Finally, I was able to solve the problem. I have created application jar using "mvn package" instead of "mvn clean compile assembly:single ",so that it will not download the maven dependencies while creating jar(But need to provide these jar/dependencies run-time) which resulted in small size Jar(as there is only reference of dependencies).
Then, I have added below two parameters in spark-defaults.conf on each node as:
spark.driver.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar
spark.executor.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar
So question arises that,how application JAR will get the maven dependencies(required jar's) run-time?
For that I have downloaded all required dependencies on each node using mvn clean compile assembly:single in advance.
You don't need to put all jars files .Just Put your application jar file .
If you get again error than put all jar files which are needed .
You have to put jars file by setJars() methods .

I want to include a jar file in my build using maven

The problem is like this
I have a maven build of my project already. But I have a requirement wherein I need to replace a .jar file located in WEB-INF/lib folder with another .jar file. This new jar file can be downloaded from a link.
What changes do I have to make in the pom.xml file to achieve this requirement. I tried to find out ways to do it but could not figure out the exact solution as I am a novice in Maven.
Assuming that the jar file is not found in any public maven repository you can install it in your local repository using the install plugin mvn install:install-file ... and refer it as any other dependency

Resources