how to change flink fat jar to thin jar - maven

can I move the dependency jars to hdfs, so I can run a thin jar without dependency jars?
the Operation and Maintenance Engineers do not allow me to move jar to flink lib folder.

Not sure what problem you are trying to solve, but you might want to consider an application mode deployment if you are using yarn:
./bin/flink run-application -t yarn-application \
-Dyarn.provided.lib.dirs="hdfs://myhdfs/remote-flink-dist-dir" \
"hdfs://myhdfs/jars/MyApplication.jar"
In this example, MyApplication.jar isn't a thin jar, but the job submission is very lightweight as the needed Flink jars and the application jar are picked up from HDFS rather than being shipped to the cluster by the client. Moreover, the application’s main() method is executed on the JobManager.
Application mode was introduced in Flink 1.11, and is described in detail in this blog post: Application Deployment in Flink: Current State and the new Application Mode.

Related

Should I use spark-submit if using spring boot

What is the purpose of spark submit? From what I can see it is just adding properties and jars to the classpath.
If I am using spring boot can I avoid using spark-submit, and just package a fat jar with all the properties I want spark.master etc...
Can ppl see any downside to doing this?
recently I met same case - and also try to stick to spring boot exec jar which unfortunately failed finally, but I was close to end. the state when I gave up was - spring boot jar built without spark/hadoop libs included, and i was running it on a cluster with -Dloader.path='spark/hadoop libs list extracted from SPARK_HOME and HADOOP_HOME on cluster'. I ended up using 2d option - build fat jar with shaded plugin and running it as usual jar by spark submit which seems to be a bit strange solution but still works ok

spark maven dependency understanding

I am trying to understand how spark works with Maven ,
I have the following question : Do I need to have spark installed in my machine to build spark application ( in scala ) with maven ?
Or should I just add the spark dependency into the POM.xml of my maven project
Best regards
The short answer is no. At build time you all your dependencies will be collected by Maven or Sbt. There is no need for an additional Spark installation.
Also at runtime (an this might also include the execution of unit test during the build) you do not necessarily need a Spark installation. If the value of SPARK_HOME is not set to a valid Spark installation, default values will be used for the runtime configuration of Spark.
However, as soon as you want to start Spark jobs on a remote cluster (by using spark-submit) you will need a Spark installation.

Submit spark application in a Jar file separate from uber Jar containing all dependencies

I am building Spark application which has several heavy dependencies (e.g. Stanford NLP with language models) so that uber Jar that contains application code with dependencies takes ~500MB. Uploading this fat Jar to my test cluster takes a lot of time and I decided to build my app and dependencies into separate Jar files.
I've created two modules in my parent pom.xml and build app and uber jar separately with mvn package and mvn assembly:asembly respectively.
However, after I upload these separate jars to my YARN cluster application fails with the following error:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.net.unix.DomainSocketWatcher.(I)V at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.(DfsClientShmManager.java:415)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.(ShortCircuitCache.java:379)
at
org.apache.hadoop.hdfs.ClientContext.(ClientContext.java:100)
at org.apache.hadoop.hdfs.ClientContext.get(ClientContext.java:151)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:690) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
When running application on Spark it also fails with similar error.
Jar with dependencies is included into Yarn classpath:
<property>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,
$YARN_HOME/lib/*,
/usr/local/myApp/org.myCompany.myApp-dependencies.jar
</value>
</property>
Is it actually possibly to run Spark application this way? Or I have to put all dependencies on YARN (or Spark) classpath as individual Jar files?
I encountered the same issue with my spark job. This is a dependency issue for sure. You have to make sure the correct versions are picked up at runtime.The best way to do this was adding the correct version hadoop-common-2.6.jar to my application jar. I also upgraded my hadoop-hdfs version in application jar. This resolved my issue.

How do I add thirdparty jar to classpath on HDP sandbox?

I have a third party jar that I am using for mapReduce and the container that processes the mapReduce needs my jar. I've tried adding it in yarn-site.xml, YARN_USER_CLASSPATH (variable), a bunch of lib folders in hadoop directory but no luck. HortonWorks did not have much on their site about classpaths so I am trying here.
You need to setup
{YARN_USER_CLASSPATH_FIRST}
so yarn will search your custom classpath first. I found this from yarn command:
https://github.com/apache/hadoop/blob/release-2.6.0/hadoop-yarn-project/hadoop-yarn/bin/yarn#L27

dependency issues with app while deploying in tomcat-server

i am using hbase 0.94.7 and hadoop 1.0.4 and tomcat 7
i wrote a small res-based application which performs crud operations on hbase.
earlier i used to run the app using maven tomcat plugin.
now i am trying to deploy the war in tomcat-server.
since hadoop and hbase jars already contain org.mortbay.jetty jsp-api and servlet-api jars of older verisons,
i am getting Abstract Method Exceptions
here's the exception log
so then i made a exclusion of org.mortbay.jetty from both hadoop and hbase dependencies in pom.xml. but it started showing more and more such kind of issues like jasper.
so then i added scope provided to hadoop and hbase dependencies.
now tomcat is unable to find the hadoop and hbase jars.
can someone help me in fixing this dependecy issues.
Thanks.
Do one thing,
- Right click on project
- go to property,
- type java build path,
- go to third tab of library,
- Removed dependency of lib and maven,
- Clean build your project.
might be solve your problem.

Resources