Oracle data export using Liquibase causing OOM - spring-boot

I am getting an exception while exporting the data from Oracle database using Liquibase.
I have used below maven command
**mvn -e -X liquibase:generateChangeLog -Dliquibase.diffTypes=data -DargLine= "-Xms10G -Xmx 20G -XX:-UseGCOverheadLimit"**
last line I can see in the log before failing is,
[DEBUG] Executing with the 'jdbc' executor
When I checked the same in Java Visual VM, I found that **liquibase.change.ColumnConfig** class is consuming 80% of space, this is more likely a memory leakage issue.
On top of this when I checked the java memory space, I can see Old Gen space is utilized 100% and it never goes down after 50 mins of execution.
I would like to know is there a way to export a data for few specific tables, if so please help with the maven command. I tried out below, but it didn't work out for me.
**mvn -e -X liquibase:generateChangeLog -Dliquibase.diffTypes=data,table -Dliquibase.includeObjects="table:myTable" -DargLine= "-Xms10G -Xmx 20G -XX:-UseGCOverheadLimit"**
The version of Liquibase I am using is 4.2.0
Any help is highly appreciated.

Related

Flume Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

arun#arun-admin:/usr/lib/apache-flume-1.6.0-bin/bin$ ./flume-ng agent --conf ./conf/ -f /usr/lib/apache-flume-1.6.0properties -Dflume.root.logger=DEBUG,console -n agent
Info: Including Hadoop libraries found via (/usr/share/hadoop/bin/hadoop) for HDFS access
Info: Excluding /usr/share/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath
Info: Excluding /usr/share/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath
Info: Including Hive libraries found via (/usr/lib/apache-hive-3.1.2-bin) for Hive access
+ exec /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp './conf/:/usr/lib/apache-flume-1.6.0-bin/lib/:/usr/share/hadoop/etc/hadoop:/usr/share/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/share/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/share/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/share/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/share/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/share/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/share/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/share/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/share/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/share/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/share/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/share/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/share/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/share/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.3.jar:/usr/share/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.3.jar:/usr/share/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/share/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/share/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/share/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/share/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/share/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/share/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/share/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/share/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/share/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/share/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/share/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/share/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/share/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/share/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/share/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/share/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/share/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/share/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/share/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/share/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/share/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/share/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/share/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/share/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/share/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/share/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/share/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/share/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/share/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/share/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/share/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar:/usr/share/hadoop/share/hadoop/common/hadoop-common-2.7.3-tests.jar:/usr/share/hadoop/share/hadoop/common/hadoop-nfs-2.7.3.jar:/usr/share/hadoop/share/hadoop/common/jdiff:/usr/share/hadoop/share/hadoop/common/lib:/usr/share/hadoop/share/hadoop/common/sources:/usr/share/hadoop/share/hadoop/common/templates:/usr/share/hadoop/share/hadoop/hdfs:/usr/share/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/share/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/share/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar:/usr/share/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.3-tests.jar:/usr/share/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.3.jar:/usr/share/hadoop/share/hadoop/hdfs/jdiff:/usr/share/hadoop/share/hadoop/hdfs/lib:/usr/share/hadoop/share/hadoop/hdfs/sources:/usr/share/hadoop/share/hadoop/hdfs/templates:/usr/share/hadoop/share/hadoop/hdfs/webapps:/usr/share/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/share/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/share/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/share/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/share/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/share/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/share/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/share/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/share/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/share/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/share/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/share/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/share/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.3.jar:/usr/share/hadoop/share/hadoop/yarn/lib:/usr/share/hadoop/share/hadoop/yarn/sources:/usr/share/hadoop/share/hadoop/yarn/test:/usr/share/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar:/usr/share/hadoop/share/hadoop/mapreduce/lib:/usr/share/hadoop/share/hadoop/mapreduce/lib-examples:/usr/share/hadoop/share/hadoop/mapreduce/sources:/usr/share/hadoop/contrib/capacity-scheduler/.jar:/usr/lib/apache-hive-3.1.2-bin/lib/*' -Djava.library.path=:/ usr/share/hadoop/lib org.apache.flume.node.Application -f
It is saying out of Memory error. Please change your Xmx value while running the application. Currently, you are giving 20MB by Xmx20m and maybe this much of memory is not enough to run this. Change it to higher value say 1000MB like this Xmx1000m and see if that helps.
You need to find the right value for this configuration. This can be done if you know the data size that has to flow. If you are unable to anticipate that then, trial and error is the only option.
You can try increasing heap size in your flume command by passing -Xmx512m. If you still face the same error pls try to increase heap size in flume command to -Xmx1000m.

JMETER : ERROR - jmeter.JMeter: Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

I am running 6450 users test in a distributed environment in AWS ubuntu machines.
I am getting the following error when test reach to peak load,
ERROR - jmeter.JMeter: Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
Machine Details:
m4.4xlarge
HEAP="-Xms512m -Xmx20480m" (jmeter.sh file)
I allocated 20GB for the heap size in JMeter.sh.
But when I run the ps -eaf|grep java command its giving following response.
root 11493 11456 56 15:47 pts/9 00:00:03 java -server -
XX:+HeapDumpOnOutOfMemoryError -Xms512m -Xmx512m -
XX:MaxTenuringThreshold=2 -XX:PermSize=64m -XX:MaxPermSize=128m -
XX:+CMSClassUnloadingEnabled -jar ./ApacheJMeter.jar**
I don't have any idea what changes I have to do now.
Do the change in jmeter file not in jmeter.sh as you can see with ps that it is not being applied.
Also with such a heap you may need to add:
-XX:-UseGCOverheadLimit
And switch to G1 garbage collector algorithm.
And also check you respect these recommendations:
http://jmeter.apache.org/usermanual/best-practices.html
http://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
First of all, the answer is in your question: you say that ps -eaf|grep java shows this:
XX:+HeapDumpOnOutOfMemoryError -Xms512m -Xmx512m
That is memory is still very low. So either you changed jmeter.sh, but using other shell script to actually start JMeter, or you didn't change it in a valid way, so JMeter uses defaults.
But on top of that, I really doubt you can run 6450 users on one machine, unless your script is very light. Unconfigured machine can usually handle 200-400, and well-configured machine probably can deal with up to 2000.
You need to amend the line in jmeter file, not jmeter.sh file. Locate HEAP="-Xms512m -Xmx512m" line and update the Xmx value accordingly.
Also ensure you're starting JMeter using jmeter file.
If you have environment which explicitly relies on jmeter.sh file you should be amending HEAP size a little bit differently, like:
export JVM_ARGS="-Xms512m -Xmx20480m" && ./jmeter.sh
or add the relevant line to jmeter.sh file.
See JMeter Best Practices and 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure articles for comprehensive information on tuning JMeter

No output files from mahout

I am running a mahout recommenderJob on hadoop in syncfusion. I get the following. But no output... it seems to run indefinitely
Does anyone have an idea why I am not getting an output.txt from this? Why does this seem to run indefinitely?
I suspect this could be due to the insufficient disk space in your machine and in this case, I'd suggest you to clean up your disk space and try this again from your end.
In alternate, I'd also suggest you to use the Syncfusion Cluster Manager - using which you can form a cluster with multiple nodes/machines, so that there will be suffifient memory available to execute your job.
-Ramkumar
I've tested the same map reduce job which you're trying to execute using Syncfusion BigData Studio and it worked for me.
Please find the input details which I've used from the following,
Command:
hadoop jar E:\mahout-examples-0.12.2-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input=/Input.txt --output=output
Sample input (Input.txt):
For input data, I've used the data available in Apache - Mahout site (refer below link) and saved the same in a text file.
http://mahout.apache.org/users/recommender/userbased-5-minutes.html
I've also seen a misspelled word "COOCCURRENCE" used in your command. Please correct this, or else you could face "Class Not Found Exception".
Output:
Please find the generated output from below.
-Ramkumar :)

Error: Failed to create Data Storage while running embedded pig in java

I wrote a simple program to test the embedded pig in java to run in mapreduce mode.
The hadoop version in the server I am running is 0.20.2-cdh3u4a, and pig version is 0.10.0-cdh3u4a.
When I try to run in local mode, it runs successfully. But when I try to run in mapreduce mode, it gives me the error.
I run my program using the following commands as shown in http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
javac -cp pig.jar EmbedPigTest.java
javac -cp pig.jar:.:/etc/hadoop/conf EmbedPigTest.java input.txt
My program gives error as:
Exception in thread "main" java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.PigServer.<init>(PigServer.java:211)
at org.apache.pig.PigServer.<init>(PigServer.java:207)
at WordCount.main(EmbedPigTest.java:9)
In some online resources they say that this problem occurs due to different hadoop version. But, I didn't understand what I should do. Suggestions please !!
This is happening because you are linking to the wrong jar, Please see the link below it describes this issue very well.
http://localsteve.wordpress.com/2012/09/30/embedding-pig-for-cdh4-java-apps-fer-realz/
I was faced same kind of issue when I tried to use pig in map reduce mode without starting the services.
Please check all services using jps before using pig in map reduce mode.

Hue Hive -- Beeswax Server Can't Find JDBC Driver for MySQL

We're using the Cloudera 3.7.5 and having a tough time configuring the Beeswax server such that the Hue can access the Hive databases. I followed all the instructions from the Cloudera documentation that to setup MySQL to serve as Hive's metastore, but when I restart the Hue services and check Beeswax server's StdErr logs, I still see the painful "javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" which is caused by
org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is bizzare to me, because the logs also indicate that the environment variable HIVE_HOME is equal to "/usr/lib/hive", and sure enough I have copied the "mysql-connector-java-5.1.15-bin.jar" into the /usr/lib/hive/lib directory, as the documents dictate.
I have also tried the instructions on the blog post http://hadoopchallenges.blogspot.com/2011/03/hue-120-upgrade-and-beeswax.html, which involved copying the the mysql-connector jar into "/usr/share/hue/apps/beeswax/hive/lib/". Unfortunately I did not have a hive/lib subdirectory in the beeswax folder, so I attempted to make one. This also did not work.
Any advice how I can get the MySQL JDBC library onto Beeswax's classpath?
We finally decided to just bite the bullet and upgrade to CDH4. Placing the JDBC jar in /usr/share/hive/lib allowed the Beeswax server to function perfectly without issue.
If anyone else is experiencing this issue I recommend upgrading from CDH3 to CDH4, the UI is much cleaner, smoother, and we had much fewer installation and maintenance bugs with CDH4.
You have to paste your mysql connector in HUE_HOME/apps/beeswax/hive/lib.
If this path doesn't exist, create hive/lib and then paste the mysql connector. I hope your problem will be solved.
When you start using cloudera 4.5 they move everything into parcels, so this exact problem on my hive meta server was fixed by this command (below). Essentially you're just re-adding modules. I'm sure you can modify the extra classpath in the hive config file to make this oblivious to parcel updates.
cp /usr/lib/hive/lib/mysql-connector-java-5.1.17-bin.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/.
So a real fix might be something like this:
cp `locate mysql-connector | grep jar | head -n 1` /opt/cloudera/parcels/*/lib/hive/lib/.
which would copy the jar into every parcel.

Resources