invoke only the openie module from corenlp server - stanford-nlp

I'd like to invoke only the openIE module once coreNLP server is up. I tried this from shell:
$ java -mx4g -cp "$HOME/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer &
$ java -cp "$CORE/*" -Xmx1g edu.stanford.nlp.pipeline.StanfordCoreNLPClient edu.stanford.nlp.naturalli.OpenIE -file inputfile.txt
After a few seconds, logs get freeze and nothing happens. Can someone help me please?

You can't do that with the StanfordCoreNLPClient, you need to run a pipeline. You can find full instructions for using the client here:
http://stanfordnlp.github.io/CoreNLP/corenlp-server.html
For example:
java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLPClient -cp "*" -annotators tokenize,ssplit,pos,lemma,ner,depparse,natlog,openie -file input.txt -backends localhost:9000
Note that the OpenIE extractor requires everything in the pipeline prior to it, so there is not an extra cost to running this part of the pipeline: tokenize,ssplit,pos,lemma,ner,depparse,natlog

Related

Jenkins restart Tomcat

I'm trying to restart tomcat after deploy. I wrote shell script
PID=$(ps -aux | grep tomcat-7.0.72 | grep java | awk ' { print $2 } ');
cd /var/lib/apache-tomcat-7.0.72/bin
kill -9 $PID
./startup.sh
which should kill tomcat process and run startup.sh. After run this job on console I can see
Using JAVA_OPTS: -server -Xms2g -Xmx8g -XX:PermSize=1024m -XX:+UseParallelGC -XX:NewRatio=3 -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8 -Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=192.168.1.30 -Djsse.enableSNIExtension=false
Tomcat started.
Finished: SUCCESS
This log shows that everything works correctly but tomcat is not started.
When I run ./startup.sh on console I can see
Using JAVA_OPTS: -server -Xms2g -Xmx8g -XX:PermSize=1024m -XX:+UseParallelGC -XX:NewRatio=3 -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8 -Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname=192.168.1.30 -Djsse.enableSNIExtension=false
Using CATALINA_BASE: /var/lib/apache-tomcat-7.0.72
Using CATALINA_HOME: /var/lib/apache-tomcat-7.0.72
Using CATALINA_TMPDIR: /var/lib/apache-tomcat-7.0.72/temp
Using JRE_HOME: /usr/lib/jvm/java-8-openjdk-amd64
Using CLASSPATH: /var/lib/apache-tomcat-7.0.72/bin/bootstrap.jar:/var/lib/apache-tomcat-7.0.72/bin/tomcat-juli.jar
Tomcat started.
I founded what was the problem. Jenkins is killing processes which were started in the job. To turn it off I typed "export BUILD_ID=dontKillMe" at the start of the shell script.

How can I fix ClassNotFounException when executing HBase java application from command line?

I don't know anything about bash, but i put together a script to help me run my Hbase java application:
#!/bin/bash
HADOOP_CLASSPATH="$(hbase classpath)"
hadoop jar my.jar my_pkg.my_class
When I run it I get a:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy
When I echo out the HADOOP_CLASSPATH I see that hbase-server-1.2.0-cdh5.8.0.jar is there...
Is the hadoop jar command ignoring the HADOOP_CLASSPATH?
Also I have tried to run the commands from the command-line instead of using my script. I get the same error.
The approach was inspired by this cloduera-question
The solution was to include the Hadoop class path on the same line. I am not certain what the difference is, but this works:
HADOOP_CLASSPATH="$(hbase classpath)" hadoop jar my.jar my_pkg.my_class

Spark submit with master as yarn-client (windows) gives Error "Could not find or load main class"

I have installed Hadoop2.7.1 with spark 1.4.1 on windows 8.1
When I execute below command
cd spark
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client lib/spark-examples*.jar 10
I get below error in JobHistoryServer log
Error: Could not find or load main class '-Dspark.externalBlockStore.folderName=spark-262c4697-ef0c-4042-af0c-8106b08574fb'
I did further debugging(along searching net) and could get hold of container cmd script where below sections(other lines are omitted) are given
...
#set CLASSPATH=C:/tmp/hadoop-xyz/nm-local-dir/usercache/xyz/appcache/application_1487502025818_0003/container_1487502025818_0003_02_000001/classpath-3207656532274684591.jar
...
#call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp '-Dspark.fileserver.uri=http://192.168.1.2:34814' '-Dspark.app.name=Spark shell' '-Dspark.driver.port=34810' '-Dspark.repl.class.uri=http://192.168.1.2:34785' '-Dspark.driver.host=192.168.1.2' '-Dspark.externalBlockStore.folderName=spark-dd9f3f84-6cf4-4ff8-b0f6-7ff84daf74bc' '-Dspark.master=yarn-client' '-Dspark.driver.appUIAddress=http://192.168.1.2:4040' '-Dspark.jars=' '-Dspark.executor.id=driver' -Dspark.yarn.app.container.log.dir=/dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.1.2:34810' --executor-memory 1024m --executor-cores 1 --num-executors 2 1> /dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001/stdout 2> /dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001/stderr
I check relevant files for CLASSPATH, looks OK. The main class org.apache.spark.deploy.yarn.ExecutorLauncher is available in spark assembly jar which is part of container jar
So, what could be the issue here? I searched net and found many discussions, but are for unix variants, but not many for Windows. I am wondering whether spark submit really works on Windows (yarn-client mode only, standalone cluster mode works) without any special setup!!!
BTW, if I run the above java command from cmd.exe command prompt, I get the same error as all command line arguments are quoted with single quote instead of double quotes(making these double quotes work!!), so is this a bug
Note spark-shell also fails (in yarn mode) and but yarn jar ... command works
Looks like it was a defect in earlier version. With latest Hadoop 2.7.3 with spark 2.1.0, it is working correctly.!!! Could not find any reference though.

Windows: Apache Spark History Server Config

I wanted to use Spark's History Server to make use of the logging mechanisms of my Web UI, but I find some difficulty in running this code on my Windows machine.
I have done the following:
Set my spark-defaults.conf file to reflect
spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs
My spark-env.sh to reflect:
SPARK_LOG_DIR "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"
I am using Git-BASH to run the start-history-server.sh file, like this:
USERA#SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
And, I get this error:
USERA#SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
The full log from the output can be found below:
Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
I am running a sparkR script where I initialize my spark context and then call init().
Please advise whether I should be running the history server before I run my spark script?
Pointers & tips to proceed(with respect to logging) would be greatly appreciated.
On Windows you'll need to run the .cmd files of Spark not .sh. According to what I saw, there is no .cmd script for Spark history server. So basically it needs to be run manually.
I have followed the history server Linux script and in order to run it manually on Windows you'll need to take the following steps:
All history server configurations should be set at the spark-defaults.conf file (remove .template suffix) as described below
You should go to spark config directory and add the spark.history.* configurations to %SPARK_HOME%/conf/spark-defaults.conf. As follows:
spark.eventLog.enabled true
spark.history.fs.logDirectory file:///c:/logs/dir/path
After configuration is finished run the following command from %SPARK_HOME%
bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
The output should be something like that:
16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080.
16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080
16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called
Hope that it helps! :-)
in case any one gets the floowing exception:
17/05/12 20:27:50 ERROR FsHistoryProvider: Exception encountered when attempting
to load application log file:/C:/Spark/Logs/spark--org.apache.spark.deploy.hist
ory.HistoryServer-1-Arsalan-PC.out
java.lang.IllegalArgumentException: Codec [out] is not available. Consider setti
ng spark.io.compression.codec=snappy
at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(Com
Just go to SparkHome/config/spark-defaults.conf
and set
spark.eventLog.compress false

JVM Args in java web start application

I have a Javafx application which is deploybed via java webstart.
I need to pass GC vm args for the application and I have issues in doing so.
I have the following in my jnlp
<j2se version="1.8+" href="http://java.sun.com/products/autodl/j2se" initial-heap-size="1024m" max-heap-size="1024m" java-vm-args="-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC"/>
When the application starts, it looks like most of them are not passed to the VM
ps -ef | grep java gives the below output
133768645 2448 1 0 4:31PM ttys020 0:37.80 /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -Xmx1g -Xms1g
The min & max heap gets set as expected but not all the other VM arguments.
Can u please let me know why the other vm args are not being passed to the VM ?
Am I doing something wrong ?
Appreciate your help.
Thanks
Make sure your jnlp file that you changed is the one javaws is using. If it has an href attribute in the jnlp file header, it will take the jnlp file from there even if you launch it from your local machine.

Resources