Elasticsearch standalone JDBC river feeder missing main class - elasticsearch

I'm trying to setup the feeder following this instruction https://github.com/jprante/elasticsearch-jdbc#installation
I downloaded and unzipped the feeder
I don't quite understand this step:
run script with a command that starts org.xbib.tools.JDBCImporter with the lib directory on the classpath
what am I suppposed to do?
if I try to run a sample script from bin I get:
Bad substitution
Error: Could not find or load main class org.xbib.elasticsearch.plugin.jdbc.feeder.Runner
where do I get the java classes org.xbib.elasticsearch.plugin.jdbc.feeder.Runner \
org.xbib.elasticsearch.plugin.jdbc.feeder.JDBCFeeder?

figured out the solution
it was to set the installation folder in script (not the elasticsearch folder but the jdbc folder!)
#!/bin/bash
#JDBC Directory -> important, change accordingly!
export JDBC_IMPORTER_HOME=~/Downloads/elasticsearch-jdbc-1.6.0.0
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
...
...
}
}' | java \
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter

Related

Grpc Go Generated.pb.go import was not formatted

I imported proto file (validator.proto) from one of my project https://github.com/maanasasubrahmanyam-sd/customValidation to another project (test.proto) https://github.com/maanasasubrahmanyam-sd/customeValTest/tree/master/interfaces/test_server
go get github.com/maanasasubrahmanyam-sd/customValidation
protoc \
-I. \
-I $GOPATH/src/ \
--proto_path=${GOPATH}/pkg/mod/github.com/envoyproxy/protoc-gen-validate#v0.1.0 \
--proto_path=${GOPATH}/src/github.com/google/protobuf/src \
--go_out="plugins=grpc:./generated" \
--validate_out="lang=go:./generated" \
--govalidators_out=. \
./interfaces/test_server/*.proto
The correct import should come as github.com/maanasasubrahmanyam-sd/customValidation/validator. But test.pb.go import was coming as _ "./validator" which shows red line in Goland.
EDIT - All pb.go files are showing errors in them. I suspect it is due to bad import.
I google it but did not found any relevant information. Any suggestion experts ?
You can address the proto path in two ways,
One: if your import proto file is in your local then you should move it to your parent directory then address it from your parent path like this:
- parentDirectory
-- directory1
--- proto1.proto
-- importDirectory
--- proto2.proto
you can build this file(proto1.proto) with this command :
protoc --proto_path=parentDirectory/directory1 --proto_path=parentDirectory --go-grpc_out=***your output path*** --go_out=***your output path*** parentDirectory/directory1/proto1.proto
also if you use Goland you need to add parentDirectory to your setting(File | Settings | Languages & Frameworks | Protocol Buffers) then uncheck Configure automatically and add your parent path.
Two: If your proto file is in URL: then you can add it to your build command like this:
protoc --proto_path=src \
--go_opt=Mprotos/buzz.proto=example.com/project/protos/fizz \
--go_opt=Mprotos/bar.proto=example.com/project/protos/foo \
protos/buzz.proto protos/bar.proto

Hot to add extensions when running NetLogo headlessy on a cluster?

I am using a common Netlogo extension, "CSV", to read a table. The job fails because it cannot find the extension (although I am sure the extension file is present).
How do I specify that I want to use an extension when working with Netlogo headlessly?
Here is my script:
#!/bin/bash
module load jdk-13.0.2
java -Xmx1024m -Dfile.encoding=UTF-8 -cp \
/opt/software/uoa/2019/apps/netlogo/netlogo-6.1.0/app/netlogo-6.1.0.jar \
org.nlogo.headless.Main \
--model /uoa/home/s11as6/Desktop/SABM-v.8.4-NL6.1.0.nlogo \
--experiment dqi_stability_exp \
--table SABM-table-results.csv \
--threads 1
Here is the error log:
Exception in thread "main" Can't find extension: csv at position 12 in
at org.nlogo.core.ErrorSource.signalError(ErrorSource.scala:11)
at org.nlogo.workspace.ExtensionManager.importExtension(ExtensionManager.scala:178)
at org.nlogo.parse.StructureParser$.$anonfun$parsingWithExtensions$1(StructureParser.scala:74)
at org.nlogo.parse.StructureParser$.$anonfun$parsingWithExtensions$1$adapted(StructureParser.scala:68)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.nlogo.parse.StructureParser$.parsingWithExtensions(StructureParser.scala:68)
at org.nlogo.parse.StructureParser$.parseSources(StructureParser.scala:33)
at org.nlogo.parse.NetLogoParser.basicParse(NetLogoParser.scala:17)
at org.nlogo.parse.NetLogoParser.basicParse$(NetLogoParser.scala:15)
at org.nlogo.parse.FrontEnd$.basicParse(FrontEnd.scala:10)
at org.nlogo.parse.FrontEndMain.frontEnd(FrontEnd.scala:26)
at org.nlogo.parse.FrontEndMain.frontEnd$(FrontEnd.scala:25)
at org.nlogo.parse.FrontEnd$.frontEnd(FrontEnd.scala:10)
at org.nlogo.compile.CompilerMain$.compile(CompilerMain.scala:43)
at org.nlogo.compile.Compiler.compileProgram(Compiler.scala:54)
at org.nlogo.headless.HeadlessModelOpener.openFromModel(HeadlessModelOpener.scala:50)
at org.nlogo.headless.HeadlessWorkspace.openModel(HeadlessWorkspace.scala:539)
at org.nlogo.headless.HeadlessWorkspace.open(HeadlessWorkspace.scala:506)
at org.nlogo.headless.Main$.newWorkspace$1(Main.scala:18)
at org.nlogo.headless.Main$.runExperiment(Main.scala:21)
at org.nlogo.headless.Main$.$anonfun$main$1(Main.scala:12)
at org.nlogo.headless.Main$.$anonfun$main$1$adapted(Main.scala:12)
at scala.Option.foreach(Option.scala:274)
at org.nlogo.headless.Main$.main(Main.scala:12)
at org.nlogo.headless.Main.main(Main.scala)
slurmstepd: error: *** JOB 414910 ON hmem05 CANCELLED AT 2020-04-16T18:15:09 DUE TO TIME LIMIT ***
The simplest solution was to place the folder of the CSV extension (along with its files) in the same directory where the model was.

Slowdown when using Mutect2 container inside Nextflow

I'm trying to run MuTect2 on a sample, which on my machine using java takes about 27 minutes to run.
If I use virtually the same code, but inside Nextflow and using the GATK3:3.6 docker container to run Mutect, it takes 7 minutes longer, for seemingly no apparent reason.
Running on Ubuntu 18.04, the tumor and normal samples are from an Oncomine panel. Tumor is 4.1G, normal is 1.1G. I thought the time might be spent copying in data to the container, but 7-8 minutes seems far too long for that. Could it be from copying in reference files too?
bai_ch is the channel that brings in the tumor and normal index files
process MuTect2 {
label 'mutect'
stageInMode 'copy'
publishDir './output', mode : 'copy', overwrite : true
input:
file tumor_bam_mu from tumor_mu
file normal_bam_mu from normal_mu
file "*" from bai_ch
file mutect2_ref
file ref_index from ref_fasta_i_m
file ref_dict from Channel.fromPath(params.ref_fast_dict)
file regions_file from Channel.fromPath(params.regions)
file cosmic_vcf from Channel.fromPath(params.cosmic_vcf)
file dbsnp_vcf from Channel.fromPath(params.dbsnp_vcf)
file normal_vcf from Channel.fromPath(params.normal_vcf)
output:
file '*' into mutect_ch
script:
"""
ls
echo MuTect2 task path: \$PWD
java -jar /usr/GenomeAnalysisTK.jar \
--analysis_type MuTect2 \
--reference_sequence hg19.fa \
-L designed.bed \
--normal_panel normal_panel.vcf \
--cosmic Cosmic.vcf \
--dbsnp dbsnp.vcf \
--input_file:tumor $tumor_bam_mu \
-o mutect2.somatic.unfiltered.vcf \
--input_file:normal $normal_bam_mu \
--max_alt_allele_in_normal_fraction 0.1 \
--minPruning 10 \
--kmerSize 60
"""
}
My only thought is to create my own docker that has the reference files handy, which will probably save time for copying them in? I'd expect the nextflow+container version to run only slightly slower than the CLI version.
Check the task Bash wrapper in the task work dir to asses the performance issue.

Lunch TDCH to Load to load data from Hive parquet table to Teradata

I need to load data from Hive tables which stored as parquet files to Teradata Database using TDCH(Teradata connector for Hadoop). I use TDCH 1.5.3 and CDH 5.8.3. and Hive 1.1.0
I try to start TDCH usign hadoop jar command and getting the Error:
java.lang.ClassNotFoundException:
org.apache.parquet.hadoop.util.ContextUtil
Is anybody have any idea why it's happened?
When looking at your problem that you might not have all Hive Libraries needed to be able to upload to Teradata.
Here is a example of script that could be used for exporting from Hive to TD.
#!/bin/bash
## Declare Hive Source and Teradata Target
Source_Database="???"
Source_Table="???"
Target_Database="???"
Target_Table="???"
JDBC="???"
## Format
Format="???"
## Declare User used to Connect and Load Data
MYUSER="???"
MYPASSWORD="???"
## Display configuration libraries.
echo $USERLIBTDCH
echo $LIB_JARS
## Define the connection option
hadoop jar $USERLIBTDCH \
com.teradata.connector.common.tool.ConnectorExportTool \
-libjars $LIB_JARS \
-url jdbc:teradata://$JDBC/logmech=ldap,database=$Target_Database,charset=UTF16 \
-username $MYUSER \
-password $MYPASSWORD \
-jobtype hive \
-fileformat $Format \
-sourcedatabase $Source_Database \
-sourcetable $Source_Table \
-targettable $Target_Table \
-method internal.fastload \
-nummappers 1`
Before using this script you need to check if the libraries that you put into hadoop jars are configured. This means that all Path variables are set and as follows by calling (use your variable name)
echo $USERLIBTDCH
Expected output of the PATH Variable (This is how it looks in Cloudera Enviroment)
/opt/cloudera/parcels/CDH/lib/avro/avro.jar,
/opt/cloudera/parcels/CDH/lib/avro/avro-mapred-hadoop2.jar,
/opt/cloudera/parcels/CDH/lib/hive/conf,
/opt/cloudera/parcels/CDH/lib/hive/lib/antlr-runtime-3.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/commons-dbcp-1.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/commons-pool-1.5.4.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-core-3.2.10.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/datanucleus-rdbms-3.2.9.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/jdo-api-3.0.1.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/libfb303-0.9.2.jar,
/opt/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar,
/opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar,
/opt/jars/parquet-hadoop-bundle.jar
I would probably expect that the Path Variable is not properly set. For this your could use the following command. To create all the necesseary paths.
PATH=$PATH:~/opt/bin
PATH=~/opt/bin:$PATH
If you look at the Teradata Connector Documentation you need to specify the following libraries.
Hive Job(version 0.11.0 as example):
a) hive-metastore-0.11.0.jar
b) hive-exec-0.11.0.jar
c) hive-cli-0.11.0.jar
d) libthrift-0.9.0.jar
e) libfb303-0.9.0.jar
f) jdo2-api-2.3-ec.jar
g) slf4j-api-1.6.1.jar
h) datanucleus-core-3.0.9.jar
i) datanucleus-rdbms-3.0.8.jar
j) commons-dbcp-1.4.jar
k) commons-pool-1.5.4.jar
l) antlr-runtime-3.4.jar
m) datanucleus-api-jdo-3.0.7.jar
HCatalog Job:
a) above Hive required jar files
b) hcatalog-core-0.11.0.jar
Hope this helps.

Universal wiki Converter throws Out Of Memory Error

I am trying to use uwc 4.0 to convert a moinmoin site, but running out of heap space, no matter how much memory i increase. Currently its (run_cmdline.sh)
# BEGIN
#!/bin/bash
MYCWD=`pwd`
CLASSPATHORIG=$CLASSPATH
CLASSPATH="uwc.jar"
for file in lib/*.jar ; do
CLASSPATH=$MYCWD/$file:$CLASSPATH
done
CLASSPATH=$CLASSPATH:$CLASSPATHORIG
export CLASSPATH
# run out of the sample_files dir
#cd sample_files
java -Xdebug -Xms2G -Xmx4G $APPLE_ARGS -classpath $CLASSPATH com.atlassian.uwc.ui.UWCCommandLineInterface $1 $2 $3 $4
## END
i run the following on command line:
sudo ./run_cmdline.sh conf/confluenceSettings.properties conf/converter.moinmoin.properties /opt/atlassian/moin/
P.S. If i use just ONE Small folder from the moinmoin pages directory, and try to export it, i get:
java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:542)
at com.atlassian.uwc.ui.ConverterEngine.createPageTable(ConverterEngine.java:2112)
at com.atlassian.uwc.ui.ConverterEngine.sendPage(ConverterEngine.java:2014)
at com.atlassian.uwc.ui.ConverterEngine.sendPage(ConverterEngine.java:1719)
at com.atlassian.uwc.ui.ConverterEngine.writePages(ConverterEngine.java:1356)
at com.atlassian.uwc.ui.ConverterEngine.convert(ConverterEngine.java:421)
at com.atlassian.uwc.ui.ConverterEngine.convert(ConverterEngine.java:215)
at com.atlassian.uwc.ui.UWCCommandLineInterface.convert(UWCCommandLineInterface.java:175)
at com.atlassian.uwc.ui.UWCCommandLineInterface.main(UWCCommandLineInterface.java:61)
Only Confluence 3.5 and lower versions are supported by UWC , and not any Confluence version above 3.5

Resources