org.apache.flink.api.java.io.jdbc.JDBCInputFormat NOT INSIDE FLINK JARS - jdbc

I have created a new Java project in
eclipse-jee-kepler-SR2-win32-x86_64.
I have included the Jars in
flink-0.8.1\lib.
I have created the standard WordCount and it works.
I have modified my WordCount to take input from text files and csv files and it works.
all the imports work perfectly.
then i tried import org.apache.flink.api.java.io.jdbc.JDBCInputFormat.
Eclipse doesn't find it?
Why does Eclipse not find the import?
Because inside the jar flink-java-0.8.1.jar there is no directory io/jdbc.
I tried the same thing with flink-0.9.0-bin-hadoop27 and in the jar flink-dist-0.9.0.jar there is no org/apache/flink/api/java/io/jdbc directory. I uncompressed the jar and searched for the string "jdbcinputformat" with 0 results. I searched the string "jdbc" and it is only mentioned in org/apache/log4j, org/eclipse/jetty, and in other places that are not org.apache.flink.api.java.io
So my question is: Where do I find the class JDBCInputFormat?
What can I do to access SqlServer2012 in Flink (apart from accessing it outside Flink, create csv files, and then reading them in Flink (It sounds horrible to me since there should be a class specific for that))?

The corresponding module is not included. In order to use it, you need to build Flink from scratch. Run the following commands:
git clone https://github.com/apache/flink.git
cd flink
mvn -DskipTests clean install
This builds the latest snapshot for flink-0.10-SNAPSHOT. If you want to use stable version 0.9 run different git clone command:
git clone -b release-0.9 https://github.com/apache/flink.git
In your current project, you need to change the used Flink version in your pom file accordingly, eg, 0.10-SNAPSHOT or 0.9-SNAPSHOT.

Related

Build/Run Elasticsearch Locally with plugins

(On Elasticsearch version 6.5.1)
How can I build/run Elasticsearch from source with local plugins?
I've tried the following command to install the plugins:
./distribution/build/cluster/run\ node0/elasticsear-6.5.1-SNAPSHOT/bin/elasticsearch-plugin install file:/<path_to_plugin_zip> and that says it successfully installed the plugin.
However, when I run elasticsearch via ./gradlew run --debug-jvm, it cleans out the contents of that directory before running ES.
The reason I installed the plugin into that particular directory is that I put a debugger in the PluginsService.java file, and saw that the Path pluginsDirectory parameter in the constructor was set to /Users/jreback/Desktop/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.5.1-SNAPSHOT/plugins.
So, how can I get my plugin installed on my local ES version and run ES such that the plugin code doesn't get removed as the process starts up? Many thanks in advance!
FWIW, I got this working with some manual code changes (there may be or likely is a more recommended way to do this, but this worked for me).
In my ES checkout, I made the following code change to server/src/main/java/org/elasticsearch/env/Environment.java:
replace this line: pluginsFile = homeFile.resolve("plugins"); with pluginsFile = Paths.get("<path to plugin directory");
(Also, you must import java.nio.file.Paths at the top of that file).
The directory structure for the directory you listed above should look like this:
- plugin parent directory (should whatever you put in the Environment.java file)
- plugin directory (name of the plugin)
- plugin-descriptor.properties file
- plugin jar file (generated from building the plugin in some prior step)
Then you should see that it loaded the plugin you've just added in the logs when you start up ES again.

Using parquet tools on files in hdfs

I downloaded and built parquet-1.5.0 of https://github.com/apache/parquet-mr.
I now want to run some commands on my parquet files that are in hdfs. I tried this:
cd ~/parquet-mr/parquet-tools/src/main/scripts
./parquet-tools meta hdfs://localhost/my_parquet_file.parquet
and I got:
Error: Could not find or load main class parquet.tools.Main
Download jar
Download the jar from maven repo, or any location of your choice. Just google it. The time of this post I can get the parquet-tools from here.
If you’re logged in the hadoop box:
wget http://central.maven.org/maven2/org/apache/parquet/parquet-tools/1.9.0/parquet-tools-1.9.0.jar
This link might stop working few days later. So get the new link from maven repo.
Build jar
If you are unable to download the jar, you could also build the jar from source. Clone the parquet-mr repo and build the jar from the source
git clone https://github.com/apache/parquet-mr
mvn clean package
Note: you need maven on your box to build the source.
Read parquet file
You can use these commands to view the contents of the parquet file-
Check schema for s3/hdfs file:
hadoop jar parquet-tools-1.9.0.jar schema s3://path/to/file.snappy.parquet
hadoop jar parquet-tools-1.9.0.jar schema hdfs://path/to/file.snappy.parquet
Head file contents:
hadoop jar parquet-tools-1.9.0.jar head -n5 s3://path/to/file.snappy.parquet
Check contents of local file:
java -jar parquet-tools-1.9.0.jar head -n5 /tmp/path/to/file.snappy.parquet
java -jar parquet-tools-1.9.0.jar schema /tmp/path/to/file.snappy.parquet
More commands:
hadoop jar parquet-tools-1.9.0.jar –help
The script is built on the assumption that parquet-tools-<version>.jar is located in a directory called lib next to the script file itself, like so:
$ find -type f
./parquet-tools
./lib/parquet-tools-1.10.1-SNAPSHOT.jar
You can set up such a file layout by issuing the following commands from the root of the parquet-mr git repo (of course many alternative ways and installation locations are possible):
mkdir -p ~/.local/share/parquet-tools/lib
cp parquet-tools/src/main/scripts/parquet-tools ~/.local/share/parquet-tools/
cp parquet-tools/target/parquet-tools-1.5.0.jar ~/.local/share/parquet-tools/lib
After this you can run ~/.local/share/parquet-tools/parquet-tools. (I tested this with version 1.10.1-SNAPSHOT though instead of 1.5.0.)

How to add a new module to Spark source and make it work in spark-shell?

I'm using IDEA15 and I want to add a new module to the spark source.
I clicked File->new->module and chose a maven module. Then I set the "Add as module to..." option and the "Parent" to "Spark Project Parent POM". After typing in the module name I clicked "Finish".
Then I added some code to my new module and build it using the follwing command
"build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package"
The project was built successfully but in the spark-shell I can't import my newly added classes.
I wonder what's wrong with what I've done and how can I add a new module and then import it in the spark-shell?
Thanks a lot!
PS: I'm sure there's no problem with my code. I added my code in the mllib module and it worked.
Maybe some dependency is missing but I don't know how to fix it.
The Maven build created a jar file from your build (should be in the target/ directory inside your project folder).
When you start the Spark shell, you can define the jar files to include in your shell. You can include your jar there like this:
spark-shell --jars /path/to/your/project.jar
Also you could give it a try, to install your project inside your local Maven repository, it is possible that the Spark shell can pick it up (so you don't have to specify --jars each time you run the it).
For this, run your Maven build command with clean install instead of clean package at the end.

problems running state machine examples

Congratulations on the spring state machine, I found it yesterday and have been trying it out, specifically the turnstile example running in STS. I found it very easy and intuitive to build a FSM.
Because spring shell doesn't work well in STS I tracked down the instructions to run the examples from the command line in the reference doc,
"java -jar
spring-statemachine-samples-turnstile-1.0.0.BUILD-SNAPSHOT.jar"
,
but running it got an error
"no main manifest attribute, in spring-statemachine-samples-turnstile-1.0.0.BUILD-SNAPSHOT.jar".
Although not even a novice in using gradle, I tried fixing this by adding this line to build.gradle in the jar section
"manifest.attributes['Main-Class'] = 'demo.turnstile.Application'"
(which doesn't handle the various sub-projects I know) but got this error
"NoClassDefFoundError: org/springframework/shell/Bootstrap".
If it is possible to run the samples from gradle, could you include them in the reference document? I tried running the samples using
gradle run
but it there was no interaction with the shell scripts.
Samples are designed to be run as executable jar and with shell so that you can interact without a need to recompile with every change. Your error indicates that you didn't build that sample jar as mentioned in docs.
./gradlew clean build -x test
This will automatically use spring boot plugin which will add the necessary jar manifest headers to jar meta info to make it a true executable jar. Essentially every every sample is a spring boot app.
Building SM sample projects in Windows Environment:
Open Command prompt (windows key + r -->cmd-->Enter), Change directory to project root folder spring-statemachine-master (Inside the Extracted folder).
Run gradlew install to get all spring dependencies copied to local machine.
Run gradlew clean build -x test to get the spring shell jars built. Courtesy Janne
These steps should ideally get all .jar built, look into \build\libs folder of respective sample project for jar files.
Run the like any other java jar file java -jar [jar-file-name.jar] (make sure to be change directory to jar file directory location).
One more thing where I was stuck was, How to give events to SM:
It's like this sm event EVENT_NAME_AS_DEFINED_IN_CLASS. Ref
E.g.: sm event RINSE --> to washer project

How to index a Maven repo without Nexus/Artifactory/etc?

I run my own little Maven repo for some open source. I have no dedicated server so I use a Google code repository, deploy to file system and then commit and push. Works perfect for me.
But some Maven tools are looking for a nexus-maven-repository-index.properties and the index (in GZ). I would like to generate this index to
get rid of the warning that it's not here
Maven doesn't try the repo for artefacts that are not there.
How can I do that? Is there a tool (Java main) that is able to generate an index? Also tips how to use the proper Nexus Jars with a little commandline tool are welcome.
I came across this post while I was searching for a solution to add a local repository to my Maven project using IntelliJ Idea.
Since Sonatype changed their paths and reorganized the downloads since the last post, here is an updated step-by-step tutorial to get your repository indexed for use with IntelliJ Idea:
Download the latest stand-alone indexer from here.
Extract it somewhere and go into this directory
From the console, run this command: export REPODIR=/path/to/your/local/repo/ && java org.sonatype.nexus.index.cli.NexusIndexerCli -r $REPODIR -i $REPODIR/.index -d $REPODIR/.index -n localrepo
In the directory .index within the repository directory, some files will be created including the file "nexus-maven-repository-index.gz" which is the file IntelliJ looks out for.
You can use the Maven Indexer CLI to product the index directly, but why bother hosting your own repo when OSS projects can use a hosted one for free?
http://nexus.sonatype.org/oss-repository-hosting.html
I was looking at maven indexer... but I am not sure what for is the last parameter indexDir in the method:
public RepositoryIndexer createRepositoryIndexer(String repositoryId,
File repositoryBasedir,
File indexDir)
is it like starting point in the repositoryBasedir?

Resources