PyFlink reading Parquet files - parquet

I'm happily reading text files via env.read_text_file(file_path), but how can I read a parquet file in PyFlink?
I'm aware of
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/dataset/formats/parquet/ for Java/Scala ...but is is possible with the Python version to somehow include the flink-parquet__2.11 dependency, and then have a FileSource with the parquetInputFormat?

Related

Error genereting .tagger file in stanford nlp?

Please can you tell me where I make mistake during generating .tagger file? I have create a properties .props file and also have .csv file that is contain about 50000 word for macedonian language and there gramatical meaning.
This two files .props and .csv are located in the root of the stanford model that I downloaded. To generate the .tagger file I use command prompt with the following command:
java -mx1g edu.stanford.nlp.tagger.maxent.MaxentTagger -props mkbase.csv.props
But when it start to generate it take to me Reflection exception:
Exception ReflectionLoading$Reflection Loading Reflection Exception:Error loading edu.stanford.nlp.optimization.OWLQNMinimazer
and other additional lines related with MaxentTagger.
In your properties file, change search = owlqn to search = qn and this issue should go away.

Include docx file in asciidoc?

I am using asciidoc with asciidoctor to create documentation for a current project.
I notice there is a markup to include files in the documentation like so:
link:index.html
or
link:protocol.json[Open the JSON file]
Is it possible to include a docx file as a link so that it would open externally or be able to downloaded?
Also can I put this file in a folder inside my asciidoc directory (for the sake of organization) and still be able to properly reference it?
You write something like this:
Open this link:somefile.docx[Word file] should work.
Or this link:file:///C:/Users/xxx/docs/otherfile.docx[second file].
It works with relative path or absolute path.
You need to ensure that the path to your file will be correct for the reader of your document.
Example: if you put the HTML files produced by Asciidoctor on a webserver (public or intranet), having a path referencing your local C: is not a good idea.
It is hard to tell you what to do without knowledge of your publication/distribution toolchain.

Opening JSON file in JRuby throws exception when evoking the Java class files generated

I use jrubyc to compile Ruby into class files. One of the Ruby files contains
dat = File.open "data.json", "r"
And there's a "data.json" file alongside. This program runs well if I directly use the jruby command.
After I compiled the Ruby files and put them into a jar, the following error appears when I run java -jar:
Exception in thread "main" org.jruby.exceptions.RaiseException: (Errno::ENOENT) data.json
at org.jruby.RubyFile.initialize(org/jruby/RubyFile.java:334)
at org.jruby.RubyIO.open(org/jruby/RubyIO.java:1144)
at RUBY.(root)(file:/Users/x5lai/Documents/rqrcode.jar!/read.rb:2)
...
To make sure that I have not put data.json in the wrong place, I have copied data.json all over the jar file, but the same error occurs.
Is there anyway to do this? Is JRuby unable to open JSON files once I have compiled the code?
I don't think that it is going to look inside the JAR by default. I created a small test and was able to reproduce your issue. I then did touch data.json and the code no longer had an error. I'm not sure how to specify that you want to look inside the JAR for your data file.

Where is the (meta) .proto file which describes .desc files?

Where is the (meta) .proto file which describes .desc files?
I make .desc files with:
protoc --descriptor_set_out=foo.desc --include_imports foo.proto
Am I correct in believing that the .desc files are in protobuf format?
If so, where can I get the .proto file which describes their format?
The format is FileDescriptorSet as defined in descriptor.proto:
https://code.google.com/p/protobuf/source/browse/trunk/src/google/protobuf/descriptor.proto
descriptor.proto is typically installed to /usr/include/descriptor.proto or /usr/local/include/descriptor.proto on Unix systems. descriptor.pb.h is installed with the protobuf headers and descriptor.pb.cc is compiled into the protobuf library, so you don't have to generate them yourself if you are using C++. Similarly, in Java, the com.google.protobuf.DescriptorProtos class is compiled into the base library.
If you install protocol buffers, the definition is in
<PB install directory>/src/google/protobuf/descriptor.proto
Some/Most of the Instalation processes (e.g. Java) will "Generate" pb classes from this definition.
As keton said it is also available at
https://code.google.com/p/protobuf/source/browse/trunk/src/google/protobuf/descriptor.proto
Presumably, it should be in the reference documentation, here:
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor.pb#FileDescriptorSet

Separating header and source files after compiling .proto with Protocol Buffers

I'm working on the project with the a structure similar to the following:
root/inc/foo/bar/
root/src
I've just started to use Google Protocol Buffers and when I compile the code I found that I need add foo/bar/file.h to the file.cc file in order for the code to find the header. I don't plan to commit the .h and .cc files to the repo since they get automatically generated. Is there a parameter I can give protoc to seperate the header/source files into different directories and add the correct path to the source file #includes?
Maybe you could append a script "mv foo.h foofolder/" after executing the protoc

Resources