spaCy: collapsed dependencies - stanford-nlp

Is there a way to get collapsed dependencies from the parser using spacy? I mean the Stanford definition of CSD viz.
In the collapsed representation, dependencies involving prepositions,
conjuncts, as well as information about the referent of relative
clauses are collapsed to get direct dependencies between content words
Thanks

There has been a discussion on this issue in Spacy's github page. It seems the current API doesn't provide this.
You can probably use a constituency parser along with the dependency parse and write rules to get the collapsed dependencies.

JoBimText Dependency Collasping may do what you want.
You can use it to produced collapsed dependencies from raw text:
java -jar org.jobimtext.collapsing.jar -i corpus/en -o output -f t
Or you can give it CoNLL input (where tagging/parsing have already been applied) and it will collapse the dependencies for you:
java -jar org.jobimtext.collapsing.jar -i corpus_tagged -o output -f c -np -nt
If you just want to understand their process for collapsing the dependencies, see the technical report in this repository.

Related

Aggregating `bazel test` reports when testing many targets

I am trying to aggregate all the test.xml reports generated after a bazel test run. The idea is to then upload this full report to a CI platform with a nicer interface.
Consider the following example
$ find .
foo/BUILD
bar/BUILD
$ bazel test //...
This might generate
./bazel-testlogs/foo/tests/test.xml
./bazel-testlogs/foo/tests/... # more
./bazel-testlogs/bar/tests/test.xml
./bazel-testlogs/bar/tests/... # more
I would love to know if there is a better way to aggregate these test.xml files into a single report.xml file (or the equivalent). This way I only need to publish 1 report file.
Current solution
The following is totally viable, I just want to make sure I am not missing some obvious built in feature.
find ./bazel-testlogs | grep 'test.xml' | xargs [publish command]
In addition, I will check out the JUnit output format, and see if just concatenating the reports is sufficient. This might work much better.

pandoc does not produce bibliography when biblio file is in YAML-metadata only

I assume that inserting a reference to a BibTex bibliography in a YAML-metadata is sufficient for the references to be produced. This is like pandoc does not print references when .bib file is in YAML, which was perhaps misunderstood and which has no accepted answer yet.
I have the example input file:
---
title: Ontologies of what?
author: auf
date: 2010-07-29
keywords: homepage
abstract: |
What are the objects ontologists talk about.
Inconsistencies become visible if one models real objects (cats) and children playthings.
bibliography: "BibTexExample.bib"
---
An example post. With a reference to [#Frank2010a] and more.
## References
I invoke the conversion to latex with :
pandoc -f markdown -t pdf postWithReference.markdown -s --verbose -o postWR.pdf -w latex
The pdf is produced, but it contains no references and the text is rendered as With a reference to [#Frank2010a] and more. demonstrating that the reference file was not used. The title and author is inserted in the pdf, thus the YAML-metadata is read. If I add the reference file on the command line, the output is correctly produce with the reference list.
What am I doing wrong? I want to avoid specifying the bibliography file (as duplication, DRY) on the command line. Is there a general switch to demand bibliography processing and leaving the selection of the bibliography file to the document YAML-metada?
In the more recent version requires --citeproc instead of --filter=pandoc-citeproc
Theo bibliography is inserted by the pandoc-citeproc filter. It will be run automatically when biblioraphy is set via the command lines, but has to be run manually in cases such as yours. Addind --filter=pandoc-citeproc will make it work as expected.

How to handle case sensitive import collisions

I am using a third party library in GoLang that has previously had import paths in different case. Initially a letter was lower case then the author changed it to uppercase.
Some plugin authors updated their libraries and others didn't. In the meantime the original library author reverted the case change.
Now I find myself in a state where my application won't build due to to case import collisions.
How can one go about fixing this?
Many thanks
You could vendor the dependencies, and then go into the vendor/ directory and manually change (try greping or seding the dependency), the dependencies.
For an introduction to vendoring, try here, https://blog.gopheracademy.com/advent-2015/vendor-folder/
The original repo can still live in your GOPATH, but the 'corrected' version can go in the vendor folder, in which the compiler will look first when linking dependencies.
There are many tools for vendoring, I use govendor
Edit
As mkopriva mentions in the comments, you can refactor import names using the gofmt tool:
gofmt -w -r '"path/to/PackageName" -> "path/to/packagename"' ./
gofmt -w -r 'PackageName.x -> packagename.x' ./
The lowercase single character identifier is a wildcard.
from the docs
The rewrite rule specified with the -r flag must be a string of the form:
pattern -> replacement
Both pattern and replacement must be valid Go expressions. In the pattern, single-character lowercase identifiers serve as wildcards matching arbitrary sub-expressions; those expressions will be substituted for the same identifiers in the replacement.
Just if anyone wonders why this error might occur in your project: Make sure all imports use either lowercase or uppercase path, but not mixed.
So like this:
one.go -> "github.com/name/app/login"
another.go -> "github.com/name/app/login"
And not like this:
one.go -> "github.com/name/app/Login"
another.go -> "github.com/name/app/login"

Including a postscript file into another one?

I wonder if there a standard way to include a postscript file into another.
For example, say I have got one file of data generated by a 3rd party program:
%!PS
\mydata [ 1 2 3 4 5 6
(...)
1098098
1098099
] def
and I would like to include it into a main PS document
%PS
\processData
{
mydata { (..) } foreach
}
(...)
(data.ps) include %<=== ???
Thanks
The operator you want is run.
string run -
execute contents of named file
Unfortunately, run is not allowed if the interpreter has the SAFER option set.
Edit: Bill Casselman, author of *Mathematical Illustrations" has a Perl script called psinc you can use to "preprocess" yor postscript files, inlining all (...) run files.
The standard way to include PostScript is to make the code to be included an EPS (Encapsulated PostScript) file. There are rules on how encapsulated PostScript must be created, and how to include it. See Adobe Tech Note 5002 'Encapsulated PostScript File Format Specification'
Simply executing 'run' on a PostScript file may well work, but it might also cause problems. Many PostScript files (especially those produced by 3rd parties) will include procedure definitions which may clash with your own names, and also the included program may leave the interpreter in a state different from the one it was in when the included file was executed. At the very least you should execute a save/restore pair around the code included via 'run'.
I would suggest meta-solution: use C preprocessor or M4 preprocessor. They are powerful tools and their power may find use in other ways as well, not only file inclusion. Though this was not asked, but use of Makefile will be wise to automate whole workflow. By using a preprocessor and Makefile in combination you can elegantly automate complex inclusions processing and beyond.
C Preprocessor
Including a file:
#include "other.ps"
Commandline for preprocessing:
cpp -P main.pps main.ps
M4 Preprocessor
Including a file:
include(other.ps)
Commandline for preprocessing:
m4 main.pps > main.ps

Join multiple Coffeescript files into one file? (Multiple subdirectories)

I've got a bunch of .coffee files that I need to join into one file.
I have folders set up like a rails app:
/src/controller/log_controller.coffee
/src/model/log.coffee
/src/views/logs/new.coffee
Coffeescript has a command that lets you join multiple coffeescripts into one file, but it only seems to work with one directory. For example this works fine:
coffee --output app/controllers.js --join --compile src/controllers/*.coffee
But I need to be able to include a bunch of subdirectories kind of like this non-working command:
coffee --output app/all.js --join --compile src/*/*.coffee
Is there a way to do this? Is there a UNIXy way to pass in a list of all the files in the subdirectories?
I'm using terminal in OSX.
They all have to be joined in one file because otherwise each separate file gets compiled & wrapped with this:
(function() { }).call(this);
Which breaks the scope of some function calls.
From the CoffeeScript documentation:
-j, --join [FILE] : Before compiling, concatenate all scripts together in the order they were passed, and write them into the specified file. Useful for building large projects.
So, you can achieve your goal at the command line (I use bash) like this:
coffee -cj path/to/compiled/file.js file1 file2 file3 file4
where file1 - fileN are the paths to the coffeescript files you want to compile.
You could write a shell script or Rake task to combine them together first, then compile. Something like:
find . -type f -name '*.coffee' -print0 | xargs -0 cat > output.coffee
Then compile output.coffee
Adjust the paths to your needs. Also make sure that the output.coffee file is not in the same path you're searching with find or you will get into an infinite loop.
http://man.cx/find |
http://www.rubyrake.org/tutorial/index.html
Additionally you may be interested in these other posts on Stackoverflow concerning searching across directories:
How to count lines of code including sub-directories
Bash script to find a file in directory tree and append it to another file
Unix script to find all folders in the directory
I've just release an alpha release of CoffeeToaster, I think it may help you.
http://github.com/serpentem/coffee-toaster
The most easy way to use coffee command line tool.
coffee --output public --join --compile app
app is my working directory holding multiple subdirectories and public is where ~output.js file will be placed. Easy to automate this process if writing app in nodejs
This helped me (-o output directory, -j join to project.js, -cw compile and watch coffeescript directory in full depth):
coffee -o web/js -j project.js -cw coffeescript
Use cake to compile them all in one (or more) resulting .js file(s). Cakefile is used as configuration which controls in which order your coffee scripts are compiled - quite handy with bigger projects.
Cake is quite easy to install and setup, invoking cake from vim while you are editing your project is then simply
:!cake build
and you can refresh your browser and see results.
As I'm also busy to learn the best way of structuring the files and use coffeescript in combination with backbone and cake, I have created a small project on github to keep it as a reference for myself, maybe it will help you too around cake and some basic things. All compiled files are in www folder so that you can open them in your browser and all source files (except for cake configuration) are in src folder. In this example, all .coffee files are compiled and combined in one output .js file which is then included in html.
Alternatively, you could use the --bare flag, compile to JavaScript, and then perhaps wrap the JS if necessary. But this would likely create problems; for instance, if you have one file with the code
i = 0
foo = -> i++
...
foo()
then there's only one var i declaration in the resulting JavaScript, and i will be incremented. But if you moved the foo function declaration to another CoffeeScript file, then its i would live in the foo scope, and the outer i would be unaffected.
So concatenating the CoffeeScript is a wiser solution, but there's still potential for confusion there; the order in which you concatenate your code is almost certainly going to matter. I strongly recommend modularizing your code instead.

Resources