Figuring out different CoNLL format

Figuring out different CoNLL format - stanford-nlp

I am trying to generate a conll file from Stanford Core NLP, which then can be used as an input to Semafor (as semafor accepts conll file only).
The generated file looks like this:
1 My my PRP$ O 2 nmod:poss
2 kitchen kitchen NN O 5 nsubj
3 no no RB O 4 neg
4 longer longer RB O 5 advmod
5 smells smell VBZ O 0 ROOT
6 . . . O 5 punct
When I use this file, the Semafor server returns illegalArgument exception since the format is slightly different. Their example conll file looks like this:
1 My _ PRP$ PRP$ _ 2 NMOD _ _
2 kitchen _ NN NN _ 5 SBJ _ _
3 no _ RB RB _ 5 ADV _ _
4 longer _ RB RB _ 3 AMOD _ _
5 smells _ VBZ VBZ _ 0 ROOT _ _
6 . _ . . _ 5 P _ _
It seems that I can control the output by defining the keys. The default keys are ID, FORM, LEMMA,POSTAG,NER, HEAD, DEPREL. However, I don't know the keys for the example conll file provided by Semafor. Please guide me how I might convert the generated file format into Semafor example file format.

I believe that Semafor can generate its own conll file in the format that it needs. We use Stanford Core NLP just to split a document into sentences per line, and then use Semafor itself to generate the conll file.

Related

How to change comments in excel file with go

I'm using the excelize library.
newfile, _ := excelize.OpenFile("filename.xlsx")
println(newfile.GetComments())
//map[Sheet1:[{Author 0 A2 comment1}]]
_ = newfile.InsertRow("Sheet1", 1)
println(newfile.GetComments())
//map[Sheet1:[{Author 0 A2 comment1}]]
the coordinates of my comment comment1 have not changed. How to solve this problem?

How to Make _ _ LINE _ _ and _ _ FILE_ _ run in perl? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
(The Script)
#!/usr/bin/perl
# Program, named literals.perl
written to test special literals
1 print "We are on line number ", _ _LINE_ _, ".\n";
2 print "The name of this file is ",_ _FILE_ _,".\n";
3 _ _END_ _
And this stuff is just a bunch of chitter–chatter that is to be
ignored by Perl. The _ _END_ _ literal is like Ctrl–d or \004.[a]
(Output)
1 We are on line number 3.
2 The name of this file is literals.perl.
Explanation
The special literal _ _LINE_ _ cannot be enclosed in quotes if it is to be interpreted. It holds the current line number of the Perl script.
The name of this script is literals.perl. The special literal _ _FILE_ _ holds the name of the current Perl script.
The special literal _ _END_ _ represents the logical end of the script. It tells Perl to ignore any characters that follow it.
print "The script is called", _ _FILE_ _, "and we are on line number ",
_ _LINE_ _,"\n";
(Output)
The script is called ./testing.plx and we are on line number 2
I need help getting this example to work. I am having a bit of trouble running it. when i run it on console2 i would get an error stating this
"cant locate object method "_" via package "LINE" (perhaps you
forgot to load "LINE"?) at C:\users\john\desktop\console2\test.pl
line 5.
Any ideas on how to fix this would be most appreciated. Thanks!

You need to use names without spaces between the underscores:
#!/usr/bin/env perl
use strict;
use warnings;
print "We are on line number ", __LINE__, ".\n";
print "The name of this file is ", __FILE__, ".\n";
__END__
print "This is not part of the script\n";
When saved in fileline.pl and run, this produces:
We are on line number 5.
The name of this file is fileline.pl.
Note that there are no spaces between the consecutive underscores. And note that the final line containing a print statement is not part of the script because it comes after the __END__. (There's also a __DATA__ directive that can sometimes be useful.)

There's no space in the literals __FILE__ and __LINE__ (or __END__). Just 2 underscores in a row, the word, and another 2 underscores.

Using sphinx for very basic structure

This is my file hierarchy:
InfoRescue
|
|_ src
|
|_ _ _ includes
|
|_ _ _ _ _ i1.py
|_ _ _ _ _ i2.py
|_ _ _ _ _ init.py
|
|_ _ _ utils
|
|_ _ _ _ _ u1.py
|_ _ _ _ _ u2.py
|_ _ _ _ _ init.py
|
|_ _ _ doc
|
|_ _ _ _ _ index.rst
|_ _ _ _ _ project.rst
|_ _ _ _ _ contact.rst
|_ _ _ _ _ api
|
|_ _ _ _ _ _ _ api.rst
|_ _ _ _ _ _ _ includes.rst
|_ _ _ _ _ _ _ utils.rst
I am using Sphinx to generate the documentation. Everything related to sphinx is in doc directory.
My index.rst:
.. InfoRescue documentation master file, created by
sphinx-quickstart on Sun Sep 15 13:52:12 2013.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to InfoRescue's documentation!
======================================
Contents:
========
.. toctree::
:maxdepth: 2
project
api/api
contact
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
api.rst:
InfoRescue API
**********
.. toctree::
:glob:
:maxdepth: 1
**
Now inside the utils there are to .py files. Both of these files contain no class and direct code, both only contain functions. To document a function I can use .. autofunction:: utils.u1.functionName. This is working correctly but I have to write like this for every function. Is there any short way to simply include all the functions?
Suppose both files in includes directory contain no class and function only some (direct) code. How to generate document for it i.e. which auto-directive to use ?
Also both the init.py files inside the utils and includes directory are empty. I made those two so that I can access the files inside those directory from .rst files. Is there any other approach so that I don't have to create _init_.py files?

There is a default extension for Sphinx called autosummary which has the ability to scan your source code and automatically generate the Sphinx input files containing the necessary autofunction directives.

For the (direct) code, you need to provide that documentation in the first docstring of the file.

Presence of the "__init__.py" file marks the directory as a Python package. You don't need to do that for Sphinx. Instead, you can place the directory contents on the Python path by editing the "src/doc/conf.py" file, adding line(s) after the "import sys, os" line such as:
sys.path.insert(0, os.path.abspath(os.path.join('..', '..', 'utils')))
sys.path.insert(0, os.path.abspath(os.path.join('..', '..', 'includes')))
Of course, if you place docstrings into both "utils/__init__.py" and "includes/__init__.py" and try to document them both using Sphinx with these path additions, then you will have to do more work.

autosummary generates a table
You may rather wish to use something like automodule in your .rst file:
.. automodule:: i1
:members:

Improving the answer above of #BarryPie, and having this problem of having to add all the sys.path.insert for all the subpackages, I'm using this code in my conf.py :
for root, dirs, files in os.walk('../../src'): # path to my source code
if '__pycache__' not in root: #__pycache__ folder excluded
sys.path.insert(0, os.path.abspath(root))
and all the subpackages are imported as requested.

WIA 2.0 HP ScanJet 7650 specific problems

I'm having WIA 2.0 problems on Windows 7. On windows XP with wia 2.0 (version from Windows 7 everything works ok)
One device, HP ScanJet 7650 refuses to have its scanning resolution set to anything above 100. When I try to set either of WIA properties
6147 _ Horizontal Resolution
6148 _ Vertical Resolution
to anything above 100 I get:
A first chance exception of type
'System.ArgumentException' occurred
Value does
not fall within the expected range.
After that, value of property is 850 (?) and scanner ignores it and scans at 100 dpi
On this same WIA 2.0 and Lexmark X340 MFP I can set scanning resolution without any problems.
Using the same scanner (HP ScanJet 7650) on WIA 1.0 I had no problems. Also, scanning from this scanner using Windows scan applet (from Devices and printers) it can scan in DPIs well above 100. So, I must be doing something wrong.
Here is complete list of properties available on WIA 2.0 for HP ScanJet 7650:
4098 _ Item Name
4099 _ Full Item Name
4101 _ Item Flags
4120 _ Color Profile Name
6154 _ Brightness
6155 _ Contrast
71692 _ Private Highlight Level
71694 _ Private Midtone Level
71693 _ Private Shadow Level
71695 _ Private Gamma
71699 _ Private Saturation
71696 _ Private Hue X
71697 _ Private Hue Y
71698 _ Private Sharpen Level
6159 _ Threshold
6147 _ Horizontal Resolution
6148 _ Vertical Resolution
71687 _ Private Default Resolution
71688 _ Private Quality Resolution
6149 _ Horizontal Start Position
6150 _ Vertical Start Position
6151 _ Horizontal Extent
6152 _ Vertical Extent
4112 _ Pixels Per Line
4113 _ Bytes Per Line
4114 _ Number of Lines
4116 _ Item Size
4118 _ Minimum Buffer Size
6146 _ Current Intent
4103 _ Data Type
4104 _ Bits Per Pixel
4110 _ Bits Per Channel
4109 _ Channels Per Pixel
4111 _ Planar
4107 _ Compression
4108 _ Media Type
4106 _ Format
4105 _ Preferred Format
4123 _ Filename extension
4102 _ Access Rights
6153 _ Photometric Interpretation
71686 _ Private Source Depth
71683 _ Private Preview
71689 _ Private Exposure Method
71722 _ Private Smoothing
71723 _ Private Color Enhanced
71685 _ Private TMA Method
71701 _ Private Defaults
71702 _ 71702
71703 _ 71703
71704 _ 71704
71711 _ 71711
71712 _ 71712
71705 _ 71705
71706 _ 71706
71707 _ 71707
71708 _ 71708
71709 _ 71709
71710 _ 71710
71721 _ 71721
71713 _ 71713
71714 _ 71714
71715 _ 71715
71716 _ 71716
71717 _ 71717
71718 _ 71718
71719 _ 71719
71720 _ Private Property

Have you looked at this question?
Try set WiaImageBias.MaximizeQuality

The Property Object has two properties, SubTypeMax and SubTypeMin, that you may want to check before setting the value property.

XDocument producing invalid XML

I have this code
Dim doc As XDocument = New XDocument( _
New XDeclaration("1.0", "utf-8", "yes"), _
New XElement("transaction", _
New XElement("realm", wcRealm), _
New XElement("password", wcPassword), _
New XElement("confirmation_email", wcConfEmail), _
New XElement("force_subscribe", wcSubscribe), _
New XElement("optout", wcOptOut), _
New XElement("command", _
New XElement("type", wcType), _
New XElement("list_id", wcListId), _
From trans As DataRow In table.Rows _
Order By trans("last") _
Select New XElement("record", _
New XElement("email", trans("email")), _
New XElement("first", trans("first")), _
New XElement("last", trans("last")), _
New XElement("company", trans("company")), _
New XElement("address_1", trans("address_1")), _
New XElement("address_2", ""), _
New XElement("city", trans("city")), _
New XElement("state", trans("state")), _
New XElement("zip", trans("zip")), _
New XElement("country", trans("country")), _
New XElement("phone", trans("phone")), _
New XElement("fax", trans("fax")), _
New XElement("custom_source", trans("source")), _
New XElement("custom_vmail_expire_date", "")))))
'' # Save XML document at root.
doc.Save("c:\vj" & saveDate & ".xml")
which works fine a produces the proper XML file BUT I run it through a validator and get this error.
Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as us-ascii (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: ascii "\xEF" does not map to Unicode
What could be causing that?

The problem is that you have an UTF-8 file that you are trying to validate as ASCII. Those 2 bytes are the unicode headers.

The validator doesn't support UTF8/UCS-2. Either save the file as ascii (which will break, as the xml says it's utf-8) or find a validator that was created within the last 5 years.
EDIT:
Note: If you want to save it as US Ascii, use new XDeclaration("1.0", "us-ascii", "yes")

The file is saved as UTF-8 with the byte-order-marker character at the start (this character begins with the octet 0xEF).
You validator for some reason seems not to like this character. Strictly speaking this character is whitespace and it is invalid to have whitespace preceeding the XML declaration. However, most parsers I know will skip it as being simply an indicator of unicode encoding and not treat it as content.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Figuring out different CoNLL format - stanford-nlp

I believe that Semafor can generate its own conll file in the format that it needs. We use Stanford Core NLP just to split a document into sentences per line, and then use Semafor itself to generate the conll file.

Related

How to change comments in excel file with go

How to Make _ _ LINE _ _ and _ _ FILE_ _ run in perl? [closed]

Using sphinx for very basic structure

WIA 2.0 HP ScanJet 7650 specific problems

XDocument producing invalid XML

Categories

Resources