CoreNLP: How can I get only collapesed dependencies? - stanford-nlp

I'm parsing over 60,000 sentences with CoreNLP to get dependencies relations.
Because I only need collapsed dependencies, other dependencies types -- basic and collapsed-cc-processed -- are redundant for my own use, and make it hard to build my own codes, which take xml-output as input.
Can I get only collapsed dependencies?
If so, please let me know.
Thanks.

There is currently no way to do this. Computing the additional representations take very little computation, and so they are always reported. They should be marked specially in the XML output, however; hopefully it's not hard to filter the correct representation in the downstream code.

Related

Autofix order of selectors

We use SonarQube against our application. One of the SonarQube rules says:
Selectors of lower specificity should come before overriding selectors of higher specificity
The details are here. As my application has many violations, changing the order by hand isn't really feasible. I'm wondering if there's a way to use scss-lint, stylelint or something else in a "fix" mode that could change the order of the selectors. I looked but couldn't find anything in stylelint. Maybe it can't safely be done automatically, as changing the order could affect specificity and therefore change the application behaviour...
As I personal! know there is no Linter which provide that. (I am curious about it.) But just some thoughts about the need of following that 'rule':
Indeed: writing SASS/CSS the way Selectors with lower specifity comes first is a good practicse. The CSS structure becomes more readable and it is easier to build up your code structure as there is a clearer systematic in your head (and the code).
But just up from the mechanic CSS works there is REALLY NO NEED to do it this way. The code simply doesn't become safer doing so or less safe and the pages don't load slower not doing it. That is what the mechanic of specifity has been done for: as of the specifity not the order of the selectors counts and you are able to write your code in the order you need it. Only if the specifity is the same the order counts.
So, maybe this rule leads to 'better' code. But: NOT ALL RULES NEEDS TO BE FULLFILLED. Not all rules Google tries to establish with their best practice rules they offer in their browser, nor all rules other analysis tools provide needs to be followed.
And if not in this project as it needs resources to correct it ... it maybe could but has not be a target for next project ;-)

How to verify and validate parsed Google Protobuf v2 file

First, I'll just couch this in the acknowledgement, yes, I am aware of protoc, but I've got a specific requirement to extrapolate some specialized target language artifacts based on a .proto file parser outcome.
That being established, I've already got the parser itself working. I am working on resolving imported .proto dependencies. Not a terribly difficult endeavor on the surface, in and of itself.
The next steps after that, I think, are to perform a kind of "transitive linkage", as I've learned, but I am curious what I should be aware of. Prima facie, I think I should be collating a set (most likely map) of element paths to field numbers, as well as collating the reserved as well as extensions, then perhaps verifying as I traverse the .proto dependency tree.
However, I'd like to get an idea of others' experience, guidance, feedback, along these lines.
For what I'm wanting to accomplish, I do not think this verification step needs to be that elaborate, only enough to rule out invalid .proto, etc.
Oh. last but not least, I need to handle this for Protobuf v2 language spec.

Boost Spirit Qi : Is it suitable language/tool to analyse/cut a "multiline" data file?

I want to apply various operations to data files : algebra of sets, statistics, reporting, changes. But the format of the files is far from code examples and a bit weird. There are differents sorts of items, items type, and some of them are put together as a collection. There is a simplistic example below.
I'm new in boost::spirit and I have tried coding to split the items and get basic informations (name, version, date) required for most of treatments. Eventually it seems tricky for me. Is the problem my lack of skills or boost::spirit is not suitable to this format?
Studying boost::spirit is not a waste of time, I am sure to use it later. But I didn't find examples of code like mine, I may not go the right way.
>>>process_type_A
//name(typeA_1)
//version(A.1.99)
//date(2016.01.01)
//property1 "pA11"
//property2 "pA12"
//etc_A_1 (thousand of lines - a lot are "multiline" and/or mulitline sub-records)
<<<process_type_A
>>>process_type_A
//name(typeA_2)
//version(A.2.99)
//date(2016.01.02)
//property1 "pA21"
//property2 "pA22"
//etc_A_2 (hundred or thousand of lines)
<<<process_type_A
>>>process_type_B
//name(typeB_1)
//version(B.1.99)
//date(2016.02.01)
//property1 "pB11"
//property2 "pB12"
//etc_B_1 (hundred or thousand of lines)
<<<process_type_B
>>>paramset_type_C
//>>paramlist
////name(typeC_1)
////version(C.1.99)
////date(2016.03.01)
////property1 "pC11"
////property2 "pC12"
////etc_C_1 (hundred or thousand of lines)
//<<paramlist
//>>paramlist
////name(typeC_2)
////version(C.2.99)
////date(2016.04.01)
////property1 "pC21"
////property2 "pC22"
////etc_C_2 (hundred or thousand of lines)
//<<paramlist
<<<paramset_type_C
Code::Blocks
Boost 1.60.0
GCC Compiler on Windows and Linux
I think #Orient is right: regex w/captures is enough here.
However, Spirit has the upside of coming without a linker dependency. Here's some approaches (using seek[] and raw[]) for inspiration:
Boost spirit revert parsing
rule to extract key+phrases from a text document
Parsing text file with binary envelope using boost Spririt (binary content)
much more involved logic: How to implement #ifdef in a boost::spirit::qi grammar?
Note that Spirit X3 (still experimental) also has a seek[] directive and it will compiler much faster.
The main advice I would give about Qi is that it is a very powerful and flexible tool for parsing. You can define quite complicated, possibly recursive structures, using boost::variant, boost::optional, etc., and associate these types with qi rules and it seemingly magically does the right thing, giving you a nice AST for your data.
The biggest sources of difficulty in my (limited) experience are when you try to make it do more than that and also process the data. It's sometimes tempting to try to "eagerly" do some processing at the same time that you are parsing the data, often in a semantic action or something. Don't do it! It usually makes things harder to read in the end, a bit harder to debug, and sometimes you can be surprised what will happen if the grammar has to backtrack across your semantic action which it already executed.
qi should work great if you can write a nice grammar for your data. If you can't write an unambiguous grammar, you might be able to use qi::eps to make it parseable but you don't want to have to do that too often IMO. I don't think "hundreds or thousands" of items will pose any particular problem.
Right now the question is rather opinion-oriented -- if you can post a more complete description of the data format you have, or better, a complete code example which is failing, it might make it easier to give precise answers.

Java Bytecode manipulation libraries

I am starting to work on a project and for one of the tasks I need to analyze the source code in order to gather information about the classes and their methods. More specifically, for each method I need to know which internal attributes and external objects (references) it uses throughout the entire method body.
I discussed it with my supervisors and they think that Bytecode manipulation libraries is the way to go. I already looked at BCEL, ASM and Javassist but I'm not sure which one I need to use. Do they all provide access to the method body where I can see all the instructions and get the information I need?
Any advice would be appreciate it. Thank you!
If you really “need to analyze the source code”, then libraries which allow to inspect the bytecode are not the way to go.
Otherwise, you really need to define your task precisely. Either, you are about to analyze classes, regardless of whether you will look at their source code or byte code, or you want to analyze source code and consider doing it by compiling first, followed by analyzing the compiled result. In the latter case, you have to compare the effort of both steps with alternative solution, which may, e.g. incorporate direct source code analysis.
Parsing byte code is rather easy, easier than analyzing source code, which is the reason why bytecode is produced prior to the execution of Java programs. To answer your concrete question, yes, all three libraries offer you a way to analyze the instructions and associated information. Which one is the best to fit your needs, is a question that is beyond the scope of Stackoverflow.
Whether analyzing the byte code helps, depends on your exact requirements. When it comes to field and method access, you may precisely get most of them using that approach. Only inlined compile-time constants lack their origins. When it comes to type use, you have to consider that not every source code artifact has an existing counterpart in the byte code, e.g. widening casts produce no actual code and and local variables usually don’t have a declared type (debugging information aside), but only an implied type which depends on how they are actually used. They also have no information about Generics, unless debugging information has been included.

Complicated Algorithm - How to store rules separate from processing code?

I'm working on a project which will do some complicated analyzing on some user-supplied input. There will be 3 parts of the code:
1) Input supplied by user, such as keywords
2) Rules, such as if keyword 1 is repeated 3 times in keyword 5, do this, etc.
3) And the analyzing itself which executes the rules and processes the user input, and generates the output necessary based on the processing.
Naturally this will lead to a lot of spaghetti code and many, many if statements in the processing code. I want to avoid that, and keep the rules (i.e. the if statements) separately from the code which loops through the user input and generates the output.
How can I do that, i.e. what is the best way?
If you have enough rules that you want to externalize, you could try using a business rules engines, like Drools in Java.
A business rules engine is a software system that executes one or more business rules in a runtime production environment. The rules might come from legal regulation ("An employee can be fired for any reason or no reason but not for an illegal reason"), company policy ("All customers that spend more than $100 at one time will receive a 10% discount"), or other sources. (Wikipedia)
It could be a little bit overhead depending of what you're trying to do. In my company we're using such kind of tools for our quality analysis tool.
Store it in XML. Easy to parse and update.
I had designed a code generator, which can be controllable from a xml file.
For each command I had a entry in the xml. I was processing the node to generate the opcode for that command. Node itself contains the actions I need to do for getting the opcode. For some commands I had to look into database, all those things I had put in this xml file.
Well, i doubt that it is necessary to have hughe if statements if polymorphism is applied correctly.
Actually, you need a proper domain model for your rules. This goes somehow into the direction of the command pattern, depending on the complexitiy of your code maybe in combination with the state machine pattern.
Once you have your model, defining rules is instantiate them correctly.
This could be done by having an xml definition, which is parsed and transformed into your model. But the new modern and even more fancy way would be using DSLs. If you program in Java and have a certain freedom about your libraries, this would be a proper use case for Embedded DSLs with Groovy. Basically you would need a Builder which constructs your model, that's all.
You always can implement factory that will create certain strategies according to passed parameters. And then you will use those strategies in your code without any if.
If it's just detecting keywords, a finite state machine or similar. If it's doing more, then other pattern matching systems, such as rules engines.
Adding an embedded scripting language to your application might help. The rules would then be expressed in scripts, executed by the applications on processing.
The idea is that scripts are easy to change and contain high level logic that will be executed by your application in details.
There are a lot of scripting languages available to do this : lua, Python, Falcon, squirrel, angelscript, etc.
Have a look at rule engines!
The approach from Lars may also be arguable.

Resources