Stanford NLP Parser Model Jar too large - stanford-nlp

I use maven for managing dependencies. I need to use the Stanford NLP Parser to get universal dependencies for english sentences. I'm using the edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz model. My pom.xml contains the following
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-parser</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-parser</artifactId>
<version>3.6.0</version>
<classifier>models</classifier>
</dependency>
By adding the models dependency, the jar size increases by around 300MB. I need to make sure that the jar is as small as possible. Is there any way to handle this in maven ?

You can make the jar smaller by not including the models, and referencing them from elsewhere (i.e., specify a custom path for the property parse.model), but in general if you'd like to produce parse trees, you need to have the parser model accessible somewhere. CoreNLP includes it in the model jar by default, to make it easier to run the code independent of your particular directory structure.
The other option is to run the CoreNLP Server, and then you only need the client library (the server includes the models jar).

Related

how does runtime instance deal with provided scope dependency?

i am interested in understanding maven scopes during build life cycle.
i understood that working with a dependency, like this one :
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
<scope>provided</scope>
</dependency>
the javax.servlet-api jar will not be included in the final executable jar,
because the server is supposed to already possess the dependency.
ok, but how does it work ?
where is physically the util jar ? (javax.servlet-api.jar)
last question :
when we build a jar, how can we be sure the dependency can be tagged as provided scope,
so that the server already has it, for the run ?
It is actually up to you to make that sure. Maven does not know about it.
So if you know that your server provides certain dependencies (e.g. because you read the server manual), then you can mark them as provided.

What is correct Maven scope of findbugs annotations?

I want to use a library that has the following dependency:
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>annotations</artifactId>
<version>2.0.3</version>
</dependency>
I read that FindBugs is for static analysis of Java code, so I though it isn't necessary to include in application. Is it safe to exclude the jar with <scope>provided</scope> or with an <exclusion>...</exclusion>?
One reason to exclude it is that there is a company policy against (L)GPL licence.
Yes, you can safely exclude this library. It contains only annotations which do not need to be present at runtime. Take care to have them available for the FindBugs analysis, though.
Note that you should also list jsr305.jar, like this:
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>annotations</artifactId>
<version>3.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
<version>3.0.2</version>
<scope>provided</scope>
</dependency>
Both JARs are required to make these annotations work.
Check the most recent findbugs version in Maven Central.
FindBugs is provided under the LGPL, so there should not be any problems for your company. Also, you are merely using FindBugs; you are not developing something derived from FindBugs.
In theory, it should be entirely safe (as defined in the OP's clarifying comment) to exclude the Findbugs transitive dependency. If used correctly, Findbugs should only be used when building the library, not using it. It's likely that someone forgot to add <scope>test</scope> to the Findbugs dependency.
So - go ahead and try the exclusion. Run the application. Do you get classpath errors, application functionality related to the library that doesn't work, or see messages in the logs that seem to be due to not having Findbugs available? If the answer is yes I personally would rethink using this particular library in my application, and would try to find an alternative.
Also, congratulations on doing the classpath check up front! As a general practice, it is a great idea to do what you have done every time you include a library in your application: add the library, then check what other transitive dependencies it brings, and do any necessary classpath clean-up at the start. When I do this I find it makes my debugging sessions much shorter.

Removing a JAR from deployment

I am working on a Spring Project on a JBoss server. I am facing a situation where I think removing a jar from the deployment may solve all the issues. But I want to keep the JAR in compile time so that I can use it in the classes.
I want to know how I can remove a jar from deployment only but keep it during the run time.
Probably, this is not the question to be asked on SO, as a matter of fact, SO is all about Coders and its main intention is to help us in solving a problem.
So, anyone ? How I can do this ?
If you are using Maven, the you need to mark the dependency as provided.
For example
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>provided</scope>
</dependency>
If you are using Gradle the corresponding element would be providedCompile. The code would look like:
providedCompile 'log4j:log4j:1.2.17'
For an Eclipse based build, check out this SO post

Maven 3 modules

I am about to start a project. We'll be using Spring MVC, RestEasy, Spring Batch and Spring Security.
Does it make sense to have a module for each of these, e.g.:
Main_Project
---pom.xml
---Module_Project
---pom.xml
---Module_MVC
---pom.xml
---Module_Rest
---pom.xml
---Module_Batch
---pom.xml
---Module_Security
---pom.xml
Not sure what the best practice is?
Or, should I be using one module?
Thanks,
adi
At first sight it don't make sense.
Since you already know what technologies you need, I guess you already have an idea on how to organize your own code. And this is your own code organization that must drive your modules (not the frameworks you are using).
A general approach that can work (at least it can be a starting point to elaborate an architecture for a traditional web based application):
one module with your model (i.e. database layer, dao, persistent beans,...) - packaging jar
one module with your controllers (i.e. access to database layer, transaction management, business logic, ...) - packaging jar
one module (front layer) with your view files (if any) (jsp, ...) - packaging war
one module (front layer) with your webservices definition (if any) - packaging war
Ignore the frameworks. Split your modules until you can answer "no" to these 2 questions for each module:
"Am I mixing view/controller logic with business logic?"
"Am I mixing features?"
Remember to declare the frameworks in the parent pom.xml so the modules can share the exactly same dependencies.
Do not order your modules by framework. (Frameworks are dependencies that you add in your modules where you need them, maybe like this:
<project>
<groupId>com.ourproject</groupId>
<artifactId>myfeature</artifactId>
<version>0.0.1-SNAPSHOT</version>
...
<dependencies>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-webmvc</artifactId>
<version>3.2.2.RELEASE</version>
</dependency>
</dependencies>
</project>
There are many different approaches to how to organize your project.
The approach I am currently using organizes software along features. Each feature is then "subdivided" via good old java packages. This way I pack business logic, data access objects and specific resources, all belonging to a specific feature, into one module.
The upside is that you don't have to stretch yourself very hard looking for everything that belongs to the feature, allowing you to introduce or remove features as you wish.
The downside is that you'll have to pack objects that are used by all feature-modules (cross cutting concerns or even common parent-classes) into separate modules and add them as yet another dependendy in each of your modules.

Maven Learning Questions

In eclipse I've installed m2e plugin. I am trying to understand how do you know the names of properties for dependencies to add to pom file?
Say for instance how do you know which artifact id to use?
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>${org.aspectj-version}</version>
</dependency>
why here say aspectjrt in artifact id?
<dependency>
<groupId>javax.inject</groupId>
<artifactId>javax.inject</artifactId>
<version>1</version>
</dependency>
and here same as group id ??
is there any pattern?
say my project is missing
org.hibernate.Query;
org.hibernate.Session;
org.hibernate.SessionFactory;
and in maven dependencies folder I have hibernate-core-3.6.0.Final.jar which effectively contains everything except those 3.
Do you write these .pom yourself?
I am betting that for student project I will have to stick with manually adding hundreds of libraries... otherwise I will fail trying to learn it :-)
You know what groupId:artifactId:version (we call these coordinates) to use by which artifact you want. Most often, you find out which artifact you want by reading it in the project's documentation, especially for projects with a large number of artifacts, some of which might contain optional addons.
aspectjrt is short for AspectJ RunTime.
GroupId and artifactId are defined by the creators of the library in question. There's no universal pattern because there's no one central coordinator. There are some conventions that have evolved, though. Generally, the groupId is at least partly the reversed domain name, like the first part of a Java package name is: org.hibernate, org.apache, org.springframework... The artifactId distinctly identifies the role of that particular artifact in the group it belongs to, like spring-core, spring-tx, spring-jms, etc. You can get an idea of what groupId's and artifactId's look like by searching Maven Central for some of the libraries you know.
If you're missing org.hibernate.SessionFactory, then you don't have hibernate-core-3.6.0.Final on your classpath. If you have that on your classpath, then you're not missing the SessionFactory class. Those three classes you mentioned are most definitely in that artifact, as you can see from searching for the class in Central. If you still doubt, do a jar -tf hibernate-core-3.6.0.Final.jar and check out the contents yourself. I promise it has those classes.

Resources