Memory problems when running stanford nlp (stanford segmentator) - stanford-nlp

I downloaded the stanford segmentator and I am following the instructions but I am getting a memory error, the full message is here:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter.shapeOf(Sighan2005DocumentReaderAndWriter.java:230)
at edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter.access$300(Sighan2005DocumentReaderAndWriter.java:49)
at edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter$CTBDocumentParser.apply(Sighan2005DocumentReaderAndWriter.java:169)
at edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter$CTBDocumentParser.apply(Sighan2005DocumentReaderAndWriter.java:114)
at edu.stanford.nlp.objectbank.LineIterator.setNext(LineIterator.java:42)
at edu.stanford.nlp.objectbank.LineIterator.<init>(LineIterator.java:31)
at edu.stanford.nlp.objectbank.LineIterator$LineIteratorFactory.getIterator(LineIterator.java:108)
at edu.stanford.nlp.wordseg.Sighan2005DocumentReaderAndWriter.getIterator(Sighan2005DocumentReaderAndWriter.java:86)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObjectHelper(ObjectBank.java:435)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObject(ObjectBank.java:419)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:412)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:250)
at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:45)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1193)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1137)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1091)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3023)
Before executing the file I tried increasing the heap space by doing export JAVA_OPTS=-Xmx4000m. I also tried splitting the file but still had the same error - I split the file to 8 chunks, so each had around 15MB each. What should I do to adjust the memory problem?

The segment.sh script that ships with the segmenter limits the memory to 2G, which is probably the cause of the error. Editing that file will hopefully fix the issue for you.

Related

Maven Project getting some Heap Size Error, Maven or Java?

when i run my programm and add a 10MB Excel and calculate something i get this error:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.HashMap.resize(HashMap.java:705)
at java.base/java.util.HashMap.putVal(HashMap.java:630)
at java.base/java.util.HashMap.put(HashMap.java:613)
at java.base/java.util.HashSet.add(HashSet.java:221)
at java.base/java.util.Collections.addAll(Collections.java:5593)
at org.logicng.formulas.FormulaFactory.or(FormulaFactory.java:532)
at org.logicng.formulas.FormulaFactory.naryOperator(FormulaFactory.java:372)
at org.logicng.formulas.FormulaFactory.naryOperator(FormulaFactory.java:359)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:130)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:129)
at org.logicng.formulas.NAryOperator.restrict(NAryOperator.java:129)
at
org.logicng.transformations.qe.ExistentialQuantifierElimination.apply(ExistentialQuantifierElimination.java:74)
at ToPue.calculatePueForPos(ToPue.java:59)
at PosvHandler.calculatePosv(PosvHandler.java:21)
at PueChecker$EqualBtnClicked.actionPerformed(PueChecker.java:192)
at java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
at java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
at java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
at java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
at java.desktop/javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:279)
at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636)
at java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
at java.desktop/java.awt.Component.processEvent(Component.java:6401)
at java.desktop/java.awt.Container.processEvent(Container.java:2263)
at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321)
at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307)
at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764)
I searched everything and tested to set the maven memory higher.
I set the heap space for java to 12gb
The maximum usage of the heapspace is 201MB. Why isn´t it using the whole memory?
Can somebody help

Julia 1.1 with JLD HDF5 package and memory release in Windows

I'm using Julia 1.1 with JLD and HDF5 to save a file onto the disk, where I met a couple of question about the memory usage.
Issue 1:
First, I defined a 4 GB matrix A.
A = zeros(ComplexF64,(243,243,4000));
When I type the command and look at windows task manager:
A=nothing
It took several minutes for Julia to release those memory back to me. Most of the time, (In Task manager) Julia just doesn't release the memory usage at all, even though the command returned results saying that A occupied 0 bytes instantly.
varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––
A 0 bytes Nothing
Base Module
Core Module
InteractiveUtils 162.930 KiB Module
Main Module
ans 0 bytes Nothing
Issue 2:
Further, when I tried to use JLD and HDF5 to save file onto the disk. This time, the task manager told me that, when using the save("test.jld", "A", A) command, an extra 4GB memory was used.
using JLD,HDF5
A = zeros(ComplexF64,(243,243,4000));
save("test.jld", "A", A)
Further, after I typed
A=nothing
Julia won't release the 8 GB memory back to me.
Finding 3:
An interesting thing I found was that, if I retype the command
A = zeros(ComplexF64,(243,243,4000));
The task manager would told me the cashed memory was released, and the total memory usage was again only 4GB.
Question 1:
What's going on with memory management in Julia? Was it just a mistake by Windows, or some command in Julia? How to check the Julia memory usage instantly?
Question 2:
How to tell the Julia to instantly release the memory usage?
Question 3:
Is there a way to tell JLD package not use those extra 4GB meomory?
(Better, could someone tell me how to create A directly on the disk without even creating it in the memory? I knew there's memory mapped I/O in JLD package. I have tried it, but it seemed to require me to create matrix A in the memory and save A onto the disk first, before I could recall the memory mapped A again. )
This is a long question, so thanks ahead!
Julia uses garbage collector to de-alocate the memory. Usually a garbage collector does not run after every line of code but only when needed.
Try to force garbage collection by running the command:
GC.gc()
This releases memory space for unreferenced Julia objects. In this way you can check whether the memory actually has been released.
Side note: JLD used to be somewhat not-always-working (I do not know the current status). Hence you first consideration for non-cross-platform object persistence always should be the serialize function from the in-built Serialization package - check the documentation at https://docs.julialang.org/en/v1/stdlib/Serialization/index.html#Serialization.serialize

Xilinx FPGA Error :FPGA Programming Failed due to errors while initializing bitstream

I have a problem in loading my program in FPGA ,I got this error:
FATAL:Data2MEM:44 - Out of memory allocating 'getMemory' object of 960000000 bytes.
Total memory already in use is 14823 bytes.
Source file "../s/DeviceTableUtils.c", line number 5692.
FPGA Programming Failed due to errors while initializing bitstream.
This just happened to me today, with an error message starting with:
FATAL:Data2MEM:44 - Out of memory allocating 'getMemory' object of 960000000 bytes.
The solution for me, was to simply reboot my computer (this has been mentioned in some places on the web too, including this Xilinx Forums thread).

umfpack: An error occurred: numeric factorization: not enough memory

I'm having a problem while running a Scilab code.
As title suggests, I get the error numeric factorization: not enough memory, related to the umfpack function.
In task manager I see a memory usage around 3GB (my system has 16GB).
Can anyone help me with this issue?
My guess is that you are attempting to use umfpack with matrices which are too big, and then it is unable to allocate the memory required (probably more than the 13GB you still have available).
See also https://scicomp.stackexchange.com/a/8972/21926

How to avoid Parquet MemoryManager exception

I'm generating some parquet (v1.6.0) output from a PIG (v0.15.0) script. My script takes several input sources and joins them with some nesting. The script runs without error but then during the STORE operation I get:
2016-04-19 17:24:36,299 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 249 Succeeded: 220 Running: 0 Failed: 1 Killed: 28 FailedTaskAttempts: 43, diagnostics=Vertex failed, vertexName=scope-1446, vertexId=vertex_1460657535752_15030_1_18, diagnostics=[Task failed, taskId=task_1460657535752_15030_1_18_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:parquet.hadoop.MemoryManager$1: New Memory allocation 134217728 exceeds minimum allocation size 1048576 with largest schema having 132 columns
at parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:309)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:81)
at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:398)
...
The above exception was thrown when I executed the script using -x tez but I get the same exception when using mapreduce. I have tried to increase parallelization using SET default_parallel as well as adding an (unneccessary w.r.t. my real objectives) ORDER BY operation just prior to my STORE operations to ensure PIG has an opportunity to ship data off to different reducers and minimize the memory required on any given reducer. Finally, I've tried pushing up the available memory using SET mapred.child.java.opts. None of this has helped however.
Is there something I'm just missing? Are there known strategies for avoiding the issue of one reducer carrying too much of the load and causing things to fail during write? I've experienced similar issues writing to avro output that appear to be caused by insufficient memory to execute the compression step.
EDIT: per this source file the issue seems to boil down to the fact that memAllocation/nCols<minMemAllocation. However the memory allocation seems unaffected by the mapred.child.java.opts setting I tried out.
I solved this finally using the parameter parquet.block.size. The default value (see source) is big enough to write a 128-column wide file, but no bigger. The solution in pig was to use SET parquet.block.size x; where x >= y * 1024^2 and y is the number of columns in your output.

Resources