Getting Java Heap Space error while using Carrot2 - carrot2

I have all my search result formatted in XML format and am trying to run lingo algorithm in the Carrot2 workbench and am continuously running into Java heap space error.
The XML is formatted in a way that Carrot2 uses. I am running Carrot2 workbench on a MAC machine.
Is there a way:
To increase the Java Heap Space for the application like some setting?
Is there a limitation to the documents that I can pass to the application for clustering? (I have around 10k documents)**
An internal error occurred during: "Searching for 'gene therapy'...". Java heap space

To set the maximum Java heap space, you can pass suitable -Xmx JVM parameter value during start:
carrot2-workbench -vmargs -Xmx256m
Carrot2 is designed for small to medium collections of documents (a few hundred). This fairly depends on the algorithm. See "Got java heap size error when trying to cluster 15980 documents via carrot2workbench" for more details.

Related

On downloading 1GB file, Uncaught Exception java.lang.OutOfMemoryError in JMeter

I'm using Jmeter 5.2.1 with HTTP Request to capture performance of a 1GB file download. On executing the script in Non-GUI mode, am receiving below error -
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid7536.hprof ...
Heap dump file created [555071109 bytes in 17.048 secs]
Uncaught Exception java.lang.OutOfMemoryError: Java heap space in thread Thread[Thread Group 1-1,5,main]
jmeter.bat has property
"set HEAP=-Xms1g -Xmx1g -XX:MaxMetaspaceSize=256m"
I don't have any listeners in the jmx file and using non-GUI mode to test. May i know what changes i need to do in order to have successsful response
If you're downloading a file it is temporarily stored in memory, it means that you need at least 1 GB of HEAP for each JMeter concurrent thread (virtual user) in order to be able to retrieve the file.
Edit the jmeter.bat file and change the upper heap limit -Xmx to a higher value. For 1 virtual user giving 2 gigabytes will be sufficient like:
set HEAP=-Xms1g -Xmx2g -XX:MaxMetaspaceSize=256m
JMeter restart will be required to pick up the change.
References:
How to control Java heap size (memory) allocation (xmx, xms)
9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
Ideally JMeter's heap consumption should be between 30% and 80% of JVM heap, the actual number varies depending on the nature of your test, just monitor the heap usage using i.e. JVisualVM to ensure that JMeter doesn't spend too much time doing garbage collection
In addition to Dmitri'suggestions, I would add that this method is ok for one or may two downloads, but it is not scalable at all if want to test ten or more parallel downloads.
IMHO, a better solution would be to check "Save response as MD5 hash?" in the HTTP controller.
If this is selected, then the response is not stored in the sample result. Instead, "the MD5 hash of the data is calculated and stored instead" (from the manual)
This is intended for testing large amounts of data.
The advantage is that the data is not stored in memory.

Neo4j in Docker - Max Heap Size Causes Hard crash 137

I'm trying to spin up a Neo4j 3.1 instance in a Docker container (through Docker-Compose), running on OSX (El Capitan). All is well, unless I try to increase the max-heap space available to Neo above the default of 512MB.
According to the docs, this can be achieved by adding the environment variable NEO4J_dbms_memory_heap_maxSize, which then causes the server wrapper script to update the neo4j.conf file accordingly. I've checked and it is being updated as one would expect.
The problem is, when I run docker-compose up to spin up the container, the Neo4j instance crashes out with a 137 status code. A little research tells me this is a linux hard-crash, based on heap-size maximum limits.
$ docker-compose up
Starting elasticsearch
Recreating neo4j31
Attaching to elasticsearch, neo4j31
neo4j31 | Starting Neo4j.
neo4j31 exited with code 137
My questions:
Is this due to a Docker or an OSX limitation?
Is there a way I can modify these limits? If I drop the requested limit to 1GB, it will spin up, but still crashes once I run my heavy query (which is what caused the need for increased Heap space anyway).
The query that I'm running is a large-scale update across a lot of nodes (>150k) containing full-text attributes, so that they can be syncronised to ElasticSearch using the plug-in. Is there a way I can get Neo to step through doing, say, 500 nodes at a time, using only cypher (I'd rather avoid writing a script if I can, feels a little dirty for this).
My docker-compose.yml is as follows:
---
version: '2'
services:
# ---<SNIP>
neo4j:
image: neo4j:3.1
container_name: neo4j31
volumes:
- ./docker/neo4j/conf:/var/lib/neo4j/conf
- ./docker/neo4j/mnt:/var/lib/neo4j/import
- ./docker/neo4j/plugins:/plugins
- ./docker/neo4j/data:/data
- ./docker/neo4j/logs:/var/lib/neo4j/logs
ports:
- "7474:7474"
- "7687:7687"
environment:
- NEO4J_dbms_memory_heap_maxSize=4G
# ---<SNIP>
Is this due to a Docker or an OSX limitation?
NO Increase the amount of available RAM to Docker to resolve this issue.
Is there a way I can modify these limits? If I drop the requested
limit to 1GB, it will spin up, but still crashes once I run my heavy
query (which is what caused the need for increased Heap space
anyway).
The query that I'm running is a large-scale update across a lot of
nodes (>150k) containing full-text attributes, so that they can be
syncronised to ElasticSearch using the plug-in. Is there a way I can
get Neo to step through doing, say, 500 nodes at a time, using only
cypher (I'd rather avoid writing a script if I can, feels a little
dirty for this).
N/A This is a NEO4J specific question. It might be better to seperate this from the Docker questions listed above.
3.The query that I'm running is a large-scale update across a lot of nodes (>150k) containing full-text attributes, so that they can be syncronised to ElasticSearch using the plug-in. Is there a way I can get Neo to step through doing, say, 500 nodes at a time, using only cypher (I'd rather avoid writing a script if I can, feels a little dirty for this).
You can do this with the help of apoc plugin for neo4j, more specifically apoc.periodic.iterate
or apoc.periodic.commit
.
If you will use apoc.periodic.commit your first match should be specific like in example you mark which nodes have you already synced, because it sometimes fall in the loop:
call apoc.periodic.commit("
match (user:User) WHERE user.synced = false
with user limit {limit}
MERGE (city:City {name:user.city})
MERGE (user)-[:LIVES_IN]->(city)
SET user.synced =true
RETURN count(*)
",{limit:10000})
If you use apoc.periodic.iterate you can run it in parallel mode:
CALL apoc.periodic.iterate(
"MATCH (o:Order) WHERE o.date > '2016-10-13' RETURN o",
"with {o} as o MATCH (o)-[:HAS_ITEM]->(i) WITH o, sum(i.value) as value
CALL apoc.es.post(host-or-port,index-or-null,type-or-null,
query-or-null,payload-or-null) yield value return *", {batchSize:100, parallel:true})
Note that there is no need for second MATCH clause and apoc.es.post is a function for apoc that can send post requests to elastic search.
see documentation for more info

Hadoop Error: Java heap space

So, after seeing the a percent or so of running the job I get an error that says, "Error: Java heap space" and then something along the lines of, "Application container killed"
I am literally running an empty map and reduce job. However, the job does take in an input that is, roughly, about 100 gigs. For whatever reason, I run out of heap space. Although the job does nothing.
I am using default configurations and it's on a single machine. It is running on hadoop version 2.2 and ubuntu. The machine has 4 gigs of ram.
Thanks!
//Note
Got it figured out.
Turns out I was setting the configuration to have a different terminating token/string. The format of the data had changed, so that token/string no longer existed. So it was trying to send all 100gigs into ram for one key.

Unable to execute Sonar: Caused by: Java heap space

I am using Sonar Runner 2.2 and set SONAR_RUNNER_OPTS=-Xmx8000m, but I am getting the following error:
Final Memory: 17M/5389M
INFO: ------------------------------------------------------------------------
ERROR: Error during Sonar runner execution
ERROR: Unable to execute Sonar
ERROR: Caused by: Java heap space
How can this be?
I had the same problem and found a very different solution, perhaps because I don't believe any of the previous answers / comments. With 10 million lines of code (that's more code than is in an F16 fighter jet), if you have a 100 characters per line (a crazy size), you could load the whole code base into 1GB of memory. Simple math. Why would 8GB of memory fail?
Answer: Because the community Sonar C++ scanner seems to have a bug where it picks up ANY file with the letter 'c' in it's extension. That includes .doc, .docx, .ipch, etc. Hence, the reason it's running out of memory is because it's trying to read some file that it thinks is 300mb of pure code but really it should be ignored.
Solution: Find the extensions used by all of the files in your project (see here).
Then add these other extensions as exclusions in your sonar.properties file:
sonar.exclusions=**/*.doc,**/*.docx,**/*.ipch
Then set your memory limits back to regular amounts:
%JAVA_EXEC% -Xmx1024m -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=128m %SONAR_RUNNER_OPTS% ...
If you allow the heap space to grow up to 8000m, this does not mean that you will always have enough physical memory to get there as you have other processes running on your operating system that also consume memory. For instance, if you have "only" 8GB of RAM on your machine, it's likely that the heap space will never be able to reach the maximum you've set.
BTW, I don't know what you're trying to analyse but I've never seen anyone requiring so much memory to analyse a project.
I faced the same problem while running test cases. With the help of Visuval VM, analysed the min and max memory allocating to PermGen during test cases execution and found that, its allocating 80MB to PermGen. Hence the same can be managed through pom.xml in properties section as follows.
<properties>
<argLine>-XX:PermSize=256m -XX:MaxPermSize=256m</argLine>
</properties>
This <argLine> tag, we can use either in <maven-surefire-plugin> or in <properties>. The advantage of using in section is, the same configuration can be utilized by both test cases and sonar. Please find reference here.

Page error 0xc0000006 with VC++

I have a VS 2005 application using C++ . It basically importing a large XML of around 9 GB into the application . After running for more than 18 hrs it gave an exception 0xc0000006 In page error. THe virtual memory consumed is 2.6 GB (I have set the 3GB) flag.
Does any one have a clue as to what caused this error and what could be the solution
Instead of loading the whole file into the memory you can use SAX parsers to load only a part of the file to the memory.
9Gb seems overly large to read in. I would say that even 3Gb is too large in one go.
Is your OS 64bit?
What is the maximum pagefile size set to?
How much RAM do you have?
Were you running this in debug or release mode?
I would suggest that you try to reading the XML in smaller chunks.
Why are you trying to read in such a large file in one go?
I would imagine that your application took so long to run before failing as it started to copy the file into virtual memory, which is basically a large file on the hard disk. Thus the OS is reading the XML from the disk and writing it back onto a different area of disk.
**Edit - added text below **
Having had a quick peek at Expat XML parser it does look as if you're running into problems with stack or event handling, most likely you are adding too much to the stack.
Do you really need 3Gb of data on the stack? At a guess I would say that you are trying to process a XML database file, but I can't imagine that you have a table row that is so large.
I think that really you should use it to search for key areas and discard what is not wanted.
I know nothing other than what I have just read about Expat XML Parser but would suggest that you are not using it in the most efficient manner.

Resources