SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs - tomcat7

we are testing solr 4.1 running inside tomcat 7 and java 7 with following options
JAVA_OPTS="-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump"
our source code looks like following:
/**** START *****/
int noOfSolrDocumentsInBatch = 0;
for(int i=0 ; i<5000 ; i++) {
SolrInputDocument solrInputDocument = getNextSolrInputDocument();
server.add(solrInputDocument);
noOfSolrDocumentsInBatch += 1;
if(noOfSolrDocumentsInBatch == 10) {
server.commit();
noOfSolrDocumentsInBatch = 0;
}
}
/**** END *****/
the method "getNextSolrInputDocument()" generates a solr document with 100 fields (average). Around 50 of the fields are of "text_general" type.
Some of the "test_general" fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type "text_general").
We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small?
What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit.
We are using the following settings:
server.commit() set waitForSearcher=true and waitForFlush=true
solrConfig.xml has following properties set:
directoryFactory = solr.MMapDirectoryFactory
maxWarmingSearchers = 1
text_general data type is being used as supplied in the schema.xml with the solr setup.
maxIndexingThreads = 8(default)
<autoCommit>
<maxTime>15000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are uploaded at following links.
http://s9.postimage.org/w7589t9e7/memorydump1.png
http://s7.postimage.org/p3abs6nuj/memorydump2.png
can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml
- useColdSearcher - true/false?
- maxwarmingsearchers- number
- spellcheck-on/off?
- omitNorms=true/false?
- omitTermFreqAndPositions?
- mergefactor? we are using default value 10
- java garbage collection tuning parameters ?

Related

Kafka Stream BoundedMemoryRocksDBConfig

I'm trying to understand how the internals of Kafka Streams works with respects to cache and RocksDB (state store).
KTable<Windowed<EligibilityKey>, String> kTable = kStreamMapValues
.groupByKey(Grouped.with(keySpecificAvroSerde, Serdes.String())).windowedBy(timeWindows)
.reduce((a, b) -> b, materialized.withLoggingDisabled().withRetention(Duration.ofSeconds(retention)))
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(timeToWaitForMoreEvents),
Suppressed.BufferConfig.unbounded().withLoggingDisabled()));
With the above portion of my topology, I'm consuming from a Kafka topic with 300 partitions. The application is deployed on OpenShift with a memory allocation of 4GB. I noticed the memory of the application constantly increasing until eventually an OOMKILLED occurs. After some research I've read that a custom RocksDB config was something I should implement, because the default size is too big for my application. Records first enter a cache (configured by CACHE_MAX_BYTES_BUFFERING_CONFIG and COMMIT_INTERVAL_MS_CONFIG) and then enter a state store.
public class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(1 * 1024 * 1024L, -1, false, 0);
private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(1 * 1024 * 1024L, cache);
#Override
public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
// These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
// These options are recommended to be set when bounding the total memory
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
tableConfig.setPinTopLevelIndexAndFilter(true);
tableConfig.setBlockSize(2048L);
options.setMaxWriteBufferNumber(2);
options.setWriteBufferSize(1 * 1024 * 1024L);
options.setTableFormatConfig(tableConfig);
}
#Override
public void close(final String storeName, final Options options) {
// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.
}
}
With each time window segment, three segments are created by default. If I'm consuming from 300 partitions, since 3 time window segments will be created for each, 900 instances of RocksDB are created. Is my understanding correct that the following is true?
Memory allocated in OpenShift / RocksDB instances => 4096MB / 900 => 4.55 MB
(WriteBufferSize * MaxWriteBufferNumber) + BlockCache + WriteBufferManager => (1MB * 2) + 1MB + 1MB => 4MB
Is the BoundedMemoryRocksDBConfig.java for each instance of RocksDB, or for all?
If you consume from a topic with 300 partitions and you use a segmented state store, i.e., you use a time window in the DSL, you will end up with 900 RocksDB instances. If you only use one Kafka Streams client, i.e., you do not scale out, all 900 RocksDB instances will end up on the same computing node.
The BoundedMemoryRocksDBConfig limits the memory RocksDB uses per Kafka Streams client. That means, if you only use one Kafka Streams client, BoundedMemoryRocksDBConfig limits the memory for all 900 instances.
Is my understanding correct that the following is true?
Memory allocated in OpenShift / RocksDB instances => 4096MB / 900 => 4.55 MB
(WriteBufferSize * MaxWriteBufferNumber) + BlockCache + WriteBufferManager => (1MB * 2) + 1MB + 1MB => 4MB
No, that is not correct.
If you pass the Cache to WriteBufferManager, the size needed for memtables are also counted against the cache (see footnote 1 in the docs of the BoundedMemoryRocksDBConfig and the RocksDB docs). So, the size that you pass to the cache is the limit for memtables and block cache. Since you pass the cache and the write buffer manager to all of your instances on the same computing node, all 900 instances are bounded by the size you pass to the cache. For example, if you specify a size of 4 GB the total memory used by all 900 instances (assuming one Kafka Streams client) is bounded to 4 GB.
Be aware that the size passed to the cache is not a strict limit. Although the boolean parameter in the constructor of the cache gives you the option to enforce a strict limit, the enforcement does not work if the write buffer memory is also counted against the cache due to a bug in the RocksDB version Kafka Streams uses.
With Kafka Streams 2.7.0, you will have the possibility to monitor RocksDB memory consumption with metrics that are exposed via JMX. Have a look at KIP-607 for more details.

How to partition Gobblin output to 30 min partitions?

We are planning to migrate from Camus to Gobblin. In Camus we were using below mentioned configs:
etl.partitioner.class=com.linkedin.camus.etl.kafka.partitioner.TimeBasedPartitioner
etl.destination.path.topic.sub.dirformat=YYYY/MM/dd/HH/mm
etl.output.file.time.partition.mins=30
But in Gobblin we have configs as:
writer.file.path.type=tablename
writer.partition.level=minute (other options: daily,hourly..)
writer.partition.pattern=YYYY/MM/dd/HH/mm
This creates directories on a minute level, but we need 30 min partitions.
Couldn't find much help in the official doc: http://gobblin.readthedocs.io/en/latest/miscellaneous/Camus-to-Gobblin-Migration/
Are there any other configs which can be used to achieve this?
Got a workaround by implementing a partitionerMethod inside custom WriterPartitioner :
While fetching the record level timestamp in the partitioner we just need to send the processed timestamp millis using below mentioned method.
public static long getPartition(long timeGranularityMs, long timestamp, DateTimeZone outputDateTimeZone) {
long adjustedTimeStamp = outputDateTimeZone.convertUTCToLocal(timestamp);
long partitionedTime = (adjustedTimeStamp / timeGranularityMs) * timeGranularityMs;
return outputDateTimeZone.convertLocalToUTC(partitionedTime, false);
}
Now partitions are getting generated at required time granularity.

Timeout and out of memory errors reading large table using jdbc drivers

I am attempting to read a large table into a spark dataframe from an Oracle database using spark's native read.jdbc in scala. I have tested this with small and medium sized tables (up to 11M rows) and it works just fine. However, when attempting to bring in a larger table (~70M rows) I keep getting errors.
Sample code to show how I am reading this in:
val df = sparkSession.read.jdbc(
url = jdbcUrl,
table = "( SELECT * FROM keyspace.table WHERE EXTRACT(year FROM date_column) BETWEEN 2012 AND 2016)"
columnName = "id_column", // numeric column, 40% NULL
lowerBound = 1L,
upperBound = 100000L,
numPartitions = 60, // same as number of cores
connectionProperties = connectionProperties) // this contains login & password
I am attempting to parallelise the operation, as I am using a cluster with 60 cores and 6 x 32GB RAM dedicated to this app. However, I still keep getting errors relating to timeouts and out of memory issues, such as:
17/08/16 14:01:18 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
....
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds}
...
17/08/16 14:17:14 ERROR RetryingBlockFetcher: Failed to fetch block rdd_2_89, and will not retry (0 retries)
org.apache.spark.network.client.ChunkFetchFailureException: Failure while fetching StreamChunkId{streamId=398908024000, chunkIndex=0}: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:125)
...
17/08/16 14:17:14 WARN BlockManager: Failed to fetch block after 1 fetch failures. Most recent failure cause:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
There should be more than enough RAM across the cluster for a table of this size (I've read in local tables 10x bigger), so I have a feeling that for some reason the data read may not be happening in parallel? Looking at the timeline in the spark UI, I can see that one executor hangs and is 'computing' for very long periods of time. Now, the partitioning column has a lot of NULL values in it (about 40%), but it is the only numeric column (other's are dates and strings) - could this make a difference? Is there another way to parallelise a jdbc read?
the partitioning column has a lot of NULL values in it (about 40%), but it is the only numeric column (other's are dates and strings) - could this make a difference?
It makes a huge difference. All values with NULL will go to the last partition:
val whereClause =
if (uBound == null) {
lBound
} else if (lBound == null) {
s"$uBound or $column is null"
} else {
s"$lBound AND $uBound"
}
Is there another way to parallelise a jdbc read?
You can use predicates with other columns than numeric ones. You could for example use ROWID pseudocoulmn in table and use a series of predicates based on prefix.

What does virtual core in YARN vcore mean?

Yarn is using the concept of virtual core to manage CPU resources. I would ask what's the benefit to use virtual core, is there some reason here that YARN uses vcore?
Here is what the documentation states (emphasis mine)
A node's capacity should be configured with virtual cores equal to its
number of physical cores. A container should be requested with the
number of cores it can saturate, i.e. the average number of threads it
expects to have runnable at a time.
Unless the CPU core is hyper-threaded it can run only one thread at a time (in case of hyper threaded OS actually sees 2 cores for one physical core and can run two threads - of course it's a bit of cheating and no-where as efficient as having actual physical core). Essentially what it means to end user is that a core can run a single thread so theoretically if I want parallelism using java threads then a reasonably good approximation is number of threads equal to number of core. So if your container process ( which is a JVM)
will require 2 threads then it's better to map it to 2 vcore - that what the last line means. And as total capacity of node the vcore should be equal to number of physical cores.
The most important thing to remember is still that it's actually the OS which will schedule the threads to be executed in different cores as it happens in any other application and
YARN in itself does not have control on it except the fact that what is the best possible approximation for how many thread to allocate for each container. And that's why it is important to take into consideration other applications running on OS, CPU cycles used by kernel etc., as all of cores will not be available to YARN application all the time.
EDIT: Further research
Yarn does not influence hard limits on CPU but Going through the code I can see how it tries to influence the CPU scheduling or cpu rate. Technically Yarn can launch different container processes - java, python , custom shell command etc. The responsibility of launching containers in Yarn belongs to the ContainerExecutor component of Node manager and I can see code for launching the container etc., along with some hints (depending on platform). For example in case of DefaultContainerExecutor ( which extends ContainerExecutor) - for windows it uses "-c" parameter for cpu restriction and on linux it uses process niceness to influence it. There is another implementation LinuxContainerExecutor (or better still CgroupsLCEResourcesHandler as former does not force the usage of cgroups) which tries to use Linux cgroups to limit the Yarn CPU resources on that node. More details can be found here.
ContainerExecutor {
.......
.......
protected String[] getRunCommand(String command, String groupId,
String userName, Path pidFile, Configuration conf, Resource resource) {
boolean containerSchedPriorityIsSet = false;
int containerSchedPriorityAdjustment =
YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_SCHED_PRIORITY;
if (conf.get(YarnConfiguration.NM_CONTAINER_EXECUTOR_SCHED_PRIORITY) !=
null) {
containerSchedPriorityIsSet = true;
containerSchedPriorityAdjustment = conf
.getInt(YarnConfiguration.NM_CONTAINER_EXECUTOR_SCHED_PRIORITY,
YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_SCHED_PRIORITY);
}
if (Shell.WINDOWS) {
int cpuRate = -1;
int memory = -1;
if (resource != null) {
if (conf
.getBoolean(
YarnConfiguration.NM_WINDOWS_CONTAINER_MEMORY_LIMIT_ENABLED,
YarnConfiguration.DEFAULT_NM_WINDOWS_CONTAINER_MEMORY_LIMIT_ENABLED)) {
memory = resource.getMemory();
}
if (conf.getBoolean(
YarnConfiguration.NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED,
YarnConfiguration.DEFAULT_NM_WINDOWS_CONTAINER_CPU_LIMIT_ENABLED)) {
int containerVCores = resource.getVirtualCores();
int nodeVCores = conf.getInt(YarnConfiguration.NM_VCORES,
YarnConfiguration.DEFAULT_NM_VCORES);
// cap overall usage to the number of cores allocated to YARN
int nodeCpuPercentage = Math
.min(
conf.getInt(
YarnConfiguration.NM_RESOURCE_PERCENTAGE_PHYSICAL_CPU_LIMIT,
YarnConfiguration.DEFAULT_NM_RESOURCE_PERCENTAGE_PHYSICAL_CPU_LIMIT),
100);
nodeCpuPercentage = Math.max(0, nodeCpuPercentage);
if (nodeCpuPercentage == 0) {
String message = "Illegal value for "
+ YarnConfiguration.NM_RESOURCE_PERCENTAGE_PHYSICAL_CPU_LIMIT
+ ". Value cannot be less than or equal to 0.";
throw new IllegalArgumentException(message);
}
float yarnVCores = (nodeCpuPercentage * nodeVCores) / 100.0f;
// CPU should be set to a percentage * 100, e.g. 20% cpu rate limit
// should be set as 20 * 100. The following setting is equal to:
// 100 * (100 * (vcores / Total # of cores allocated to YARN))
cpuRate = Math.min(10000,
(int) ((containerVCores * 10000) / yarnVCores));
}
}
return new String[] { Shell.WINUTILS, "task", "create", "-m",
String.valueOf(memory), "-c", String.valueOf(cpuRate), groupId,
"cmd /c " + command };
} else {
List<String> retCommand = new ArrayList<String>();
if (containerSchedPriorityIsSet) {
retCommand.addAll(Arrays.asList("nice", "-n",
Integer.toString(containerSchedPriorityAdjustment)));
}
retCommand.addAll(Arrays.asList("bash", command));
return retCommand.toArray(new String[retCommand.size()]);
}
}
}
For windows (it utilizes winutils.exe) , it uses cpu rate
For Linux it uses niceness as a parameter to control the CPU priority
"Virtual cores" are merely an abstraction of actual cores. This abstraction or "lie" (as i like to call it), allows YARN (and others) to dynamically spin threads (parallel process) based on availability. Take for example running map reduce on an "elastic" cluster with a processing limit constrained only by your wallet... The cloud baby... The. Cloud.
you can read more here

Apache Spark 1.2.1 standalone cluster giving java heap space error

I need information about, how to figure out how much heap space(memory) would be needed to operate on x mb(suppose x means 600 mb) in spark standalone cluster.
Scenario:
I have standalone cluster with 14gb memory and 8 cores. I want to operate(Reading data from files and writing it to Cassandra) on 600 MB of data.
For this task I have SparkConfig as:
.set("spark.cassandra.output.throughput_mb_per_sec","800")
.set("spark.storage.memoryFraction", "0.3")
And --executor-memory=5g --total-executor-cores 6 --driver-memory 6g at the time of submitting task.
In spite of above configuration,I getting java heap space error while writing data to Cassandra.
Below is the java code:
public static void main(String[] args) throws Exception {
String fileName = args[0];
Long now = new Date().getTime();
SparkConf conf = new SparkConf(true)
.setAppName("JavaSparkSQL_" +now)
.set("spark.cassandra.connection.host", "192.168.1.65")
.set("spark.cassandra.connection.native.port", "9042")
.set("spark.cassandra.connection.rpc.port", "9160")
.set("spark.cassandra.output.throughput_mb_per_sec","800")
.set("spark.storage.memoryFraction", "0.3");
JavaSparkContext ctx = new JavaSparkContext(conf);
JavaRDD<String> input =ctx.textFile
("hdfs://abc.xyz.net:9000/figmd/resources/" + fileName, 12);
JavaRDD<PlanOfCare> result = input.mapPartitions(new
ParseJson()).filter(new PickInputData());
System.out.print("Count --> "+result.count());
System.out.println(StringUtils.join(result.collect(), ","));
javaFunctions(result).writerBuilder("ks","pt_planofcarelarge",
mapToRow(PlanOfCare.class)).saveToCassandra();
}
What configuration I am suppose to do?Am I missing anything?
Thanks in advance.
JavaRDD collect method return an array that contains all of the elements in this RDD.
So in your case, it will creates an array with 340000 elements which will result in a Java Heap Error, you may want to take a small sample of your data and collect it or you may want to save it directly to your disk.
For more information about JavaRDD, you can always refer to the official documentation.

Resources