mod_tile cache size limit - caching

I have a TMS server with apache mod_tile, mapnik & renderd. I have 400GB of free space on my cache folder.
I want to pre-render 11 or 12 levels.
I tried the command "render_list -a -z 0 -Z 10 -v -n 4".
But my cache folder doesn't grow more than 2.6GB and render_list says it finished, no error message.
Even when I use my map (openlayer) missing tiles are rendered on the fly but not stored in cache. Before I pre-rendered my tiles, they were stored in cache.
I searched unsuccessfully,so I ask here : Is there any option in Mod_tile to manage cache size and cache replacement strategy ?
Thanks for your answers.
udpdate : Strangely when I request tile from level 11, they are well stored in cache, and my cache grows. So is there a size limit per level ?

Related

janusgraph-0.5.3 memory configuration

I am using janusgraph-0.5.3 (with Cassandra) and I want to know how to configure memory allocation to increase default memory allocated to 2GB for the gremlin server process.
I am trying to load bulk data on my gremlin-server, but it is failing with error. I would like to know if there is a way to check and increase the default memory allocation.
I need help in locating the .yaml configuration files as well as the values in these files that would need to change.
Thanks
I changed gremlin-server.sh file to take additional memory
# Set Java options
if [ "$JAVA_OPTIONS" = "" ] ; then
echo "Setting xmx and xss"
JAVA_OPTIONS="-Xms1024m -Xmx3074m -Xss2048k -javaagent:$JANUSGRAPH_LIB/jamm-0.3.0.jar -Dgremlin.io.kryoShimService=org.janusgraph.hadoop.serialize.JanusGraphKryoShimService"
fi

Kafka Streams app fails to start with “what(): Resource temporarily unavailable” in Cloud Foundry

Dear Stackoverflowians,
I’m having a problem with a Spring Cloud Stream app using a Kafka Streams binder. It’s only in our own Pivotal Cloud Foundry (CF) environment where this issue occurs. I have kind of hit a wall at this point and so I turn to you and your wisdom!
When the application starts up I see the following error
<snip>
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT current active tasks: [0_3, 1_3, 2_3, 3_3, 4_3, 0_7, 1_7, 5_3, 2_7, 3_7, 4_7, 0_11, 1_11, 5_7, 2_11, 3_11, 4_11, 0_15, 1_15, 5_11, 2_15, 3_15, 4_15, 0_19, 1_19, 5_15, 2_19, 3_19, 4_19, 0_23, 1_23, 5_19, 2_23, 3_23, 4_23, 5_23]
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT current standby tasks: []
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT previous active tasks: []
2019-08-07T15:18:02.67-0700 [API/0] OUT Updated app with guid 2db4a719-53ee-4d4a-9573-fe958fae1b4f ({"state"=>"STOPPED"})
2019-08-07T15:18:02.64-0700 [APP/PROC/WEB/0]ERR terminate called after throwing an instance of 'std::system_error'
2019-08-07T15:18:02.64-0700 [APP/PROC/WEB/0]ERR what(): Resource temporarily unavailable
2019-08-07T15:18:02.67-0700 [CELL/0] OUT Stopping instance 516eca4f-ea73-4684-7e48-e43c
2019-08-07T15:18:02.67-0700 [CELL/SSHD/0]OUT Exit status 0
2019-08-07T15:18:02.71-0700 [APP/PROC/WEB/0]OUT Exit status 134
2019-08-07T15:18:02.71-0700 [CELL/0] OUT Destroying container
2019-08-07T15:18:03.62-0700 [CELL/0] OUT Successfully destroyed container
The key here being the line with
what(): Resource temporarily unavailable
The error is related to the number of partitions. If I set the partition count to 12 or less things work. If I double it, the process fails to start with this error.
This doesn’t happen on my local windows dev machine. It also doesn’t happen in my local docker environment when I wrap this app in a docker image and run. I can take the same image and push it to CF or push the app as a java app, I get this error.
Here is some information about the kafka streams app. We have an input topic with a number of partitions. The topic is the output of debezium connector and basically it’s a change log of a bunch of database tables. The topology is not super complex but it’s not trivial. Its job is to aggregate the table update information back into our aggregates. We end up with 17 local stores in the topology. I have a strong suspicion this issue has something to do with rocksdb and the resources available to the CF container the app is in. But I have not the faintest idea what the Resource is that’s “temporarily unavailable”.
As I mentioned, I tried deploying it as a docker container with various jdk8 jvms, different base images centos, debian, I tried various different CF java buildbacks, I tried limiting java heap with relation to max container memory size (thinking that maybe it has something to do with native memory allocation) to no avail.
I’ve also asked our ops folks to up some limits on the containers and open files limit changed from the initial 16k to now 500k+. I saw some file lock related errors as below but they went away after this change.
2019-08-01T15:46:23.69-0700 [APP/PROC/WEB/0]ERR Caused by: org.rocksdb.RocksDBException: lock : /home/vcap/tmp/kafka-streams/cms-cdc/0_7/rocksdb/input/LOCK: No locks available
2019-08-01T15:46:23.69-0700 [APP/PROC/WEB/0]ERR at org.rocksdb.RocksDB.open(Native Method)
However the error what(): Resource temporarily unavailable with higher number of partitions persists.
ulimit -a on the container looks like this
~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1007531
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 524288
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I really do need to understand what the root of this error is. It’s hard to plan in this case not knowing what limit we’re hitting here.
Hope to hear your ideas. Thanks!
Edit:
Is there maybe some way to get more verbose error messages from the rocksdb library or a way to maybe build it so it outputs more info?
Edit 2
I have also tried to customize the rocksdb memory settings using org.apache.kafka.streams.state.RocksDBConfigSetter
The defaults are in org.apache.kafka.streams.state.internals.RocksDBStore#openDB(org.apache.kafka.streams.processor.ProcessorContext)
First I made sure the Java heap settings were well below the container process size limit and left nothing to the memory calculator by setting
JAVA_OPTS: -XX:MaxDirectMemorySize=100m -XX:ReservedCodeCacheSize=240m -XX:MaxMetaspaceSize=145m -Xmx1000m
With this I tried:
1.
Lowering the write buffer size
org.rocksdb.Options#setWriteBufferSize(long)
org.rocksdb.Options#setMaxWriteBufferNumber(int)
2.
Setting max_open_files to half the limit for the container (the total of all db instances)
org.rocksdb.Options#setMaxOpenFiles(int)
3.
I tried turning off the block cache altogether
org.rocksdb.BlockBasedTableConfig#setNoBlockCache
4.
I also tried setting cache_index_and_filter_blocks = true after re-enabling block cache
https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks
All to no avail. The above issue is still happening when I set higher number of partitions (24) on the input topic. Now that I have RocksDBConfigSetter with logging in it, I can see that the error happens exactly when the rocksdb is configured.
Edit 3
I still haven't gotten to the bottom of this. I have asked the question on https://www.facebook.com/groups/rocksdb.dev and was advised to trace system calls with strace or similar, but I was not able to obtain the required permissions to do that in our environment.
It has eaten up so much time that I had to settle for a workaround for now. What I ended up doing is refactored the topology to
1) minimize the number of materialized ktables (and the number of resulting rocksdb instances) and
2) break up the topology among multiple processes.
This allowed me to turn on and off topology parts in separate deployments with spring profiles and has given me some limited way forward for now.

What part of the RAM is used by the system file cache in Windows?

According to general notions about the page cache and this answer the system file cache essentially uses all the RAM not used by any other process. This is, as far as I know, the case for the page cache in Linux.
Since the notion of "free RAM" is a bit blurry in Windows, my question is, what part of the RAM does the system file cache use? For example, is the same as "Available RAM" in the task manager?
Yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. But not exactly. I'll go into details and explain how to measure it more precisely.
The file cache is not a process listed in the list of processes in the Task Manager. However, since Vista, its memory is managed like a process. Thus I'll explain a bit of memory management for processes, the file cache being a special case.
In Windows, the RAM used by a process has essentially two states: "Active" and "Standby":
"Active" RAM is displayed in the Task Manager and resource monitor as "In Use". It is also the RAM displayed for each process in the Task Manager.
"Standby" RAM is visible in the Resource monitor globally and for each process with RAMMap.
"Standby" + "Free" RAM is what is called "Available" in the task manager. "Free" RAM tends to be near 0 in Windows but you can meaningfully consider Standby RAM is free as well.
Standby RAM is considered as "not used for a while by the process". It is the part of the RAM that will be used to give new memory to processes needing it. But it still belongs to the process and could be used directly if the owning process suddenly access it (which is considered as unlikely by the system).
Thus the file cache has "Active" RAM and "Standby" RAM. "Active" RAM is somehow the cache for data recently accessed. "Standby" RAM is the cache for data accessed a while ago. The "Active" RAM of the file cache is usually relatively small. The Standby RAM of the file cache is most often all the RAM of your computer: Total RAM - Active RAM of all processes. Indeed, other processes rarely have Standby RAM because it tends to go to the file cache if you do disk I/O quite a bit.
This is the info displayed by RAMMap for a busy server doing a lot of I/O and computation:
The file cache is the second row called "Mapped file". See that most of the 32 GB is either in the Active part of other processes, or in the Standby part of the file cache.
So finally, yes, the RAM used by the file cache is essentially the RAM displayed as available in the Task Manager. If you want to measure with more certainty, you can use RAMMap.
Your answer is not entirely true.
The file cache, also called the system cache, describes a range of virtual addresses, it has a physical working set that is tracked by MmSystemCacheWs, and that working set is a subset of all the mapped file physical pages on the system.
The system cache is a range of virtual addresses, hence PTEs, that point to mapped file pages. The mapped file pages are brought in by a process creating a mapping or brought in by the system cache manager in response to a file read.
Existing pages that are needed by the file cache in response to a read become part of the system working set. If a page in a mapped file is not present then it is paged in and it becomes part of the system working set. When a page is in more than one working set (i.e. system and a process or process and another process), it is considered to be in a shared working set on programs like VMMap.
The actual mapped file pages themselves are controlled by a section object, one per file, a data control area (for the file) and subsection objects for the file, and a segment object for the file with prototype PTEs for the file. These get created the first time a process creates a mapping object for the file, or the first time the system cache manager creates the mapping object (section object) for the file due to it needing to access the file in response to a file IO operation performed by a process.
When the system cache manager needs to read from the file, it maps 256KiB views of the file at a time, and keeps track of the view in a VACB object. A process maps a variable view of a file, typically the size of the whole file, and keeps track of this view in the process VAD. The act of mapping the view is simply filling in PTEs to point to physical pages that contain the file that are already resident by looking at the prototype PTE for that range in the file and seeing what it contains, and in the event that the prototype PTE does not point to a physical page, initialising the PTE to point to the prototype PTE instead of the page it points to, and the PTE is left invalid, and this fault will be resolved on demand on a page by page basis when the read from the view is actually performed.
The VACBs keep track of the 256KiB views of files that the cache manager has opened and the virtual address range of that view, which describes the range of 64 PTEs that service that range of virtual addresses. There is no virtual external fragmentation or page table external fragmentation as all views are the same size, and there is no physical external fragmentation, because all pages in the view are 4KiB. 256KiB is the size chosen because if it were smaller, there would be too many VACB objects (64 times as many, taking up space), and if it were larger, there would effectively be a lot of internal fragmentation from reads and hence large virtual address pollution, and also, the VACB uses the lower bits of the virtual address to store the number of I/O operations that are currently being performed on that range, so the VACB size would have to be increased by a few bits or it would be able to handle fewer concurrent I/O operations.
If the view were the whole size of the file, there would quickly be a lot of virtual address pollution, because it would be mapping in the whole of every file that is read, and file mappings are supposed to be for user processes which knowingly map a whole file view into its virtual address space, expecting the whole of the file to be accessed. There would also be a lot of virtual external fragmentation, because the views wouldn't be the same size.
As for executable images, they are mapped in separately with separate prototype PTEs and separate physical pages, separate control area, separate segment and subsection object to the data file map for the file. The process maps the image in, but the kernel also maps images for ntoskrnl.exe, hal.dll in large pages, and then driver images are on the system PTE working set.

ext4 commit= mount option and dirty_writeback_centisecs

I'm tring to understand the way bytes go from write() to the phisical disk plate to tune my picture server performance.
Thing I don't understand is what is the difference between these two: commit= mount option and dirty_writeback_centisecs. Looks like they are about the same procces of writing changes to the storage device, but still different.
I do not get it clear which one fires first on the way to the disk for my bytes.
Yeah, I just ran into this investigating mount options for an SDCard Ubuntu install on an ARM Chromebook. Here's what I can tell you...
Here's how to see the dirty and writeback amounts:
user#chrubuntu:~$ cat /proc/meminfo | grep "Dirty" -A1
Dirty: 14232 kB
Writeback: 4608 kB
(edit: This dirty and writeback is rather high, I had a compile running when I ran this.)
So data to be written out is dirty. Dirty data can still be eliminated (if say, a temporary file is created, used, and deleted before it goes to writeback, it'll never have to be written out). As dirty data is moved into writeback, the kernel tries to combine smaller requests that may be into dirty into single larger I/O requests, this is one reason why dirty_expire_centisecs is usually not set too low. Dirty data is usually put into writeback when a) Enough data is cached to get up to vm.dirty_background_ratio. b) As data gets to be vm.dirty_writeback_centisecs centiseconds old (3000 default is 30 seconds) it is put into writeback. vm.dirty_writeback_centisecs, a writeback daemon is run by default every 500 centiseconds (5 seconds) to actually flush out anything in writeback.
fsync will flush out an individual file (force it from dirty into writeback and wait until it's flushed out of writeback), and sync does that with everything. As far as I know, it does this ASAP, bypassing any attempt to try to balance disk reads and writes, it stalls the device doing 100% writes until the sync completes.
commit=5 default ext4 mount option actually forces a sync() every 5 seconds on that filesystem. This is intended to ensure that writes are not unduly delayed if there's heavy read activity (ideally losing a maximum of 5 seconds of data if power is cut or whatever.) What I found with an Ubuntu install on SDCard (in a Chromebook) is that this actually just leads to massive filesystem stalls like every 5 seconds if you're writing much to the card, ChromeOS uses commit=600 and I applied that Ubuntu-side to good effect.
The dirty_writeback_centisecs, configures the daemons of the kernel Linux related to the virtual memory (that's why the vm). Which are in charge of making a write back from the RAM memory to all the storage devices, so if you configure the dirty_writeback_centisecs and you have 25 different storage devices mounted on your system it will have the same amount of time of writeback for all the 25 storage systems.
While the commit is done per storage device (actually is per filesystem) and is related to the sync process instead of the daemons from the virtual memory.
So you can see it as:
dirty_writeback_centisecs
writing from RAM to all filesystems
commit
each filesystem fetches from RAM

Oracle 11g direct I/O and write complete wait

Last week my database having high waiting time (write complete waits) on disk. I decided to stop all application from using this database. After that I shut down database and decided to use direct I/O option oracle as advised by my vendor last year.
*I cannot put screen shot here because didn't have enough reputation point
So changed value filesystemio_options from ASYNCH to SETALL and started back database. All application continue using this database again. But...
My disk seems more busy then before, but no waiting detected. Is this normal?
*I cannot put screen shot here because didn't have enough reputation point
For disk i'm using MegaRAID.
c0t1d0 is consist of 6 physical disk.
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 835.394 GB
Mirror Data : 835.394 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives per span:2
Span Depth : 3
Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None

Resources