Speeding up "Mapcache_seed" with Mapserver - caching

I'm using mapcache_seed in the mapcache package to create a large image cache from calling my mapserver WMS, with vectors.
Currently, this is the command I'm using:
sudo -u www-data mapcache_seed -c mapcache.xml -g WGS84 -n 8 -t Test -e\ [Foo,Bar,Baz,Fwee] -M 8,8 -z 12,13 --thread-delay 0 --rate-limit 10000
Where www-data is my Nginx system-user, mapcache.xml is my config, WGS84 is my SRS, -n 8 is my logical thread count (on a i7-6700HQ at 3200 MHz), -z 12,13 is one zoom level that needs to be seeded, thread-delay is off, and tile rate creation is set to 10000.
However, I only (max) get 50% total CPU utilization and most times only a single core goes above 50%. And an average of 500 tiles per second -- independent of how many threads or processes I specify. I've been trying to get all zoom-levels (4 to 27) seeded for the last couple of days, but I've only managed to get through 4-12, before being severely bottle-necked at a mere 3GB of a couple million tiles.
Memory utilization is at a stable 2.4% for 8GB PC4-2133 for mapcache_seed (0.5 for the WMS). Write speeds are at 100 MB/s, no-buffer write is also 100 MB/s, while buffered+cache is at 6.7-8.7 GB/s on a SATA III 1TB HDD. I have another SSD drive on my machine that gets 6 GB/s write and 8 GB/s read, but it's too small for storage and I'm afraid of drive failure from too many writes.
The cached tiles are around 4KB each and that means I get around 2MB worth of tiles every second. The majority of them aren't even tiles, but symlinks to a catch-all blank tile for empty tiles.
How would I go about speeding this process up? Messing with threads and limits, through mapcache_seed, does not make any discernible difference. This is also on a Debian Wheezy machine.
This is also being run through fast-cgi, using 256x256 px images, and disk cache with a restricted extent to a single country (otherwise mapcache starts generating nothing but symlinks to blank tiles, because more than 90% of the world is blank!)
Mapserver mapfile (redacted):
MAP
NAME "MAP"
SIZE 1200 800
EXTENT Foo Bar Baz Fwee
UNITS DD
SHAPEPATH "."
IMAGECOLOR 255 255 255
IMAGETYPE PNG
WEB
IMAGEPATH "/path/to/image"
IMAGEURL "/path/to/imageurl"
METADATA
"wms_title" "MAP"
"wms_onlineresource" "http://localhost/cgi-bin/mapserv?MAP=/path/to/map.map"
"wms_srs" "EPSG:4326"
"wms_feature_info_mime_type" "text/plain"
"wms_abstract" "Lorem ipsum"
"ows_enable_request" "*"
"wms_enable_request" "*"
END
END
PROJECTION
"init=epsg:4326"
END
LAYER
NAME base
TYPE POLYGON
STATUS OFF
DATA polygon.shp
CLASS
NAME "Polygon"
STYLE
COLOR 0 0 0
OUTLINECOLOR 255 255 255
END
END
END
LAYER
NAME outline
TYPE LINE
STATUS OFF
DATA line.shp
CLASS
NAME "Line"
STYLE
OUTLINECOLOR 255 255 255
END
END
END
END
mapcache.xml (redacted):
<?xml version="1.0" encoding="UTF-8"?>
<mapcache>
<source name="ms4wserver" type="wms">
<getmap>
<params>
<LAYERS>base</LAYERS>
<MAP>/path/to/map.map</MAP>
</params>
</getmap>
<http>
<url>http://localhost/wms/</url>
</http>
</source>
<cache name="disk" type="disk">
<base>/path/to/cache/</base>
<symlink_blank/>
</cache>
<tileset name="test">
<source>ms4wserver</source>
<cache>disk</cache>
<format>PNG</format>
<grid>WGS84</grid>
<metatile>5 5</metatile>
<metabuffer>10</metabuffer>
<expires>3600</expires>
</tileset>
<default_format>JPEG</default_format>
<service type="wms" enabled="true">
<full_wms>assemble</full_wms>
<resample_mode>bilinear</resample_mode>
<format>JPEG</format>
<maxsize>4096</maxsize>
</service>
<service type="wmts" enabled="false"/>
<service type="tms" enabled="false"/>
<service type="kml" enabled="false"/>
<service type="gmaps" enabled="false"/>
<service type="ve" enabled="false"/>
<service type="mapguide" enabled="false"/>
<service type="demo" enabled="false"/>
<errors>report</errors>
<locker type="disk">
<directory>/path/</directory>
<timeout>300</timeout>
</locker>
</mapcache>

So for anyone coming across this ten years later, when what's left of the little documentation for these tools has rotted away, I messed with my mapcache and mapfile settings to get something better. It wasn't that my generation was too slow, it was that I was generating TOO MANY GODDAMN SYMLINKS to blank files. First, the mapfile extent was incorrect. Second, I was using the "WGS84" grid, which by default seeds ALL extents. That means 90% of all my tiles were just symlinks to a blank.png, and it ate up ALL of my inodes. I recommend using mkdir blank; rsync -a --delete blank/ /path/to/cache for a quick clearing of all that mess.
I fixed the above by taking the WGS84 specifications and changing the extent to the one I specified in my mapfile. Now, only my mapfile gets seeded. Lastly, I appended the grid XML element like so:
<grid restricted_extent="MAP FILE EXTENT HERE">GRIDNAME</grid>
With restricted_extent now it's certain that only my map gets seeded. I had over 100 million tiles, but they were all goddamn symlinks! Otherwise, I got a "Ran out of space" or something of some such. Even if df shows that the partition isn't full, it's misleading. Symlinks take up inode space, not logical space! To see inode space, run df -hi. I was at 100% inode space, but only 1% logical space on a 1TB drive -- filled with goddamn symlinks!

Related

How can you control Kristen FPGA from implementing excess registers?

I am using Kristen to generate a Verilog FPGA host interface for a neuromorphic processor. I have implemented the basic host as follows,
<module name= "nmp" duplicate="1000">
<register name="start0" type="rdconst" mask="0xFFFFFFFF" default="0x00000000" description="Lower 64 bit start pointer of persitant NMP storage."></register>
<register name="start1" type="rdconst" mask="0xFFFFFFFF" default="0x00000020" description="Upper 64 bit start pointer of persitant NMP storage."></register>
<register name="size" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Size of NMP persitant storage in Mbytes."></register>
<register name="c_start0" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Lower 64 bit start pointer of cached shared storage."></register>
<register name="c_start1" type="rdconst" mask="0xFFFFFFFF" default="0x00000020" description="Upper 64 bit start pointer of cached shared storage."></register>
<register name="c_size" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Size of cached shared storage in Mbytes."></register>
<register name="row" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable row location for this NMP."></register>
<register name="col" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable col location for this NMP."></register>
<register name="threshold" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable synaptic sum threshold for this instance."></register>
<memory name="learn" memsize="0x00004000" type="mem_ack" description="Learning interface - Map input synapsys to node intensity">
<field name="input_id" size="22b" description="Input ID this map data is intended for."></field>
<field name="scale" size="16b" description="The intensity scale for this input ID."></field>
</memory>
</module>
The end result is that I am seeing a ton of registers being generated and I have to scale my NMP size down to fit within the constraints of my FPGA. Is there a way to control the number of registers being generated here? Obviously I need to store settings for these different fields. Am I missing something here?
I should add that I am trying to get to a 2048 scale on my NMP but the best I can do is just over a 1000, and not quite 1024. If I implement without PCIe or host control, I can get to 2048 without issue.
If I understand correctly, each NMP instance has a been coded with a internal register to store data and the configuration you have shown will result in kristen creating Verilog with registers as well. Effectivley there is a double buffered storage occuring.
Because of this, the number of registers are effectively doubled beyond what they need to be. One way of dealing with this situation described is to use another RAM interface of 32 bits wide. I do note that your config calls for 9 x 32 bits words which is a odd size for memory. There will be some wasted adddress space. Kristen will create a RAM's on binary boundaries so, you can get a 16x32bit memory region that you can overlay on that interface. And then a second RAM just like you have already for the learn memory.
<module>
<memory name="regs" memsize="0x10" type="mem_ack" description="Register mapping per NMP instance">
<field name "start0" size="32b" description="Start0"></field>
<field name "start1" size="32b" description="Start1"></field>
....
<field name "threshold" size="32b" description="Threshold"></field>
</memory>
<memory name="learn" memsize="0x00004000" type="mem_ack" description="Learning interface - Map input synapsys to node intensity">
<field name="input_id" size="22b" description="Input ID this map data is intended for."></field>
<field name="scale" size="16b" description="The intensity scale for this input ID."></field>
</memory>
</module>
Generate this and take a look at the new interface. That should reduce the number of registers generated in your Verilog code and subsequent synthesis.

OpenZFS on Windows: less available space than capacity in single disk pool

Creating a new pool by using the instructions from readme, as follows:
zpool create -O casesensitivity=insensitive -O compression=lz4 -O atime=off -o ashift=12 tank PHYSICALDRIVE1
I get less available space showing up in file explorer and zpool, than the disk capacity itself: 1.76TiB vs 1.81TiB
zpool list and zfs list -r poolname show the difference:
zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 1,81T 360K 1,81T - - 0% 0% 1.00x ONLINE -
zfs list -r tank
NAME USED AVAIL REFER MOUNTPOINT
tank 300K 1,76T 96K /tank
I'm not sure of the reason. Is there something that ZFS uses the space for?
Does it ever become available for use, or is it reserved, e.g. for root like on ext4?
Because it is copy on write, even deleting stuff requires using a tiny bit of extra storage in ZFS: until the old data has been marked as free (which requires writing newly-created metadata), you can’t start allocating the space it was using to store new data. A small amount of storage is reserved so that if you completely fill your pool it’s still possible to delete stuff to free up space. If it didn’t do this, you could get wedged, which wouldn’t be fixable unless you added more disks to your pool / made one of the disks larger if you are using virtualized storage.
There are also other small overheads (metadata storage, etc.) but I think most of the holdback you’re seeing is related to the above since it doesn’t look like you’ve written anything into the pool yet.

Memory not being released back to OS

I've created an image resizing server that creates a few different thumbnails of and image that you upload to it. I'm using the package https://github.com/h2non/bimg for resizing, which is using libvips with c-bindings.
Before going to production I've started to stress test my app with jmeter and upload 100 images to it concurrently for a few times after each other and noticed that the memory is not being released back to the OS.
To illustrate the problem I've written a few lines of code that reads 100 images and resize them (without saving them anywhere) and then waits for 10 minutes. It repeats like this for 5 times
My code and memory/CPU graph can be found here:
https://github.com/hamochi/bimg-memory-issue
It's clear that the memory is being reused for ever cycle, otherwise it should have doubled (I think). But it's never released back to the OS.
Is this a general behaviour for cgo? Or bimg that is doing something weird. Or is it just my code that is faulty?
Thank you very much for any help you can give!
There's a libvips thing to track and debug reference counts -- you could try enabling that and see if you have any leaks.
https://libvips.github.io/libvips/API/current/libvips-vips.html#vips-leak-set
Though from your comment above about bimg memory stats, it sounds like it's probably all OK.
It's easy to test libvips memory from Python. I made this small program:
#!/usr/bin/python3
import pyvips
import sys
# disable libvips operation caching ... without this, it'll cache all the
# thumbnail operations and we'll just be testing the jpg write
pyvips.cache_set_max(0)
for i in range(0, 10000):
print("loop {} ...".format(i))
for filename in sys.argv[1:]:
# thumbnail to fit 128x128 box
image = pyvips.Image.thumbnail(filename, 128)
thumb = image.write_to_buffer(".jpg")
ie. repeatedly thumbnail a set of source images. I ran it like this:
$ for i in {1..100}; do cp ~/pics/k2.jpg $i.jpg; done
$ ../fing.py *
And watched RES in top. I saw:
loop | RES (kb)
-- | --
100 | 39220
250 | 39324
300 | 39276
400 | 39316
500 | 39396
600 | 39464
700 | 39404
1000 | 39420
As long as you have no refcount leaks, I think what you are seeing is expected behaviour. Linux processes can only release pages at the end of the heap back to the OS (have a look at the brk and sbrk sys calls):
https://en.wikipedia.org/wiki/Sbrk
Now imagine if 1) libvips allocates 6GB, 2) the Go runtime allocates 100kb, 3) libvips releases 6GB. Your libc (the thing in your process that will call sbrk and brk on your behalf) can't hand the 6GB back to the OS because of the 100kb alloc at the end of the heap. Some malloc implementations have better memory fragmentation behaviour than others, but the default linux one is pretty good.
In practice, it doesn't matter. malloc will reuse holes in your memory space, and even if it doesn't, they will get paged out anyway under memory pressure and won't end up eating RAM. Try running your process for a few hours, and watch RES. You should see it creep up, but then stabilize.
(I'm not at all a kernel person, the above is just my understanding, corrections very welcome of course)
The problem is in the resize code:
_, err = bimg.NewImage(buffer).Resize(width, height)
The image is gobject and need unref explicitly to release the memory, try:
image, err = bimg.NewImage(buffer).Resize(width, height)
defer C.g_object_unref(C.gpointer(image))

Performance Issue on Nginx Static File Serving 10Gbps Server

I'm using Nginx to Serve Static Files on Dedicated Servers.
The server has no website, it is only a File Download Server. File sizes range from MB to GBs.
Previously I had 8 Dedicated Servers with 500 Mbps at unmetered.com. Each of them was performing great.
I thought to buy a 10Gbps server from FDCServers. Because one is easy to manage than multiple servers.
Below are specs of server:
Dual Xeon E5-2640 (15M Cache, 2.50 GHz, 7.20 GT/s Intel® QPI) - 24 Cores
128 GB RAM
10 Gbit/s Network Unmetered
Ubuntu 14.04 LTS
1.5 TB SATA
But my new giant server is not giving speed more than 500 to 600 Mbps. I installed nload to monitor traffic and upload/download speed. It is reporting almost same as previous unmetered.com servers.
Then I thought that it might be due to Read rate limitation of SATA hard disk.
So I purchased and installed 3 X 240 GB SSD Drives in New Powerful server.
I moved file into SSD Drive and downloaded it for testing purpose. Speed is still not good. I'm getting only 250 to 300 Kbps. Whereas It should give me at least 2Mbps (Which is the speed limit per IP I placed in Nginx Configuration Files).
I then searched on Gigabit Ethernet Tuning settings. Found couple of sysctl settings that need to be tuned for 10Gbps network.
http://www.nas.nasa.gov/hecc/support/kb/Optional-Advanced-Tuning-for-Linux_138.html
I implemented them but still throughput is same like my previous 500Mbps servers.
Can you please help in to improve the Network throughput of this server. I asked FDCServer support team and they confirmed that their server's can easily give 3 to 5 Gbps and they can't help me to tune it.
After all tuning and setting I'm getting only 700Mbit at most.
Let me know if you need more details.
Perform the test memory:
for DDR3 1333MHz PC10600
$ dd if=/dev/zero bs=1024k count=512 > /dev/null
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.0444859 s, 12.1 GB/s
test disk io:
$ pv ./100MB.bin > /dev/null
100MiB 0:00:00 [3.36GiB/s] [=================================================================================================================================================================================>] 100%
test cpu speed with the help pipe:
$ dd if=/dev/zero bs=1024k count=512 2> /dev/null| pv > /dev/null
512MiB 0:00:00 [2.24GiB/s] [ <=> ]
speed nginx download from localhost should be ~1.5-2 GB/s
cheking:
$ wget -O /dev/null http://127.0.0.1/100MB.bin
--2014-12-10 09:08:57-- http://127.0.0.1:8080/100MB.bin
Connecting to 127.0.0.1:8080... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/octet-stream]
Saving to: ‘/dev/null’
100%[=======================================================================================================================================================================================>] 104,857,600 --.-K/s in 0.06s
2014-12-10 09:08:57 (1.63 GB/s) - ‘/dev/null’ saved [104857600/104857600]
Check this solution.
remove lines:
output_buffers 1 512k;
aio on;
directio 512;
and change
sendfile off;
tcp_nopush off;
tcp_nodelay off;
to
sendfile on;
tcp_nopush on;
tcp_nodelay on;
good luck
I think you need to split the issues and test independently to determine the real problem - it's no use guessing it's the disk and spending hundreds, or thousands, on new disks if it is the network. You have too many variables to just change randomly - you need to divide and conquer.
1) To test the disks, use a disk performance tool or good old dd to measure throughput in bytes/sec and latency in milliseconds. Read data blocks from disk and write to /dev/null to test read speed. Read data blocks from /dev/zero and write to disk to test write speed - if necessary.
Are your disks RAIDed by the way? And split over how many controllers?
2) To test the network, use nc (a.k.a. netcat) and thrash the network to see what throughput and latency you measure. Read data blocks from /dev/zero and send across network with nc. Read data blocks from the network and discard to /dev/null for testing in the other direction.
3) To test your nginx server, put some static files on a RAMdisk and then you will be independent of the physical disks.
Only then will you know what needs tuning...

mod_tile cache size limit

I have a TMS server with apache mod_tile, mapnik & renderd. I have 400GB of free space on my cache folder.
I want to pre-render 11 or 12 levels.
I tried the command "render_list -a -z 0 -Z 10 -v -n 4".
But my cache folder doesn't grow more than 2.6GB and render_list says it finished, no error message.
Even when I use my map (openlayer) missing tiles are rendered on the fly but not stored in cache. Before I pre-rendered my tiles, they were stored in cache.
I searched unsuccessfully,so I ask here : Is there any option in Mod_tile to manage cache size and cache replacement strategy ?
Thanks for your answers.
udpdate : Strangely when I request tile from level 11, they are well stored in cache, and my cache grows. So is there a size limit per level ?

Resources