Heroku need help handling bloat in database and vacuuming - laravel

Today, I started getting timeout errors from heroku. I eventually ran this ...
heroku pg:diagnose -a myapp
and got ...
RED: Bloat
Type Object Bloat Waste
───── ───────────────────────────────────────────── ───── ───────
table public.files 776 1326 MB
index public.files::files__lft__rgt_parent_id_index 63 106 MB
RED: Hit Rate
Name Ratio
────────────────────── ──────────────────
overall cache hit rate 0.8246404842342929
public.files 0.8508127886460272
I ran the VACUUM command and it did nothing to address the bloat. How do I address this?

I know this is an old answer. For anyone else facing the same issue, he might try the following for vacuum:
VACUUM (ANALYZE, VERBOSE, FULL) your-table-name;

Related

runtime: failed to create new OS thread

On a 54-core machine, I use os.Exec() to spawn hundreds of client processes, and manage them with an abundance of goroutines.
Sometimes, but not always, I get this:
runtime: failed to create new OS thread (have 1306 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc
My ulimit is pretty high already:
$ ulimit -u
1828079
There's never a problem if I limit myself to, say, 54 clients.
Is there a way I can handle this situation more gracefully? E.g. not bomb out with a fatal error, and just do less/delayed work instead? Or query the system ahead of time and anticipate the maximum amount of stuff I can do (I don't just want to limit to the number of cores though)?
Given my large ulimit, should this error even be happening? grep -c goroutine on the stack output following the fatal error only gives 6087. Each client process (of which there are certainly less than 2000) might have a few goroutines of their own, but nothing crazy.
Edit: the problem only occurs on high-core machines (~60). Keeping everything else constant and just changing the number of cores down to 30 (this being an OpenStack environment, so the same underlying hardware still being used), these runtime errors don't occur.

Memory not being released back to OS

I've created an image resizing server that creates a few different thumbnails of and image that you upload to it. I'm using the package https://github.com/h2non/bimg for resizing, which is using libvips with c-bindings.
Before going to production I've started to stress test my app with jmeter and upload 100 images to it concurrently for a few times after each other and noticed that the memory is not being released back to the OS.
To illustrate the problem I've written a few lines of code that reads 100 images and resize them (without saving them anywhere) and then waits for 10 minutes. It repeats like this for 5 times
My code and memory/CPU graph can be found here:
https://github.com/hamochi/bimg-memory-issue
It's clear that the memory is being reused for ever cycle, otherwise it should have doubled (I think). But it's never released back to the OS.
Is this a general behaviour for cgo? Or bimg that is doing something weird. Or is it just my code that is faulty?
Thank you very much for any help you can give!
There's a libvips thing to track and debug reference counts -- you could try enabling that and see if you have any leaks.
https://libvips.github.io/libvips/API/current/libvips-vips.html#vips-leak-set
Though from your comment above about bimg memory stats, it sounds like it's probably all OK.
It's easy to test libvips memory from Python. I made this small program:
#!/usr/bin/python3
import pyvips
import sys
# disable libvips operation caching ... without this, it'll cache all the
# thumbnail operations and we'll just be testing the jpg write
pyvips.cache_set_max(0)
for i in range(0, 10000):
print("loop {} ...".format(i))
for filename in sys.argv[1:]:
# thumbnail to fit 128x128 box
image = pyvips.Image.thumbnail(filename, 128)
thumb = image.write_to_buffer(".jpg")
ie. repeatedly thumbnail a set of source images. I ran it like this:
$ for i in {1..100}; do cp ~/pics/k2.jpg $i.jpg; done
$ ../fing.py *
And watched RES in top. I saw:
loop | RES (kb)
-- | --
100 | 39220
250 | 39324
300 | 39276
400 | 39316
500 | 39396
600 | 39464
700 | 39404
1000 | 39420
As long as you have no refcount leaks, I think what you are seeing is expected behaviour. Linux processes can only release pages at the end of the heap back to the OS (have a look at the brk and sbrk sys calls):
https://en.wikipedia.org/wiki/Sbrk
Now imagine if 1) libvips allocates 6GB, 2) the Go runtime allocates 100kb, 3) libvips releases 6GB. Your libc (the thing in your process that will call sbrk and brk on your behalf) can't hand the 6GB back to the OS because of the 100kb alloc at the end of the heap. Some malloc implementations have better memory fragmentation behaviour than others, but the default linux one is pretty good.
In practice, it doesn't matter. malloc will reuse holes in your memory space, and even if it doesn't, they will get paged out anyway under memory pressure and won't end up eating RAM. Try running your process for a few hours, and watch RES. You should see it creep up, but then stabilize.
(I'm not at all a kernel person, the above is just my understanding, corrections very welcome of course)
The problem is in the resize code:
_, err = bimg.NewImage(buffer).Resize(width, height)
The image is gobject and need unref explicitly to release the memory, try:
image, err = bimg.NewImage(buffer).Resize(width, height)
defer C.g_object_unref(C.gpointer(image))

Query failing with "ERROR: Canceling query because of high VMEM usage"

We have small array of gpdb cluster. in that, few queries are failing
System Related information
TOTAL RAM =30G
SWAP =15G
gp_vmem_protect_limit= 2700MB
TOTAL segment = 8 Primary + 8 mirror = 16
SEGMENT HOST=2
VM_OVERCOMMIT RATIO =72
Used this calc : http://greenplum.org/calc/#
SYMPTOM
The query failed with the error message shown below:
ERROR: XX000: Canceling query because of high VMEM usage. Used: 2433MB, available 266MB, red zone: 2430MB (runaway_cleaner.c:135) (seg2 slice74 DATANODE01:40002 pid=11294) (cdbdisp.c:1320)
We tried :
changed following parameters
statement_mem from 125 MB to 8GB
MAX_STATEMENT MEMORY from 200 MB TO 16 GB
Not sure what exactly needs to change here.still, trying to understand root cause of error.
Any help in it would be much appreciated ?
gp_vmem_protect_limit is for per segment. You have 16segments. based on your segments and vm_protect, you need 2700MB X 16 total memory.

Redis high memory usage for almot no keys

I have a redis instance hosted by heroku ( https://elements.heroku.com/addons/heroku-redis ) and using the plan "Premium 1"
This redis is usued only to host a small queue system called Bull ( https://www.npmjs.com/package/bull )
The memory usage is now almost at 100 % ( of the 100 Mo allowed ) even though there is barely any job stored in redis.
I ran an INFO command on this instance and here are the important part ( can post more if needed ) :
# Server
redis_version:3.2.4
# Memory
used_memory:98123632
used_memory_human:93.58M
used_memory_rss:470360064
used_memory_rss_human:448.57M
used_memory_peak:105616528
used_memory_peak_human:100.72M
total_system_memory:16040415232
total_system_memory_human:14.94G
used_memory_lua:280863744
used_memory_lua_human:267.85M
maxmemory:104857600
maxmemory_human:100.00M
maxmemory_policy:noeviction
mem_fragmentation_ratio:4.79
mem_allocator:jemalloc-4.0.3
# Keyspace
db0:keys=45,expires=0,avg_ttl=0
# Replication
role:master
connected_slaves:1
master_repl_offset:25687582196
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:25686533621
repl_backlog_histlen:1048576
I have a really hard time figuring out how I can be using 95 Mo with barely 50 object stored. These objects are really small, usually a JSON with 2-3 fields containing small strings and ids
I've tried https://github.com/gamenet/redis-memory-analyzer but it crashes on me when I try to run it
I can't get a dump because Heroku does not allow it.
I'm a bit lost here, there might be something obvious I've missed but I'm reaching the limit of my understanding of Redis.
Thanks in advance for any tips / pointer.
EDIT
We had to upgrade our Redis instance to keep everything running but it seems the issue is still here. Currently sitting at 34 keys / 34 Mo
I've tried redis-cli --bigkeys :
Sampled 34 keys in the keyspace!
Total key length in bytes is 743 (avg len 21.85)
9 strings with 43 bytes (26.47% of keys, avg size 4.78)
0 lists with 0 items (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
24 hashs with 227 fields (70.59% of keys, avg size 9.46)
1 zsets with 23 members (02.94% of keys, avg size 23.00)
I'm pretty sure there is some overhead building up somewhere but I can't find what.
EDIT 2
I'm actually blind : used_memory_lua_human:267.85M in the INFO command I run when first creating this post and now used_memory_lua_human:89.25M on the new instance
This seems super high, and might explain the memory usage
You have just 45 keys in database, so what you can do is:
List all keys with KEYS * command
Run DEBUG OBJECT <key> command for each or several keys, it will return serialized length so you will get better understanding what keys consume lot of space.
Alternative option is to run redis-cli --bigkeys so it will show biggest keys. You can see content of the key by specific for the data type command - for strings it's GET command, for hashes it's HGETALL and so on.
After a lot of digging, the issue is not coming from Redis or Heroku in anyway.
The queue system we use has a somewhat recent bug where Redis ends up caching a Lua script repeatedly eating up memory as time goes on.
More info here : https://github.com/OptimalBits/bull/issues/426
Thanks for those who took the time to reply.

Memory Size Errors When Running Magento's cron.sh

We have set the nginx user to run the following command at 5 minute intervals.
sh /var/www/magento/cron.sh
This was executed fairly successfully for some time. Within the last month or so, it has begun to error out each time due to memory limit. I've increased the memory limit but that's only caused a higher limit that gets reached. It seems that there must be a bigger problem. The error is consistently as follows.
Fatal error: Allowed memory size of 262144 bytes exhausted
(tried to allocate 7680 bytes) in /var/www/magento/app/Mage.php on line 589
Below are the crons that magento has set to run every five minutes:
job: xmlconnect_notification_send_all
model: xmlconnect/observer::scheduledSend
file: /var/www/magento/app/code/core/Mage/XmlConnect/etc/config.xml
job: newsletter_send_all
model: newsletter/observer::scheduledSend
file: /var/www/magento/app/code/core/Mage/Newsletter/etc/config.xml
job: enterprise_staging_automates
model: enterprise_staging/observer::automates
file: /var/www/magento/app/code/core/Enterprise/Staging/etc/config.xml
Your error message has all the answers
Allowed memory size of 262144 bytes exhausted
While 262144 bytes seems like a big number, it's only 256KB, or around .25 MB
I believe the Magento documents recommend a memory limit of 256MB, with 512MB being a far more common in the wild. You'll need make sure your command PHP version (or the command line PHP launched by cron.sh) has it's memory_limit ini set correctly. One common pitfall here is to omit the M, or to use MB
; Will not do what you want it to
memory_limit=256
memory_limit=256MB
So make sure your configuration file is set something like this
memory_limit=256M
This is a server setup issue. Your php CLI memory_limit is set too low.
Bounce it up to 256M.
If you're not running your own server, then you need to contact your hosting provider.

Resources