Postgres errors on ARM-based M1 Mac w/ Big Sur - macos-big-sur

Ever since I got a new ARM-based M1 MacBook Pro, I've been experiencing severe and consistent PostgreSQL issues (psql 13.1). Whether I use a Rails server or Foreman, I receive errors in both my browser and terminal like PG::InternalError: ERROR: could not read block 15 in file "base/147456/148555": Bad address or PG::Error (invalid encoding name: unicode) or Error during failsafe response: PG::UnableToSend: no connection to the server. The strange thing is that I can often refresh the browser repeatedly in order to get things to work (until they inevitably don't again).
I'm aware of all the configuration challenges related to ARM-based M1 Macs, which is why I've uninstalled and reinstalled everything from Homebrew to Postgres multiple times in numerous ways (with Rosetta, without Rosetta, using arch -x86_64 brew commands, using the Postgres app instead of the Homebrew install). I've encountered a couple other people on random message boards who are experiencing the same issue (also on new Macs) and not having any luck, which is why I'm reluctant to believe that it's a drive corruption issue. (I've also run the Disk Utility FirstAid check multiple times; it says everything's healthy, but I have no idea how reliable that is.)
I'm using thoughtbot parity to sync up my dev environment database with what's currently in production. When I run development restore production, I get hundreds of lines in my terminal that look like the output below (this is immediately after the download completes but before it goes on to create defaults, process data, sequence sets, etc.). I believe it's at the root of the issue, but I'm not sure what the solution would be:
pg_restore: dropping TABLE [table name1]
pg_restore: from TOC entry 442; 1259 15829269 TABLE [table name1] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR: table "[table name1]" does not exist
Command was: DROP TABLE "public"."[table name1]";
pg_restore: dropping TABLE [table name2]
pg_restore: from TOC entry 277; 1259 16955 TABLE [table name2] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR: table "[table name2]" does not exist
Command was: DROP TABLE "public"."[table name2]";
pg_restore: dropping TABLE [table name3]
pg_restore: from TOC entry 463; 1259 15830702 TABLE [table name3] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR: table "[table name3]" does not exist
Command was: DROP TABLE "public"."[table name3]";
pg_restore: dropping TABLE [table name4]
pg_restore: from TOC entry 445; 1259 15830421 TABLE [table name4] u1oi0d2o8cha8f
pg_restore: error: could not execute query: ERROR: table "[table name4]" does not exist
Command was: DROP TABLE "public"."[table name4]";
Has anyone else experienced this? Any solution ideas would be much appreciated. Thanks!
EDIT: I was able to reproduce the same issue on an older MacBook Pro (also running Big Sur), so it seems unrelated to M1 but potentially related to Big Sur.

Definitive workaround for this:
After trying all the workarounds in the other answer, I was STILL getting this error occasionally. Even after dumping and restoring the database, switching to M1-native postgres, running all manner of maintenance script, etc.
After much tinkering with postgresql.conf, the only thing that has reliably worked around this issue indefinitely (have not since received the error):
In postgresql.conf, change:
max_worker_processes = 8
to
max_worker_processes = 1
After making this change, I have thrown every test at my previously error-ridden database and it hasn't displayed the same error once. Previously an extraction routine I run on a database of about 20M records would give the bad address error after processing 1-2 million records. Now it completes the whole process.
Obviously there is a performance penalty to reducing the number of parallel workers, but this is the only way I've found to reliably and permanently resolve this issue.

UPDATE #2:
WAL Buffer etc. adjustments extended the time between errors, but didn't eliminate it completely. Ended up reinstalling a fresh Apple Silicon version of Postgres using Homebrew then doing a pg_dump of my existing database (experiencing the errors) and restoring it to the new installation/cluster.
Here's the interesting bit: pg_restore failed to restore one of the indexes in the database, and noted it during the restore process (which otherwise completed). My hunch is that corruption or another issue with this index was causing the Bad Address errors. As such, my final suggestion on this issue is to perform pg_dump, then use pg_restore, not pg_dump to restore the database. pg_restore appears to have flagged this issue where pg_dump didn't, writing a clean DB sans the faulty index.
UPDATE:
Continued to experience this issue after attempting several workarounds, including a full pg_dump and restore of the affected database. And while some of the fixes seem to extend the time between occurrences (particularly increasing shared buffer memory), none have proven a permanent fix.
That said, some more digging on postgres mailing lists revealed that this "Bad Address" error can occur in conjunction with WAL (write-ahead-log) issues. As such, I've now set the following in my postgresql.conf file, significantly increasing the WAL buffer size:
wal_buffers = 4MB
and have not experienced the issue since (knock on wood, again).
It makes sense that this would have some effect, as the wal_buffer size increases by default in proportion to the shared buffer size (as aforementioned, increasing shared buffer size provided temporary relief). Anyway, something else to try until we get definitive word on what's causing this bug.
Was having this exact issue sporadically on an M1 MacBook Air: ERROR: could not read block and Bad Address in various permutations.
I read in postgres forum that this issue can occur in virtual machine setups. As such, I assume this is somehow caused by Rosetta. Even if you're using the Universal version of postgres, you're likely still using an x86 binary for some adjunct process (e.g. Python in my case).
Regardless, here's what has solved the issue (so far): reindexing the database
Note: you need to reindex from the command line, not using SQL commands. When I attempted to reindex using SQL, I encountered the same Bad Address error over and over, and the reindexing never completed.
When I reindexed using the command line, the process finished, and the Bad Address error has not recurred (knock on wood).
For me, it was just:
reindexdb name_of_database
Took 20-30 minutes for a 12GB DB. Not only am I not getting these errors anymore, but the database seems snappier to boot. Only hope the issue doesn't return with repeated reads/writes/index creation in Rosetta. I'm not sure why this works... maybe indices created on M1 Macs are prone to corruption? Maybe the indices become corrupt due to write or access because of the Rosetta interaction?

Is it possible that something in the Big Sur Beta 11.3 fixed this issue?
I've been having the same issues as OP since installing PostgreSQL 13 using MacPorts on my Mac mini M1 (now on PostgreSQL 13.2).
I would see could not read block errors:
Occasionally when running ad hoc queries
Always when compiling a book in R Markdown that makes several queries
Always when running VACUUM FULL on my main database (there's about 620 GB in the instance on this machine and the error would be thrown very quickly relative to how long a VACUUM FULL would take).
(My "fix" so far has been to point my Mac to the Ubuntu server I have running in the corner of my office, so no real problem for me.)
But I've managed to do 2 and 3 without the error since upgrading to Big Sur Beta 11.3 today (both failed immediately prior to upgrading). Is it possible that something in the OS fixed this issue?

I restored postgresql.conf from postgresql.conf.sample (and restarted db server) and it works fine since then.
TBC, I was trying both wal_buffers & max_worker_processes here and it didn't help. I discovered it accidentally because I tried so many things I just needed to go back. I did not reinitiazed whole database or anything like that, just the config file.

Related

Why pgAdmin 4 is too slow?

pgAdmin 4 GUI for postgreSQL is very slow. It takes too much time to even expand a server tree or a database tree. They each took almost 30 seconds to expand. It also hangs while creating a new database or table. Even after loading it took more than a minute just to create and save a new database. It happens almost every time I load the pgAdmin. Is this problem faced just by me or there's something wrong?
My system specifications: PostgreSQL 12.3, Firefox 77.0, Windows 10 64-bit, 8th Gen Quad Core i5 8250u processor, 8GB RAM and 2GB dedicated graphics memory.
In the picture you can see:
The database tree is still loading.
The right click menu and create database window got hanged up.
Hanged on clicking save. It took more than a minute to save a new database
I had the very same issue, Seems it's related to Windows 10 preferring ipv6 over ipv4.
the following fix worked for me:
https://dba.stackexchange.com/questions/201646/
Modify the listen_addresses setting in
the postgresql.conf file usually located under : (unless another data folder specified)
{installation folder}/data/postgresql.conf
to be -> listen_addresses = '127.0.0.1,::1'
The Default value for the listen_addresses is localhost
Click the title bar of pgadmin and drag it around(slowly, otherwise other windows will minimize). I don't know how but it works for me, everything will load faster.
I had the same problem. It turns out that the PostgreSQL binary path was pointing to $DIR/../runtime:
So I had to manually set the binary path, restart PgAdmin and that's it; everything works as expected.
pgAdmin 4 is just a frontend for PostgreSQL, there are around 25 front end like this that you can use for Postgre, i personally use DBeaver for PostgreSQL, i also had so many issues with pgAdmin 4, its super slow and will give you issues if you working with nodejs.
Install DBeaver and go to "NEw Database Connection> and connect with PostgreSQL
Thank you
FILE >> PREFERENCES >> BINARY PATHS :
I moved the '$DIR/runtime' path from Pg14 to Pg13.
GUI is much faster now.
I also had the same issue when I installed pgAdmin 4. Server tree loading and query execution took long time than the older versions took.
I tried to change many things but the thing that worked was changing the binary path. Actually I had added the binary path of postgresql bin folder i.e. C:\Program Files\PostgreSQL\14\bin to PostgreSQL 14. I changed the that and added the path to PostgreSQL 13.
Open the preferences dialog box from 'File' in the menu bar. Then go to binary paths tab and it should look similar to this image.
Actually PostgresSQL 14 is the latest one and it should work but I also don't know why it is not working. Maybe in the future developers will solve this issue, but for now I hope that this solution helps you.
Programming languages are not made equal.
Python is four times slower than C++ or Java
when no external efficient libs are used.
The current pgAdmin 4 has been rewritten in Python as you can check here: github pgadmin4 and here pgadmin download
The older pgAdmin 3 was written in C++ as you can check in its source code here enter link description here Python is four times slower than languages like C++ or Java. Part of the user interface has also moved to Javascript.
Change to different data structures
Each programming language comes with its default datastructures. A change to a different programming language can cause that less effective structures are used . And this cause a many fold decrease in performance if great care is not taken.
These changes could explain that pgAdmin 4 is slow.

DB synchronization on Visual Studio 2015 is hanged

I tried to sync database on Visual Studio 2015 after creating a project, EDT, Enum and a Table in order to create a new screen on Dynamics 365.
When I tried to synchronize it, it was stopped in the middle during schema checking process. Though it seems that the DB synchronization doesn't have problem for the first few minutes, it always stops during this process as I describe below.
Log Details:
"Schema has not changed between new table 'DPT_TableDT' and old table
'DPT_TableDT' with table id '3997'. Returning from
ManagedSyncTableWorker.ExecuteModifyTable() Syncing Table Finished:
DPT_TableDT. Time elapsed: 0:00:00:00.0010010"
Could you tell me how to solve this issue?
Thanks in advance.
Full database synchronization log
DB Sync Log
From what you've described and also shown in your screenshot, this does not look like an error but is simply describing X++ and Dynamics AX/365FO behaviour.
When you say that it "doesn't have a problem for the first few minutes" I'm guessing you're just not being patient enough. Full database syncs should generally take between 10-30 minutes, but can take shorter or longer depending on a variety of factors such as how much horsepower your development environment has, how many changes are being sync'd etc. I would wait at least one hour before considering the possibility that the sync engine has errors (or even run it overnight and see what information it has for you in the morning).
The message you've posted from the log ("Schema has not changed") isn't an error message; it is just an informational log from the sync engine. It is simply letting you know that the table did not have any changes to propagate to SQL Server.
Solution: Run the sync overnight and post a screenshot of the results or the error list window in Visual Studio.
I've recently been stymied by a long running application where Access v2003 replicas refused to synchronize. The message returned was "not enough memory". This was on machines running Windows 10. The only way I was able to force synchronizing was to move the replicas onto an old machine still running Windows 98 with Office XP, which allowed synchronizing and conflict resolution. When I moved the synchronized files back to the Windows 10 machine they still would not synchronize.
I finally had to create a blank database and link to a replica, then use make-table queries to select only data fields to create new tables. I was then able to create new replicas that would synchronize.
From this I've come to suspect the following:
Something in Windows 10 has changed and caused the problem with synchronizing/conflict resolution.
Something in the hidden/protected fields added to the replica sets is seen as a problem under Windows 10 that is not a problem under Windows 98.
One thing I noticed is that over the years the number of replicas in the synchronizing list had grown to over 900 sets, but the only way to clear the table was to create a new clean database.

Mac w/PostgreSQL flush/empty cache for performance tuning

This question is going to be a bit specific because I have tried A LOT of things out there and none of it has worked for me. I'm hoping someone out there might have another idea.
I am working with PostgreSQL on a Mac (OS High Sierra) and I am trying to improve the performance for generating a materialized view, but can't compare my changes anymore because it seems PostgreSQL has cached the materialized view. It used to take ~12 minutes to generate the materialized view, and now it's taking less than 10 seconds (same code, I reverted the changes).
I used EXPLAIN (ANALYZE, BUFFERS) to confirm that almost all of the data getting fetched by the query to generate the materialized view is a hit (cached), and there were almost no disk reads.
I do not know if the information is cached in PostgreSQL's shared buffers or in the OS cache because at this point I've done things that I thought would have cleared both.
Here is what I have tried for emptying the PostgreSQL cache:
Restarted PostgreSQL server using brew services stop postgres, and then brew services start postgres (also tried calling sync && sudo purge in between). I confirmed with top as well as grep that postgres was no longer running.
Used DISCARD ALL, as well as DISCARD with its other options.
Set the shared_buffers setting in postgresql.conf to the minimum (128k).
Installed, compiled, and used pg_dropcache.
I looked at pg_ctl for a bit but I'll admit I couldn't figure out how to use it. I got the error no database directory specified and environment variable PGDATA unset, and I am not sure what to set the -D/pgdata option to for my case.
VACUUM. I know this shouldn't have had an effect, but I tried it anyway.
Here is what I have tried for emptying the operating system's cache:
Restarted computer.
Emptied ~/Library/Caches and /Library/Caches.
sync && sudo purge as well as sync && purge.
Booted up in Safe Mode.
I have also tried a few other things that I thought would force PostgreSQL to generate the materialized view from scratch (these would have been fine since I only need to test performance in dev for now):
Cloned the main table used in the materialized view, and generated the materialized view from the clone. It still generated within 10 seconds.
Scrambled some column values (first_name, last_name, mem_id (not the primary key)). It still generated within 10 seconds (and the materialized view was generated correctly with the newly scrambled values).
I am stuck and do not know what to try anymore. Any ideas/help would be appreciated!
Rebooting your computer clears both of the caches (unless you use something like autoprewarm from pg_prewarm, but that code has not be released yet). If the reboot doesn't cause the problem to reappear, then you have either fixed the problem permanently or didn't correctly understand it in the first place.
One possibility is that an ANALYZE (either manual, or auto) fixed some outdated statistics which was causing a poor plan to be used by the materialized view refresh. Another possibility is that a VACUUM means that now index-only scans no longer have to access the table pages, because they are marked as all-visible. If either of these is the case, and if you wanted to recreate the problem for some reason, you would have to restore the database to the state before VACUUM or ANALYZE was run.
EXPLAIN (ANALYZE, BUFFERS) only knows about shared_buffers. If something is a hit in the OS cache only, it will still be reported as a miss by EXPLAIN (ANALYZE, BUFFERS). If you freshly restarted PostgreSQL and the very first query run shows mostly buffer hits and only a few misses, that indicates your query is hitting the same buffers over and over again. This is common in index-only scans, for example, because for every row it consults one of just a handful of visibility map pages.

Is there a way to cap the file size of slony log shipping files?

I am working with a SuSE machine (cat /etc/issue: SUSE Linux Enterprise Server 11 SP1 (i586)) running Postgresql 8.1.3 and the Slony-I replication system (slon version 1.1.5). We have a working replication setup going between two databases on this server, which is generating log shipping files to be sent to the remote machines we are tasked to maintain. As of this morning, we ran into a problem with this.
For a while now, we've had strange memory problems on this machine - the oom-killer seems to be striking even when there is plenty of free memory left. That has set the stage for our current issue to occur - we ran a massive update on our system last night, while replication was turned off. Now, as things currently stand, we cannot replicate the changes out - slony is attempting to compile all the changes into a single massive log file, and after about half an hour or so of running, it trips over the oom-killer issue, which appears to restart the replication package. Since it is constantly trying to rebuild that same package, it never gets anywhere.
My first question is this: Is there a way to cap the size of Slony log shipping files, so that it writes out no more than 'X' bytes (or K, or Meg, etc.) and after going over that size, closes the current log shipping file and starts a new one? We've been able to hit about four megs in size before the oom-killer hits with fair regularity, so if I could cap it there, I could at least start generating the smaller files and hopefully eventually get through this.
My second question, I guess, is this: Does anyone have a better solution for this issue than the one I'm asking about? It's quite possible I'm getting tunnel vision looking at the problem, and all I really need is -a- solution, not necessarily -my- solution.

Core Data: Updating max pk failed

I have a cocoa app which uses core data. Everything seems to be working fine.
However, in a very specific scenario the app was behaving very strangely for our client.
In particular the logs shows this appearing in the output many times (which I've never seen in my testing):
Core Data: annotation: -executeRequest: encountered exception = Updating max pk failed: with userInfo = {
NSSQLiteErrorDomain = 14;
}
Has anyone ever seen this message and do you know what it means? I've tried googling it but found no information other than a few message boards regarding the Growl app having similar problems, with no solution yet available.
Sorry that I can't be more specific regarding what causes this as I'm not even sure myself. I know how to reproduce this on the client's machine but this message seems very random.
I was hoping someone could give me some more information as to what this error means exactly so that I can maybe narrow it down some more. Right now I'm pretty clueless.
Note: This appears on a macbook pro running 10.7.2 (if that matters).
Thanks for any kind of help you can provide, even something vague would help me at this point.
Update:
The managed context "save" method also fails with the following error:
The operation couldn’t be completed. (Cocoa error 134030.)
This is not really a Core Data problem as such, but more an issue of you process running out of file descriptors.
Each process has a limited number of file descriptors. If you run out, Core Data (and many other things) will stop working, because they can no longer open files -- any they'll fail.
First of all, make sure you're not leaking file descriptors, i.e. make sure you close files when you no longer need them.
I'm not sure what kind of changes you're trying to track. Take a look at Tracking File-System Changes.
If you're on 10.7, take a look at dispatch sources and DISPATCH_SOURCE_TYPE_VNODE for a very powerful tool to track file system changes (corresponds to kqueue, but is easier to use).
Core Data also gives this error in a Sandboxed app when it tries to save DB to a location where it doesn't have full read/write access to (if a user opens file for example, Core Data will be able to read/write this file, but not anything else to the same folder).
Core Data fails to write the temporary _journal file to this folder and reports this error.

Resources