Magento 2 going down for no reason - magento

I have a server with Magento 2 installed and the website goes down randomly for no reason whatsoever. When it goes down I restart the server and the website comes back up. I'm running the top command and there is no indication either on the resources being consumed while the website goes down
It is an instance on AWS with 8 core and 32 GB of ram using Provisioned IOPS SSD storage.
I'm completely dumbfounded by this. Already cleared all logs for Magento and database logs to clear up space because I was running low on space

Related

Unexplained memory usage on Azure Windows App Service Plan - Drill down missing

We have a memory problem with our Azure Windows App Service Plan (service level is P1v3 with 1 instance – this means 8 GB memory).
We are running two small .NET 6 App Services on it (some web APIs), that use custom containers – without problems.
They’re not in production and receive a very low number of requests.
However, when looking at the service plan’s memory usage in Diagnose and Solve Problems / Memory Analysis, we see an unexplained 80% memory percent usage – in a stable way:
And the real problem occurs when we try to start a third app service on the plan. We get this "out of memory" error in our log stream :
ERROR - Site: app-name-dev - Unable to start container.
Error message: Docker API responded with status code=InternalServerError,
response={"message":"hcsshim::CreateComputeSystem xxxx:
The paging file is too small for this operation to complete."}
So it looks like docker doesn’t have enough mem to start the container. Maybe because of the 80% mem usage ?
But our apps actually have very low memory needs. When running them locally on dev machines, we see about 50-150M memory usage (when no requests occur).
In Azure, the private bytes graph in “availability and performance” shows very moderate consumption for the biggest app of the two:
Unfortunately, the “Memory drill down” is unavailable:
(needless to say, waiting hours doesn’t change the message…)
Even more strange, stopping all App Services of the App Service Plan still show a Memory Percentage of 60% in the Plan.
Obviously some memory is being retained by something...
So the questions are:
Is it normal to have 60% memory percentage in an App Service Plan with no App Services running ?
If not, could this be due to a memory leak in our app ? But app services are ran in supposedly isolated containers, so I'm not sure this is possible. Any other explanation is of course welcome :-)
Why can’t we access the memory drill down ?
Any tips on the best way to fit "small" Docker containers with low memory usage in Azure App Service ? (or maybe in another Azure resource type...). It's a bit frustrating to be able to use ony 3GB out of a 8GB machine...
Further details:
First app is a .NET 6 based app, with its docker image based on aspnet:6.0-nanoserver-ltsc2022
Second app is also a .NET 6 based app, but has some windows DLL dependencies, and therefore is based on aspnet:6.0-windowsservercore-ltsc2022
Thanks in advance!
EDIT:
I added more details and changed the questions a bit since I was able to stop all app services tonight.

Magento Performance slow even with one user only on EC2 instance and RDS

I currently have a magento website, on a dedicated server.
Not really happy with TTFB (around 2.5s with the home page).
I have only 3000 visitors per day, 15 000 pages /day.
I have 30 000 products, 1 store, 1 language, 1 currency.
I thought I could make a try with EC2/RDS from amazon, so I copied my website on a new EC2 instance (web server) + 1 RDS for mysql for testing purpose.
I started with a small one (t2.small), same for RDS.
I setup nginx+php5-fpm, and imported my magento (files + DB)
I was very surprised, around 5s for TTFB!!
For 1 user (myself!). accessing only the home page, I never went somewhere else.
I have the same poor TTFB when I access a CMS page that only displays the newsletter form (+ header and footer), no products at all displayed.
I migrated to a better RDS (db.r3.large, 2CPU,15GB,110GB SSD), it was 5s.
Still a lot.
I upgraded the EC2 instance too (c3.2xlarge, 8CPU 15GB)
Now it is 3.5s, still more than my current dedicated server, with 1 user only.
I know there are options like making magento code better, but my point is more about why EC2 instance with only 1 user connected is performing less than my current prod server (which is 8CPU 8GB only, so half the size, and it has everything, even the DB while my EC2 config has 2 servers!)
My nginx vhosts have been configured the same than my prod server (but nginx.conf might differ).
Same with redis.
BTW on EC2 I noticed 0 difference with or without Redis (same TTFB), I assume Redis makes a difference when there are a lot of users, and a lot of files cached. (I'm sure Redis is working, because I can see KEYS added, and var/cache folder remains empty once enabled)
I haven't tryed to optimize mysql config, I'm assuming AWS/RDS has by default something good enough.
Out of monitoring I can read (last 1 hour):
RDS:
ReadIOPS peak: 1.5 - WriteIOPS peak at 1.15 - CPU: max is
0.80%
EC2 : CPU max: 1.5 %
everything is sleeping or I'm wondering if I miss something important, maybe I can't expect a better TTFB just because EC2 are shared (and not reserved)? AWS adds some latency somewhere because of that?
EDIT 1:
I just upgraded RDS to db.r3.2xlarge (8CPUs, 64GB), now it's 3.1s to display a home page (2.8s for the newsletter CMS page). Still more than my server...
EDIT 2:
I just upgraded RDS SSD to provisionned IOPS instead of general purpose. now it's 3.0s. So no enhancement...
Can you confirm the size of the servers are really huge for my website (1 user accessing the home page only!!) ?
EDIT 3:
Now I have 0.8s :-) Thanks to amazing AOE_Profile I found the bottleneck being Cmsmart_Megamenu. I have more than 100 categories and it does something strange. 127 queries per category!! so more than 10 000 queries (with every page)! The 127 queries /cat are almost identical, like this one is repeated 127 times:
SELECT main_table.* FROM admin_menutop AS main_table WHERE (category_id = '356')
Actually, it does NOT change anything to this topic, it is even more important now to get help. I was not looking for fixing the issue (that could be another topic). I still wonder why the EC2 performs less, actually I will keep this module in place on the EC2 server until I understand why it performs worse than my less-powerfull current prod server (both having this DB-CPU-consumming module).
Can it be because on my prod server I always have several visitors connected , so the mysql caching performs better? while on EC2 the cache is removed when all active connections are closed so it has to hit the DB?
this is the type of hints I'm looking for :-)
Thansk
I'm having the same issue wih admin_menutop table (I'm using Cmsmart_Megamenu too), NewRelic told me about that hundreds of queries per categories too.
You said it was the bottleneck, but didn't mention what was the fix :). Can you please share that with me?

High memory and CPU consumption for rails application on google cloud

I have a Compute engine on google cloud with 4 core CPU Ivy Brigde and 15 GB RAM and on that I have deployed my rails application.
Before this I had hosted my rails application on digital ocean and there I was getting good throughput and also the cpu and memory consumption was minimal.
It never crossed 3 GB memory consumption on Digital ocean and the CPU consumption max was around 50% - 55%.
On Digital Ocean I had a single instance with 4 core CPU and 8GB RAM and even I was running mysql,redis and sidekiq on the same instance and still it could handle the load easily.
But as I moved to google cloud I started facing the problems for the same code.
Actually I was expecting more throughput from the Google cloud as Google has data centers in Asia, but I started facing issue.
When I restart apache everything comes back to normal and again after 2 - 3 hours it goes on consuming memeory and CPU and finally instance stops responding to the requests anymore.
I checked the logs..... and there are no much increase in traffic, also I cheked logs during the load time to ensure whether someone is attacking the servers.
But all the request I found are from a valid browsers with valid user agents.
I don't understand why is this happening.
First I felt if it is a DDOS/DOS attack but din't find anything suspicious in the log (apache access logs and rails logs).
Please help me.
Hoping for some good solution that I can try and debug the issue.
Thanks :)

Coldfusion CFC creation timeouts

After searching to no end, as well as countless hours of trial and error with different settings, I've come up completely empty on why my server is performing so slowly.
Here are the basics. I've switched hosting from a local server (CF8 running on ubuntu) to a better equipped hosting company (CF10 running on Windows Server 2008). Both servers ran Xeon processors. My old linux server ran on 8GB ram. Windows is running on 9GB. Both are running 64-bit. The problem I am having is on a very simple task: initial CFC creation.
I have a custom created CMS, that runs 2 sets of CFC (application and session scoped) for the general public, only application scope CFC's are created, and when a user logs into the site, additional session scoped CFC's are created (anywhere from 8 - 16 depending on the number of modules the site contains).
On the linux box this worked great, fast with no issues. However, since switching to the Windows server and CF10, the creation process has become dreadful. When I go to log into a site, authentication is done, and the CFC's are created. When I first log into a site, this process can take anywhere from 15 - 50 seconds. When I log out, the session scope variables are all killed. If I was to log in a 2nd time, within a short period of time, my login time runs about 1 - 5 seconds depending on server load.
My initial thinking is that it's a memory allocation issue, but I'm running out of ideas. I have some of the following specs
JVM - 1.7.0_40
JVM Heap Size 1280
PermSize 256m
Simultaneous request limit 100
CFC request limit 60
cfthread pool size 50
trusted cache is currently off
I've set worker threads in IIS to have 5 per application. Each worker process runs at about 12,000k.
If anyone could help, it would be greatly appreciated.

AWS: EC2 micro, not enough for a .NET MVC 3 application?

I used elastic beanstalk to manage/deploy my .NET MVC 3 application on an EC2 micro instance (has 613 MB memory). It's mainly a static site for now as it is in Beta with registration (including email confirmation) and some error logging (ELMAH).
It was doing fine until recently, I keep getting notifications of CPU Utilization greater than 95.00%.
Is the micro instance with 613MB memory not enough to run an MVC application for Production use?
Added info: Windows Server 2008 R2, running IIS7.5
Thanks!
I've tried running Jetbrains teamcity (which uses Tomcat I think) and was on a linux box using an ec2 micro instance and there wasn't enough memory available to support what it needed.
I did try running a server 2008/2012 box on a micro instance as well and it was pointless took minutes to open anything.
I think you're going to find that running windows on one of those boxes isn't really a viable option unless you start disabling services like crazy and get really creative with you're tweaking.
A micro instance is clearly not enough for Production.
The micro instances have a low I/O limit, and once this limit is reached (for the month I think), all later operations are throttled.
So, I advise you to use at least a small instance for production. And keep your micro for your dev/test/preprod environments!
Edit: I got those info from an Amazon guy.
Make sure your load balancer is pinging a blank html file. I got that message because it was pinging my home page which had db loads. When I set it to ping a blank html file it ran smoothly

Resources