How fast is the go 1.5 gc with terabytes of RAM?

How fast is the go 1.5 gc with terabytes of RAM? - go

Java cannot use terabytes of RAM because the GC pause is way too long (minutes). With the recent update to the Go GC, I'm wondering if its GC pauses are short enough for use with huge amounts of RAM, such as a couple of terabytes.
Are there any benchmarks of this yet? Can we use a garbage-collected language with this much RAM now?

tl;dr:
You can't use TBs of RAM with a single Go process right now. Max is 512 GB on Linux, and most that I've seen tested is 240 GB.
With the current background GC, GC workload tends to be more important than GC pauses.
You can understand GC workload as pointers * allocation rate / spare RAM. Of apps using tons of RAM, only those with few pointers or little allocation will have a low GC workload.
I agree with inf's comment that huge heaps are worth asking other folks about (or testing). JimB notes that Go heaps have a hard limit of 512 GB right now, and 18 240 GB is the most I've seen tested.
Some things we know about huge heaps, from the design document and the GopherCon 2015 slides:
The 1.5 collector doesn't aim to cut GC work, just cut pauses by working in the background.
Your code is paused while the GC scans pointers on the stack and in globals.
The 1.5 GC has a short pause on a GC benchmark with a roughly 18GB heap, as shown by the rightmost yellow dot along the bottom of this graph from the GopherCon talk:
Folks running a couple production apps that initially had about 300ms pauses reported drops to ~4ms and ~20ms. Another app reported their 95th percentile GC time went from 279ms to ~10ms.
Go 1.6 added polish and pushed some of the remaining work to the background. As a result, tests with heaps up to a bit over 200GB still saw a max pause time of 20ms, as shown in a slide in an early 2016 State of Go talk:
The same application that had 20ms pause times under 1.5 had 3-4ms pauses under 1.6, with about an 8GB heap and 150M allocations/minute.
Twitch, who use Go for their chat service, reported that by Go 1.7 pause times had been reduced to 1ms with lots of running goroutines.
1.8 took stack scanning out of the stop-the-world phase, bringing most pauses well under 1ms, even on large heaps. Early numbers look good. Occasionally applications still have code patterns that make a goroutine hard to pause, effectively lengthening the pause for all other threads, but generally it's fair to say the GC's background work is now usually much more important than GC pauses.
Some general observations on garbage collection, not specific to Go:
The frequency of collections depends on how quickly you use up the RAM you're willing to give to the process.
The amount of work each collection does depends in part on how many pointers are in use.
(That includes the pointers within slices, interface values, strings, etc.)
Rephrased, an application accessing lots of memory might still not have a GC problem if it only has a few pointers (e.g., it handles relatively few large []byte buffers), and collections happen less often if the allocation rate is low (e.g., because you applied sync.Pool to reuse memory wherever you were chewing through RAM most quickly).
So if you're looking at something involving heaps of hundreds of GB that's not naturally GC-friendly, I'd suggest you consider any of
writing in C or such
moving the bulky data out of the object graph. For example, you could manage data in an embedded DB like bolt, put it in an outside DB service, or use something like groupcache or memcache if you want more of a cache than a DB
running a set of smaller-heap'd processes instead of one big one
just carefully prototyping, testing, and optimizing to avoid memory issues.

The new Java ZGC garbage collector can now use 16 Terrabytes of memory and garbage collect in under 10ms.

Related

Why Go can lower GC pauses to sub 1ms and JVM has not?

So there's that: https://groups.google.com/forum/?fromgroups#!topic/golang-dev/Ab1sFeoZg_8:
Today I submitted changes to the garbage collector that make typical worst-case stop-the-world times less than 100 microseconds. This should particularly improve pauses for applications with many active goroutines, which could previously inflate pause times significantly.
High GC pauses are one of the things JVM users struggle with for a long time.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to Go levels, but are not affecting Go?

2021 Update: With OpenJDK 16 ZGC now has a max pause time of <1ms and average pause times 50µs
It achieves these goals while still performing compaction, unlike Go's collector.
Update: With OpenJDK 17 Shenandoah exploits the same techniques introduced by ZGC and achieves similar results.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to golang levels
There aren't any fundamental ones as low-pause GCs have existed for a while (see below). So this may be more a difference of impressions either from historic experience or out-of-the-box configuration rather than what is possible.
High GC pauses are one if the things JVM users struggle with for a long time.
A little googling shows that similar solutions are available for java too
Azul offers a pauseless collector that scales even to 100GB+
Redhat is contributing shenandoah to openjdk and oracle zgc.
IBM offers metronome, also aiming for microsecond pause times
various other realtime JVMs
The other collectors in openjdk are, unlike Go's, compacting generational collectors. That is to avoid fragmentation problems and to provide higher throughput on server-class machines with large heaps by enabling bump pointer allocation and reducing the CPU time spent in GC. And at least under good conditions CMS can achieve single-digit millisecond pauses, despite being paired with a moving young-generation collector.
Go's collector is non-generational, non-compacting and requires write barriers (see this other SO question), which results in lower throughput/more CPU overhead for collections, higher memory footprint (due to fragmentation and needing more headroom) and less cache-efficient placement of objects on the heap (non-compact memory layout).
So GoGC is mostly optimized for pause time while staying relatively simple (by GC standards) at the expense of several other performance and scalability goals.
JVM GCs make different tradeoffs. The older ones often focused on throughput. The more recent ones achieve low pause times and several other goals at the expense of higher complexity.

According to this presentation, Getting to Go: The Journey of Go's Garbage Collector, the Go collectors only utilize half of the heap for live data:
Heap 2X live heap
My impression is that Java GCs generally aim for higher heap utilization, so they make a very different trade-off here.

How can too much Virtual Memory negatively impact performance? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Does anyone have experience with using very large heaps, 12 GB or higher in Java?
Does the GC make the program unusable?
What GC params do you use?
Which JVM, Sun or BEA would be better suited for this?
Which platform, Linux or Windows, performs better under such conditions?
In the case of Windows is there any performance difference to be had between 64 bit Vista and XP under such high memory loads?

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.
However, when you need to keep GC pauses low, things get really nasty:
Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.
The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.
When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.
Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:
Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.
Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

I am CEO of Azul Systems so I am obviously biased in my opinion on this topic! :) That being said...
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about It presentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.

We have an application that we allocate 12-16 Gb for but it really only reaches 8-10 during normal operation. We use the Sun JVM (tried IBMs and it was a bit of a disaster but that just might have been ignorance on our part...I have friends that swear by it--that work at IBM). As long as you give your app breathing room, the JVM can handle large heap sizes with not too much GC. Plenty of 'extra' memory is key.
Linux is almost always more stable than Windows and when it is not stable it is a hell of a lot easier to figure out why. Solaris is rock solid as well and you get DTrace too :)
With these kind of loads, why on earth would you be using Vista or XP? You are just asking for trouble.
We don't do anything fancy with the GC params. We do set the minimum allocation to be equal to the maximum so it is not constantly trying to resize but that is it.

I have used over 60 GB heap sizes on two different applications under Linux and Solaris respectively using 64-bit versions (obviously) of the Sun 1.6 JVM.
I never encountered garbage collection problems with the Linux-based application except when pushing up near the heap size limit. To avoid the thrashing problems inherent to that scenario (too much time spent doing garbage collection), I simply optimized memory usage throughout the program so that peak usage was about 5-10% below a 64 GB heap size limit.
With a different application running under Solaris, however, I encountered significant garbage-collection problems which made it necessary to do a lot of tweaking. This consisted primarily of three steps:
Enabling/forcing use of the parallel garbage collector via the -XX:+UseParallelGC -XX:+UseParallelOldGC JVM options, as well as controlling the number of GC threads used via the -XX:ParallelGCThreads option. See "Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning" for more details.
Extensive and seemingly ridiculous setting of local variables to "null" after they are no longer needed. Most of these were variables that should have been eligible for garbage collection after going out of scope, and they were not memory leak situations since the references were not copied. However, this "hand-holding" strategy to aid garbage collection was inexplicably necessary for some reason for this application under the Solaris platform in question.
Selective use of the System.gc() method call in key code sections after extensive periods of temporary object allocation. I'm aware of the standard caveats against using these calls, and the argument that they should normally be unnecessary, but I found them to be critical in taming garbage collection when running this memory-intensive application.
The three above steps made it feasible to keep this application contained and running productively at around 60 GB heap usage instead of growing out of control up into the 128 GB heap size limit that was in place. The parallel garbage collector in particular was very helpful since major garbage-collection cycles are expensive when there are a lot of objects, i.e., the time required for major garbage collection is a function of the number of objects in the heap.
I cannot comment on other platform-specific issues at this scale, nor have I used non-Sun (Oracle) JVMs.

12Gb should be no problem with a decent JVM implementation such as Sun's Hotspot.
I would advice you to use the Concurrent Mark and Sweep colllector ( -XX:+UseConcMarkSweepGC) when using a SUN VM.Otherwies you may face long "stop the world" phases, were all threads are stopped during a GC.
The OS should not make a big difference for the GC performance.
You will need of course a 64 bit OS and a machine with enough physical RAM.

I recommend also considering taking a heap dump and see where memory usage can be improved in your app and analyzing the dump in something such as Eclipse's MAT . There are a few articles on the MAT page on getting started in looking for memory leaks. You can use jmap to obtain the dump with something such as ...
jmap -heap:format=b pid

As mentioned above, if you have a non-interactive program, the default (compacting) garbage collector (GC) should work well. If you have an interactive program, and you (1) don't allocate memory faster than the GC can keep up, and (2) don't create temporary objects (or collections of objects) that are too big (relative to the total maximum JVM memory) for the GC to work around, then CMS is for you.
You run into trouble if you have an interactive program where the GC doesn't have enough breathing room. That's true regardless of how much memory you have, but the more memory you have, the worse it gets. That's because when you get too low on memory, CMS will run out of memory, whereas the compacting GCs (including G1) will pause everything until all the memory has been checked for garbage. This stop-the-world pause gets bigger the more memory you have. Trust me, you don't want your servlets to pause for over a minute. I wrote a detailed StackOverflow answer about these pauses in G1.
Since then, my company has switched to Azul Zing. It still can't handle the case where your app really needs more memory than you've got, but up until that very moment it runs like a dream.
But, of course, Zing isn't free and its special sauce is patented. If you have far more time than money, try rewriting your app to use a cluster of JVMs.
On the horizon, Oracle is working on a high-performance GC for multi-gigabyte heaps. However, as of today that's not an option.

If you switch to 64-bit you will use more memory. Pointers become 8 bytes instead of 4. If you are creating lots of objects this can be noticeable seeing as every object is a reference (pointer).
I have recently allocated 15GB of memory in Java using the Sun 1.6 JVM with no problems. Though it is all only allocated once. Not much more memory is allocated or released after the initial amount. This was on a Linux but I imagine the Sun JVM will work just as well on 64-bit Windows.

You should try running visualgc against your app. It´s a heap visualization tool that´s part of the jvmstat download at http://java.sun.com/performance/jvmstat/
It is a lot easier than reading GC logs.
It quickly helps you understand how the parts (generations) of the heap are working. While your total heap may be 10GB, the various parts of the heap will be much smaller. GCs in the Eden portion of the heap are relatively cheap, while full GCs in the old generation are expensive. Sizing your heap so that that the Eden is large and the old generation is hardly ever touched is a good strategy. This may result in a very large overall heap, but what the heck, if the JVM never touches the page, it´s just a virtual page, and doesn´t have to take up RAM.

A couple of years ago, I compared JRockit and the Sun JVM for a 12G heap. JRockit won, and Linux hugepages support made our test run 20% faster. YMMV as our test was very processor/memory intensive and was primarily single-threaded.

here's an article on gc FROM one of Java Champions --
http://kirk.blog-city.com/is_your_concurrent_collector_failing_you.htm
Kirk, the author writes
"Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return. "

An article from Sun on Java 6 can help you: https://www.oracle.com/java/technologies/javase/troubleshooting-javase.html

The max memory that XP can address is 4 gig(here). So you may not want to use XP for that(use a 64 bit os).

sun has had an itanium 64-bit jvm for a while although itanium is not a popular destination. The solaris and linux 64-bit JVMs should be what you should be after.
Some questions
1) is your application stable ?
2) have you already tested the app in a 32 bit JVM ?
3) is it OK to run multiple JVMs on the same box ?
I would expect the 64-bit OS from windows to get stable in about a year or so but until then, solaris/linux might be better bet.

How to gain control of a 5GB heap in Haskell?

Currently I'm experimenting with a little Haskell web-server written in Snap that loads and makes available to the client a lot of data. And I have a very, very hard time gaining control over the server process. At random moments the process uses a lot of CPU for seconds to minutes and becomes irresponsive to client requests. Sometimes memory usage spikes (and sometimes drops) hundreds of megabytes within seconds.
Hopefully someone has more experience with long running Haskell processes that use lots of memory and can give me some pointers to make the thing more stable. I've been debugging the thing for days now and I'm starting to get a bit desperate here.
A little overview of my setup:
On server startup I read about 5 gigabytes of data into a big (nested) Data.Map-alike structure in memory. The nested map is value strict and all values inside the map are of datatypes with all their field made strict as well. I've put a lot of time in ensuring no unevaluated thunks are left. The import (depending on my system load) takes around 5-30 minutes. The strange thing is the fluctuation in consecutive runs is way bigger than I would expect, but that's a different problem.
The big data structure lives inside a 'TVar' that is shared by all client threads spawned by the Snap server. Clients can request arbitrary parts of the data using a small query language. The amount of data request usually is small (upto 300kb or so) and only touches a small part of the data structure. All read-only request are done using a 'readTVarIO', so they don't require any STM transactions.
The server is started with the following flags: +RTS -N -I0 -qg -qb. This starts the server in multi-threaded mode, disable idle-time and parallel GC. This seems to speed up the process a lot.
The server mostly runs without any problem. However, every now and then a client request times out and the CPU spikes to 100% (or even over 100%) and keeps doing this for a long while. Meanwhile the server does not respond to request anymore.
There are few reasons I can think of that might cause the CPU usage:
The request just takes a lot of time because there is a lot of work to be done. This is somewhat unlikely because sometimes it happens for requests that have proven to be very fast in previous runs (with fast I mean 20-80ms or so).
There are still some unevaluated thunks that need to be computed before the data can be processed and sent to the client. This is also unlikely, with the same reason as the previous point.
Somehow garbage collection kicks in and start scanning my entire 5GB heap. I can imagine this can take up a lot of time.
The problem is that I have no clue how to figure out what is going on exactly and what to do about this. Because the import process takes such a long time profiling results don't show me anything useful. There seems to be no way to conditionally turn on and off the profiler from within code.
I personally suspect the GC is the problem here. I'm using GHC7 which seems to have a lot of options to tweak how GC works.
What GC settings do you recommend when using large heaps with generally very stable data?

Large memory usage and occasional CPU spikes is almost certainly the GC kicking in. You can see if this is indeed the case by using RTS options like -B, which causes GHC to beep whenever there is a major collection, -t which will tell you statistics after the fact (in particular, see if the GC times are really long) or -Dg, which turns on debugging info for GC calls (though you need to compile with -debug).
There are several things you can do to alleviate this problem:
On the initial import of the data, GHC is wasting a lot of time growing the heap. You can tell it to grab all of the memory you need at once by specifying a large -H.
A large heap with stable data will get promoted to an old generation. If you increase the number of generations with -G, you may be able to get the stable data to be in the oldest, very rarely GC'd generation, whereas you have the more traditional young and old heaps above it.
Depending the on the memory usage of the rest of the application, you can use -F to tweak how much GHC will let the old generation grow before collecting it again. You may be able to tweak this parameter to make this un-garbage collected.
If there are no writes, and you have a well-defined interface, it may be worthwhile making this memory un-managed by GHC (use the C FFI) so that there is no chance of a super-GC ever.
These are all speculation, so please test with your particular application.

I had a very similar issue with a 1.5GB heap of nested Maps. With the idle GC on by default I would get 3-4 secs of freeze on every GC, and with the idle GC off (+RTS -I0), I would get 17 secs of freeze after a few hundred queries, causing a client time-out.
My "solution" was first to increase the client time-out and asking that people tolerate that while 98% of queries were about 500ms, about 2% of the queries would be dead slow. However, wanting a better solution, I ended up running two load-balanced servers and taking them offline from the cluster for performGC every 200 queries, then back in action.
Adding insult to injury, this was a rewrite of an original Python program, which never had such problems. In fairness, we did get about 40% performance increase, dead-easy parallelization and a more stable codebase. But this pesky GC problem...

Java heap bottleneck - how to identify the cause?

I have a J2EE project running on JBoss, with a maximum heap size of 2048m, which is giving strange results under load testing. I've benchmarked the heap and cpu usage and received the following results (series 1 is heap usage, series 2 is cpu usage):
It seems as if the heap is being used properly and getting garbage collected properly around A. When it gets to B however, there appears to be some kind of a bottleneck as there is heap space available, but it never breaks that imaginary line. At the same time, at C, the cpu usage drops dramatically. During this period we also receive an "OutOfMemoryError (GC overhead limit exceeded)," which does not make much sense to me as there is heap space available.
My guess is that there is some kind of bottleneck, but what exactly I can't even imagine. How would you suggest going about finding the cause of the issue? I've profiled the memory usage and noticed that there are quite a few instances of the one class (around a million), but the total size of these instances is fairly small (around 50MB if I remember correctly).
Edit: The server is dedicated to to this application and the CPU usage given is only for the JVM (there should not be any significant CPU usage outside of the JVM). The memory usage is only for the heap, it does not include the permgen space. This problem is reproducible. My main concern is surrounding the limit encountered around B, for which I have not found a plausible explanation yet.
Conclusion: Turns out this was caused by a bunch of long running SQL queries being called concurrently. The returned ResultSets were also very large, possibly explaining the OOME. I still have no reasonable explanation for why there appears to be some limit at B.

From the error message it appears that the JVM is using the parallel scavenger algorithm for garbage collection. The message is dumped along with an OOME error when a lot of time is spent on GC, but not a lot of the heap is recovered.
The document from Sun does not specify if the 98% of the total time consumed is to be read as 98% of the CPU utilization of the process or that of the CPU itself. In either case, I have to draw the following inferences (with limited information):
The garbage collector or the JVM process does not have enough CPU utilization, most likely due to other processes consuming CPU at the same time.
The garbage collector does not have enough CPU utilization since it is a low priority thread, and another memory intensive (but not CPU intensive) thread in the JVM is doing work at the same time, which results in the failure to de-allocate memory.
Based on the above inferences (all, one or none of them could be true), it would be worthwhile to correlate the graph that you're obtained with the runtime behavior of the application as far as users are concerned. In other words, you might find it useful to determine if other processes are kicked off (when your problem occurs), or the part of the application that is in operation (again, when the problem occurs).
In any case, the page referenced above, does give an option to disable the GC overhead limit used by the GC algorithm.
EDIT: If the problem occurs periodically, and can be reproduced, it might turn out to be a memory leak, otherwise (i.e. it occurs sporadically), you are better off tuning the GC algorithm or even changing it.

If I want to know where the "bottlenecks" are, I just get a few stackshots. There's no need to wonder and guess and play detective. They will just tell you.
Usually memory problems and performance problems go hand in hand, so if you fix the performance problems, you will also fix the memory problems (not for certain, though).

Is it reasonable for modern applications to consume large amounts of memory?

Applications like Microsoft Outlook and the Eclipse IDE consume RAM, as much as 200MB. Is it OK for a modern application to consume that much memory, given that few years back we had only 256MB of RAM? Also, why this is happening? Are we taking the resources for granted?

Is it acceptable when most people have 1 or 2 gigabytes of RAM on their PCS?
Think of this - although your 200mb is small and nothing to worry about given a 2Gb limit, everyone else also has apps that take masses of RAM. Add them together and you find that the 2Gb I have very quickly gets all used up. End result - your app appears slow, resource hungry and takes a long time to startup.
I think people will start to rebel against resource-hungry applications unless they get 'value for ram'. you can see this starting to happen on servers, as virtualised systems gain popularity - people are complaining about resource requirements and corresponding server costs.
As a real-world example, I used to code with VC6 on my old 512Mb 1.7GHz machine, and things were fine - I could open 4 or 5 copies along with Outlook, Word and a web browser and my machine was responsive.
Today I have a dual-processor 2.8Ghz server box with 3Gb RAM, but I cannot realistically run more than 2 copies of Visual Studio 2008, they both take ages to start up (as all that RAM still has to be copied in and set up, along with all the other startup costs we now have), and even Word take ages to load a document.
So if you can reduce memory usage you should. Don't think that you can just use whatever bloated framework/library/practice you want with impunity.

http://en.wikipedia.org/wiki/Moore%27s_law
also:
http://en.wikipedia.org/wiki/Wirth%27s_law

There's a couple of things you need to think about.
1/ Do you have 256M now? I wouldn't think so - my smallest memory machine is 2G so a 200M application is not much of a problem.
2a/ That 200M you talk about might not be "real" memory. It may just be address space in which case it might not all be in physical memory at once. Some bits may only be pulled in to physical memory when you choose to do esoteric things.
2b/ It may also be shared between other processes (such as a DLL). This means it could be only held in physical memory as one copy but be present in the address space of many processes. That way, the usage is amortized over those many processes. Both 2a and 2b depend on where your figure of 200M actually came from (which I don't know and, running Linux, I'm unlikel to find out without you telling me :-).
3/ Even if it is physical memory, modern operating systems aren't like the old DOS or Windows 3.1 - they have virtual memory where bits of applications can be paged out (data) or thrown away completely (code, since it can always reload from the executable). Virtual memory gives you the ability to use far more memory than your actual physical memory.

Many modern apps will take advantage of the existance of more memory to cache more. Some like firefox and SQL server have explicit settings for how much memory they will use. In my opinion, it's foolish to not use available memory - what's the point of having 2GB of RAM if your apps all sit around at 10MB leaving 90% of your physical memory unused. Of course, if your app does use caching like this, it better be good at releasing that memory if page file thrashing starts, or allow the user to limit the cache size manually.
You can see the advantage of this by running a decent-sized query against SQL server. The first time you run the query, it may take 10 seconds. But when you run that exact query again, it takes less than a second - why? The query plan was only compiled the first time and cached for use later. The database pages that needed to be read were only loaded from disk the first time - the second time, they were still cached in RAM. If done right, the more memory you use for caching (until you run into paging) the faster you can re-access data. You'll see the same thing in large documents (e.g. in Word and Acrobat) - when you scroll to new areas of a document, things are slow, but once it's been rendered and cached, things speed up. If you don't have enough memory, that cache starts to get overwritten and going to the old parts of the document gets slow again.

If you can make good use of the RAM, it is your responsability to use it.

Yes, it is perfectly normal. Also something big was changed since 256MB were normal... and do not forget that before that 640Kb were supposed to be enough for everybody!
Now most software solutions are build with a garbage collector: C#, Java, Ruby, Python... everybody love them because certainly development can be faster, however there is one glitch.
The same program can be memory leak free with either manual or automatic memory deallocation. However in the second case it is likely for the memory consumption to grow. Why? In the first case memory is deallocated and kept clean immediately after something becomes useless (garbage). However it takes time and computing power to detect that automatically, hence most collectors (except for reference counting) wait for garbage to accumulate in order to make worth the cost of the exploration. The more you wait the more garbage you can sweep with the cost of one blow, but more memory is needed to accumulate that garbage. If you try to force the collector constantly, your program would spend more time exploring memory than working on your problems.
You can be completely sure than as long as programmers get more resources, they will sacrifice them using heavier tools in exchange for more freedom, abstraction and faster development.

A few years ago 256 MB was the norm for a PC, then Outlook consumed about 30 - 35 MB or so of memory, that's around 10% of the available memory, Now PC's have 2 GB or more as a norm, and outlook consumes 200 MB of memory, that's about 10% also.
The 1st conclusion: as more memory is available applications use more of it.
The 2nd conclusion: no matter what time frame you pick there are applications that are true memory hogs (like Outlook) and applications that are very efficient memory wise.
The 3rd conclusion: memory consumption of a app can't go down with time, else 640K would have been enough even today.

It completely depends on the application.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio