Analytics server needing high application storage memory - elasticsearch

Our Application went live just a few months back. We have configured 2 mobile analytics server with 8gb of Ram space and 50GB of SAN space each. We have observed that Analytics server is utilizing a huge SAN space it's already 85% consumed on each server. Here are few more details how it is configured.
Number of Active Shards 24
Number of Nodes 2
Number of Data Nodes 2
MFP version : Server version: 7.1.0.00.20160801-2314
I have also noticed that document count is huge number almost 500K the memory it is taking is 28gb.
Is this the expectation or this is some sort of configuration issue. Is there any way to clean up and release some memory.

Elasticsearch (on which MobileFirst Operational Analytics is built) is very memory-intensive, and the memory usage is a function of how much data you have stored. 500K documents is not very much in the grand scheme of things, but the amount of SAN space and memory that uses depends on what is in the documents. You don't mention what version (and iFix level) of MobileFirst Platform Foundation you're using, and it's difficult to guide you without knowing that information. But, as a start, if you are collecting server logs in Operational Analytics, I'd recommend you stop doing that unless you truly need the server logs in Operational Analytics for some reason - in your application runtime, set the JNDI property "wl.analytics.logs.forward" to "false" (assuming you're using MobileFirst Platform Foundation 7.1). Then, in the Analytics Dashboard, set the TTL value for "TTL_ServerLogs" to a very small value, and check the box to apply the TTL value to existing documents (to do this, you must be running a more recent iFix level of MobileFirst Platform Foundation, as older builds didn't include this checkbox - again, assuming you're using 7.1). This should purge the existing server logs, which should free up some memory and SAN space. While you're in that panel, you may wish to set the other TTL values to a value appropriate for your environment, if you have not done so already.
If you're running something older than 7.1, or the build you're running doesn't have the checkbox to apply TTL values retroactively, the process to purge existing data is more complicated - in that case, please open a PMR and the support team can guide you.
If you can't purge data (e.g., if you have to keep collecting server logs, or save old data for a long time), you should add additional nodes to your Elasticsearch cluster to distribute the load over the additional nodes, so the resource utilization of each node will be less.

Related

How to increase resource allocation to ravendb

I'm trying to process a document and store many documents into ravendb which I have running locally.
I'm getting the error
Tried to send *ravendb.BatchCommand request via POST http://127.0.0.1:8080/databases/mydb/bulk_docs to all configured nodes in the topology, all of them seem to be down or not responding. I've tried to access the following nodes: http://127.0.0.1:8080
I was able to fetch mydb topology from http://127.0.0.1:8080.
Fetched topology: ( url: http://127.0.0.1:8080, clusterTag: A, serverRole: Member)
exit status 1
To me, it sounds like maybe my local cluster is running out of compute to process the large amount of data I'm trying to store.
RavenDB says I'm using 3 of 12 available cores, and I'd also like to make sure it's using a reasonable amount of the ram I have available on the machine (I'd even be happy with giving it a swap)
But reading around online, I'm not finding much helpful information for making sure RavenDB is able to use what it needs. I found the settings.json so I can add in configurations which theoretically should get included into the server but I'm not making much progress.
I also found some settings and changed "reassign cores" to 12 but it says that still 3/12 are being used and 6/31.1 GB of memory are being used.
If an alternative solution is recommended I'm all ears. I just need to run things locally and storing everything as json's doesn't enable fast enough retrieval for my usecase.
Update
I was able to install mongodb and setup a local database. It hasn't given me any problems yet. RavenDB looks appealing if I understood it better but I guess I'll stick with the tried and true for this project.
It is highly unlikely that you managed to run out of resources on the server with 3 cores / 6 GB unless you are pushing hundreds of millions of documents and doing very heavy work.
Do you get any error on the server? There should be more details on the error or in the server log.

Elasticsearch load is not distributed evenly

I am facing strange issue with Elasticsearch. I have 8 nodes with same configurations (16GB RAM and 8 core CPU).
One node "es53node6" has always high load as shown in the screenshot below. Also 5-6 nodes were getting stopped yesterday automatically after every 3-4 hours.
What could be the reason?
ES version : 5.3
there can be a fair share of reasons. Maybe all data is stored on that node (which should not happen by default), maybe you are sending all the requests to this single node.
Also, there is no automatic stopping of Elasticsearch built-in. You can configure Elasticsearch that it stops the JVM process when an out-of-memory exception occurs, but this is not enabled by default as it relies on a more recent JVM.
You can use the hot threads API to check where the CPU time is spent in Elasticsearch.

Marklogic latency : Document not found

I am working on a clustered marklogic environment where we have 10 Nodes. All nodes are shared E&D Nodes.
Problem that we are facing:
When a page is written in marklogic its takes some time (upto 3 secs) for all the nodes in the cluster to get updated & its during this time if I then do a read operation to fetch the previously written page, its not found.
Has anyone experienced this latency issue? and looked at eliminating it then please let me know.
Thanks
It's normal for a new document to only appear after the database transaction commits. But it is not normal for a commit to take 3-sec.
Which version of MarkLogic Server?
Which OS and version?
Can you describe the hardware configuration?
How large are these documents? All other things equal, update time should be proportional to document size.
Can you reproduce this with a standalone host? That should eliminate cluster-related network latency from the transaction, which might tell you something. Possibly your cluster network has problems, or possibly one or more of the hosts has problems.
If you can reproduce the problem with a standalone host, use system monitoring to see what that host is doing at the time. On linux I favor something like iostat -Mxz 5 and top, but other tools can also help. The problem could be disk I/O - though it would have to be really slow to result in 3-sec commits. Or it might be that your servers are low on RAM, so they are paging during the commit phase.
If you can't reproduce it with a standalone host, then I think you'll have to run similar system monitoring on all the hosts in the cluster. That's harder, but for 10 hosts it is just barely manageable.

Memory footprint for large systems in Vaadin

I'm working in financial sector and we are about to select Vaadin 7 for development of large heavy load system.
But I'm a bit worried about Vaadin memory footprint for large systems since Vaadin keeps all state in session. It means that for every new user all application state will be stored in memory, won't it?
We cannot aford to build monolithic system - system must be scalable and agile instead. Since we have huge client base it must be easy to customize and ready to grow.
Could anyone please share the experience and possible workarounds how to minimize or eliminate those problems in Vaadin?
During the development of our products we faced the problem of large memory footprint using the default Vaadin architecture.
The Vaadin architecture is based on components driven by events. Using components is fairly simple to create a tightly coupled application. The reason is that components are structured into a hierarchy. It's like a pyramid. The larger application is built; the larger pyramid is stored in the session for each user.
In order to significantly reduce the memory allocation we've created a page based approach for the application with a comprehensive event model on the background using the old school state management. It is based on the Statechart notation in XML format.
As the result the session keeps only visited pages during the user workflow, described by the Statechart configuration. When the user finishes the workflow, all the pages are released to be collected by garbage collector.
To see the difference we have done some tests to compare memory allocated for the user working with the application.
The applications developed:
with tightly coupled approach consume from 5 to 15MB of heap per user
with loose-coupled approach - up to 2 MB
We are quite happy with results since it let us scale the large system using 4GB RAM up to 1000-1500 concurrent users per server.
Almost forgot. We used Lexaden Web Flow library. It is with Apache license.
I think you should have a look here: https://vaadin.com/blog/-/blogs/vaadin-scalability-study-quicktickets
Plus, I have found the following info by people who run Vaadin in production.
Balázs Hódossy:
We have a back office system with more than 10 000 users. The daily
user number is about 3000 but half of them use the system 8 hours
without logout. We use Liferay 6.0.5 Tomcat bundle and Vaadin as
portlet. Our two servers have 48 GB RAM and we give Tomcat 24 GB heap.
DB got 18 GB and the system the rest. Measure the heap to the session
size, concurrent users, and the activity. More memory cause more
rarely but longer full GC. We plan to increase the number of Tomcat
workers and reduce the heap. When you measure your server, try to add
a little bit more memory. If the cost is so important than decrease
the processor cost and buy more RAM. Most of the time it is valuable
with a little tuning.
Pierre-Emmanuel Gros:
For 1000 dayly user heavyly used , a pure vaadin application: Server
3 gb 2 core Jetty with ulimit to 50000 Postgresql 9 with 50
concurent users ( a connection pool is used). As software part, I used also ehcache to cache DTO objects,and pure JDBC.

how does windows azure platform scale for my app?

Just a question about Azure.
Yes, I know roughly about Azure and cloud computing. I will put it in this way:
say, in normal way, I build a program listening to a TCP port. I run this server program in a server. I also build a client program, which connects to the server through specified port. Once a client is connected, my server program will compute some thing and return to the client.
Above is the normal model, or say my program's model.
Now I want to use Azure. I want to use because my clients are too many, let's say 1 million a day. I don't want to rent 1000 servers and maintain them. ( just a assumption for the number of clients)
I have looked at the Azure pricing plan. It say about CPU and talks about small, median, large instances.
I don't know what they mean. for e.g., in my above assumed case, how many instances do I need? or at most I can get from azure for extra large (8 small instances?)
How does Azure scale for my program? If I choose small instance (my server program is very little, just compute some data and return to clients), will Azure scale for me? or Azure just gives me one virture server and let it overload?
Please consider the CPU only, not storage or network traffic.
You choose two things: what size of VM to run (small, medium, large) and how many of those VMs to run. That means you could choose a small VM (single processor) and run 100 "instances" of it (100 VMs), or you could choose a large VM (eight processors on the same server) and run 10 instances of it (10 VMs).
Today, Windows Azure doesn't automatically adjust your scale, so it's up to you to use the web portal or the Service Management API to increase the number of instances as your need increases.
One factor to consider is if your app can take advantage of multi-core environments - multi-thread, shared memory, etc. to improve its scale. If it can, it may be better to use 5 2x core (i.e. medium) VMs than 10 1x core (small) VMs. You may find in some cases that 2 4x core VMs perform better than 5 2core.
If your app is not parallel/multi-core, then you could just do some 'x' number of small VMs. The charges are linear anyway - i.e. a 2core VM is twice the cost of a single core.
Other factors would include the scratch disk size & memory available in the VM.
One other suggestion - you may want to look into leveraging the Azure queues (i.e. have the client post to the queue and the workers pull from there). This would allow you to transparently (to the client) increase/decrease the workers w/out worrying about connections, etc. Also, if a processing step failed and crashed your instance the message would persist and be picked up by one of the others.
I suggest you also monitor, evaluate, and perfect the results of your Azure configuration.
For "Monitoring Applications in Windows Azure" (and performance) please reference
http://channel9.msdn.com/learn/courses/Azure/Deployment/DeployingApplicationsinWindowsAzure/Exercise-3-Monitoring-Applications-in-Windows-Azure/
There is also a good blog entry titled "Visualizing Windows Azure diagnostic data"
Check out http://www.paraleap.com - simple service for automatically adjusting number of instances that you have according to demand.

Resources