How many nodes/shards for elasticsearch, and how much RAM - elasticsearch

So I've really got two questions here.
If I have about 100GB of documents that I want to make searchable with elasticsearch, is it bad to just stick it in a single node / shard? ( I can figure out replicas later when we start looking at production)
Also, how much RAM do I need? Is it possible to run this ES instance on a machine with only 8gb ram or something (just during development) and just have it run slower, or do I need to need shell out now for a system with more memory?
My use case is that I am prototyping a system and need to get our full document set indexed so we can compare it apples to apples in usability testing against the existing system. Performance isn't huge right now. My dev machine is just a i7 ultrabook with 8gb of ram, and for the first, smaller version of the prototype that only had about 30mb of documents, my machine was just fine. Is it even possible for me to use this machine for dev with the next version of the prototype or do I need to shell out now for a more powerful machine?

Related

postgres windows efficient memory usage

I'm using Postgres 9.6 (64 bit) on Windows 10 on a laptop with 8 GB RAM for dev purposes. The application is batch mass data processing with the large table having 10 mio records.
I've read various Postgres tuning guide, and also previous questions/answers raised here, and I tried several of the suggestions, but without great success.
I know my laptop is not large but watching the performance monitor then, for a Query, I see Postgres mostly writing (to the disc), with a tiny bit of reading, and one of the cores mostly utilized. What I'm interested in is memory. I'm wondering why Postgres doesn't make use of it; it stay's at 5.7GB "used" but 8GB are available. My conclusion is that Postgres decides to write temp data to a file (memory mapped file), rather then to use the memory. If that is true, may be I can tune Windows and allow for more (files) pages in memory. Anyhow my gut feeling is this has something to do with Postres on Windows, rather then being a generic Postgres question.
Does anybody know how I can configure Postgres and/or Windows so that Postgres makes better use of the (free) memory available?
Thanks a lot for your help
Juergen
If it is a dedicated database machine, set shared_buffers to – say – 2GB and increase work_mem so that most sorts and hashes can be performed in memory.
For more specific questions, ask with more detail.

5.6 GB not enough for Cloudera?

I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.

Decrease VS201X build / load time

Here is my conundrum.
On the project I work on I need to switch branches a few times a day, when I do my machine grinds to a halt, it makes me really unproductive for 10 minutes while Visual Studio reloads the solution.
Compilation time is really slow as well this is 2-5 minutes of coffee making time because my machine is totally unusable during this.
Now my work machine is no beast but it's no desk clerk POS either. High spec i5 with 8GB of ram. HDD is possibly a cheap junker.
Our solution have roughly 11K of files and it's going to keep growing.
What can I do to speed things up?
I was thinking SSD possibly 4GB more RAM, setup a RAM drive?
Any suggestions welcome, if I do go the SSD route, any suggestions what goes on the SSD and what not.
Turn off anti-virus
more ram
SSD
In that order
An SSD will make a huge difference, as will more RAM, but in the meantime if you have a second HDD around, try installing that and put the source on it. In the days before SSDs we found a huge improvement in build times and machine usability when the source disk was separate to the OS. When it was all on the same disk the whole machine ground to a halt on big builds, but on separate disks the machine became usable again even while building.
An SSD is the way to go, disk read/write peformance for small files in random access pattern is most important to make Visual Studio build faster. So go shopping for an SSD but be careful to check some benchmarks to find a drive that has good performance for small files and random access read/writes.
I would recommend something like the Samsung 840 Pro or similar, Intel also has some drives with good performance.
Another thing about your solution is that the number of Projects has an impact on build performance. You should keep the number of output assemblies small and also make sure all the Projects are using one and the same output folder and then change all references between Projects to assembly references and make sure the copy local setting is set to false. This will improve build times dramatically as it eliminated lots of assembly copy operations.

SQL Server 2008 enterprise setup and virtual memory

Hi we have a server with 32 cores and 256GB RAM, we are using this with SQL Server 2008 Enterprise on Windows 2008 R2 Enterprise.
Currently windows has allocated automatically a swapfile of 256GB which seems excessive. Is it advisable to hard limit the swapfile to something smaller like 32GB to force it to use the physical RAM?
Is it the swap file or is it the hibernate file?
The answer depends upon the work the machine is expected to do. You might find that Windows doesn't touch the swap file much because you have adequate physical memory available. One approach would be to cut the swap file allocation in half, then use the inbuilt performance monitoring tools to make sure it is still running ok, then after a period of stable running look to half the swap allocation again.
But is it really a problem? With a machine like that you probably have a good chunk of hard drive space available, and i doubt that they would be slow old 5400rpm drives :)
An ideally setup OLTP SQL Server should never need to use the swap file. It depends what you are using this server for.
But unless you are short of disk space, I wouldn't worry too much. 32GB sounds a better size though.

How can I force SQL Server to use more CPU

I have an data transformation query which takes a long time to run on my development machine (Core i7 920 running at 3.9GHz, and with 12GB of RAM under Windows Server 2003 x86 and with 2 Velociraptors 300GB iN RAID0).
When I look at the task manager, the CPU stays around 26%, with the third (out of 4) core being the most active.
As this is not a production environment, is there any way to tell SQL Server 2008 that I am alright with it using more of my CPU or is it because my query can not be parallelized for some reason?
If, shouldn't SQL Server be smart enough to cut the query in smaller chunks and run it across several threads so each core can get it?
Thanks.
Optimize your query. Chances are that the issue is with it and not SQL Server.
It already knows that it's okay unless you specifically limited it to use only a certain number of CPUs either through configuration or through setting the MAXDOP parameter.
It sounds like you may be constrained by your hard drives or memory more than anything.
Note that because you are running an x86 version of windows (and by extension sql server), you may be RAM limited to around 3GB. And even with the PAE (physical addressing extensions) turned on, it's going to be a world of difference slower than if you have an x64 OS and SQL Server to begin with.
In other words, you might consider reinstalling the machine from the ground up to take advantage of all the x64 goodness you have.

Resources