elasticsearch index getting reset

elasticsearch index getting reset - elasticsearch

I have a single node elasticsearch instance ( 0.90 version) running on a single machine ( 8GB RAM, dual core CPU) having RHEL 5.6
After having indexed close to 2 million documents, it runs fine for a few hours and then restarts on its own, wiping out the index in the process. I now need to reindex all the documents again.
Any ideas on why this happens? Maximum file descriptors is set to 32k and the number of open file descriptors at any time does not even come close. So it cant be that.
Here are the modifications i made to the default elasticsearch.yml file :
index.number_of_shards: 5
index.cache.field.type: soft
index.fielddata.cache: soft
index.cache.field.expire: 5m
indices.fielddata.cache.size: 10%
indices.fielddata.cache.expire : 5m
index.store.type: mmapfs
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
action.disable_delete_all_indices: true
script.disable_dynamic: true
I use the elasticsearch service wrapper to start and stop the instance. In the elasticsearch.conf file, i have set the heap size to 2GB :
set.default.ES_HEAP_SIZE=2048
Any help in diagnosing the problem will be appreciated.
Thanks guys!

Related

Loading Elasticsearch via Logstash on large dataset runs very slowly

I have a large dataset in MySql (around 2.2 million rows) and my importing to Elasticsearch via Logstash works, but now is going incredibly slowly.
On my local machine in vagrant instances with 4GB RAM, each, it went relatively quickly (took 3 days) compared to taking an estimate 80+ days for a server-to-server transfer.
The query is quite complex (using a subquery, etc).
I switched the mysql server from using the the /tmp directory to using the /data/tmp_mysqldirectory, but even then I was occasionally running out of temporary space. when I switched to the /data/tmp_mysql directory to hold the /tmp files instead of the /tmp directory.
e.g: I was getting the error:
message=>"Exception when executing JDBC query,
exception Sequel::DatabaseError: Java::JavaSql::SQLException
Error writing file '/data/tmp_mysql/MYHPf8X5' (Errcode: 28)
I updated my query to have this limit (200):
UPDATE p_results set computed_at="0000-00-00 00:00:00" WHERE computed_at IS NULL LIMIT 200;
My configuration file looks like this: (notice that I'm using paging with a page size of 10000).
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/xxx_production"
jdbc_user => "xxx"
jdbc_password => "xxx"
jdbc_driver_library => "/usr/share/java/mysql.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement_filepath => "./sql/req.sql"
jdbc_paging_enabled => "true"
jdbc_page_size => 10000
}
}
output {
elasticsearch {
index => "xxx_resultats_preprod2"
document_type => "resultats"
hosts => ["localhost:9200"]
codec => "plain"
template => "./resultats_template.json"
template_name => "xxx_resultats"
template_overwrite => true
document_id => "%{result_id}"
}
}
I've looked at some of the documentation here
running free -m on my logstash/elasticsearch server, I see this:
total used free shared buffers cached
Mem: 3951 2507 1444 0 148 724
-/+ buffers/cache: 1634 2316
Swap: 4093 173 3920
So total RAM= 4GB, and 2.5GB or 63.4% of it is used. So Ram on the Elasticsearch server doesn't seem to be the issue.
running free -m on my MySql server I see this:
total used free shared buffers cached
Mem: 3951 3836 115 0 2 1154
-/+ cache: 2679 1271
swap: 4093 813 3280
So total Ram = 4GB and ~3,8GB or 97% is used. This looks like a problem.
My theories are that I'm occasionally swapping to disk and that is part of the reason why it's slow. Or maybe I'm using BOTH paging and a limit and that's slowing things down?
The load average on the Mysql server is relatively low right now.
top
load average: 1,00, 1,00, 1,00
under /data I see:
sudo du -h -d 1
13G ./tmp_mysql
4,5G ./production
using df-h I see:
total used utilization%
/dev/sdb1 32G 6,2G 24G 21% /data
If someone can help me make my queries execute much faster I'd very much appreciate it!
Edit:
Thank you all for your helpful feedback. It turns out my logstash import had crashed (due to running out of /tmp space in Mysql for the subquery), and I assumed that I could just keep running the same import job. Well, I could run it, and it loaded into elastic, but very, very slowly. When I re-implemented the loading of the index entirely and started running it on a new index, the load time became pretty-much on par with what it was in the past. I estimate it will take 55 hours to load the data - which is a long time, but at least it's working reasonably now.
I did an EXPLAIN on my Mysql subquery and found some indexing issues I could address/improve, too.

You indicate 2 potential problems here:
slow mysql read
slow elasticsearch write
You must eliminate one! Try to output on stdout to see if elasticsearch is the bottleneck or not.
If yes, you can play with some ES settings to improve ingestion:
refresh_interval => -1 (disable refresh)
remove replica when doing the import (number_of_replicas:0)
Use more shards and more nodes
(more at https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html)

ES / JVM Memory Locking in Unpriv. Linux Container (LXD/LXC)

I've seen a good bit about docker setups and the like using unpriv containers running ES. Basically, I wan't to set up a simple "prod cluster". Have a total of two nodes, one physical (for data), and one for Injest/Master (LXD Container).
The issue that I've run into is using bootstrap.memory_lock: true as a config option to lock memory (avoid swapping) on my container master/injest node.
[2018-02-07T23:28:51,623][WARN ][o.e.b.JNANatives ] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
[2018-02-07T23:28:51,624][WARN ][o.e.b.JNANatives ] This can result in part of the JVM being swapped out.
[2018-02-07T23:28:51,625][WARN ][o.e.b.JNANatives ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2018-02-07T23:28:51,625][WARN ][o.e.b.JNANatives ] These can be adjusted by modifying /etc/security/limits.conf, for example:
# allow user 'elasticsearch' mlockall
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
...
[1]: memory locking requested for elasticsearch process but memory is not locked
Now, this makes sense given that the ES user can't adjust ulimits on the host. Given that I know enough about this to be dangerous, is there a way/how do I ensure that my unpriv container, can lock the memory it needs, given that there is no ES user on the host?

I'll just call this resolved - set swapoff on parent, and leave that setting to default in container. Not what I would call "the right way" as asked in my question, but good/close enough.

Cannot increase work_mem above 1GB using PostgreSQL 9.3 on Windows Server

I would like to tweak the postgres config for use on a Windows server. Here is my current posgresql.conf file: http://pastebin.com/KpSi2zSd
I would like to increase work_mem and maintenance_work_mem, but if I raise the values above 1GB I get this error when starting the service:
Nothing is added to the log files (at least not in data\pg_log). How can I figure out what is causing the issue (increase logging)? Could the have anything to do with issues management between windows and postgres?
Here are my server specs:
Windows Server 2012 R2 Datacenter (64 bit)
Intel CPU E5-2670 v2 # 2.50 GHz
512 GB RAM
PostgreSQL 9.3

Under Windows the value for work_mem is limited to 2GB (even on a 64bit system) - there is no workaround as far as I know.
I don't know why you couldn't set it to 1GB though. Maybe the sum of work_mem and maintenance_work_mem has another limit I am not aware of.
Setting work_mem that high by default is usually not a good idea. With 512GB RAM and just 10 users this might work, but keep in mind that the amount of work_mem is requested by a statement for every sort, group or hash operation in a single query. So you could have a statement requesting this amount of memory 15 or 20 times.
You don't need to change this in postgresql.conf - this can be changed dynamically if you know that the following query will benefit from a large work_mem, by running:
set session work_mem='2097151';
If you use a higher number, you'll get an error message telling you the limit:
ERROR: 2097152 is outside the valid range for parameter "work_mem" (64 .. 2097151)
Even if Postgres isn't using all the memory, it still benefits from it. Postgres (unlike e.g. Oracle) relies heavily on the filesystem cache rather than doing all the caching itself. Values for shared_buffers beyond roughly 8GB rarely show any benefit.
What you do need to tell Postgres is how much memory the operating system usually uses for caching, by setting effective_cache_size to the appropriate value. Postgres doesn't use that for caching, but it influences the planner's choice to e.g. prefer an index scan over a seq scan if the index is likely to be in the file system cache.
You can see the current size of the file system cache in the Windows task manager (or e.g. ProcessExplorer)

as described above, in windows it is more beneficial to rely on the OS cache.
If you use RAMMAP from sysinternals (Microsoft) you can see exactly what is being used by postgres in the OS cache, and hence how much is actually cached to it.

Neo4J tuning or just more RAM?

I have a Neo4J-enterprise database running on a DigitalOcean VPS with 8Gb RAM and 80Gb SSD.
The performance of the Neo4J instance is awful at the moment:
match (n) where n.gram='0gram' AND n.word=~'a.' return n.word LIMIT 5 # 349ms
match (n) where n.gram='0gram' AND n.word=~'a.*' return n.word LIMIT 25 # 1588ms
I understand regex are expensive, but on likewise queries where I replace the 'a.' or 'a.*' part with any other letter, Neo4j simply crashes. I can see a huge build-up in memory before that (towards 90%), and the CPU sky-rocketing.
My Neo4j is populated as follows:
Number Of Relationship Type Ids In Use: 1,
Number Of Node Ids In Use: 172412046,
Number Of Relationship Ids In Use: 172219328,
Number Of Property Ids In Use: 344453742
The VPS only runs Neo4J (on debian 7/amd64). I use the NUMA+parallelGC flags as they're supposed to be faster. I've been tweaking my RAM settings, and although it doesn't crash at often now, I have a feeling there should be some gainings to be made
neostore.nodestore.db.mapped_memory=1024M
neostore.relationshipstore.db.mapped_memory=2048M
neostore.propertystore.db.mapped_memory=6144M
neostore.propertystore.db.strings.mapped_memory=512M
neostore.propertystore.db.arrays.mapped_memory=512M
# caching
cache_type=hpc
node_cache_array_fraction=7
relationship_cache_array_fraction=5
# node_cache_size=3G
# relationship_cache_size=1G --> these throw a not-enough-heap-mem error
The data is essentially a series of tree, where on node0 only a full text search is needed, the following nodes are searched by a property with floating point values.
node0 -REL-> node0.1 -REL-> node0.1.1 ... node0.1.1.1.1
\
-REL-> node0.2 -REL-> node0.2.1 ... node0.2.1.1
There are aprox. 5.000 top-nodes like node0.
Should I reconfigure my memory/cache usage, or should I just add more RAM?
--- Edit on Indexes ---
Because all tree's of nodes al always 4-levels deep, each level has a label for quick finding.in this case all node0 nodes have a label (called 0gram). the n.gram='0gram' should use the index coupled to the label.
--- Edit on new Config ---
I upgraded the VPS to 16Gb. The nodeStore has 2.3Gb (11%), PropertyStore 13.8Gb (64%) and the relastionshipStore amounts to 5.6Gb (26%) on the SSD.
On this basis I created a new config (detailed above).
I'm waiting for the full set of queries and will do some additional testing in the mean time

Yes you need to create an index, what's your label called? Imagine it being called :NGram
create index on :NGram(gram);
match (n:NGram) where n.gram='0gram' AND n.word=~'a.' return n.word LIMIT 5
match (n:NGram) where n.gram='0gram' AND n.word=~'a.*' return n.word LIMIT 25
What you're doing is not a graph search but just a lookup via full scan + property comparison with a regexp. Not a very efficient operation. What you need is FullTextSearch (which is not supported with the new schema indexes but still with the legacy indexes).
Could you run this query (after you created the index) and say how many nodes it returns?
match (n:NGram) where n.gram='0gram' return count(*)
which is the equivalent to
match (n:NGram {gram:'0gram'}) return count(*)
I wrote a blog post about it a few days ago, please read it and see if it applies to your case.
How big is your Neo4j database on disk?
What is the configured heap size? (in neo4j-wrapper.conf?)
As you can see you use more RAM than you machine has (not even counting OS or filesystem caches).
So you would have to reduce the mmio sizes, e.g. to 500M for nodes 2G for rels and 1G for properties.
Look at your store-file sizes and set mmio accordingly.

Depending on the number of nodes having n.gram='0gram' you might benefit a lot from setting a label on them and index for the gram property. If you have this in place a index lookup will directly return all 0gram nodes and apply regex matching only on those. Your current statement will load each and every node from the db and inspect its properties.

JVM tuning for better Solr performance

Now we are using Solr1.4 in Master/Slave mode and want to improve the performance for Slave query.
The biggest issue for us is the index file is about 30G.
The Slave server config as below:
Dell PC Server: 48G memory and 2 CPU;
RedHat 64 Linux;
JDK64 1.6.0_22;
Tomcat 6.18.
Our current JAVA_OPTS is "–Xms2048M –Xmx20480 –server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=20 -XX:SurvivorRatio=2"
Do you have more suggestion for JAVA_OPTS?

The JAVA_OPTS seem fine. quite a few questions :-
Is you max for 20GB ram peaking out ? can you check the memory stats as to whats the max utilized ?
Is there any heavy processing happening on Slave ? CPU stats ?
How are the queries ??? are you using highlighting ?
Whats the number of results you are returing for single query ?
what do your cache stats say ? are they utilized properly ?
Is your index optimized ??
do you use warming queries to improve performance on the slow running queries ?
If the above seems fine, can you consider enabling the http caching ?

use the following opts
-XX:+UseCompressedOops
(This will help in reducing the heap size)
-XX:+DoEscapeAnalysis

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio