Searching during Pentaho ElasticSearch update runs - elasticsearch

We are using ES to index ~1.5mil records from the database. To populate the index we are using Pentaho ES component which is set to “Overwrite if exists” (runs ~15 min). Also, individual indexed documents can be retrieved, updated or deleted via Java services.
The question is, what will ES return during full Pentaho update run? For example, we have 1.5mil indexed documents with version = 1. Next update will change this version to 2. If we request a document while Pentaho is updating it – will we receive the old version of it? Will service will be unavailable for that particular document? Also, if we receive an old version, will the new version be available immediately after update or will it wait till full batch is updated (pentaho component is sending rows in batches of 5k)?
Pentaho - 4.4
ElasticSearch - 0.19.4
Lucene - 3.6.0

You will receive the old version of a document if the new one isn't committed yet. The service will continue to be available.
The new versions will be available depending on the refresh_interval setting in elasticsearch. This defaults to every 1s.
It's possible that pentaho may twiddle the refresh_interval during the data load. If that's the case then you'll have to wait until pentaho calls the refresh method directly or until it resets the parameter.
You might simply start the run and then check the setting for the refresh_interval via:
curl -XGET "http://my-es-server:9200/my-index-name/_settings"

Related

How to bet notified when an Elastic Search Index has changed [duplicate]

I am using Elasticsearch, and I am building a client (using the Java Client API) to export logs indexed via Logstash.
I would like to be able to be notified (by adding a listener somewhere) when a new document is index (= a new log line have been added) instead of querying the last X documents.
Is it possible ?
This is what you're looking for: https://github.com/ForgeRock/es-change-feed-plugin
Using this plugin, you can register to a websocket channel to receive indexation/deletion events as they happen. It has some limitations, though.
Back in the days, it was possible to install river plugins to stream documents to ES. The river feature has been removed, but this plugin above is like a "reverse river", where outside clients are notified by ES as documents get indexed.
Very useful and seemingly up-to-date with ES 6.x
UPDATE (April 14th, 2019):
According to what was said at Elastic{ON} Zurich 2019, at some point in the 7.x series, there will be a Changes API that will provide index changes notifications (document creation, update, deletion and more).
UPDATE (July 22nd, 2022):
ES 8.x is out and the Changes API is still nowhere in sight ... Good to know, though, that's it's still open at least.

ELK - Removing old logs viewable in Kibana

I have managed to process log files using the ELK kit and I can now see my logs on Kibana.
I have scoured the internet and can't seem to find a way to remove all the old logs, viewable in Kibana, from months ago. (Well an explaination that I understand). I just want to clear my Kibana and start a fresh by loading new logs and them being the only ones displayed. Does anyone know how I would do that?
Note: Even if I remove all the Index Patterns (in Management section), the processed logs are still there.
Context: I have been looking at using ELK to analyse testing logs in my work. For that reason, I am using ElasticSearch, Kibana and Logstatsh v5.4, and I am unable to download a newer version due to company restrictions.
Any help would be much appreciated!
Kibana screenshot displaying logs
Update:
I've typed "GET /_cat/indices/*?v&s=index" into the Dev Tools>Console and got a list of indices.
I initially used the "DELETE" function, and it didn't appear to be working. However, after restarting everything, it worked the seond time and I was able to remove all the existing indices which subsiquently removed all logs being displayed in Kibana.
SUCCESS!
Kibana is just the visualization part of the elastic stack, your data is stored in elasticsearch, to get rid of it you need to delete your index.
The 5.4 version is very old and already passed the EOL date, it does not have any UI to delete the index, you will need to use the elasticsearch REST API to delete it.
You can do it from kibana, just click in Dev Tools, first you will need to list your index using the cat indices endpoint.
GET "/_cat/indices?v&s=index&pretty"
After that you will need to use the delete api endpoint to delete your index.
DELETE /name-of-your-index
On the newer versions you can do it using the Index Management UI, you should try to talk with your company to get the new version.

Integration of elasticsearch with neo4j database

Am trying to use elasticsearch with my neo4j database for fast querying.I tried many sites but they are all old articles so i didn't get any clear idea. Steps I followed until now,
Installed neo4j
Installed elasticsearch
Copy pasted elastic search plugins into neo4j plugins folder
added this line into neo4j. properties file
elasticsearch.host_name=http://localhost:9200
elasticsearch.index_spec=people:Person(first_name,last_name), places:Place(name)
Here my question is,
How elasticsearch and neo4j are integrated. Please clarify me on this.
I followed this ,
Link
You have to install Apoc procedures plugin (https://github.com/neo4j-contrib/neo4j-apoc-procedures). The documentation about ES integration is here : ES Integration with Apoc procedures
[edit]
download and drop apoc.jar in plugins's Neo4j directory, regarding the targetted Neo4j version
restart Neo4j
in Neo4j Web browser, launch the following Cypher query to show all ES procedures:
CALL apoc.help("apoc.es")
Sample query for logs:
CALL apoc.es.getRaw("localhost","_search?q=level:ERROR",null)
YIELD value
UNWIND value.hits.hits as hits
RETURN hits LIMIT 100
The recommanded way is to store the ES host in neo4j.conf by adding a key (after restart of Neo4j):
apoc.es.myKey.url=localhost
Then the query looks like:
CALL apoc.es.getRaw("myKey","_search?q=level:ERROR",null)
YIELD value
UNWIND value.hits.hits as hits
RETURN hits LIMIT 100
For those of you who already have APOC plugin installed and accessible, but don't have access to the neo4j.properties file (or are more comfortable working with ES through curl) you can do this without using apoc.es.getRaw and can instead use the JSON returned with apoc.load.json:
WITH "http://myelasticurl:9200/my_index/_search?q=level:ERROR" as search_url
CALL apoc.load.json(search_url) YIELD value
UNWIND value.hits.hits as hit
WITH hit._source as source
...
# do work
...

Elastic search next steps

I'm new to elasticsearch and am still trying to set it up. I have installed elasticsearch 5.5.1 using default values I have also installed Kibana 5.5.1 using the default values. I've also installed the ingest-attachment plugin with the latest x-pack plugin. I have elasticsearch running as a service and I have Kibana open in my browser. On the Kibana dashboardI have an error stating that it is unable to fetch mappings. I guess this is because I havn't set up any indices or pipelines yet. This is where I need some steer, all the documentation I've found so far on-line isn't particularly clear. I have a directory with a mixture of document types such as pdf and doc files. My ultimate goal is to be able to search these documents with values that a user will enter via an app. I'm guessing I need to use the Dev Tools/console window in Kibana using the 'PUT' command to create a pipeline next, but I'm unsure of how I should do this so that it points to my directory with the documents. Can anybody provide me an example of this for this version please.
If I understand you correctly, let's first set some basic understanding about elasticsearch:
Elasticsearch in it's simple definition is a "Search engine". so you need to store some data, and then elastic will help you to search using a search criteria, and it will retrieve relevant data back
You need a "Container" to save your data to, and elastic has this thing like any database engine to store your data, but the terms are somehow different. for example a "Database" in sql-like systems is called "Index", and what you know as "table" is called "Type" in elastic.
from my understanding, you will need to create your index (with or without mappings) to have a starting point, and I recommend you to start without mappings just to "start" and get things working, but later on it's highly recommend to work with "mappings" if applicable, because elastic is smart, but it cannot know more about your data than you do
Because Kibana has failed to find a proper index to start with, it has complained and asked you to either provide a syntax for index names, or a specific index name so it can infer the inline mappings and give you the nice features of querying, displaying charts, etc of your data, so once you create your index, you will provide that to the starting page of Kibana, and you will be ready to go.
Let me know if you need something more specific to your needs :)

Quality profile weirdness (active/inactive rules) after sonarqube upgrade 6.3.1

I have upgraded sonarqube server from 6.2 to 6.3.1 and since then I see a weird behaviour regarding the quality profile (it might have occurred before, it is only now I see it).
When I click on the Quality Profile SonarWay (Java) I see
so it seems, that all rules are inactive.
When I click Activate More I see the following
so it looks, that there are rules are active (I assume due to the "Deactivate" option").
But when switching in the left bar to "active" under Quality Profile results in this
so clearly, no rules are active.
What is the second image then showing, what does the "Deactivate" mean, although it is inactive ?
How could this happen that suddenly all rules seem to be inactivated ?
This specific behaviour is a common symptom of a corrupted Elastic Search index (no longer in sync with SonarQube database).
Solution
Rebuild the SonarQube ElasticSearch index:
stop your SonarQube server
delete the ElasticSearch index # sonar_install_dir/data/es
start your SonarQube server
(reminder: ElasticSearch is a search engine used by SonarQube to index issues, rules etc. so that it can access this data rapidly without having to query the database all the time, see SonarQube Architecture)
Root-cause
Why did that happen ? A common case is an ElasticSearch index not being properly rebuilt after upgrading and/or changing database. Here's a typical scenario: you first start SonarQube on embedded H2 database, experiment a bit with it, then plug it to a full-fledged database. If the ElasticSearch index does not get scratched/rebuilt in between, then the index gets corrupted as the database/dataset it used to be in synch with just changed all of the sudden.
FYI there's an improvement planned to handle this more gracefully: SONAR-5681 .
Note: independently from above solution, do not take ElasticSearch index rebuild as a lightweight operation that should be performed regularly. SonarQube does self-manage its ElasticSearch index, so any issue must be investigated first.

Resources