Spring Boot with spring-data-elastic connecting to Elastic Search 7.4.0 on AWS server - spring-boot

I have 2 questions:
Can I run spring-data-elastic v4.0.1.RELEASE (with org.elasticsearch:elasticsearch 7.6.2 ) with ES client running on 7.4.0??? If not, what combination can I use for 7.4.0 client? We are migrating to AWS and I need to use 7.4.0 version of client.
I have parent/child relationship (configured as join datatype field). Could pls somebody provide a documentation or explain, how to use either ElasticsearchRestTemplate or ElasticsearchOperations to correctly insert/update both parent and child records?
Thank you.
Best regards,
Robert

ad 1): from the Elasticsearch documentation I can't at the moment find anything in the breaking changes sections that would prevent using a 7.4.0 client library, but that does not mean there aren't any. But that does not mean that there aren't any. Recently there was a breaking change in the Java classes (from 7.7 to 7.8) and I got the information:
our compatability focus is on the HTTP APIs and we don’t offer any guarantees on the code itself. There’s more background here: https://github.com/elastic/elasticsearch/issues/22707#issuecomment-274163711
So I'd say, write a small test app and with the corresponding libraries, start a local ES 7.4 and test it.
ad 2): adding the join-type mapping ang implementing the corresponding inserts etc. is currently worked on and will hopefully be available in version 4.1.

Related

Superset Blank Dataset field

Currently in my company we using a couple of software for our Kpi,
Nifi
Elasticsearch 1.3.2
Superset 7.16.2
Since i have updated the elasticsearch database to 7.16.2 i have a problem in superset, i'm not able to see any index when i would like to add a new dataset the field is blank but when i watch the logs i can see the indexes. When i would like to sync the old dataset i have this message
I'm using the last version of elasticsearch-dbapi.
I'm not sure that the problem come from the elastic search upgrade but it's strange than superset ui not display our indexes.
Do you have already the same issue?
I'd recommend opening a detailed Github issue in the Apache Superset repo:
https://github.com/apache/superset/issues/new?assignees=&labels=%23bug&template=bug_report.md
Stack Overflow is better utilized for questions and answers!

How to bet notified when an Elastic Search Index has changed [duplicate]

I am using Elasticsearch, and I am building a client (using the Java Client API) to export logs indexed via Logstash.
I would like to be able to be notified (by adding a listener somewhere) when a new document is index (= a new log line have been added) instead of querying the last X documents.
Is it possible ?
This is what you're looking for: https://github.com/ForgeRock/es-change-feed-plugin
Using this plugin, you can register to a websocket channel to receive indexation/deletion events as they happen. It has some limitations, though.
Back in the days, it was possible to install river plugins to stream documents to ES. The river feature has been removed, but this plugin above is like a "reverse river", where outside clients are notified by ES as documents get indexed.
Very useful and seemingly up-to-date with ES 6.x
UPDATE (April 14th, 2019):
According to what was said at Elastic{ON} Zurich 2019, at some point in the 7.x series, there will be a Changes API that will provide index changes notifications (document creation, update, deletion and more).
UPDATE (July 22nd, 2022):
ES 8.x is out and the Changes API is still nowhere in sight ... Good to know, though, that's it's still open at least.

Is it a good idea to use serilog to write logs directly to the elasticsearch

I'm evaluating different options about the distributed log server.
In the Java world, as I can see, the most popular solution is filebeat + kafka + logstash + elasticsearch + kibana.
However, in .NET world, there's a serilog which can send structure logs directly to the elasticsearch. So the only required components are elasticsearch + kibana.
I searched a lot, but there's not much information about this solution in production. I've no idea whether it's enough to handle large volumes of logs.
Can anyone give me some suggestions? Thanks.
I had the same issue exactly. Our system worked with the "classic" elk-stack architecture i.e. FileBeat -> LogStash -> Elastic ( ->Kibana).
but as we found out in big projects with a lot of logs Serilog is much better solution for the following reasons:
CI\CD - when you have different types of logs with different structure which you want to have different types, Serilog power comes in handy. in LogStash you need to create a different filter to break down a message according to the pattern. which implies that there is big coupling in the log structure aspect and the LogStash aspect - very bug prone.
maintenance - Because of the easy CI\CD and the one point of change, it is easier to maintain a large amount of logs.
Scalability - FileBeat has a problem to handle big chunks of data because of the registry file which have a tend to "explode" - reference from personal experience stack overflow flow question ; elastic-forum question
Less failure points - with serilog the log send directly to elastic when with Filebeat you have to path through LogStash. one more place to fail.
Hope it helps you with your evaluation.
Update (Dec 2021):
The Elasticsearch logger provider has been moved to the Elastic ECS DotNet project.
Find the latest version here: https://github.com/elastic/ecs-dotnet/blob/master/src/Elasticsearch.Extensions.Logging/ReadMe.md
The nuget package is here: https://www.nuget.org/packages/Elasticsearch.Extensions.Logging/1.6.0-alpha1
It is still labelled an alpha release (although it has more functionality than the Essential's version), so currently (Dec 2021) you need to specify the version when adding the package:
dotnet add package Elasticsearch.Extensions.Logging --version 1.6.0-alpha1
Disclaimer: I am the author
ORIGINAL ANSWER
There is now also a stand alone logger provider that will write .NET Core logging direct to Elasticsearch, following the Elasticsearch Common Schema (ECS) field specifications, https://github.com/sgryphon/essential-logging/tree/master/src/Essential.LoggerProvider.Elasticsearch
To use this from your .NET Core application, add a reference to the Essential.LoggerProvider.Elasticsearch package:
dotnet add package Essential.LoggerProvider.Elasticsearch
Then, add the provider to the loggingBuilder during host construction, using the provided extension method.
using Essential.LoggerProvider;
// ...
.ConfigureLogging((hostContext, loggingBuilder) =>
{
loggingBuilder.AddElasticsearch();
})
The default configuration will write to a local Elasticsearch running at http://localhost:9200/.
Once you have sent some log events, open Kibana (e.g. http://localhost:5601/) and define an index pattern for "dotnet-*" with the time filter "#timestamp".
This reduces the dependencies even more, as rather than pull in the entire Serilog infrastructure (App -> Microsoft ILogger -> Serilog provider/adapter -> Elasticsearch sink -> Elasticsearch) you now only have (App -> Microsoft ILogger -> Elasticsearch provider -> Elasticsearch).
The ElasticsearchLoggerProvider also writes events following the Elasticsearch Common Schema (ECS) conventions, so is compatible with events logged from other sources, e.g. Beats.

Spring Data Couchbase - Search without having admin rights on the cluster

I'm currently working on a POC with Couchbase, using Spring Data to put & get documents on/off a bucket on a cluster.
As I'm working in a big company, I'm lucky they gave me a bucket, but still I don't have the admin rights on the cluster, so I only have access to the bucket.
But as I'm digging into the Spring Data documentation, I'm not able to find a way to retrieve documents without creating views on the server. (I'm getting errors like "Unknown query param" ). Nevertheless with couchbase java sdk i'm able to, through n1ql queries, but the use of the Spring data layer is mandatory.
The answers I found always point me to the server-side function direction
ex : https://stackoverflow.com/a/30928169/3744307
What I would like to find, is a way to add a repository method like
List findReceiptByAccount(String Account)
without having to specificly declare the function server-side.
Is this possible, or have I to send a request to the administrators to create functions for me everytime I have to add a findByX method?
Thanks for your time,
What version of CB is it ?
I think that prior to 4.5, a n1ql access (which you seems to have) is enough to build your index yourself !
With Spring Data Couchbase 2.x that would use a N1QL index in the background, and it would work with a single primary index (although having 1 index per repository entity class would be best for performance). Maybe you can ask your admin to create that index once?

Document Management/Content Management with Search

I have a requirement for a document management system to handle pdf,word,xls,ppt with semantic search.
I started looking into elasticsearch for the same and stumbled on Apache JacKrabbit and subsequently on OpenKM and Hippo. Even though core features like versioning exists in Jackrabbit, I need some pointers on how to go about this.
I need help navigating through the following concerns:
Should I just use elasticsearch and elasticsearch attachment plugin or use Jackrabbit with MySQL backend and use Elasticsearch to index the documents.
Or should I use OpenKM?
Any pointers would be greatly appreciated. This would finally require App integration.
Update Logically, using ElasticSearch for Search makes sense. But I figure that I cannot use that as primary datasource. What are the best options from storage(primary) Apache JackRabbit with MySQL? As all features are prebuilt in OpenKM, would this be a better option?.
What is it you want to achieve? Are you looking to manage making the documents available, is it about managing the content in documents? ES, or any search engine, is generally not a primary data source.
I can't give you any advice wrt OpenKM (neither for or against). Whether Hippo is a match depends on your case which I need to know more about.

Resources