Elasticsearch: How to add "created_at" and "updated_at" timestamps? - elasticsearch

I have a database where I should store created_at and updated_at fields for each document.
The created_at field should be created once on first document insert.
The updated_at field should be created on first document insert and should be updated via Bulk API on each update, even if none of the document fields are changed.
The question is: how to add those timestamps?

I believe that Elasticsearch used to have a feature to add these automatically, but it was removed in later versions to improve performance. You would have to add these fields to your mapping and then implement a process to set those fields. One way to do this is with an ingest pipeline, which someone explains in the ES forum. You will want to check the docs for how to implement pipelines with your particular version of Elasticsearch.
Suggestion: You should always check the forums for Elasticsearch questions. The community seems to be more active on there, and devs will also often respond to questions.

Related

Map multiple values to a unique column in Elasticsearch

I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields

ElasticSearch Updating Document

General help question, but I wanted to ask a clarifying question on how updating documents in ES works.
When adding a document request to the elasticsearch indexing, do we have to include all fields for that document or just the ones I want to update?
If there already exists a document with the same document id, would our new document request override all data in that document or just update the fields listed in this document request? In other words, do I need to supply all the fields in this document request or just the ones I want to update? Thanks!
The docs:
The update API also supports passing a partial document, which is
merged into the existing document.

Is it possible to query an AppSync GraphQL type by timestamps?

I want to query all instances of a model by the most recently created.
Reading the official docs, they suggest a way of querying by the default timestamps (updatedAt/createdAt) but only when also querying by another key. So I know I could query a hypothetical User model by name and createdAt, but I can't query all instances of User by createdAt.
Is there an established way of doing this?
I have tried adding a #key directive to sort by updatedAt, but that results in an error because updatedAt is automatically added and not described in my schema. If I then add the timestamps to my schema this creates problems when mutating clients because it expects the timestamps to be added by me, which I obviously don't do because it's automatically added by DynamoDB.
Thanks
You could try using a Global Secondary Index on the field you want to query. In your AppSync resolver, you need to specify the index you want to use for the query.
Another way would be to run a scan operation against your DB (you don't need to specify a key in this case), although that would be way more inefficient than a GSI.

What's the different between index and update document in elasticsearch?

As we know when we update an existed document the Elasticsearch engine will reindex the document and mark the previous document deleted. But for the restful API, it's same. So I guess the ElasticSearch will analysis the document whether exist by the unique document ID and then update or index.
So my question is, we don't need to care the index or update functionality, because both restful API and Java Client are PUT the same endpoint, Am I right?
The most difference for PUT and POST document in Elasticsearch:
POST will create a new document with a new unique ID.
PUT will update the current document without change ID.
so if your ID is important to you like for some context, you should use PUT to update a document to keep this ID.

Possible to filter audit log by recently changed field

Morning-
I have an entity with auditing enabled, and would like to be able to search that entity based on whether or not a field value has been changed in the past week.
In the audit summary view in Dynamics online, it does not appear that I can achieve that. Is there any other way I can get the functionality short of coding?
You can't do this exactly, but you have a number of options to approximate this:
1) Only turn on auditing for that particular field
2) Filter for Update Event and you can filter down to that particular entity
3) Set up a workflow that runs on the update of that field only - then look for those records where that workflow is a related record

Resources