Elasticsearch dsl python, natural key for document? - elasticsearch

I have a document which looks like
{
date_at: '2020-10-01',
foo_id: 3,
value: 5
}
When date_at and foo_id are defined, document is uniquely defined.
So I'd like to do something like
MyDocument.update_or_create(date_at=date_at, foo_id=foo_id, {value: some_value})
If a document with given date_at and foo_id exists, update the document, otherwise create the document.

In order to update or create a document (what ES calls "upsert"), you need to go through the update API and that API requires a document ID.
Selecting a document with a specific date_at and foo_id would be the job of the update by query API but that API doesn't support "upserting" (i.e. create or update).
So, if your documents are uniquely defined by date_at and foo_id, I'd suggest giving them IDs that contain those two values, like for instance 2020-10-01:3. Doing so would allow you to leverage the update API like this:
POST your-index/_update/2020-10-01:3
{
"doc": {
"value": "some_value",
"date_at": "2020-10-01",
"foo_id": 3
},
"doc_as_upsert": true
}

An alternative approach would be creating daily indices and using foo_id as document id. Then upserting would be as simple as:
PUT your-index-2020-10-01/_doc/3
{
"value": "some_value",
"date_at": "2020-10-01",
"foo_id": 3
}
foo_id would be always unique within the index.

Related

Query does not return some items from DynamoDB via GraphQL

May I please know what is the reason why are items in DynamoDB not being fetched by GraphQL?
When searching via the DynamoDB console interface, I could easily see and query the item in there but once used in GraphQL, some items are not showing. Mind you, this isn't a connection problem because I could query items its just there's a specific item that is not being returned.
For example, if I query all Posts, it will return all posts in an array but the item is not showing there. However, when I try to query a Post just by targetting it by its ID, it is working well.
Example code that is not working:
listPosts(filter: {groupID: {eq: "25"}}) {
items {
id
content
}
}
but when I do this, it is working well:
getPost(id: "c59ce7e9") {
id
content
}
I had this same issue and can share what i found and worked for me.
The default resolver for the list operation has a limit:20 built in.
{
"version": "2017-02-28",
"operation": "Scan",
"filter": #if($context.args.filter) $util.transform.toDynamoDBFilterExpression($ctx.args.filter) #else null #end,
"limit": $util.defaultIfNull($ctx.args.limit, 20),
"nextToken": $util.toJson($util.defaultIfNullOrEmpty($ctx.args.nextToken, null)),
}
I imagine you could change this or you could add a limit filter to your query like this:
listPosts(filter: {groupID: {eq: "25"}}, limit:100) {
items {
id
content
}
}
The limit should be higher than the number of records.
You can see that this would be an issue because it is using the scan operation meaning it inspects each record for a match. this would hurt performance. you could add pagination or you should craft a query for this. you will need to look into pagination, relations and connection.
https://docs.aws.amazon.com/appsync/latest/devguide/designing-your-schema.html#advanced-relations-and-pagination

Issues with Quickbase API call

I am using JSON Quickbase API documentation below:
Quickbase API
I am trying to update records with Quickbase via recordId as per below, and it's working fine:
{
"to": "my-table-id-goes-here",
"data": [
{
"6": {
"value": "nancy more is the value to be updated"
},
"3": {
"value": "recordId_to_be_used_to_make_updates"
}
}
]
}
My issue: I want to update where email and userid is equal to certain value.
Eg. in normal SQL queries something like "update mytable_name set name ='nancy more' where email='nancy#gmail.com' and userid=70".
Is it possible with Quickbase? Is there a way to achieve that based on the code above, assuming email field is 7 and userid field is 8 or whatever?
The end result is possible but not through a single API call. The insert/update records API call for Quick Base only updates records when the key field is included in the record payload (the key field is the record ID by default but can be changed to another field in the table). If you don't already know the value of the key field, you'll need to query for the matching records first and then use the returned record ID/key field to perform that update.
For example, you could query for records where email is "nancy#gmail.com" and userid is 70:
POST https://api.quickbase.com/v1/records/query
QB-Realm-Hostname: host
Authorization: QB-USER-TOKEN userToken
Content-Type: application/json
{
"from": "tableId",
"where": "{7.EX.'nancy#gmail.com'}AND{8.EX.70}"
}
You can then use the id's of the returned set of records to perform your update. How you go about reading the response and making the upsert request will depend on the language you're using.

Filtering collapsed results in Elasticsearch

I have an elasticsearch index containing documents that represent entities at a given point in time. When an entity changes state, a new document is created with a timestamp. When I need to get the current state of all entities, I can do the following:
GET https://127.0.0.1:9200/myindex/_search
{
"collapse": {
"field": "entity_id"
},
"sort" : [{
"timestamp": {
"order": "desc"
}
}]
}
However, I would like to further filter the result of the collapse. When entities are deleted I create a new document that includes an is_deleted flag along with the timestamp in a nested metadata field. I would like to extend the above query to entirely filter out those entities that have been deleted. Using a term filter on entity_metadata.is_deleted: true obviously does not work, because then my result just includes the last document with that entity_id before it got marked as deleted. How can I filter my results after the collapse is done to exclude any tombstoned entites?
What I would suggest is that instead of adding an is_deleted flag to all entity_id documents, you could add a date_deleted field with the date of the deletion to all documents of that entity, and then when you view a document, given its date and the deleted_date you'd know if the document was LIVE or deleted at that date.
In addition, it would allow you to consider:
all documents that don't have a deleted_date field (i.e. not deleted) and
all documents that have a deleted_date before/after a given date.

Filter documents based on value of an attribute inside an array of objects

RethinkDB newb here and I can't figure this one out.
Lets say I have a table named mydata with documents that have the following basic structure:
{
"SomeAttirbute": "SomeValue",
"team": [
{
"name": "john" ,
"other": "stuff",
} ,
{
"name": "jane" ,
"other": "junk",
}
] ,
...
}
How do I get all documents in the mydata table that have john for a value of the name attribute for any of the elements in the team array?
This is pretty easy and requires a simple ReQL expression. In JavaScript it would be something like this:
const name = 'john';
...
r.db('q50732045')
.table('mydata')
// The predicate below can be literally read as:
// a document whose `team` property is a sequence
// that contains any element with a property `name`
// that equals the given name
.filter(doc => doc('team').contains(member => member('name').eq(name)))
// No need to invoke the run method in Data Explorer
;
I do believe it can be easily re-written in Python.
I think this is what you are looking for:
r.db(insert_database_name).table("mydata").filter(
lambda doc: doc["team"]["name"].contains("john")
).run(con)
or:
r.db(insert_database_name).table("mydata").filter(
r.row["team"]["name"].contains("john")
).run(con)

Elasticsearch: document relationship

I'm doing a elastic search autocomplete-as-you-type
I'm using cool features like ngram's and other stuff to create needed analyzer.
currently I break my had around indexing following data.
Let say I have Payments type,
each document in this type looks like this
{
..elastic meta data..
paymentId: 123453425342,
providerAccount : {
id: 123456
firstName: Alex,
lastName: Web
},
consumerAccount : {
id: 7575757,
firstName: John,
lastName: Doe
},
amount: 556,
date : 342523454235345 (some unix timestamp)
}
so basically this document represents not only the payment itself but it also shows the relationship of the payment, the 2 entities which related to the payment.
Payment always have its provider and consumer.
I need this data in payment document because I want to show it in UI.
By indexing it like so, it might be a big pain for handling the updates of Consumer or Provider because each time some of them change its properties I have to update all the payments which has this entity.
Another possible solution is to store only id's of this consumers/providers and make a query on payments and then 2 queries for the entities for retrieving needed fields, but i'm not sure about this because i'm doing ajax requests each time a character entered, so here comes the performance question.
I have also looked into parent/child relationship solution which basically fits my case but I wasn't able to figure out if I can retrieve also the parent(consumer/provider) fields while I querying child(payment).
What would you suggest?
Thanks!
Yes, you can retrieve the parent while querying child using has_child.
Considering payment as child and consumer as parent, You can search all the consumers by :
GET /index_name/consumer/_search
{
"query": {
"has_child": {
"type": "payment",
"query": {
// any query on payment table
},
"inner_hits": {}
}
}
}
This would fetch you all the consumer based on the query on child i.e payment in your case.
inner_hits is what you are looking for. This will retrieve you the children as well. But it was introduced in elasticsearch 1.5.0. So version should be greater than elasticsearch 1.5.0.
You can refer https://www.elastic.co/blog/elasticsearch-1-5-0-released.
Your problem is not an issue. I suppose you want tot freeze data after the pay, right? So you don't need to update the accounts data in existing payment documents.
Further: parent/schild is easy for updating, but less efficient with querying. For auto complete, stay using your current mapping!

Resources