how to create a join relation using elasticsearch python client - elasticsearch

I am looking for any examples that implement the parent-child relationship using the python interface.
I can define a mapping such as
es.indices.create(
index= "docpage",
body= {
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"my_document": "my_page"
}
}
}
}
}
)
I am then indexing a document using
res = es.index(index="docpage",doc_type="_doc",id = 1, body=jsonDict) ,
where jsonDict is a dict structure of document's text,
jsonDict['my_join_field']= 'my_document', and other relevant info.
Reference example.
I tried adding pageDict where the page is a string containing text on a page in a document, and
pageDict['content']=page
pageDict['my_join_field']={}
pageDict['my_join_field']['parent']="1"
pageDict['my_join_field']['name']="page"
res = es.index(index="docpage",doc_type="_doc",body=pageDict)
but I get a parser error:
RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Any ideas?

This worked for me :
res=es.index(index="docpage",doc_type="_doc",body={"content":page,
"my-join-field":{
"name": "my_page",
"parent": "1"}
})

The initial syntax can work if the parent is also repeated in the "routing" key of the main query body:
res = es.index(index="docpage",doc_type="_doc",body=pageDict, routing=1)

Related

AppSync/GraphQL filter nested objects

I have a DynamoDB table with the following structure
site (String)
addresses (List)
|-> address (String)
|-> isCurrent (Boolean)
I want to filter a specific site for either current or all address(s).
query MyQuery {
getSite(site: "site1", isCurrent: true) {
site
addresses{
adress
isCurrent
}
the schema looks like
type Sites{
site: String!
addresses: [Address]
}
type Address {
address: String
isCurrent: Boolean
}
type Query{
getSite(site: String!, isCurrent:Boolean)
}
The Resolver I have
#if($ctx.args.isCurrent)
{
"version": "2017-02-28",
"operation": "Query",
"query": {. // Filter for specific Site
"expression": "#siteName = :siteNameByUser",
"expressionNames": {
"#siteName": "site"
},
"expressionValues": {
":siteNameByUser": {"S": $util.toJson($ctx.args.site)}
}
}, // Filter Current Address(s)
"filter": {
"expression": "addresses.isCurrent = :isActiveByUser",
"expressionValues": {
":isActiveByUser": $util.dynamodb.toDynamoDBJson($ctx.args.isCurrent)
}
}
}
#else
{
"version": "2017-02-28",
"operation": "GetItem",
"key": {
"site": $util.dynamodb.toDynamoDBJson($ctx.args.site)
}
}
#end
I'm not getting any results when I add filter ( it works without the filter or with isCurrent=False ).
I am trying to filter the inner objects in Addresses list based on a value user sends for isCurrent. Any help is much appreciated!
I tried writing a resolver with a filter condition on an inner value (addresses.isCurrent).
{
"version": "2017-02-28",
"operation": "Query",
"query": {. // Filter for specific Site
"expression": "#siteName = :siteNameByUser",
"expressionNames": {
"#siteName": "site"
},
"expressionValues": {
":siteNameByUser": {"S": $util.toJson($ctx.args.site)}
}
}, // Filter Current Address(s)
"filter": {
"expression": "addresses.isCurrent = :isActiveByUser",
"expressionValues": {
":isActiveByUser": $util.dynamodb.toDynamoDBJson($ctx.args.isCurrent)
}
}
}
Apparently, DynamoDB does not let you filter on Complex object types like List of Maps (your case), see a related question: DynamoDB: How to store a list of items
I'd suggest changing your DynamoDB table data model if possible to site, address, isCurrentAddress to achieve what you are trying to do. Or you can
write logic in VTL response mapping template to filter your result set based on isCurrentAddress value. Btw AppSync recently launched JavaScript resolvers, go through that and see if that helps in writing your resolver logic simpler.

How to change the field type in an ElasticSearch Index?

I have index_A, which includes a number field "foo".
I copy the mapping for index_A, and make a dev tools call PUT /index_B with the field foo changed to text, so the mapping portion of that is:
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
I then reindex index_A to index_B with:
POST _reindex
{
"source": {
"index": "index_A"
},
"dest": {
"index": "index_B"
}
}
When I go to view any document for index_B, the entry for the "foo" field is still a number. (I was expecting for example: "foo": 30 to become "foo" : "30" in the new document's source).
As much as I've read on Mappings and reindexing, I'm still at a loss on how to accomplish this. What specifically do I need to run in order to get this new index with "foo" as a text field, and all number entries for foo in the original index changed to text entries in the new index?
There's a distinction between how a field is stored vs indexed in ES. What you see inside of _source is stored and it's the "original" document that you've ingested. But there's no explicit casting based on the mapping type -- ES stores what it receives but then proceeds to index it as defined in the mapping.
In order to verify how a field was indexed, you can inspect the script stack returned in:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(doc['foo'])"
}
}
}
}
as opposed to how a field was stored:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(params._source['foo'])"
}
}
}
}
So in other words, rest assured that foo was indeed indexed as text + keyword.
If you'd like to explicitly cast a field value into a different data type in the _source, you can apply a script along the lines of:
POST _reindex
{
"source": {
"index": "index_a"
},
"dest": {
"index": "index_b"
},
"script": {
"source": "ctx._source.foo = '' + ctx._source.foo"
}
}
I'm not overly familiar with java but I think ... = ctx._source.foo.toString() would work too.
FYI there's a coerce mapping parameter which sounds like it could be of use here but it only works the other way around -- casting/parsing from strings to numerical types etc.
FYI#2 There's a pipeline processor called convert that does exactly what I did in the above script, and more. (A pipeline is a pre-processor that runs before the fields are indexed in ES.) The good thing about pipelines is that they can be run as part of the _reindex process too.

ElasticSearch append non matched docs at the end of the search result

Is there any way to append non matched docs at the end of the search result?
I have been working on a project where we need to search docs by geolocation data but some docs don't have the geolocation data available. As a result of that these docs not returning in the search result.
Is there any way to append non matched docs at the end of the search result?
Example mapping:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
Data with geo location:
PUT /my_locations/_doc/1
{
"address" : {
"city: "XYZ",
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Data without geo location:
PUT /my_locations/_doc/2
{
"address" : {
"city: "ABC"
}
}
Is there any way to perform geo distance query which will select the docs with geolocation data plus append the non geo docs at the end of the result?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html#query-dsl-geo-distance-query
You have two separate queries
Get documents within the area
Get other documents
To get both of these in one search, would mean all of the documents appear in one result, and share ranking. It would be difficult to create a relevancy model which gets first 9 documents with address, and one without.
But you can just run two queries at once, one for say, the first 9 documents with location, and one for without any.
Example:
GET my_locations/_msearch
{}
{"size":9,"query":{"geo_distance":{"distance":"200km","pin.location":{"lat":40,"lon":-70}}}}
{}
{"size":1,"query":{"bool":{"must_not":[{"exists":{"field":"pin.location"}}]}}}

How to merge splitted FlowFiles with the data from Elasticsearch?

I have the problem with merging splitted FlowFiles. let me explain the problem step by step.
This is my sequence of processors.
In Elasticsearch I have this index and mapping:
PUT /myindex
{
"mappings": {
"myentries": {
"_all": {
"enabled": false
},
"properties": {
"yid": {"type": "keyword"},
"days": {
"properties": {
"Type1": { "type": "date" },
"Type2": { "type": "date" }
}
},
"directions": {
"properties": {
"name": {"type": "keyword"},
"recorder": { "type": "keyword" },
"direction": { "type": "integer" }
}
}
}
}
}
}
I get directions from Elasticsearch using QueryElasticsearchHTTP and then I split directions into using SplitJson in order to get 10 FlowFiles. Each FlowFile has this content: {"name": "X","recorder": "X", "direction": "X"}
After this, for each of 10 FlowFiles I generate the attribute filename using UpdateAttribute and ${UUID()}. Then, I enrich each FlowFile with some constant data from ElasticSearch. In fact, the data that I merge to each FlowFile is the same. Therefore, ideally, I would like to run Get constants from Elastic only once instead of running it 10 times.
But anyway the key problem is different. FlowFiles that come from Gets constants from Elastic have other values of filename and they cannot be merged with the files that come from Set the attribute "filename". I also tried to use EvaluateJsonPath, but had the same problem. Any idea of how to solve this problem?
UPDATE:
The Groovy code used in Merge inputs.... I am not sure if it works when in the input queues come batches of 10 and 10 files that should be merged:
import org.apache.nifi.processor.FlowFileFilter;
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
//get first flow file
def ff0 = session.get()
if(!ff0)return
def filename = ff0.getAttribute('filename')
//try to find files with same attribute in the incoming queue
def ffList = session.get(new FlowFileFilter(){
public FlowFileFilterResult filter(FlowFile ff) {
if( filename == ff.getAttribute('filename') )return FlowFileFilterResult.ACCEPT_AND_CONTINUE
return FlowFileFilterResult.REJECT_AND_CONTINUE
}
})
//let's assume you require two additional files in queue with the same attribute
if( !ffList || ffList.size()<1 ){
session.rollback(false)
return
}
//let's put all in one list to simplify later iterations
ffList.add(ff0)
//create empty map (aka json object)
def json = [:]
//iterate through files parse and merge attributes
ffList.each{ff->
session.read(ff).withStream{rawIn->
def fjson = new JsonSlurper().parse(rawIn)
json.putAll(fjson)
}
}
//create new flow file and write merged json as a content
def ffOut = session.create()
ffOut = session.write(ffOut,{rawOut->
rawOut.withWriter("UTF-8"){writer->
new JsonBuilder(json).writeTo(writer)
}
} as OutputStreamCallback )
//set mime-type
ffOut = session.putAttribute(ffOut, "mime.type", "application/json")
session.remove(ffList)
session.transfer(ffOut, REL_SUCCESS)

Just few hours with Couchdb and... when I create a view how do get the key?

I am trying to learn Couchdb and have a very very newbie question. I have following two documents
{
"type": "type1",
"code": "10",
"name": "ten",
},
{
"type": "type2",
"code": "20",
"name": "twenty",
}
I have created a view as following
function(doc) {
emit(doc.type, {"code":doc.code, "name":doc.name});
}
The above function works fine but I would like to get the key instead of writing as following example which doesn't work:
function(doc) {
emit(doc.type, {key(doc.code):doc.code, key(doc.name):doc.name});
}
How do I do that???
Simple solution
I'm not sure this is what you're after but you can do this:
function(doc) {
emit(doc.type, doc);
}
Then all the fields (including type but also _id, _rev…) are available without having to type them explicitly.
Full solution
key(doc.code):doc.code does not look better than "code":doc.code to me, but if you really want to avoid duplication, you can do:
function(doc) {
var elem = {}, keys = ["code", "name"];
for (var i in keys) {
elem[keys[i]] = doc[keys[i]];
}
emit(doc.type, elem);
}
It seems overkill unless you have a long list of keys.

Resources