Indices with nested property in both Kibana vizualization and index queries - elasticsearch

So I have following problem which I'm trying to solve last two days. I have python script which parses logs and inserts data in elastic search, dynamically creating indices via bulk function.
Problem is my mapping has one "type": "nested" property, something like "users" field. And particularly when I'm only adding "type": "nested" in this property I can't query objects from Kibana nor creating any vizualization (because nested objects are separate documents If I'm not making mistakes). First think I tried: adding aditional "include_in_parent": true parameter to users field, but as result I got "wrong" queries (i.e. running something like +users.name: 'test' +users.age: 30) would result in ANY document which has those two fields, not exactly referring to ONE user object. Also vizualization was obviously wrong too.
Second solution I found was adding parent-child relationship. But this could be potentially be waste of time as I don't know will it result in correct queries. So I'm asking, if it will be normal solution to my problem?

Found out that Kibana doesn't support nested objects.
But ppadovani made this fork which supports this feature.
https://github.com/homeaway/kibana/tree/nestedSupport-4.5.4

Related

Control which multi fields are queried by default

I have a preexisting index that contains field mappings and is currently being queried by many applications. I would like to add additional ways for the data to be queried, specifically, support full text search via analysis. Multi-fields seemed like the obvious way to do this, but I found that adding new multi-fields actually changes the existing query behavior.
For example, I have an "id" field that is a keyword. Applications are already using this field to query on. After I add a new multi-field, like "txt" (using the standard analyzer), new documents can be found by querying with just a partial value match. Values for "id" look like this: "123-abc" so now a query with just "abc" will match when querying against the "id" field. This is not how it worked previously (the keyword only field would require the entire value "123-abc").
Ideally, the top-level "id" field would be keyword only, and if a "full text" search was required, the query would need to specify "id.txt". So my question is... is there a way to disable multi-fields and require that the query explicitly set a sub field when needed?
My only other thought on how to solve this, was to use copy_to so that these fields are completely distinct... but that is a bit more work and there are many many fields to deal with that would require this.

How to perform a Join query on Elastic Search via springboot/java?

I have a springboot application that interacts with elastic search (or as it know now OpenSearch). It can perform basic operations such as search, index etc. I used this as my base (although I replaced high level client since it is deprecated) and to perform queries, I am using #Query annotation mostly (as described in section 2.2 here, although I also used QueryBuilders).
Now, I have an interesting use case - I would like to perform 2 queries at the same time. First query would find a file in elastic search that would contain 3 ids. These 3 ids are ids of other files in the same elastic search. The 2nd query would look for these 3 files and finally return them to me. Now, I can easily do it in 2 steps:
Have a query to find a file containing 3 ids and return it
Have a second query (multisearch query can do bulk search as I understand) to search
for 3 files using info from the first query.
However, I need them to happen within the same query - so within the same query I need to search for a file containing the 3 ids and then perform a search for these 3 files.
So currently my files in elastic search look like so:
{
"docId": "docId57",
"relatedDocs": [
{
"relatedId": "docId1",
"type": "apple"
},
{
"relatedId": "docId2",
"type": "orange"
},
{
"relatedId": "docId3",
"type": "banana"
}
]
}
and my goal is to have a query that will accept docId57 as an arg (so a method findFilesViaJoin(docId57) or something) and return a list of 3 files: file for docId1, file docId2 and file for docId3.
I know it is possible either via nested queries, child/parent queries or good old SQL queries (via jpa/hibarnate).
I attempted to use all of these and was unsuccessful for reasons described below.
Child/parent queries
So for child/parent queries, I attempted to use DSL with #Query but couldn't quite get it since I don't have a solid documentation to refer to (the one that actually helps with java not curls). After some time I found this and this articles - I maybe can figure out how to make it work with child/parent but neither explain how to do mapping. If this approach can do what I want, my question is: how to set up & map parent/child in springboot.
Using SQL queries
So for this one, I need to change my set up to use hibarnate. I used this as my base. It works, the only problem I have is that my SQL queries get ignored. Instead, the search is done based of a method's name, not the content of #Query. So it is as if I don't have an annotation used at all. So using the structure mentioned above, the following method in my app:
#Query("select t from MyModel t where t.docId = ?1")
findByRelatedDocsRelatedId(String id)
will return files that has a relatedId that matches the id passed via method ard id (as oppose to reading query from #Query that tells method to search all docs based on docId). Now, I don't mind using method name as a query to search for something. But then why would I use #Query for? (not to mention how do I create a name that does join). It might be possible that my hibernate is set up wrong (never used it before this week). So question here is, does anybody have a nice complete example of hibarnate being used with elastic search that does join query?
Nested queries
For these queries, I assume that I just need to figure out what to put inside the #Query but due to limited documentation about how to compose nested query I didn't manage to make it even remotely to work. Any concreate documentation on how to create DSL nested query would be appreciated.
Any of the ways I described will work for me. I think child/parent seems the best choice (seeing as they kind created for this purpose) but any will do.

Custom query with Deepstackai haystack

I am exploring deepset haystack and found it very interesting for multiple use cases like a chatbot, search engine, document search, etc
But have not found any reference where I can create multiple indexes for different documents and search based on indexes. I thought of using meta tags for conditional search(on a particular area) by tagging the documents first and then using the params parameter of query API but the same doesn't seem to work and throws an error(I used its vanilla docker-compose based setup)
You can use multiple indices in the same document store if you want to support multiple use cases, indeed. The write_documents method of the document store has a parameter index so that you can store documents for your different use cases in different indices. In the same way, you can pass an index parameter to the query method.
As you expected, there is an alternative solution that uses the meta field of documents. However, the format needs to be slightly different. Your query needs to have the following format:
{"query": "What's the capital town?", "params": {"filters": {"name": "75_Algeria75.txt"}}}
and your documents need to have the following format:
{'text': 'Algeria is...', 'meta':{'name': "75_Algeria75.txt"}}

How to overcome maxClauseCount error when using multi_match query

I have 10+ Indexes on my Elasticsearch server.
Each Index has 1 or more fields with different kind of Analyzers: keyword, standard, ngram and etc...
For Global search I am using multi_match without specifying any explicit fields.
For querying I am using using elasticsearch-dsl library, the code is bellow:
def search_for_index(indice, term, num_of_result=10):
s = Search(index=indice).sort({"_score": "desc"})
s = s[:num_of_result]
s = s.query('multi_match', query=term, operator='and')
response = s.execute()
return response.to_dict()['hits']['hits']
I get very good result, and search is working just fine, but sometimes someone enters a bit longer text, and I am getting maxClauseCount error.
For example, search that raises an error when search term term is equal to:
term=We are working on your request and will keep you posted at the earliest.
Or any other little longer text raises the same error.
Can you help me figure it out maybe some better approach for my Global search so that I can avoid this kind of error?
First of all - this limitation is there for a reason. The more boolean clauses you have - the heavier search would be. Think of it as crossing (AND) or joining (OR) subset of document ids for each of the clause. This is very heavy operation, that is why initially it has a limit of 1024 clauses.
General recommendation would be to try reduce number of fields you're searching. Maybe you have fields which consist no text data or just have some internal ids. You could cross them out during multi_match query by specifying fields section explicitly.
If you're still decided to go with current approach and you're using Elasticsearch 5.5+ and higher you could alter those by adding following line in elasticsearch.yml and restart your instance.
indices.query.bool.max_clause_count: 250000
If you're using pre-5 version of Elasticsearch the setting is called index.query.bool.max_clause_count

How to retrieve all document ids matching a search, in elastic search?

I'm working on a simple side project, and have a tech stack that involves both a SQL database and ElasticSearch. I only have ElasticSearch because I assumed that as my project grows, my full text searching would be most efficiently performed by ES. My ES schema is very simple - documents that I insert into ES have 2 fields, one being the id and the other being the field with the body of text to search. The id being inserted into ES corresponds to that document's primary key id from the SQL database.
insert record into SQL -> insert record into ES using PK from SQL
Searching would be the reverse of that. Query ES and grab all the matching ids, and then turn around and use those ids to get records from SQL.
search ES can get all PK ids -> use those ids to get documents from SQL
The problem that I am facing though, is that ES can only return documents in a paginated manner. This is a problem because I also have a WHERE clause on my SQL query, beyond just the ids. My SQL query might look like this ...
SELECT * FROM foo WHERE id IN (1,2,3,4,5) AND bar != 'baz'
Well, with ES paginating the results, my WHERE clause will always only be querying a subset of the full results from ES. Even if I utilize ES' skip and take, I'm still only querying SQL using a subset of document ids.
Is there a way to get Elastic Search to only return the entire list of matching document ids? I realize this is here to not allow me to shoot myself in the foot, because doing this across all shards and many many documents is not efficient. Is there no way, though?
After putting in some hours on this project, I've only now realized that I've poorly engineered this, unless I can get all of these ids from ES. Some alternative implementations that I've thought of would be to store the things that I'm filtering on, in SQL, in ES as well. A problem there is that I'd have to update the ES document every time I update the document in SQL. This would require a pretty big rewrite to some of my data access code. I could scrap ElasticSearch all together and just perform searching in Postgres, for now, until I can think of a better way to structure this.
The elasticsearch not support return each and every doc match to you queries. Because it Ll overload the system. Instead of this.. Use scroll concept in elasticsearch.. It's lik cursor concept in db's..
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
For more examples refer the Github repo. https://github.com/sidharthancr/elasticsearch-java-client
Hope it helps..
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
please have a look into the elastic search document where you can specify only particular fields that return from the match documents
hope this resolves your problem
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}

Resources