How to retrieve all document ids matching a search, in elastic search? - elasticsearch

I'm working on a simple side project, and have a tech stack that involves both a SQL database and ElasticSearch. I only have ElasticSearch because I assumed that as my project grows, my full text searching would be most efficiently performed by ES. My ES schema is very simple - documents that I insert into ES have 2 fields, one being the id and the other being the field with the body of text to search. The id being inserted into ES corresponds to that document's primary key id from the SQL database.
insert record into SQL -> insert record into ES using PK from SQL
Searching would be the reverse of that. Query ES and grab all the matching ids, and then turn around and use those ids to get records from SQL.
search ES can get all PK ids -> use those ids to get documents from SQL
The problem that I am facing though, is that ES can only return documents in a paginated manner. This is a problem because I also have a WHERE clause on my SQL query, beyond just the ids. My SQL query might look like this ...
SELECT * FROM foo WHERE id IN (1,2,3,4,5) AND bar != 'baz'
Well, with ES paginating the results, my WHERE clause will always only be querying a subset of the full results from ES. Even if I utilize ES' skip and take, I'm still only querying SQL using a subset of document ids.
Is there a way to get Elastic Search to only return the entire list of matching document ids? I realize this is here to not allow me to shoot myself in the foot, because doing this across all shards and many many documents is not efficient. Is there no way, though?
After putting in some hours on this project, I've only now realized that I've poorly engineered this, unless I can get all of these ids from ES. Some alternative implementations that I've thought of would be to store the things that I'm filtering on, in SQL, in ES as well. A problem there is that I'd have to update the ES document every time I update the document in SQL. This would require a pretty big rewrite to some of my data access code. I could scrap ElasticSearch all together and just perform searching in Postgres, for now, until I can think of a better way to structure this.

The elasticsearch not support return each and every doc match to you queries. Because it Ll overload the system. Instead of this.. Use scroll concept in elasticsearch.. It's lik cursor concept in db's..
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
For more examples refer the Github repo. https://github.com/sidharthancr/elasticsearch-java-client
Hope it helps..

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
please have a look into the elastic search document where you can specify only particular fields that return from the match documents
hope this resolves your problem
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}

Related

How to perform a Join query on Elastic Search via springboot/java?

I have a springboot application that interacts with elastic search (or as it know now OpenSearch). It can perform basic operations such as search, index etc. I used this as my base (although I replaced high level client since it is deprecated) and to perform queries, I am using #Query annotation mostly (as described in section 2.2 here, although I also used QueryBuilders).
Now, I have an interesting use case - I would like to perform 2 queries at the same time. First query would find a file in elastic search that would contain 3 ids. These 3 ids are ids of other files in the same elastic search. The 2nd query would look for these 3 files and finally return them to me. Now, I can easily do it in 2 steps:
Have a query to find a file containing 3 ids and return it
Have a second query (multisearch query can do bulk search as I understand) to search
for 3 files using info from the first query.
However, I need them to happen within the same query - so within the same query I need to search for a file containing the 3 ids and then perform a search for these 3 files.
So currently my files in elastic search look like so:
{
"docId": "docId57",
"relatedDocs": [
{
"relatedId": "docId1",
"type": "apple"
},
{
"relatedId": "docId2",
"type": "orange"
},
{
"relatedId": "docId3",
"type": "banana"
}
]
}
and my goal is to have a query that will accept docId57 as an arg (so a method findFilesViaJoin(docId57) or something) and return a list of 3 files: file for docId1, file docId2 and file for docId3.
I know it is possible either via nested queries, child/parent queries or good old SQL queries (via jpa/hibarnate).
I attempted to use all of these and was unsuccessful for reasons described below.
Child/parent queries
So for child/parent queries, I attempted to use DSL with #Query but couldn't quite get it since I don't have a solid documentation to refer to (the one that actually helps with java not curls). After some time I found this and this articles - I maybe can figure out how to make it work with child/parent but neither explain how to do mapping. If this approach can do what I want, my question is: how to set up & map parent/child in springboot.
Using SQL queries
So for this one, I need to change my set up to use hibarnate. I used this as my base. It works, the only problem I have is that my SQL queries get ignored. Instead, the search is done based of a method's name, not the content of #Query. So it is as if I don't have an annotation used at all. So using the structure mentioned above, the following method in my app:
#Query("select t from MyModel t where t.docId = ?1")
findByRelatedDocsRelatedId(String id)
will return files that has a relatedId that matches the id passed via method ard id (as oppose to reading query from #Query that tells method to search all docs based on docId). Now, I don't mind using method name as a query to search for something. But then why would I use #Query for? (not to mention how do I create a name that does join). It might be possible that my hibernate is set up wrong (never used it before this week). So question here is, does anybody have a nice complete example of hibarnate being used with elastic search that does join query?
Nested queries
For these queries, I assume that I just need to figure out what to put inside the #Query but due to limited documentation about how to compose nested query I didn't manage to make it even remotely to work. Any concreate documentation on how to create DSL nested query would be appreciated.
Any of the ways I described will work for me. I think child/parent seems the best choice (seeing as they kind created for this purpose) but any will do.

Reverse searching with Elastic Search

We have a site and want to give the users the oportunity to save a search query and be notified once an object have been added that would have been a hit they might be interested in.
We Have an index that contains search queries that the users have saved. Every time a new object is added to the object index, we want to do a reverse search in order to find the search queries that would have resulted in a hit for that object. This is in order to avoid doing one search for each saved query every time an object is added.
The problem is that the object contains all data, but the search queries only contain the properties that are interesting. So we are getting zero hits for most queries.
Example:
Search query:
{
"make": "foo",
"model": "bar
}
Newly added object:
{
"make": "foo",
"model: "bar",
"type": "jazz"
}
As you can see, the user is interested in any object with make "foo" and model "bar", and we want a query that would result in a hit because type "jazz" is missing in the index. What we get is zero hits.
We use the nest client version 7.13.0 in a dotnet6 application and Elastic Search version 7.13.4.
Would it be possible to reverse search so that a null in the index would be considered as a hit for any search query?
Thank you
You can achieve this with Percolate Query in Elasticsearch.
I have recently written blog on Percolate Query where I have explained with an example.
You can save a user query with Percolate query and when you index document at that time you can call search API and check if any query is matched the document or not. As you are using Nest client this will be easy to implement.

Messages aggregation in elasticsearch

For example I have next documents.
{sourceIP:1.1.1.1, destIP:2.2.2.2}
{sourceIP:1.1.1.1, destIP:3.3.3.3}
{sourceIP:1.1.1.1, destIP:4.4.4.4}
Is there anyway to automatically aggregate them into one document which will contain next data?
{sourceIP:1.1.1.1, destIP:{2.2.2.2,3.3.3.3,4.4.4.4}}
So it looks like group by in SQL, but generate new documents in elasticsearch instead of old one.
I dont think there is anyway to do indexing time auto-merging of documents.
However , it should be possible to acheive whatever result you are planning to query should be possible by using one of querying options offered by Elasticsearch - while indexing one document for ,
Like ..
You can index seperate documents, query by sourceIP and use aggregations to give dest_ip
Take count of documents if its just to find dest_ips for a source_ip
Also if you want to avoid duplicate source_id + dest_id combinations , you can concat and use it as _id of document
Hope this helps.

Updating filtered documents in elasticsearch

I want to know if there is a way to update elasticsearch documents after filtering them out.
Let's say I have a user collection with following documents:
[
{ "name":"u1","age":23},
{ "name":"u2","age":31},
{ "name":"u3","age":27},
{ "name":"u4","age":33}
]
Now what I need to do is update the names of all the users who have ages above 30.
Looking at a lot of documentation and searching for hours on google, including the following document
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_updating_documents.html
I couldn't find a way to do it. So if we look into the docs, we are providing the id of the document, so it doesn't suite my need. Is there a way to do this sort do this sort of stuff in Elasticsearch?
From the link you provided:
Note that as of this writing, updates can only be performed on a
single document at a time. In the future, Elasticsearch will provide
the ability to update multiple documents given a query condition (like
an SQL UPDATE-WHERE statement).
So, this is not supported at the moment. But you can consider taking a look at this plugin: https://github.com/yakaz/elasticsearch-action-updatebyquery/.

Do mongo find queries perform faster with more criteria?

Does performance improve by limiting the find (or findOne) with more criteria?
An example:
db.users.find({_id : ObjectId("111111111111111111111111")})
db.users.find({_id : ObjectId("111111111111111111111111"), accountId : ObjectId("22222222222222222222222")})
Another example:
db.users.find({full_name: 'Lionel Messi'})
db.users.find({full_name : 'Lionel Messi', first_name : 'Lionel', last_name : 'Messi' })
Typically, no. Because mongoDB tends to return a cursor of the first N values found, if you're being more specific, it will take longer to find values matching that criteria.
If you want to see what could be effecting the speed of your query, its a good idea to use the explain() method.
See here for more details: http://docs.mongodb.org/manual/tutorial/analyze-query-plan/
No since you are using _id which is unique.
As for making the query slower: it could be slower by nanoseconds at most if there is not a compound index on {_id, accountId} since once the documents by the _id index have been found they will be loaded into memory to match the accountId field.
MongoDB will find by index before looking at fields which are not witin the selected index.
However since your query (being uncovered) will load the document prior to returning anyway the only thing slowing the query down is that final match which is basically negliable in speed.
In this case no. _id is indexed automatically and it uniquely identifies documents. The first criteria
{_id : ObjectId("111111111111111111111111")}
will find the document using the index. Checking the value of accountId will actually make the query slower because MongoDB has to check an other value.

Resources