JanusGraph with Elasticsearch index is not working - elasticsearch

I have added mixed index in JanusGraph to support full-text search with Elasticsearch.
I have mixed index like:
myindex = mgmt.buildIndex("myesindex", Vertex.class)
.addKey("name", Mapping.TEXTSTRING.asParameter())
.addKey("sabindex", Mapping.TEXTSTRING.asParameter())
.buildMixedIndex("search");
I am able to load data into Elasticsearch engine.
Also I am able to execute the query successfully.
The issue I am facing is when I hit query :
g.V().has('code','abc').valueMap()
==>{str=[some text], code=[abc], sab=[sab], sabindex=[sabindex], name=[[some tex]]}
I am getting the result successfully, but when I try to search with name and code:
g.V().has('name', textContains('some text')).has('code','abc').valueMap()
code field is also indexed(composite)
At that time I am getting no result. Though data is present in graph and Elasticsearch.
And another scenario is same query with different name and code works successfully. I also rebuild the graph multiple times but not getting positive results.

The first query shows the value is name=[[some tex]]. It is missing the final t in text, so that explains why the query isn't matching on some text.
If you instead do textContains('some tex'), you would get the same result as the first query. Using the profile() step would show that the myindex was utilized.
See this gist of the recreate scenario.

Related

Cannot use "OR" with "NOT _exists_" in Kibana 6.8.0 search bar

I am trying to create one query in the Kibana search bar to retrieve some specific documents.
The goal is to get the documents that either have the field "myDate" before 2019-10-08 or "myDate" does not exist.
I have documents that meet one or the other condition.
I started by creating this query :
myDate:<=2019-10-08 OR NOT _exists_:myDate
But no documents were returned.
Since it did not work, I tried some other ways i found online :
myDate:<=2019-10-08 OR NOT (_exists_:myDate)
myDate:<=2019-10-08 OR !(_exists_:myDate)
myDate:<=2019-10-08 OR NOT (myDate:*)
But still, no results.
When I use either "part" of the "OR" condition, it works perfectly : I get either the documents who have myDate<=2019-10-08 or the ones that do not have a "myDate" field filled.
But when I try with both conditions, I get no document.
I have to use only the search bar to find these documents, neither an elasticsearch rest query nor by using kibana filters.
Thank you for your help :)
Below query works. Use Inspect button in kibana to see what query is actually being fired and make sure you are using correct index pattern as well.
(myDate:<=2019-12-31) OR (NOT _exists_:myDate)
Take a look at Query DSL documentation for Boolean operators for more better understanding with different use cases

How to overcome maxClauseCount error when using multi_match query

I have 10+ Indexes on my Elasticsearch server.
Each Index has 1 or more fields with different kind of Analyzers: keyword, standard, ngram and etc...
For Global search I am using multi_match without specifying any explicit fields.
For querying I am using using elasticsearch-dsl library, the code is bellow:
def search_for_index(indice, term, num_of_result=10):
s = Search(index=indice).sort({"_score": "desc"})
s = s[:num_of_result]
s = s.query('multi_match', query=term, operator='and')
response = s.execute()
return response.to_dict()['hits']['hits']
I get very good result, and search is working just fine, but sometimes someone enters a bit longer text, and I am getting maxClauseCount error.
For example, search that raises an error when search term term is equal to:
term=We are working on your request and will keep you posted at the earliest.
Or any other little longer text raises the same error.
Can you help me figure it out maybe some better approach for my Global search so that I can avoid this kind of error?
First of all - this limitation is there for a reason. The more boolean clauses you have - the heavier search would be. Think of it as crossing (AND) or joining (OR) subset of document ids for each of the clause. This is very heavy operation, that is why initially it has a limit of 1024 clauses.
General recommendation would be to try reduce number of fields you're searching. Maybe you have fields which consist no text data or just have some internal ids. You could cross them out during multi_match query by specifying fields section explicitly.
If you're still decided to go with current approach and you're using Elasticsearch 5.5+ and higher you could alter those by adding following line in elasticsearch.yml and restart your instance.
indices.query.bool.max_clause_count: 250000
If you're using pre-5 version of Elasticsearch the setting is called index.query.bool.max_clause_count

Aggregation value error in Elastic Search

I am trying to create a Date Histogram and aggregate a particular field to find the maximum value which is of long type in mapping from my ealsticsearch, but i get the result in floating point number,
for example :
Instead of getting 31032832 am getting 3.1032832E7
However am able to get 31032832 properly when i query my elasticsearch index through chrome plugin sense.
I found out what was the issue! it was giving me double value after aggregation because of this:
while accessing i called myResult.getMax().longValue() which solved my problem.

How to retrieve all document ids matching a search, in elastic search?

I'm working on a simple side project, and have a tech stack that involves both a SQL database and ElasticSearch. I only have ElasticSearch because I assumed that as my project grows, my full text searching would be most efficiently performed by ES. My ES schema is very simple - documents that I insert into ES have 2 fields, one being the id and the other being the field with the body of text to search. The id being inserted into ES corresponds to that document's primary key id from the SQL database.
insert record into SQL -> insert record into ES using PK from SQL
Searching would be the reverse of that. Query ES and grab all the matching ids, and then turn around and use those ids to get records from SQL.
search ES can get all PK ids -> use those ids to get documents from SQL
The problem that I am facing though, is that ES can only return documents in a paginated manner. This is a problem because I also have a WHERE clause on my SQL query, beyond just the ids. My SQL query might look like this ...
SELECT * FROM foo WHERE id IN (1,2,3,4,5) AND bar != 'baz'
Well, with ES paginating the results, my WHERE clause will always only be querying a subset of the full results from ES. Even if I utilize ES' skip and take, I'm still only querying SQL using a subset of document ids.
Is there a way to get Elastic Search to only return the entire list of matching document ids? I realize this is here to not allow me to shoot myself in the foot, because doing this across all shards and many many documents is not efficient. Is there no way, though?
After putting in some hours on this project, I've only now realized that I've poorly engineered this, unless I can get all of these ids from ES. Some alternative implementations that I've thought of would be to store the things that I'm filtering on, in SQL, in ES as well. A problem there is that I'd have to update the ES document every time I update the document in SQL. This would require a pretty big rewrite to some of my data access code. I could scrap ElasticSearch all together and just perform searching in Postgres, for now, until I can think of a better way to structure this.
The elasticsearch not support return each and every doc match to you queries. Because it Ll overload the system. Instead of this.. Use scroll concept in elasticsearch.. It's lik cursor concept in db's..
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
For more examples refer the Github repo. https://github.com/sidharthancr/elasticsearch-java-client
Hope it helps..
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
please have a look into the elastic search document where you can specify only particular fields that return from the match documents
hope this resolves your problem
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}

Elasticsearch not searching some fields

I have just updated a website, the update adds new fields to elasticsearch.
In my dev environment, it all works fine. but on the live site, the new fields are not being found.
Eg. I have added a new field with the value : 1
However, when adding a filtered query of
{"field":1}
It does not find any matching results.
When I look in the documents, I can see docs with the field set to 1
Would the reason for this be that the new field was added after the mappings was set? I am not all that familiar with elasticsearch, So I am not really sure where to start looking to fix it.
Any help would be appreciated.
Update:
querying from URL shows nothing either
_search/?pretty=true&size=50&q=field1:*
however there is another field that was added at the same time which I can search on.
I can see field1 in the result set but it just wont allow me to search on it.
Only difference i see in the mapping is that the one that is working is set to type:long whereas the one not working is set as type:string
Is it a length issue on the ngram? what was your "min_gram" settings?
When you check on your index settings like this:
GET <host>/<index_name>/_settings
Does it work when you filter for a two digit field?
Are all the field values one digit?
It's OK to add a field after the mapping was set. ElasticSearch will guess the mapping for you. (in fact, it's one of their selling features --- no need to define the mapping, just throw the data at it)
There are a few things that can go wrong:
Verify that data is actually in the index. To do that, just navigate to the _search url with no parameters, you should see the field if it is indexed.
Look at your mapping. Could it be that the field is explicitly set not to be indexed?
Another possibility is that your query is wrong (but that is unlikely, since you're saying it works in the development environment)

Resources