Elasticsearch query with wildcards - elasticsearch

Use their data on Elasticsearch tutorials as an example, the following uri search hits 9 records,
curl -XGET 'remotehost:9200/bank/_search?q=city:R*d&_source_include=city&pretty&pretty'
while the following reques body search hits 0 records,
curl -XGET 'remotehost:9200/bank/_search?pretty' -H 'Content-Type: application/json'
-d'{"query": {"wildcard": {"city": "R*d"} },
"_source": ["city"]
}
'
But the two methods shoud be equivalent to each other. Any idea why this is happening? I use Elasticsearch 5.5.1 in docker.

You can get your expected result by hitting the below command. This commands add an extra .keyword with your command in field city.
curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"wildcard": {"city.keyword": "R*d"} }, "_source": ["city"]}'
Reason of adding .keyword
When you insert data to elasticsearch, you will notice a .keyword field and that field is not_analyzed. By default, the field you have inserted data, is standard analyzed and there is a multifield .keyword . If you create a field city with data, then it create a field city with standard analyzer and added a multifield .keyword which is not_analyzed.
In your case you need a not_analyzed field to query (as wildcard query). So, your query should be on city.keyword field which is by default not_analyzed.
In the first case, you have hit a get request to elasticsearch with query parameter. Elasticsearch will automatically converted the query as like second format.
For reliable source, you can follow the Official docs
The string field has split into two new types: text, which should be
used for full-text search, and keyword, which should be used for
keyword search.
To make things better, Elasticsearch decided to borrow an idea that
initially stemmed from Logstash: strings will now be mapped both as
text and keyword by default. For instance, if you index the
following simple document:
{
"foo": "bar"
}
Then the following dynamic mappings will be created:
{
"foo": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search
on foo, and keyword search and aggregations using the foo.keyword
field.

Related

Trino/presto with elastic : how to search nested objects?

I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.

How to create document in elasticsearch to save data and to search it?

Here it is my requirement, This is my 3levels of data which I am gettting from DB , my requirement is when I search for Developer I should get all the values of Developer such as Geo and Graph from Data2 in a list and while coming to support my values should contain Server and Data in a list and then on the basis of selection of Data1 . Data3 should be able to do the search , like suppose when we select developer then Geopos and Graphpos...
the logic which i need to use here is of elasticsearch
data1 data2 data3
Developer GEO GeoPos
Developer GRAPH GraphPos
Support SERVER ServerPos
Support Data DataPos
this is what I have done to crete the index and to get the values
curl -X PUT http://localhost:9200/mapping_log
{ "mappings":{ "properties":{"data1:{"type": "text","fields":{"keyword":{"type":"keyword"}}}, {"data2":{"type": "text","fields":{"keyword":{"type":"keyword"}}}, {"data3":{"type": "text","fields":{"keyword":{"type":"keyword"}}}, } } } 
searching values , I am not sure what I am going to get can u pls help with search dsl query too
curl -X GET "localhost:9200/mapping_log/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"data1.data2": "product"
}
}
}
How to create document for such type of Data can we create json and post it through postman or curl ?
If your documents are not indexed in elastic search first you need to ingest them to an existing index in elastic with the aid of Logstah , you can find many configuration file related to you input database.
Before transforming your documents create and index in elastic with multi fields mapping, you can use dynamic mapping(elastic default mapping) also and change your Dsl query but I recommend to use multi fields mapping as follow
PUT /mapping{
"mappings":
{"properties": {"rating":{"type": "float"},
"content":{"type": "text"},
"author":{"properties": {
"name":{"type": "text"},
"email":{"type": "keyword"}
}}
}}
}
The result will be
Mapping result
then you can query the fields in kibana Dev tools with DSL query like below
GET /mapping/_search{
"query": {"match":
{ "author.email": "SOMEMAIL"}}
}

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

Is it possible to returned the analyzed fields in an ElasticSearch >2.0 search?

This question feels very similar to an old question posted here: Retrieve analyzed tokens from ElasticSearch documents, but to see if there are any changes I thought it would make sense to post it again for the latest version of ElasticSearch.
We are trying to search bodies of text in ElasticSearch with the search-query and field-mapping using the snowball stemmer built into ElasticSearch. The performance and results are great, but because we need to have the stemmed text-body for post-analysis we would like to have the search result return the actual stemmed tokens for the text-field per document in the search results.
The mapping for the field currently looks like:
"TitleEnglish": {
"type": "string",
"analyzer": "standard",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"stemming": {
"type": "string",
"analyzer": "snowball"
}
}
}
and the search query is performed specifically on TitleEnglish.stemming. Ideally I would like it to return that field, but returning that does not return the analyzed field but the original field.
Does anybody know of any way to do this? We have looked at Term Vectors, but they only seem to be returnable for individual documents or a body of documents, not for a search result?
Or perhaps other solutions like Solr or Sphinx do offer this option?
To add some extra information. If we run the following query:
GET /_analyze?analyzer=snowball&text=Eight issue of Industrial Lorestan eliminate barriers to facilitate the Committees review of
It returns the stemmed words: eight, issu, industri, etc. This is exactly the result we would like back for each matching document for all of the words in the text (so not just the matches).
Unless I'm missing something evident, why not simply returning a terms aggregation on the TitleEnglish.stemming field?
{
"query": {...},
"aggs" : {
"stems" : {
"terms" : {
"field" : "TitleEnglish.stemming",
"size": 50
}
}
}
}
Adding that aggregation to your query, you'd get a breakdown of all the stemmed terms in the TitleEnglish.stemming sub-field from the documents that matched your query.

Elastic Search Hyphen issue with term filter

I have the following Elastic Search query with only a term filter. My query is much more complex but I am just trying to show the issue here.
{
"filter": {
"term": {
"field": "update-time"
}
}
}
When I pass in a hyphenated value to the filter, I get zero results back. But if I try without an unhyphenated value I get results back. I am not sure if the hyphen is an issue here but my scenario makes me believe so.
Is there a way to escape the hyphen so the filter would return results? I have tried escaping the hyphen with a back slash which I read from the Lucene forums but that didn't help.
Also, if I pass in a GUID value into this field which is hyphenated and surrounded by curly braces, something like - {ASD23-34SD-DFE1-42FWW}, would I need to lower case the alphabet characters and would I need to escape the curly braces too?
Thanks
I would guess that your field is analyzed, which is default setting for string fields in elasticsearch. As a result, when it indexed it's not indexed as one term "update-time" but instead as 2 terms: "update" and "time". That's why your term search cannot find this term. If your field will always contain values that will have to be matched completely as is, it would be the best to define such field in mapping as not analyzed. You can do it by recreating the index with new mapping:
curl -XPUT http://localhost:9200/your-index -d '{
"mappings" : {
"your-type" : {
"properties" : {
"field" : { "type": "string", "index" : "not_analyzed" }
}
}
}
}'
curl -XPUT http://localhost:9200/your-index/your-type/1 -d '{
"field" : "update-time"
}'
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"filter": {
"term": {
"field": "update-time"
}
}
}'
Alternatively, if you want some flexibility in finding records based on this field, you can keep this field analyzed and use text queries instead:
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"query": {
"text": {
"field": "update-time"
}
}
}'
Please, keep in mind that if your field is analyzed then this record will be found by searching for just word "update" or word "time" as well.
The accepted answer didn't work for me with elastic 6.1. I solved it using the "keyword" field that elastic provides by default on string fields.
{
"filter": {
"term": {
"field.keyword": "update-time"
}
}
}
Based on the answer by #imotov If you're using spring-data-elasticsearch then all you need to do is mark your field as:
#Field(type = FieldType.String, index = FieldIndex.not_analyzed)
instead of
#Field(type = FieldType.String)
The problem is you need to drop the index though and re-instantiate it with new mappings.

Resources