How to store URLs in Elasticsearch for fast access? - elasticsearch

I would like to use Elasticsearch for a website and need a fast way to retrieve documents via the page's URL strings - actually paths (e.g. /shoes/sneakers/nike). The paths are unique.
Following solutions come to my mind:
Store as string, indexed, not analyzed
Store in the _id field
Which one would be the better solution and are there maybe better methods?
Thanks!

You can store the url field as keyword datatype and use the below query to get the results.
https://www.elastic.co/guide/en/elasticsearch/reference/master/keyword.html
POST index/_search
{
"query": {
"term": {
"url": "/shoes/sneakers/nike"
}
}
}
if you store it as text data type then elasticsearch will automatically create a keyword field for you and you can use the below query to get the results
mapping created by Elasticsearch
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
query to search
POST index/_search
{
"query": {
"term": {
"url.keyword": "/shoes/sneakers/nike"
}
}
}

{
"query": {
"match_phrase": {
"url": "/shoes/sneakers/nike"
}
}
}
You need to store url field with their values. This match_phrase will solve your problem.

Related

Elastic Search Aggregations not working with subfield keyword

I am new to ES and trying to work with facet creation. But I am stuck at the first step itself. I have an index with a "text" field that has subfield as "keyword" shown below as an example.
PUT my-index
{
"mappings": {
"properties": {
"flavor": {
"type": "text",
"fields": {
"text": {
"type": "keyword"
}
}
}
}
}
}
I have 5 documents indexed, and I am trying to create buckets using the query below:
GET my-index/_search
{
"aggs": {
"my_buckets": {
"terms": {
"field": "flavor.keyword"
}
}
}
}
I do not get any error but also I do not get any buckets. But I also tried to do it in a reverse way. i.e. make the flavor field to be a keyword and in turn added text as the subfield. This method works and gives me the desired result. The problem here is I cannot change the original mapping which I have. I was expecting to get the results by using flavor.keyword but I hit a dead end now. Can someone please explain this behavior and anyway I can make it working?
You named the sub field as text. So, you have to refer it by flavor.text.
GET my-index/_search
{
"aggs": {
"my_buckets": {
"terms": {
"field": "flavor.text"
}
}
}
}

Find all entries on a list within Kibana via Elasticserach Query DSL

Could you please help me on this? My Kibana Database within "Discover" contains a list of trades. I know want to find all trades within this DB that have been done in specific instruments (ISIN-Number). When I add a filter manually and switch to Elasticserach Query DSL, I find the following:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"obdetails.isin": "CH0253592783"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0315622966"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0357659488"
}
}
],
"minimum_should_match": 1
}
}
}
Since I want to check the DB for more than 200 ISINS, this seems to be inefficient. Is there a way, in which I could just say "show me the trade if it contains one of the following 200 ISINs?".
I already googled and tried this, which did not work:
{
"query": {
"terms": {
"obdetails.isin": [ "CH0357659488", "CH0315622966"],
"boost": 1.0
}
}
}
The query works, but does not show any results.
To conclude. A field of type text is analyzed which basically converts the given data to a list of terms using given analyzers etc. rather than it being a single term.
Given behavior causes the terms query to not match these values.
Rather than changing the type of the field one may add an additional field of type keyword. That way a terms queries can be performed whilst still having the ability to match on the field.
{
"isin": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
The above example will add an extra field called obdetails.isin.keyword which can be used for terms. While still being able to use match queries on obdetails.isin

handling both exact and partial search on the same search string

I want to define the schema which can tackle the partial as well as the exact search for the same search value.
The exact search should always return the "exact match", ES should not break the search string into tokens in this case.
For partial match data type of the property should be text and for exact it should be keyword. For having the feasibility to have both partial and exact search without having to index the data to different properties you can leverage using fields. What it does is that it helps to index same data into different ways.
So, lets say you want to index name of persons, and have the ability for partial and exact search. In such case the mapping would be:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Lets index a few docs:
PUT test/_doc/1
{
"name": "Nishant Saini"
}
PUT test/_doc/2
{
"name": "Nishant Kumar"
}
For partial search we have to query name field and it is of type text.
GET test/_doc/_search
{
"query": {
"query_string": {
"query": "Nishant Saini",
"field": [
"name"
]
}
}
}
The above query will return both docs (1 and 2) because one token i.e. Nishant appears in both the document for field name.
For exact search we need to query on name.keyword. To perform exact match we can use term query as below:
{
"query": {
"term": {
"name.keyword": "Nishant Saini"
}
}
}
This would match doc 1 only.

elasticsearch query string(full text search) which can't be searched

we have a document below. I can't searched with financialmarkets. but it can be searched with industry_icon_financialmarkets.png. Can anyone tell me what is the reason?
content is the text type field.
document:
{
"title":"test",
"content":"industry_icon_financialmarkets.png"
}
Query:
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"industry_icon_financialmarkets.png\""
}
}
]
}
}
}
The default analyzer for text field is standard which won't break industry_icon_financialmarkets into tokens using _ as a delimiter. I would suggest you to use simple analyzer instead which will breaks text into terms whenever it encounters a character which is not a letter.
You can also add sub-field of type keyword to retain the original value.
So the mapping of the field should be:
{
"content": {
"type": "text",
"analyzer": "simple",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
At the time of creating index, we should have our own mapping for each fields based on its type to get the expected result.
Mapping
PUT relevance
{"mapping":{"ID":{"type":"long"},"title":
{"type":"keyword","analyzer":"my_analyzer"},
"content":
{"type":"string","analyzer":"my_analyzer","search_analyzer":"my_analyzer"}},
"settings":
{"analysis":
{"analyzer":
{"my_analyzer":
{"tokenizer":"my_tokenizer"}},
"tokenizer":
{"my_tokenizer":
{"type":"ngram","min_gram":3,"max_gram":30,"token_chars":
["letter","digit"]
}
}
},"number_of_shards":5,"number_of_replicas":2
}
}
Then start inserting documents,
POST relevance/_doc/1
{
"name": "1elastic",
"content": "working fine" //replace special characters with space using program before inserting into ES index.
}
Query
GET relevance/_search
{"size":20,"query":{"bool":{"must":[{"match":{"content":
{"query":"fine","fuzziness":1}}}]}}}

Multiple Paths in Nested Queries

I'm cross-posting this from the elasticsearch forums (https://discuss.elastic.co/t/multiple-paths-in-nested-query/96851/1)
Below is an example, but first I’ll tell you about my use case, because I’m not sure if this is a good approach. I’m trying to automatically index a large collection of typed data. What this means is I’m trying to generate mappings and queries on those mappings all automatically based on information about my data. A lot of my data is relational, and I’m interested in being able to search accross the relations, thus I’m also interested in using Nested data types.
However, the issue is that many of these types have on the order of 10 relations, and I’ve got a feeling its not a good idea to pass 10 identical copies of a nested query to elasticsearch just to query 10 different nested paths the same way. Thus, I’m wondering if its possible to instead pass multiple paths into a single query? Better yet, if its possible to search over all fields in the current document and in all its nested documents and their fields in a single query. I’m aware of object fields, and they’re not a good fit because I want to retrive some data of matched nested documents.
In this example, I create an index with multiple nested types and some of its own types, upload a document, and attempt to query the document and all its nested documents, but fail. Is there some way to do this without duplicating the query for each nested document, or is that actually a performant way to do this? Thanks
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"obj1" : {
"type" : "nested",
"properties": {
"name": {
"type":"text"
},
"number": {
"type":"text"
}
}
},
"obj2" : {
"type" : "nested",
"properties": {
"color": {
"type":"text"
},
"food": {
"type":"text"
}
}
},
"lul":{
"type": "text"
},
"pucci":{
"type": "text"
}
}
}
}
}
PUT /my_index/type1/1
{
"obj1": [
{ "name":"liar", "number":"deer dog"},
{ "name":"one two three", "number":"you can call on me"},
{ "name":"ricky gervais", "number":"user 123"}
],
"obj2": [
{ "color":"red green blue", "food":"meatball and spaghetti"},
{ "color":"orange", "food":"pineapple, fish, goat"},
{ "color":"none", "food":"none"}
],
"lul": "lul its me user123",
"field": "one dog"
}
POST /my_index/_search
{
"query": {
"nested": {
"path": ["obj1", "obj2"],
"query": {
"query_string": {
"query": "ricky",
"all_fields": true
}
}
}
}
}

Resources