How does a JSON object gets tokenized and indexed in Elasticsearch - elasticsearch

I recently started working on Elasticsearch and could not figure out how a JSON object gets tokenized and gets stored in the inverted index.
Consider below JSON has been inserted.
{
"city": "Seattle",
"state": "WA",
"location": {
"lat": "47.6062095",
"lon": "-122.3320708"
}
}
I can perform an URI search like this
GET /my_index/_search?q=city:seattle
This search would return the above document, but how does Elasticsearch could be able to search 'seattle' only in the 'city' field. If it tokenizes the complete JSON, all the keys and values would be separated, then how the mapping between key token and value token would be maintained.

Because the indexed tokens point to the original document, which is also stored.
Have a look at Inverted Index in the elastic docs.

Related

Is there a way to get ElasticSearch to create n-gram tokens from truncated field?

Documents contain a url field with a full url. Users should be able to search for documents containing a given url by supplying a portion of the url string. The search string can be 3-15 characters long. An N-gram token filter with min_gram of 3 and max_gram of 15 would work but generates a large number of tokens for long urls. Is it possible to have ElasticSearch only generate tokens for the first 100 characters of the url field?
For example, the user should be able to search for documents containing the following url using a search string such as ’example.com’ or ‘/foo/bar’.
https://click.example.com/foo/bar/55gft/?qs=1952934d0ee8e2368ec7f7a921e3c6202b39365b9a2d26774c8122b8555ca21fce9d2344fc08a8ba40caede5e6901a112c6e89ead40892109eb8290d70571eab
There are two ways to achieve what you want.
Option 1: Keep using ngrams as you do now, but insert a truncate token filter before the ngram one, to limit the url size to 100 and only after ngram it.
Option 2: Use the wildcard field type, which has been created exactly for cases like this.
In your index, you should first change the type of the URL field to wildcard:
PUT test
{
"mappings": {
"properties": {
"url": {
"type": "wildcard"
}
}
}
}
Then, you can search on that field, using the wildcard query, like this:
POST test/_search
{
"query": {
"wildcard": {
"url": "*foo/bar*"
}
}
}
Also, read the related blog post which shows in details how the wildcard field type performs.

How to make _source field dynamic in elasticsearch search template?

While using search query in elastic search we define what fields we required in the response
"_source": ["name", "age"]
And while working with search templates we have to set _source fields value while inserting search template to ES Cluster.
"_source": ["name", "age"]
but the problem with the search template is that it will always return us name and age and to get other fields we have to change our search template accordingly.
Is there any way we can pass search fields from the client so that it will only return fields in response to which the user asked?
I have achieved that just for one field like if you do this
"_source": "{{field}}"
then while search index via template you can do this
POST index_name/_search/template
{
"id": template_id,
"params": {
"field": "name"
}
}
This search query returning the name field in response but I could not find a way to pass it as in array or in another format so I can get multiple fields.
Absolutely!!
Your search template should look like this:
"_source": {{#toJson}}fields{{/toJson}}
And then you can call it like this:
POST index_name/_search/template
{
"id": template_id,
"params": {
"fields": ["name"]
}
}
What it's going to do is to transform the params.fields array into JSON and so the generated query will look like this:
"_source": ["name"]

Can't create a visualization in Kibana (no compatible fields) - but I have compatible fields

I'd appreciate any help with this, I'm really stuck.
I am trying to create a simple visualization in Kibana, a line graph based on a number value in my data (origin_file_size_bytes). When I try to add a Visualization graph, I get this error:
No Compatible Fields: The "test*" index pattern does not contain any of the following field types: number or date
My actual index does contain a field with number, as does my data.
Thank you for any help!
Andrew
Here's a sample entry from the Discover Menu:
{
"_index": "lambda-index",
"_type": "lambda-type",
"_id": "LC08_L1TP_166077.TIF",
"_version": 1,
"_score": 2,
"_source": {.
"metadata_processed": {
"BOOL": true.
},
"origin_file_name": {
"S": "LC08_L1TP_166077.TIF"
},
"origin_file_size_bytes": {
"N": "61667800"
}
}
}
My Index pattern classifies as a string, even though it isn't:
origin_file_size_bytes.N string
You cannot aggregate on a string field. As seen from the screenshot above, your field has been indexed as string and NOT as a number. Elasticsearch dynamically determines mapping type of data if it is not explicitly defined. Since, you ingested the field as a string ES determined, correctly, that the field is of type string. See this link.
For ex. if you run the below to index a document with 2 fields as shown without an explicit mapping, ES creates message field as type 'string' and size field as type 'number' (long)
POST my_index\_doc\1
{
"message": "100",
"size": 100
}
Index your field into ES as a number instead and you should able to aggregate on it.

Elasticsearch create doc with custom _id

is there any way to create document without indicating the _id in the URL? I understand that there is option to create document using http://localhost:9200/[index_name]/[index_type]/[_id], I do have issues creating a document using this option as my _id is not auto-generated and the _id might have special characters such as # or &. I am currently using elasticsearch version 1.3.
One way to create documents without having to add the URL-encoded ID in the URL is to use the _bulk API. It simply goes like this:
POST index/doc/_bulk
{"index": {"_id": "1234#5678"}}
{"field": "value", "number": 34}
{"index": {"_id": "5555#7896"}}
{"field": "another", "number": 45}
As you can see, you can index several documents and their IDs is simply given within quotes inside the body of the bulk request. The URL itself simply invokes the _bulk endpoint.

JSON geo-point mapping with fields not matching Elasticsearch documentation format

I'm fairly new to Elasticsearch, and I'm trying to visualize some JSON data with Kibana. The trouble I have is with the geo-point mapping format.
The JSON object containing the relevant location fields (lon/lat) looks like this:
"geoNetwork": {
"city": "Test City",
"cityId": "1234567",
"continent": "Americas",
"country": "Canada",
"latitude": "44.1234",
"longitude": "-63.6940",
"metro": "(not set)",
"networkDomain": "bellaliant.net",
"networkLocation": "bell aliant regional communications inc.",
"region": "Nova Scotia",
"subContinent": "Northern America"
},
This does not match geo-point format in the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html, first example), as longitude and latitude are not the only two keys in the geoNetwork (location) object.
Is there a way to define the geo-point mapping so that I can use the JSON as-is, or would I have to modify the JSON schema to have an object that matches one of the geo-point formats in the documentation?
Sadly what you are asking is impossible :(
Elasticsearch require a GeoPoint/GeoLocation/... for any geographic operation.
So from there, I would recommend updating your json with a new GeoPoint field.
If for some technical reason, this is not possible, while I haven't used them myself, I would lookup elasticsearch ingestion pipeline. It act similarly to a SQL trigger. You will be able to add new fields dynamically before the index process, meaning that you can create on insertion a new GeoPoint field from your latitude, longitude field.
GeoPoint: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html
Ingestion Pipeline: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html

Resources