Hive to Elasticsearch to Kibana: No Fields in the Available field column - elasticsearch

I am following the below steps:
Step 1:
create table tutorials_tbl(submission_date date, tutorial_id INT,tutorial_title STRING,tutorial_author STRING) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe';
Step 2:
INSERT INTO tutorials_tbl (submission_date, tutorial_title, tutorial_author) VALUES ('2016-03-19 18:00:00', "Mark Smith", "John Paul");
Step 3:
CREATE EXTERNAL TABLE tutorials_tbl_es(submission_date date,tutorial_id INT,tutorial_title STRING,tutorial_author STRING)STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='tutor/tutors','es.nodes'='saturn:9200');
Step 4:
INSERT INTO tutorials_tbl_es SELECT * FROM tutorials_tbl LIMIT 1;
Now I selected the index in Kibana>Settings. I have configured _timestamp in the advanced settings so i only got that in the Time-field name even though I have submission_date column in the data.
Query 1: Why I am not getting submission_date in the Time-field name?
Query 2: When I selected _timestamp and clicked 'Create', I did not get anything under Available fields in the Discover tab? Why is that so?

Please load data into tutorials_tbl and try these steps as follows.
Step 1: create "tutor" dynamic template with settings and mappings.
{
"order": 0,
"template": "tutor-*",
"settings": {
"index": {
"number_of_shards": "4",
"number_of_replicas": "1",
"refresh_interval": "30s"
}
},
"mappings": {
"tutors": {
"dynamic": "true",
"_all": {
"enabled": true
},
"_timestamp": {
"enabled": true,
"format": "yyyy-MM-dd HH:mm:ss"
},
"dynamic_templates": [
{
"disable_string_index": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
],
"date_detection": false,
"properties": {
"submission_date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"tutorial_id": {
"index": "not_analyzed",
"type": "integer"
},
"tutorial_title": {
"index": "not_analyzed",
"type": "string"
},
"tutorial_author": {
"index": "not_analyzed",
"type": "string"
}
}
}
}
}
Step 2: create ES index "tutor" based on tutor-* template ( from Step 1).
I usually use elasticsearch head "Index" tab / "Any request" to create it.
Step 3 : create ES HIVE table with timestamp mapping
CREATE EXTERNAL TABLE tutorials_tbl_es(submission_date STRING ,tutorial_id INT,tutorial_title STRING,tutorial_author STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='tutor/tutors','es.nodes'='saturn:9200','es.mapping.timestamp'='submission_date');
Step 4: insert data from tutorials_tbl to tutorials_tbl_es
INSERT INTO tutorials_tbl_es SELECT * FROM tutorials_tbl LIMIT 1;

Related

Elasticsearch - Using copy_to on the fields of a nested type

Elastic version 7.17
Below I've pasted a simplified version of my mappings which represent a nested object structure. One top-level-object will have one or more second-level-object. A second-level-object will have one or more third-level-object. Fields field_a, field_b, and field_c on third-level-object are all related to each other so I'd like to copy them into a single field that can be partial matched against. I've done this on a lot of attributes at the top-level-object level, so I know it works.
{
"mappings": {
"_doc": { //one top level object
"dynamic": "false",
"properties": {
"second-level-objects": { //one or more second level objects
"type": "nested",
"dynamic": "false",
"properties": {
"third-level-objects": { //one or more third level objects
"type": "nested",
"dynamic": "false",
"properties": {
"my_copy_to_field": { //should have the values from field_a, field_b, and field_c
"type": "text",
"index": true
},
"field_a": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_b": {
"type": "long",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_c": {
"type": "keyword",
"index": false,
"copy_to": "my_copy_to_field"
},
"field_d": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
}
}
However, when I run a nested query against that my_copy_to_field I get no results because the field is never populated, even though I know my documents have data in the 3 fields with copy_to. If I perform a nested query against field_d which is not part of the copied info I get the expected results, so it seems like there's something going on with nested (or double-nested in my case) usage of copy_to that I'm overlooking. Here is my query which returns nothing:
GET /my_index/_search
{
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects",
"query": {
"nested": {
"inner_hits": {},
"path": "second-level-objects.third-level-objects",
"query": {
"bool": {
"should": [
{"match": {"second-level-objects.third-level-objects.my_copy_to_field": "my search value"}}
]
}
}
}
}
}
}
}
I've tried adding include_in_root:true to the third-level-objects, but that didn't make any difference. If I could just get the field to populate with the copied data then I'm sure I can work through getting the query working. Is there something I'm missing about using copy_to with nested fields?
Additionally, when I view my data in Kibana -> Discover, I see second-level-objects as an available "Nested" field, but I don't see anything for third-level-objects, even though KQL recognizes it as a field. Is that symptomatic of an issue?
You must add complete path nested, like this:
"field_a": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_b": {
"type": "long",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
},
"field_c": {
"type": "keyword",
"copy_to": "second-level-objects.third-level-objects.my_copy_to_field"
}

Lucene search using Kibana does return my results

Using Kibana, I have created the following index:
put newsindex
{
"settings" : {
"number_of_shards":3,
"number_of_replicas":2
},
"mappings" : {
"news": {
"properties": {
"NewsID": {
"type": "integer"
},
"NewsType": {
"type": "text"
},
"BodyText": {
"type": "text"
},
"Caption": {
"type": "text"
},
"HeadLine": {
"type": "text"
},
"Approved": {
"type": "text"
},
"Author": {
"type": "text"
},
"Contact": {
"type": "text"
},
"DateCreated": {
"type": "date",
"format": "date_time"
},
"DateSubmitted": {
"type": "date",
"format": "date_time"
},
"LastModifiedDate": {
"type": "date",
"format": "date_time"
}
}
}
}
}
I have populated the index with Logstash. If I just perform a match_all query, all my records are returned as you'd expect. However, when I try to perform a targeted query such as:
get newsindex/_search
{
"query":{"match": {"headline": "construct abnomolies"}
}
}
I can see headline as a property of _source, but my query is ignored i.e. I still receive everything, regardless of whats in the headline. How do I need to change my index to make headline searchable. I'm using Elasticsearch 5.6.3
I needed to change the name property on my index to be lowercase. I noticed in the output windows the the properties under _source where lowercase. In Kibana the predictive text was offering my notation and lowercase. I've dropped my index and re-populated and it now works.

[ElasticSearch]: Get all indices containing a specific value

Following is the use case:
I am trying to list all indices in elasticsearch that contain a particular value.
For illustration purposes let us consider following to be the index template:
{
"order": 0,
"template": "sample-*",
"settings": {
"index.refresh_interval": "300",
"index.number_of_replicas": "1",
"index.number_of_shards": "10"
},
"mappings": {
"digital": {
"_source": {
"enabled": false
},
"_all": {
"enabled": false
},
"properties": {
"website": {
"index": "not_analyzed",
"store": false,
"type": "string",
"doc_values": true
},
"iab_codes": {
"store": false,
"type": "long",
"doc_values": true
},
"audiences": {
"store": false,
"type": "long",
"doc_values": true
}
}
}
},
"aliases": {
}
}
The audiences field in this template is a sequence of long eg. [1,2,3]. I create one index per day based on this template. How can I get the list of all the indices that contain a specific value in the audiences array field.
Something like list all the indices where audience array contains the value 3.
Thank you.
You can make a search query similar to
http://localhost:9200/sample-*/_search?q=audiences:3&pretty
and using the Java API, you can try getting the value of hits.hits._index
Or rather use filter_path :
http://localhost:9200/sample*/_search?q=audiences:3&filter_path=hits.hits._index&pretty
Result will look like this

ElasticSearch - string concat aggregation?

I've got the following simple mapping:
"element": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"group": { "type": "string", "index": "not_analyzed" },
"type": { "type": "string", "index": "not_analyzed" }
}
}
Which basically is a way to store Group object:
{
id : "...",
elements : [
{id: "...", type: "..."},
...
{id: "...", type: "..."}
]
}
I want to find how many different groups exist sharing the same set of element types (ordered, including repetitions).
An obvious solution would be to change the schema to:
"element": {
"dynamic": "false",
"properties": {
"group": { "type": "string", "index": "not_analyzed" },
"concatenated_list_of_types": { "type": "string", "index": "not_analyzed" }
}
}
But, due to the requirements, we need to be able to exclude some types from group by (aggregation) :(
All fields of the document are mongo ids, so in SQL I would do something like this:
SELECT COUNT(id), concat_value FROM (
SELECT GROUP_CONCAT(type_id), group_id
FROM table
WHERE type_id != 'some_filtered_out_type_id'
GROUP BY group_id
) T GROUP BY concat_value
In Elastic with given mapping it's really easy to filter out, its also not a problem to count assuming we have a concated value. Needless to say, sum aggregation does not work for strings.
How can I get this working? :)
Thanks!
Finally I solved this problem with scripting and by changing the mapping.
{
"mappings": {
"group": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"elements": { "type": "string", "index": "not_analyzed" }
}
}
}
}
There are still some issues with duplicate elements in array (ScriptDocValues.Strings) for some reason strips out dups, but here's an aggregation that counts by string concat:
{
"aggs": {
"path": {
"scripted_metric": {
"map_script": "key = doc['elements'].join('-'); _agg[key] = _agg[key] ? _agg[key] + 1 : 1",
"combine_script": "_agg",
"reduce_script": "_aggs.collectMany { it.entrySet() }.inject( [:] ) { result, e -> result << [ (e.key):e.value + ( result[ e.key ] ?: 0 ) ]}"
}
}
}
}
The result would be as follows:
"aggregations" : {
"path" : {
"value" : {
"5639abfb5cba47087e8b457e" : 362,
"568bfc495cba47fc308b4567" : 3695,
"5666d9d65cba47701c413c53" : 14,
"5639abfb5cba47087e8b4571-5639abfb5cba47087e8b457b" : 1,
"570eb97abe529e83498b473d" : 1
}
}
}

Changing elasticsearch index's shard-count on the next index-rotation

I have an ELK (Elasticsearch-Kibana) stack wherein the elasticsearch node has the default shard value of 5. Logs are pushed to it in logstash format (logstash-YYYY.MM.DD), which - correct me if I am wrong - are indexed date-wise.
Since I cannot change the shard count of an existing index without reindexing, I want to increase the number of shards to 8 when the next index is created. I figured that the ES-API allows on-the-fly persistent changes.
How do I go about doing this?
You can use the "Template Management" features in Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/indices-templates.html
Create a new logstash template by using:
curl -XPUT localhost:9200/_template/logstash -d '
{
"template": "logstash-*",
"settings": {
"number_of_replicas": 1,
"number_of_shards": 8,
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"#version": {
"type": "string",
"index": "not_analyzed"
},
"geoip": {
"type": "object",
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}'
The next time the index that matches your pattern is created, it will be created with your new settings.
The setting is on your elasticsearch. You need to change to config file config/elasticsearch.yml
Change the index.number_of_shards: 8. and restart elasticsearch. The new configuration will set and the new index will use the new configuration, which create 8 shard as you want.
Best would be to use templates and to add one I would recommend Kopf pluin found here: https://github.com/lmenezes/elasticsearch-kopf
You can ofcourse use the API:
curl -XPUT $ELASTICSEARCH-MASTER$:9200/_template/$TEMPLATE-NAME$ -d '$TEMPLATE-CONTENT$'
In the plugin: on the top left corner click on more -> Index templates and then create a new template and make sure you have the following settings as part of your template:
{
"order": 0,
"template": "logstash*",
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {### your mapping ####},
"aliases": {}
}
The above setting will make sure that if a new new index with name logstash* is created it would have 5 number of shards and 1 replica.

Resources