Elasticsearch river - no _meta document found after 5 attempts - elasticsearch

I am using elasticsearch version 1.3.0. when I create a river using wikipedia plugin version 2.3.0 as thus
PUT _river/my_river/_meta -d
{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2"
},
"index" : {
"index" : "wikipedia",
"type" : "wiki",
"bulk_size" : 1000,
"max_concurrent_bulk" : 3
}
}
the server responds with this message
{
"_index": "_river",
"_type": "my_river",
"_id": "_meta -d",
"_version": 1,
"created": true
}
however, I don't see the wikipedia documents when I run a search. also, when I restart my server I get river-routing no _meta document found after 5 attempts

Remove the -d at the end as it creates a document named _meta -d and not _meta.
PUT _river/my_river/_meta
{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2"
},
"index" : {
"index" : "wikipedia",
"type" : "wiki",
"bulk_size" : 1000,
"max_concurrent_bulk" : 3
}
}

Related

How to identify timefield of the index?

Is there a es query or some way to ask Elasticsearch that which field is being used as time field for a specific index?
You can use Kibana to choose the right time field (Step 5):
In Kibana, open Management, and then click Index Patterns.
If this is your first index pattern, the Create index pattern page opens automatically. Otherwise, click Create index pattern in the upper left.
Enter "your_index_name*" in the Index pattern field.
Click Next step
In Configure settings, select "#your_timestamp_field" in the Time Filter field name dropdown menu.
Click Create index pattern.
Kibana User Guide: Defining your index patterns
Or search in your index mapping for an field with "type: date"
curl 'http://localhost:9200/your_index/_mapping?pretty'
{
"your_index" : {
"mappings" : {
"your_index" : {
"properties" : {
"#**timestamp**" : {
"type" : "date"
},
"#version" : {
"type" : "text"
},
"clock" : {
"type" : "long"
},
"host" : {
"type" : "text"
},
"type" : {
"type" : "text"
}
}
}
}
}
}
Get Mapping
Or look into your indexed documents:
curl 'http://localhost:9200/your_index/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "your_index",
"_type" : "your_index",
"_id" : "logstash-01.kvm.local",
"_score" : 1.0,
"_source" : {
"#timestamp" : "2018-11-10T18:03:22.822Z",
"host" : "logstash-01.kvm.local",
"#version" : "1",
"clock" : 558753,
"type" : "your_index"
}
}
]
}
}
Search API
When you have index already created and you want to check which field is used as a time field, navigate down to Management/Stack management/Index patterns, select your index and search through the fields. The field that is used as a time field has time (clock) icon next to it.

Elasticsearch Top 10 Most Frequent Values In Array Across All Records

I have an index "test". Document structure is as shown below. Each document has an array of "tags". I am not able to figure out how to query this index to get top 10 most frequently occurring tags?
Also, what are the best practices one should follow if we have more than 2mil docs in this index?
{
"_index" : "test",
"_type" : "data",
"_id" : "1412879673545024927_1373991666",
"_score" : 1.0,
"_source" : {
"instagramuserid" : "1373991666",
"likes_count" : 163,
"#timestamp" : "2017-06-08T08:52:41.803Z",
"post" : {
"created_time" : "1482648403",
"comments" : {
"count" : 9
},
"user_has_liked" : true,
"link" : "https://www.instagram.com/p/BObjpPMBWWf/",
"caption" : {
"created_time" : "1482648403",
"from" : {
"full_name" : "PARAMSahib ™",
"profile_picture" : "https://scontent.cdninstagram.com/t51.2885-19/s150x150/12750236_1692144537739696_350427084_a.jpg",
"id" : "1373991666",
"username" : "parambanana"
},
"id" : "17845953787172829",
"text" : "This feature talks about how to work pastels .\n\nDull gold pullover + saffron khadi kurta + baby pink pants + Deep purple patka and white sneakers - Perfect colours for a Happy sunday christmas morning . \n#paramsahib #men #menswear #mensfashion #mensfashionblog #mensfashionblogger #menswearofficial #menstyle #fashion #fashionfashion #fashionblog #blog #blogger #designer #fashiondesigner #streetstyle #streetfashion #sikh #sikhfashion #singhstreetstyle #sikhdesigner #bearded #indian #indianfashionblog #indiandesigner #international #ootd #lookbook #delhistyleblog #delhifashionblog"
},
"type" : "image",
"tags" : [
"men",
"delhifashionblog",
"menswearofficial",
"fashiondesigner",
"singhstreetstyle",
"fashionblog",
"mensfashion",
"fashion",
"sikhfashion",
"delhistyleblog",
"sikhdesigner",
"indianfashionblog",
"lookbook",
"fashionfashion",
"designer",
"streetfashion",
"international",
"paramsahib",
"mensfashionblogger",
"indian",
"blog",
"mensfashionblog",
"menstyle",
"ootd",
"indiandesigner",
"menswear",
"blogger",
"sikh",
"streetstyle",
"bearded"
],
"filter" : "Normal",
"attribution" : null,
"location" : null,
"id" : "1412879673545024927_1373991666",
"likes" : {
"count" : 163
}
}
}
},
If your tags type in mapping is object (which is by default) you can use an aggregation query like this:
{
"size": 0,
"aggs": {
"frequent_tags": {
"terms": {"field": "post.tags"}
}
}
}

Elastic search csv river module not working

I am trying to index csv file in elasticsearch
curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/tmp",
"filename_pattern" : ".*\\.csv$",
"first_line_is_header" : "true",
"field_separator" : ",",
"field_id" : "_id"
},
"index" : {
"index" : "my_csv_data_1",
"type" : "csv_type_1",
"bulk_size" : 100,
"bulk_threshold" : 10
}
}'
after indexing while searching http://localhost:9200/my_csv_data_1/_search
got
{
error: "IndexMissingException[[my_csv_data_1] missing]",
status: 404
}
any thoughts or i missed any thing?

Specify Routing on Index Alias's Term Lookup Filter

I am using Logstash, ElasticSearch and Kibana to allow multiple users to log in and view the log data they have forwarded. I have created index aliases for each user. These restrict their results to contain only their own data.
I'd like to assign users to groups, and allow users to view data for the computers in their group. I created a parent-child relationship between the groups and the users, and I created a term lookup filter on the alias.
My problem is, I receive a RoutingMissingException when I try to apply the alias.
Is there a way to specify the routing for the term lookup filter? How can I lookup terms on a parent document?
I posted the mapping and alias below, but a full gist recreation is available at this link.
curl -XPUT 'http://localhost:9200/accesscontrol/' -d '{
"mappings" : {
"group" : {
"properties" : {
"name" : { "type" : "string" },
"hosts" : { "type" : "string" }
}
},
"user" : {
"_parent" : { "type" : "group" },
"_routing" : { "required" : true, "path" : "group_id" },
"properties" : {
"name" : { "type" : "string" },
"group_id" : { "type" : "string" }
}
}
}
}'
# Create the logstash alias for cvializ
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "logstash-2014.04.25", "alias" : "cvializ-logstash-2014.04.25" } },
{
"add" : {
"index" : "logstash-2014.04.25",
"alias" : "cvializ-logstash-2014.04.25",
"routing" : "intern",
"filter": {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "user",
"id" : "cvializ",
"path" : "group.hosts"
},
"_cache_key" : "cvializ_hosts"
}
}
}
}
]
}'
In attempting to find a workaround for this error, I submitted a bug to the ElasticSearch team, and received an answer from them. It was a bug in ElasticSearch where the filter is applied before the dynamic mapping, causing some erroneous output. I've included their workaround below:
PUT /accesscontrol/group/admin
{
"name" : "admin",
"hosts" : ["computer1","computer2","computer3"]
}
PUT /_template/admin_group
{
"template" : "logstash-*",
"aliases" : {
"template-admin-{index}" : {
"filter" : {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "group",
"id" : "admin",
"path" : "hosts"
}
}
}
}
},
"mappings": {
"example" : {
"properties": {
"host" : {
"type" : "string"
}
}
}
}
}
POST /logstash-2014.05.09/example/1
{
"message":"my sample data",
"#version":"1",
"#timestamp":"2014-05-09T16:25:45.613Z",
"type":"example",
"host":"computer1"
}
GET /template-admin-logstash-2014.05.09/_search

ElasticSearch CouchDB Geo location

I am trying to get elasticsearch to index a couchdb river without luck.
I have a database 'pl1' with only one document '1' in it.
This is a printout of the entire document pretty-printed:
curl -XGET localhost:5984/pl1/1 | python -mjson.tool
{
"_id": "1",
"_rev": "1-0442f3962cffedc2238fcdb28dd77557",
"location": {
"geo_json": {
"coordinates": [
59.70141999133738,
14.162789164118708
],
"type": "point"
},
"lat": 14.162789164118708,
"lon": 59.70141999133738
}
}
I create a couchdb river and index with a catch-all type called all_entries the following way:
curl -XPUT 'localhost:9200/_river/pl1/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"filter" : null,
"db" : "pl1"
},
"index" : {
"index" : "pl1",
"type" : "all_entries",
"bulk_size" : "100",
"bulk_timeout" : "10ms"
}
}'
{"ok":true,"_index":"_river","_type":"pl1","_id":"_meta","_version":1}
To test whether the document was indexed I perform the following query:
curl -XGET localhost:9200/pl1/all_entries/_count?pretty=true
{
"count" : 1,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
But then nothing. I can't figure out how to index the location using a geo_shape type (I have also tried with the different geo_point format for the data, and indexing that, but also no results)
How do I specify a mapper and query for this?

Resources