Can anybody knows the process to upload bulk data in elastic search, I am not able to upload data in elastic search.
Thanks
Gaurav Singh
from elasticsearch import helpers, Elasticsearch
data = [{'id':1, 'content':'some content'}]
INDEX_NAME = 'index'
TYPE = 'list'
def get_actions():
actions = []
for d in data:
action = {
'_op_type': 'update',
"_index": INDEX_NAME,
'_type': TYPE,
"_id": d['id'],
}
actions.append(action)
return(actions)
result = helpers.bulk(Elasticsearch(IP), get_actions())
Related
I'm very new to this so I would appreciate help very much
Code is below:
import pandas as pd
from elasticsearch import Elasticsearch
from elasticsearch import helpers
elastic_user = "elastic"
elastic_password = "pass"
SOURCE = 'netflix_titles.csv'
netflix_df = pd.read_csv(SOURCE)
elastic_client = Elasticsearch("https://localhost:9200",verify_certs=False,basic_auth=(elastic_user,elastic_password))
def doc_generator(df):
df_iter = df.iterrows()
for index, document in df_iter:
yield {
"_index": "netflix_shows",
"_source": document,
}
helpers.bulk(elastic_client, doc_generator(netflix_df))
When I try to push the df into the index I get:
elasticsearch.helpers.BulkIndexError: 500 document(s) failed to index.
I have Elastic with Nest.
I have logs in elastic. I have no problems to query all by .client.Query(... But I'm having problems in getting one specific document by its __id using client.Get.
I'm using:
_el_client.Get<SystemLog>(id); // This does not work (_id = QUrLVXgB1uALlflB_-oF)
But object / record is not returned... What is the way to query a concrete elastic _id from Nest client?
This is the beginning of the document (just for the info).
"_index": "webapi-development-2021-03",
"_type": "_doc",
"_id": "QUrLVXgB1uALlflB_-oF",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2021-03-21T18:18:55.2173785+01:00",
"level": "Information",
"messageTemplate": "{HostingRequestFinishedLog:l}",
// etc., etc.
Thx for your help...
Ok after many tests I find out solutions... I must say official DOCS sucks... This should be as startup examples.. Most common needs..
When using Get I need to specific concrete index not only part ending with *
Example:
GetResponse<SystemLog> result = _el_client.Get<SystemLog>(request.id, idx => idx.Index("webapi-development-2021-03"));
Require to build id for app containing __id + __index
Using search (easier but slower)
var response = _el_client.Search<SystemLog>(s => s
.Query(q => q
.Ids(i => i
.Values(request.id)
)
)
);
I'm trying to do a bulk insert using elastic search py, but I don't want to specify a type, but it won't allow me to specify None or "" for the value of type. How can I get around this?
bulk_data = []
this_merchant_product = {'field1': 'value1'}
op_dict = {
"index": {
"_index": "product",
"_type": None,
"_id": str(this_merchant_product_id)
}
}
bulk_data.append(op_dict)
bulk_data.append(this_merchant_product)
es = Elasticsearch()
res = es.bulk(index='product', body=bulk_data)
I've also tried to set _type to "", but that doesn't work either.
These are the error messages.
This is the error when I set _type to None:
elasticsearch.exceptions.RequestError: RequestError(400, 'action_request_validation_exception', 'Validation Failed: 1: type is missing;
and this is the error I get when I set _type to "":
java.lang.IllegalArgumentException: name cannot be empty string
Each index has one mapping type in Elasticsearch 6.x+. In Elasticsearch 7.x+ type is removed. In version 2.x - 5.6 you could have used more then one mapping. Assuming that you’re using version 6.x, you need to have type of documents in index.
I have articles & tags collection. Articles contain tags which is array of objectId. I want to fetch tagName as well, so I unwind (this gives me multiple rows - 1 per tag array entry) => lookup (joins with tabs collection) => group (combine it into original result set)
My mongodb query is as follows, which gives me correct result:
db.articles.aggregate([
{"$unwind": "$tags"},
{
"$lookup": {
"localField": "tags",
"from": "tags",
"foreignField": "_id",
"as": "materialTags"
}
},
{
"$group": {
"_id": "$_id",
"title": {"$first": "$title"},
"materialTags": {"$push": "$materialTags"}
}
}
])
My corresponding Spring code:
UnwindOperation unwindOperation = Aggregation.unwind("tags");
LookupOperation lookupOperation1 = LookupOperation.newLookup()
.from("tags")
.localField("tags")
.foreignField("_id")
.as("materialTags");
//I also want to add group operation but unable to find the proper syntax ??.
Aggregation aggregation = Aggregation.newAggregation(unwindOperation,
lookupOperation1, ??groupOperation?? );
AggregationResults<Article> resultList
= mongoTemplate.aggregate(aggregation, "articles", Article.class);
I tried to play around with group operation but without much luck. How can I add group operations as per original query ?
Thanks in advance.
Group query syntax in Spring for
{
"$group": {
"_id": "$_id",
"title": {"$first": "$title"},
"materialTags": {"$push": "$materialTags"}
}
}
is
Aggregation.group("_id").first("title").as("title").push("materialTags").as("materialTags")
Final query
UnwindOperation unwindOperation = Aggregation.unwind("tags");
LookupOperation lookupOperation1 = LookupOperation.newLookup()
.from("tags")
.localField("tags")
.foreignField("_id")
.as("materialTags");
Aggregation aggregation = Aggregation.newAggregation(unwindOperation,
lookupOperation1, Aggregation.group("_id").first("title").as("title").push("materialTags").as("materialTags") );
AggregationResults<Article> resultList
= mongoTemplate.aggregate(aggregation, "articles", Article.class);
To get more info please go thru the below references
http://www.baeldung.com/spring-data-mongodb-projections-aggregations
spring data mongodb group by
Create Spring Data Aggregation from MongoDb aggregation query
https://www.javacodegeeks.com/2016/04/data-aggregation-spring-data-mongodb-spring-boot.html
I wrote my own filter for Logstash and I'm trying to calculate my own document_id something like this:
docIdClean = "%d %s %s %s" % [ event["#timestamp"].to_f * 1000, event["type"], event["message"] ]
event["docId"] = Digest::MD5.hexdigest(docIdClean)
And the Logstash configuration looks like this:
output {
elasticsearch {
...
index => "analysis-%{+YYYY.MM.dd}"
document_id => "%{docId}"
template_name => "logstash_per_index"
}
}
The more or less minor downside is that all documents in Elasticsearch contain _id and docId holding the same value. Since docId is completely pointless as nobody searches for an MD5-hash I want to remove it, but I don't know how.
The docId has to exist when the event hits the output, otherwise the output can't refer to it. Therefore, I can't remove it beforehand. Since I can't remove it afterwards, the docId sits there occupying space.
I tried to set the event field _id, but that only causes an exception in Elasticsearch that the id of the document is different.
Maybe for explanation here one document:
{
"_index": "analysis-2014.09.16",
"_type": "access",
"_id": "022d9055423cdd0756b6cfa06886f866",
"_score": 1,
"_source": {
"#timestamp": "2014-09-16T19:36:31.000+02:00",
"type": "access",
"tags": [
"personalized"
],
"importDate": "2014/09/17",
"docId": "022d9055423cdd0756b6cfa06886f866"
}
}
EDIT:
This is about Logstash 1.3
There's nothing you can do about this in Logstash 1.4.
In Logstash 1.5, you can use #metadata fields, which are not passed to Elasticsearch.