Elasticsearch appends random strings to source data inside indexes - elasticsearch

I am new to Elasticsearch and have a peculiar problem: I am using elasticsearch with kibana to store and visualize most of the events in my application. For example to store a user login with user_id of 123, I would write to an index user/login/123 with the following array as data:
{
"details" : {
"fname" : "John",
"lname" : "Smith",
"click" : "login-button",
etc...
},
"ip_address" : 127.0.0.1,
"browser_type" : "Chrome",
"browser_version" : "17"
}
However the problem I encountered is that some records show up with a random string after the "details" array: see screenshot. Can anyone suggest what am I doing wrong and how can I fix existing indexes?
Screenshot

I think you should have something like this in your data:
{
"details" : {
"28d211adbf" : {
"stats" : {
"merge_field_count": 6,
"unsubscribe_count_since_send": 3
}
},
"555cd3bcba" : {
"stats" : {
"merge_field_count": 6,
"unsubscribe_count_since_send": 3
}
}
},
"ip_address" : 127.0.0.1,
"browser_type" : "Chrome",
"browser_version" : "17"
}
It's actually not a good practice in indexing document in elasticsearch.
Read about mapping explosion for more info about this:
https://www.elastic.co/blog/found-crash-elasticsearch#mapping-explosion

Related

How to find similar documents in Elasticsearch

My documents are made up of using various fields. Now given an input document, I want to find the similar documents using the input document fields. How can I achieve it?
{
"query": {
"more_like_this" : {
"ids" : ["12345"],
"fields" : ["field_1", "field_2"],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
you will get similar documents to id 12345. Here you need to specify only ids and field like title, category, name, etc. not their values.
Here is another code to do without ids, but you need to specify fields with values. Example: Get similar documents which have similar title to:
elasticsearch is fast
{
"query": {
"more_like_this" : {
"fields" : ["title"],
"like" : "elasticsearch is fast",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
You can add more fields and their values
You haven't mentioned the types of your fields. A general approach is to use a catch all field (using copy_to) with the more like this query.
{
"query": {
"more_like_this" : {
"fields" : ["first name", "last name", "address", "etc"],
"like" : "your_query",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
Put everything in your_query . You can increase or decrease min_term_freq and max_query_terms

How to use slice and count in the same query in MongoDB

I need to combine slice and count in the same query, let me explain how:
I have a collection which stores comments with his replies within in an array
{
"_id" : ObjectId("5a6b14796ede6d5169ad68a7"),
"_class" : "com.social.model.comment.FirstLevelComment",
"contentId" : "5a12996de7e84e0001b93a91",
"replies" : [
{
"_id" : ObjectId("5a6b151a6ede6d5169ad68b1"),
"date" : ISODate("2018-01-26T11:46:34.202Z"),
"text" : "Reply 1"
},
{
"_id" : ObjectId("5a6b151d6ede6d5169ad68b2"),
"date" : ISODate("2018-01-26T11:46:37.357Z"),
"text" : "Reply 2"
},
{
"_id" : ObjectId("5a6b15206ede6d5169ad68b3"),
"date" : ISODate("2018-01-26T11:46:40.170Z"),
"text" : "Reply 3"
},
{
"_id" : ObjectId("5a6b15236ede6d5169ad68b4"),
"date" : ISODate("2018-01-26T11:46:43.025Z"),
"text" : "Reply 4"
},
{
"_id" : ObjectId("5a6b15256ede6d5169ad68b5"),
"date" : ISODate("2018-01-26T11:46:45.931Z"),
"text" : "Reply 5"
}
],
"date" : ISODate("2018-01-26T11:43:53.578Z"),
"text" : "This is the comment text"
}
Every First level comment is stored in a separate document, so to retrieve all comments belong to a content, I have to make a query matching by "contentId" field.
But, I only want to retrieve the first two replies of every comment, so I have to use the $slice operator.
But I have to retrieve too, the total amount of replies that a comment has, so can I do that in the same query?
I'm using spring boot with mongo repositories, so for now my query is like this
#Query(value = "{}", fields = "{ replies : { $slice : 2 }}")
public Page<FirstLevelComment> findByContentId(String contentId, Pageable page);
But don't know how to add the number of replies to that query.
EDIT:
Added query as Alex P. said
db.comment.aggregate([
{$match:{contentId: "5a12996de7e84e0001b93a91"}},
{
$project: {
_id: 1,
_class: 1,
contentId: 1,
date: 1,
text: 1,
countSize: {$size: "$replies"},
sl: {$slice: ["$replies", 2]}
}
}])
If you'd directly aggregate on your mongo server you would have to do this:
db.collection.aggregate([
{
$project: {
_id: 1,
_class: 1,
contentId: 1,
date: 1,
text: 1,
countSize: {$size: "$replies"},
sl: {$slice: ["$replies", 2]}
}
}
])
When using aggregation framework in your Java application with Spring Data you can't use MongoRepository. You would have to use MongoTemplate instead.
Have a look in the documentation for more details
You can try below aggregation using MongoTemplate.
ProjectionOperation project = Aggregation.project().and("replies").slice(2).as("first 2 comments").and("replies").size().as("count");
SkipOperation skip = Aggregation.skip(2L);
LimitOperation limit = Aggregation.limit(5);
Aggregation aggregation = Aggregation.newAggregation(project, skip, limit);
AggregationResults<Document> result = mongoTemplate.aggregate(aggregation, collectionname, Document.class);

ElasticSearch on concatination of multiple fields

I have data where phone number is in parts. Therefore I created it as an array(object). But I want to search on the complete phone number now.
"Phone":{
"type" : "object",
"properties" : {
"first" : {
"type” : "text"
},
"second": {
"type" : "text"
}
}
}
Now if I have three records, [{"first" : "123", "second" = "456"}, {"first" : "456", "second" = "123"}, {"first" : "412", "second" = "356"}]. It should search on records like "123456", "456123", "412356". And should give 3 records for query "123".
Take a look at copy_to fields or create an ingest pipeline that creates a single field of those different numbers and also enriches the JSON.

Elasticsearch slow results with IN query and Scoring

I have text document data (500k approximately) saved in elasticsearch where the document text is mapped with it's corresponding document number.
I am trying to fetch results in batches for "Sample Text" in particular set of document numbers (300k appoximately) with scoring and i am facing extreme slowness in the result.
Here is the the Mapping
PUT my_index
{
"mappings" : {
"doc_repo" : {
"properties" : {
"doc_number" : {
"type" : "integer"
},
"document" : {
"type" : "string",
"term_vector" : "with_positions_offsets_payloads"
}
}
}
}
}
Here is the request query
{
"query" : {
"bool" : {
"must" : [
{
"terms" : {
"document" : [
"sample text"
]
}
},
{
"terms" : {
"doc_number" : [1,2,3....,300K] //ArrayOf_300K_DocNumbers
}
}
]
}
},
"fields" : [
"doc_number"
],
"size" : 500,
"from" : 0
}
I Tried fetching result in two other ways
Result without scoring in particular set of document numbers(i used filtering for this)
Result with scoring but without any particular set of document numbers (in batches)
Both of these were pretty quick, but problem comes when i am trying achieve both.
Do i need to change mapping or search query or any other ways to achieve this.
Thanks in advance.
Issue was specifically with elasticsearch 2.X, Upgrading elasticsearch solves the issue.

Elasticsearch: Comparing two fields of the same document, where one of the fields is inside a nested document

Consider the following document:
{
"group" : "fans",
"preferredFanId" : 1,
"user" : [
{
"fanId" : 1,
"first" : "John",
"last" : "Smith"
},
{
"fanId" : 2,
"first" : "Alice",
"last" : "White"
},
]
}
where "user" is a nested document. I want to get inner_hits (from 2.0.0-SNAPSHOT) where preferredFanId == user.fanId , and so I want only the John Smith record returned in the inner_hits.
Is it possible? I've tried several approaches like using "include_in_parent" or "_source", but nothing seems to work.

Resources