Group By (Aggregation) to get only latest fields value - elasticsearch

I write a query to read the metricbeat file. This gives me the whatever I want but it repeats the value multiple time.
I want to group by this on latest timestamp so I can get only latest record.
Below is my query
string indexName = "metricbeat-7.4.2-" + DateTime.Now.Year.ToString() + "." + DateTime.Now.Month.ToString("00") + "." + DateTime.Now.Day.ToString("00");
connectionSettings = new ConnectionSettings(connectionPool).DefaultIndex(indexName);
elasticClient = new ElasticClient(connectionSettings);
string[] systemFields = new string[]
{
"system.memory.actual.used.pct",
"system.cpu.total.norm.pct"
};
var elasticResponse = elasticClient.Search<object>(s => s
.DocValueFields(dvf => dvf.Fields(systemFields))
);
DSL query
get /metricbeat*/_search?pretty=true
{
"query" : {
"match_all": {}
},
"docvalue_fields" : [
"system.memory.actual.used.pct",
"system.cpu.total.norm.pct",
"system.load.5",
"docker.diskio.summary.bytes"
]
}

Related

What is the nest way to bulk index(around 40 k files of type .docx) using ingest-attachment?

I am fairly familiar with the ELK stack and currently using Elastic search 6.6. Our use case is content search for about 40K .docx files
(uploaded by Portfolio managers as research reports.
Max file size allowed is 10 MB, but mostly file sizes are in few Kb).
I have used the ingest attachment plug in to index sample test files and I am able to also search the content using KIBANA
for ex: POST /attachment_test/my_type/_search?pretty=true
{
"query": {
"match": {
"attachment.content": "JP Morgan"
}
}
}
returns me the expected results.
My doubts:
Using the ingest plug in, we need to push data to the plug in. I am using VS 2017 and elastic NEST dll. Which means, I have to programmatically read the 40K documents and push them to ES using the NEST commands?
I have gone through the Fscrawler project and know that it can achieve the purpose but I am keeping it as my last resort
If I were to use approach 1 (code), is there any bulk upload API available for posting number of attachments together to ES (in batches)?
Finally, I uploaded 40K files in to the elastic index using C# code:
private static void PopulateIndex(ElasticClient client)
{
var directory =System.Configuration.ConfigurationManager.AppSettings["CallReportPath"].ToString();
var callReportsCollection = Directory.GetFiles(directory, "*.doc"); //this will fetch both doc and docx
//callReportsCollection.ToList().AddRange(Directory.GetFiles(directory, "*.doc"));
ConcurrentBag<string> reportsBag = new ConcurrentBag<string>(callReportsCollection);
int i = 0;
var callReportElasticDataSet = new DLCallReportSearch().GetCallReportDetailsForElastic();//.AsEnumerable();//.Take(50).CopyToDataTable();
try
{
Parallel.ForEach(reportsBag, callReport =>
//Array.ForEach(callReportsCollection,callReport=>
{
var base64File = Convert.ToBase64String(File.ReadAllBytes(callReport));
var fileSavedName = callReport.Replace(directory, "");
// var dt = dLCallReportSearch.GetCallFileName(fileSavedName.Replace("'", "''"));//replace the ' in a file name with '';
var rows = callReportElasticDataSet.Select("CALL_SAVE_FILE like '%" + fileSavedName.Replace("'", "''") + "'");
if (rows != null && rows.Count() > 0)
{
var row = rows.FirstOrDefault();
//foreach (DataRow row in rows)
//{
i++;
client.Index(new Document
{
Id = i,
DocId = Convert.ToInt32(row["CALL_ID"].ToString()),
Path = row["CALL_SAVE_FILE"].ToString().Replace(CallReportPath, ""),
Title = row["CALL_FILE"].ToString().Replace(CallReportPath, ""),
Author = row["USER_NAME"].ToString(),
DateOfMeeting = string.IsNullOrEmpty(row["CALL_DT"].ToString()) ? (DateTime?)null : Convert.ToDateTime(row["CALL_DT"].ToString()),
Location = row["CALL_LOCATION"].ToString(),
UploadDate = string.IsNullOrEmpty(row["CALL_REPORT_DT"].ToString()) ? (DateTime?)null : Convert.ToDateTime(row["CALL_REPORT_DT"].ToString()),
CompanyName = row["COMP_NAME"].ToString(),
CompanyId = Convert.ToInt32(row["COMP_ID"].ToString()),
Country = row["COU_NAME"].ToString(),
CountryCode = row["COU_CD"].ToString(),
RegionCode = row["REGION_CODE"].ToString(),
RegionName = row["REGION_NAME"].ToString(),
SectorCode = row["SECTOR_CD"].ToString(),
SectorName = row["SECTOR_NAME"].ToString(),
Content = base64File
}, p => p.Pipeline("attachments"));
//}
}
});
}
catch (Exception ex)
{
throw ex;
}
}

How to read distance from Elasticsearch response

I am using Elasticsearch V6, NEST V6.
I am searching ES as below and I am using ScriptFields to calculate the distance and include it the result.
var searchResponse = _elasticClient.Search<MyDocument>(new SearchRequest<MyDocument>
{
Query = new BoolQuery
{
Must = new QueryContainer[] { matchQuery },
Filter = new QueryContainer[] { filterQuery },
},
Source = new SourceFilter
{
Includes = resultFields // fields to be included in the result
},
ScriptFields = new ScriptField
{
Script = new InlineScript("doc['geoLocation'].planeDistance(params.lat, params.lng) * 0.001") // divide by 1000 to convert to km
{
Lang = "painless",
Params = new FluentDictionary<string, object>
{
{ "lat", _center.Latitude },
{ "lng", _center.Longitude }
}
}
}
});
Now, I am trying to read the search result and I am not sure how to read the distance from the response, this is what I have tried:
// this is how I read the Document, all OK here
var docs = searchResponse.Documents.ToList<MyDocument>();
// this is my attempt to read the distance from the result
var hits = searchResponse.Hits;
foreach (var h in hits)
{
var d = h.Fields["distance"];
// d is of type Nest.LazyDocument
// I am not sure how to get the distance value from object of type LazyDocument
}
While debugging I can see the distance value, I am just not sure how to read the value?
I found the answer here
To read search document and distance:
foreach (var hit in searchResponse.Hits)
{
MyDocument doc = hit.Source;
double distance = hit.Fields.Value<double>("distance");
}
And if you are only interested in distance:
foreach (var fieldValues in searchResponse.Fields)
{
var distance = fieldValues.Value<double>("distance");
}

ElasticSearch NEST 5.6.1 Query for unit test

I wrote a bunch of queries to elastic search and I wanted to write a unit test for them. using this post moq an elastic connection I was able to preform a general mocking. But When I tried to view the Json which is being generated from my query I didn't manage to get it in any way.
I tried to follow this post elsatic query moq, but it is relevant only to older versions of Nest because the method ConnectionStatus and RequestInformation is no longer available for an ISearchResponse object.
My test look as follow:
[TestMethod]
public void VerifyElasticFuncJson()
{
//Arrange
var elasticService = new Mock<IElasticService>();
var elasticClient = new Mock<IElasticClient>();
var clinet = new ElasticClient();
var searchResponse = new Mock<ISearchResponse<ElasticLog>>();
elasticService.Setup(es => es.GetConnection())
.Returns(elasticClient.Object);
elasticClient.Setup(ec => ec.Search(It.IsAny<Func<SearchDescriptor<ElasticLog>,
ISearchRequest>>())).
Returns(searchResponse.Object);
//Act
var service = new ElasticCusipInfoQuery(elasticService.Object);
var FindFunc = service.MatchCusip("CusipA", HostName.GSMSIMPAPPR01,
LogType.Serilog);
var con = GetConnection();
var search = con.Search<ElasticLog>(sd => sd
.Type(LogType.Serilog)
.Index("logstash-*")
.Query(q => q
.Bool(b => b
.Must(FindFunc)
)
)
);
**HERE I want to get the JSON** and assert it look as expected**
}
Is there any other way to achieve what I ask?
The best way to do this would be to use the InMemoryConnection to capture the request bytes and compare this to the expected JSON. This is what the unit tests for NEST do. Something like
private static void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool, new InMemoryConnection())
.DefaultIndex("default")
.DisableDirectStreaming();
var client = new ElasticClient(connectionSettings);
// Act
var searchResponse = client.Search<Question>(s => s
.Query(q => (q
.Match(m => m
.Field(f => f.Title)
.Query("Kibana")
) || q
.Match(m => m
.Field(f => f.Title)
.Query("Elasticsearch")
.Boost(2)
)) && +q
.Range(t => t
.Field(f => f.Score)
.GreaterThan(0)
)
)
);
var actual = searchResponse.RequestJson();
var expected = new
{
query = new {
#bool = new {
must = new object[] {
new {
#bool = new {
should = new object[] {
new {
match = new {
title = new {
query = "Kibana"
}
}
},
new {
match = new {
title = new {
query = "Elasticsearch",
boost = 2d
}
}
}
},
}
},
new {
#bool = new {
filter = new [] {
new {
range = new {
score = new {
gt = 0d
}
}
}
}
}
}
}
}
}
};
// Assert
Console.WriteLine(JObject.DeepEquals(JToken.FromObject(expected), JToken.Parse(actual)));
}
public static class Extensions
{
public static string RequestJson(this IResponse response) =>
Encoding.UTF8.GetString(response.ApiCall.RequestBodyInBytes);
}
I've used an anonymous type for the expected JSON as it's easier to work with than an escaped JSON string.
One thing to note is that Json.NET's JObject.DeepEquals(...) will return true even when there are repeated object keys in a JSON object (so long as the last key/value matches). It's not likely something you'll encounter if you're only serializing NEST searches though, but something to be aware of.
If you're going to have many tests checking serialization, you'll want to create a single instance of ConnectionSettings and share with all, so that you can take advantage of the internal caches within it and your tests will run quicker than instantiating a new instance in each test.

elasticsearch nest : get number results of SearchRequest

I am looking for how to do an elasticsearch _count for nest :
in elastic seatch it would be:
i am looking for the equivalent of:
var request = new SearchRequest<type>()
{
Query = new BoolQuery
{
//Should = ...
//Must = ...
},
MinScore = 1
//....
};
var nbResult = client.Count(request);
If you know how to do it and if you have a tip for having a count of results with the fastest way it would help me a lot.
Use client.Count<T>( ... )
var request = new CountRequest<Document>
{
Query = new MatchAllQuery()
};
var nbResult = client.Count<Document>(request);
which yields the following request
POST http://localhost:9200/default-index/document/_count
{
"query": {
"match_all": {}
}
}
I found in sources. Its not solution because i cant test it locally but at least direction.
Look to this test and client source

elasticsearch nest stopword filter does not work

I am tiring to implement elasticsearch NEST client and indexing documents and SQL data and able to search these perfectly. But I am not able to apply stopwords on these records. Below is the code. Please note I put "abc" as my stopword.
public IndexSettings GetIndexSettings()
{
var stopTokenFilter = new StopTokenFilter();
string stopwordsfilePath = Convert.ToString(ConfigurationManager.AppSettings["Stopwords"]);
string[] stopwordsLines = System.IO.File.ReadAllLines(stopwordsfilePath);
List<string> words = new List<string>();
foreach (string line in stopwordsLines)
{
words.Add(line);
}
stopTokenFilter.Stopwords = words;
var settings = new IndexSettings { NumberOfReplicas = 0, NumberOfShards = 5 };
settings.Settings.Add("merge.policy.merge_factor", "10");
settings.Settings.Add("search.slowlog.threshold.fetch.warn", "1s");
settings.Analysis.Analyzers.Add("xyz", new StandardAnalyzer { StopWords = words });
settings.Analysis.Tokenizers.Add("keyword", new KeywordTokenizer());
settings.Analysis.Tokenizers.Add("standard", new StandardTokenizer());
settings.Analysis.TokenFilters.Add("standard", new StandardTokenFilter());
settings.Analysis.TokenFilters.Add("lowercase", new LowercaseTokenFilter());
settings.Analysis.TokenFilters.Add("stop", stopTokenFilter);
settings.Analysis.TokenFilters.Add("asciifolding", new AsciiFoldingTokenFilter());
settings.Analysis.TokenFilters.Add("word_delimiter", new WordDelimiterTokenFilter());
return settings;
}
public void CreateDocumentIndex(string indexName = null)
{
IndexSettings settings = GetIndexSettings();
if (!this.client.IndexExists(indexName).Exists)
{
this.client.CreateIndex(indexName, c => c
.InitializeUsing(settings)
.AddMapping<Document>
(m => m.Properties(ps => ps.Attachment
(a => a.Name(o => o.Documents)
.TitleField(t => t.Name(x => x.Name)
.TermVector(TermVectorOption.WithPositionsOffsets))))));
}
var r = this.client.GetIndexSettings(i => i.Index(indexName));
}
Indexing Data
var documents = GetDocuments();
documents.ForEach((document) =>
{
indexRepository.IndexData<Document>(document, DOCindexName, DOCtypeName);
});
public bool IndexData<T>(T data, string indexName = null, string mappingType = null)
where T : class, new()
{
if (client == null)
{
throw new ArgumentNullException("data");
}
var result = this.client.Index<T>(data, c => c.Index(indexName).Type(mappingType));
return result.IsValid;
}
In one of my document I have put a single line "abc" and I do not expect this to be returned as "abc" is in my stopword list. But On Searching Document It is also returning the above document. Below is the search query.
public IEnumerable<dynamic> GetAll(string queryTerm)
{
var queryResult = this.client.Search<dynamic>(d => d
.Analyzer("xyz")
.AllIndices()
.AllTypes()
.QueryString(queryTerm)).Documents;
return queryResult;
}
Please suggest where I am going wrong.

Resources