Elasticsearch.net Index Settings + Analyzer - elasticsearch

i can use elasticsearch 2.3.0 version with C# (Nest)
i want to use the analysis with index,
but index settings doesn't change and i don't know why.
Here's my code:
private void button1_Click_1(object sender, EventArgs e)
{
var conn = new Uri("http://localhost:9200");
var config = new ConnectionSettings(conn);
var client = new ElasticClient(config);
string server = cmb_Serv.Text.Trim();
if (server.Length > 0)
{
string ser = server;
string uid = util.getConfigValue("SetUid");
string pwd = util.getConfigValue("SetPwd");
string dbn = cmb_Db.Text;
string tbl = cmb_Tbl.Text;
setWorkDbConnection(ser, uid, pwd, dbn);
string query = util.getConfigValue("SelectMC");
query = query.Replace("###tbl###",tbl);
using (SqlCommand cmd1 = new SqlCommand())
{
using (SqlConnection con1 = new SqlConnection())
{
con1.ConnectionString = util.WorkConnectionString;
con1.Open();
cmd1.CommandTimeout = 0; cmd1.Connection = con1;
cmd1.CommandText = query;
int id_num =0;
SqlDataReader reader = cmd1.ExecuteReader();
while (reader.Read())
{ id_num++;
Console.Write("\r" + id_num);
var mc = new mc
{
Id = id_num,
code = reader[0].ToString(),
mainclass = reader[1].ToString().Trim()
};
client.Index(mc, idx => idx.Index("mctest_ilhee"));
client.Alias(x => x.Add(a => a.Alias("mcAlias").Index("mctest_ilhee")));
client.Map<mc>(d => d
.Properties(props => props
.String(s => s
.Name(p => p.mainclass)
.Name(p2 => p2.code).Index(FieldIndexOption.Analyzed).Analyzer("whitespace"))));
} reader.Dispose();
reader.Close();
}
IndexSettings Is = new IndexSettings();
Is.Analysis.Analyzers.Add("snowball", new SnowballAnalyzer());
Is.Analysis.Analyzers.Add("whitespace", new WhitespaceAnalyzer());
}
}
}

Well first of all you code is strange.
Why you are doing mapping in while? Do mapping only once.
Its impossible to help you because you are not even providing error you get. I would recomend to add simple debug method.
protected void ValidateResponse(IResponse response)
{
if (!response.IsValid ||
(response is IIndicesOperationResponse && !((IIndicesOperationResponse) response).Acknowledged))
{
var error = string.Format("Request to ES failed with error: {0} ", response.ServerError != null ? response.ServerError.Error : "Unknown");
var esRequest = string.Format("URL: {0}\n Method: {1}\n Request: {2}\n",
response.ConnectionStatus.RequestUrl,
response.ConnectionStatus.RequestMethod,
response.ConnectionStatus.Request != null
? Encoding.UTF8.GetString(response.ConnectionStatus.Request)
: string.Empty);
}
}
All requests such as client.Alias, client.Map returns status. So you can do
var result = client.Map<mc>(.....YOUR_CODE_HERE....)
ValidateResponse(result);
Then you will see two things, propper error which ES return + request which NEST sends to ES

Related

ElasticSearch NEST 5.6.1 Query for unit test

I wrote a bunch of queries to elastic search and I wanted to write a unit test for them. using this post moq an elastic connection I was able to preform a general mocking. But When I tried to view the Json which is being generated from my query I didn't manage to get it in any way.
I tried to follow this post elsatic query moq, but it is relevant only to older versions of Nest because the method ConnectionStatus and RequestInformation is no longer available for an ISearchResponse object.
My test look as follow:
[TestMethod]
public void VerifyElasticFuncJson()
{
//Arrange
var elasticService = new Mock<IElasticService>();
var elasticClient = new Mock<IElasticClient>();
var clinet = new ElasticClient();
var searchResponse = new Mock<ISearchResponse<ElasticLog>>();
elasticService.Setup(es => es.GetConnection())
.Returns(elasticClient.Object);
elasticClient.Setup(ec => ec.Search(It.IsAny<Func<SearchDescriptor<ElasticLog>,
ISearchRequest>>())).
Returns(searchResponse.Object);
//Act
var service = new ElasticCusipInfoQuery(elasticService.Object);
var FindFunc = service.MatchCusip("CusipA", HostName.GSMSIMPAPPR01,
LogType.Serilog);
var con = GetConnection();
var search = con.Search<ElasticLog>(sd => sd
.Type(LogType.Serilog)
.Index("logstash-*")
.Query(q => q
.Bool(b => b
.Must(FindFunc)
)
)
);
**HERE I want to get the JSON** and assert it look as expected**
}
Is there any other way to achieve what I ask?
The best way to do this would be to use the InMemoryConnection to capture the request bytes and compare this to the expected JSON. This is what the unit tests for NEST do. Something like
private static void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool, new InMemoryConnection())
.DefaultIndex("default")
.DisableDirectStreaming();
var client = new ElasticClient(connectionSettings);
// Act
var searchResponse = client.Search<Question>(s => s
.Query(q => (q
.Match(m => m
.Field(f => f.Title)
.Query("Kibana")
) || q
.Match(m => m
.Field(f => f.Title)
.Query("Elasticsearch")
.Boost(2)
)) && +q
.Range(t => t
.Field(f => f.Score)
.GreaterThan(0)
)
)
);
var actual = searchResponse.RequestJson();
var expected = new
{
query = new {
#bool = new {
must = new object[] {
new {
#bool = new {
should = new object[] {
new {
match = new {
title = new {
query = "Kibana"
}
}
},
new {
match = new {
title = new {
query = "Elasticsearch",
boost = 2d
}
}
}
},
}
},
new {
#bool = new {
filter = new [] {
new {
range = new {
score = new {
gt = 0d
}
}
}
}
}
}
}
}
}
};
// Assert
Console.WriteLine(JObject.DeepEquals(JToken.FromObject(expected), JToken.Parse(actual)));
}
public static class Extensions
{
public static string RequestJson(this IResponse response) =>
Encoding.UTF8.GetString(response.ApiCall.RequestBodyInBytes);
}
I've used an anonymous type for the expected JSON as it's easier to work with than an escaped JSON string.
One thing to note is that Json.NET's JObject.DeepEquals(...) will return true even when there are repeated object keys in a JSON object (so long as the last key/value matches). It's not likely something you'll encounter if you're only serializing NEST searches though, but something to be aware of.
If you're going to have many tests checking serialization, you'll want to create a single instance of ConnectionSettings and share with all, so that you can take advantage of the internal caches within it and your tests will run quicker than instantiating a new instance in each test.

Scroll example in ElasticSearch NEST API

I am using .From() and .Size() methods to retrieve all documents from Elastic Search results.
Below is sample example -
ISearchResponse<dynamic> bResponse = ObjElasticClient.Search<dynamic>(s => s.From(0).Size(25000).Index("accounts").AllTypes().Query(Query));
Recently i came across scroll feature of Elastic Search. This looks better approach than From() and Size() methods specifically to fetch large data.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
I looking for example on Scroll feature in NEST API.
Can someone please provide NEST example?
Thanks,
Sameer
Here's an example of using scroll with NEST and C#. Works with 5.x and 6.x
public IEnumerable<T> GetAllDocumentsInIndex<T>(string indexName, string scrollTimeout = "2m", int scrollSize = 1000) where T : class
{
ISearchResponse<T> initialResponse = this.ElasticClient.Search<T>
(scr => scr.Index(indexName)
.From(0)
.Take(scrollSize)
.MatchAll()
.Scroll(scrollTimeout));
List<T> results = new List<T>();
if (!initialResponse.IsValid || string.IsNullOrEmpty(initialResponse.ScrollId))
throw new Exception(initialResponse.ServerError.Error.Reason);
if (initialResponse.Documents.Any())
results.AddRange(initialResponse.Documents);
string scrollid = initialResponse.ScrollId;
bool isScrollSetHasData = true;
while (isScrollSetHasData)
{
ISearchResponse<T> loopingResponse = this.ElasticClient.Scroll<T>(scrollTimeout, scrollid);
if (loopingResponse.IsValid)
{
results.AddRange(loopingResponse.Documents);
scrollid = loopingResponse.ScrollId;
}
isScrollSetHasData = loopingResponse.Documents.Any();
}
this.ElasticClient.ClearScroll(new ClearScrollRequest(scrollid));
return results;
}
It's from: http://telegraphrepaircompany.com/elasticsearch-nest-scroll-api-c/
Internal implementation of NEST Reindex uses scroll to move documents from one index to another.
It should be good starting point.
Below you can find interesting for you code from github.
var page = 0;
var searchResult = this.CurrentClient.Search<T>(
s => s
.Index(fromIndex)
.AllTypes()
.From(0)
.Size(size)
.Query(this._reindexDescriptor._QuerySelector ?? (q=>q.MatchAll()))
.SearchType(SearchType.Scan)
.Scroll(scroll)
);
if (searchResult.Total <= 0)
throw new ReindexException(searchResult.ConnectionStatus, "index " + fromIndex + " has no documents!");
IBulkResponse indexResult = null;
do
{
var result = searchResult;
searchResult = this.CurrentClient.Scroll<T>(s => s
.Scroll(scroll)
.ScrollId(result.ScrollId)
);
if (searchResult.Documents.HasAny())
indexResult = this.IndexSearchResults(searchResult, observer, toIndex, page);
page++;
} while (searchResult.IsValid && indexResult != null && indexResult.IsValid && searchResult.Documents.HasAny());
Also you can take a look at integration test for Scroll
[Test]
public void SearchTypeScan()
{
var scanResults = this.Client.Search<ElasticsearchProject>(s => s
.From(0)
.Size(1)
.MatchAll()
.Fields(f => f.Name)
.SearchType(SearchType.Scan)
.Scroll("2s")
);
Assert.True(scanResults.IsValid);
Assert.False(scanResults.FieldSelections.Any());
Assert.IsNotNullOrEmpty(scanResults.ScrollId);
var results = this.Client.Scroll<ElasticsearchProject>(s=>s
.Scroll("4s")
.ScrollId(scanResults.ScrollId)
);
var hitCount = results.Hits.Count();
while (results.FieldSelections.Any())
{
Assert.True(results.IsValid);
Assert.True(results.FieldSelections.Any());
Assert.IsNotNullOrEmpty(results.ScrollId);
var localResults = results;
results = this.Client.Scroll<ElasticsearchProject>(s=>s
.Scroll("4s")
.ScrollId(localResults.ScrollId));
hitCount += results.Hits.Count();
}
Assert.AreEqual(scanResults.Total, hitCount);
}
I took the liberty of rewriting the fine answer from Michael to async and a bit less verbose (v. 6.x Nest):
public async Task<IList<T>> RockAndScroll<T>(
string indexName,
string scrollTimeoutMinutes = "2m",
int scrollPageSize = 1000
) where T : class
{
var searchResponse = await this.ElasticClient.SearchAsync<T>(sd => sd
.Index(indexName)
.From(0)
.Take(scrollPageSize)
.MatchAll()
.Scroll(scrollTimeoutMinutes));
var results = new List<T>();
while (true)
{
if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");
if (!searchResponse.Documents.Any())
break;
results.AddRange(searchResponse.Documents);
searchResponse = await ElasticClient.ScrollAsync<T>(scrollTimeoutMinutes, searchResponse.ScrollId);
}
await this.ElasticClient.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId));
return results;
}
I took Frederick's answer and made it an extension method that leverages IAsyncEnumerable:
// Adopted from https://stackoverflow.com/a/56261657/1072030
public static async IAsyncEnumerable<T> ScrollAllAsync<T>(
this IElasticClient elasticClient,
string indexName,
string scrollTimeoutMinutes = "2m",
int scrollPageSize = 1000,
[EnumeratorCancellation] CancellationToken ct = default
) where T : class
{
var searchResponse = await elasticClient.SearchAsync<T>(
sd => sd
.Index(indexName)
.From(0)
.Take(scrollPageSize)
.MatchAll()
.Scroll(scrollTimeoutMinutes),
ct);
try
{
while (true)
{
if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");
if (!searchResponse.Documents.Any())
break;
foreach(var item in searchResponse.Documents)
{
yield return item;
}
searchResponse = await elasticClient.ScrollAsync<T>(scrollTimeoutMinutes, searchResponse.ScrollId, ct: ct);
}
}
finally
{
await elasticClient.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId), ct: ct);
}
}
I'm guessing we should really be using search_after instead of the scroll api, but meh.

elasticsearch nest stopword filter does not work

I am tiring to implement elasticsearch NEST client and indexing documents and SQL data and able to search these perfectly. But I am not able to apply stopwords on these records. Below is the code. Please note I put "abc" as my stopword.
public IndexSettings GetIndexSettings()
{
var stopTokenFilter = new StopTokenFilter();
string stopwordsfilePath = Convert.ToString(ConfigurationManager.AppSettings["Stopwords"]);
string[] stopwordsLines = System.IO.File.ReadAllLines(stopwordsfilePath);
List<string> words = new List<string>();
foreach (string line in stopwordsLines)
{
words.Add(line);
}
stopTokenFilter.Stopwords = words;
var settings = new IndexSettings { NumberOfReplicas = 0, NumberOfShards = 5 };
settings.Settings.Add("merge.policy.merge_factor", "10");
settings.Settings.Add("search.slowlog.threshold.fetch.warn", "1s");
settings.Analysis.Analyzers.Add("xyz", new StandardAnalyzer { StopWords = words });
settings.Analysis.Tokenizers.Add("keyword", new KeywordTokenizer());
settings.Analysis.Tokenizers.Add("standard", new StandardTokenizer());
settings.Analysis.TokenFilters.Add("standard", new StandardTokenFilter());
settings.Analysis.TokenFilters.Add("lowercase", new LowercaseTokenFilter());
settings.Analysis.TokenFilters.Add("stop", stopTokenFilter);
settings.Analysis.TokenFilters.Add("asciifolding", new AsciiFoldingTokenFilter());
settings.Analysis.TokenFilters.Add("word_delimiter", new WordDelimiterTokenFilter());
return settings;
}
public void CreateDocumentIndex(string indexName = null)
{
IndexSettings settings = GetIndexSettings();
if (!this.client.IndexExists(indexName).Exists)
{
this.client.CreateIndex(indexName, c => c
.InitializeUsing(settings)
.AddMapping<Document>
(m => m.Properties(ps => ps.Attachment
(a => a.Name(o => o.Documents)
.TitleField(t => t.Name(x => x.Name)
.TermVector(TermVectorOption.WithPositionsOffsets))))));
}
var r = this.client.GetIndexSettings(i => i.Index(indexName));
}
Indexing Data
var documents = GetDocuments();
documents.ForEach((document) =>
{
indexRepository.IndexData<Document>(document, DOCindexName, DOCtypeName);
});
public bool IndexData<T>(T data, string indexName = null, string mappingType = null)
where T : class, new()
{
if (client == null)
{
throw new ArgumentNullException("data");
}
var result = this.client.Index<T>(data, c => c.Index(indexName).Type(mappingType));
return result.IsValid;
}
In one of my document I have put a single line "abc" and I do not expect this to be returned as "abc" is in my stopword list. But On Searching Document It is also returning the above document. Below is the search query.
public IEnumerable<dynamic> GetAll(string queryTerm)
{
var queryResult = this.client.Search<dynamic>(d => d
.Analyzer("xyz")
.AllIndices()
.AllTypes()
.QueryString(queryTerm)).Documents;
return queryResult;
}
Please suggest where I am going wrong.

How can improve this Linq query expressions performance?

public bool SaveValidTicketNos(string id,string[] ticketNos, string checkType, string checkMan)
{
bool result = false;
List<Carstartlistticket>enties=new List<Carstartlistticket>();
using (var context = new MiniSysDataContext())
{
try
{
foreach (var ticketNo in ticketNos)
{
Orderticket temp = context.Orderticket.ByTicketNo(ticketNo).SingleOrDefault();
if (temp != null)
{
Ticketline ticketline= temp.Ticketline;
string currencyType = temp.CurrencyType;
float personAllowance=GetPersonCountAllowance(context,ticketline, currencyType);
Carstartlistticket carstartlistticket = new Carstartlistticket()
{
CsltId = Guid.NewGuid().ToString(),
Carstartlist = new Carstartlist(){CslId = id},
LeaveDate = temp.LeaveDate,
OnPointName = temp.OnpointName,
OffPointName = temp.OffpointName,
OutTicketMan = temp.OutBy,
TicketNo = temp.TicketNo,
ChekMan = checkMan,
Type = string.IsNullOrEmpty(checkType)?(short?)null:Convert.ToInt16(checkType),
CreatedOn = DateTime.Now,
CreatedBy = checkMan,
NumbserAllowance = personAllowance
};
enties.Add(carstartlistticket);
}
}
context.BeginTransaction();
context.Carstartlistticket.InsertAllOnSubmit(enties);
context.SubmitChanges();
bool changeStateResult=ChangeTicketState(context, ticketNos,checkMan);
if(changeStateResult)
{
context.CommitTransaction();
result = true;
}
else
{
context.RollbackTransaction();
}
}
catch (Exception e)
{
LogHelper.WriteLog(string.Format("CarstartlistService.SaveValidTicketNos({0},{1},{2},{3})",id,ticketNos,checkType,checkMan),e);
context.RollbackTransaction();
}
}
return result;
}
My code is above. I doubt these code have terrible poor performance. The poor performance in the point
Orderticket temp = context.Orderticket.ByTicketNo(ticketNo).SingleOrDefault();
,actually, I got an string array through the method args,then I want to get all data by ticketNos from database, here i use a loop,I know if i write my code like that ,there will be cause performance problem and it will lead one more time database access,how can avoid this problem and improve the code performance,for example ,geting all data by only on databse access
I forget to tell you the ORM I use ,en ,the ORM is PlinqO based NHibernate
i am looking forward to having your every answer,thank you
using plain NHibernate
var tickets = session.QueryOver<OrderTicket>()
.WhereRestrictionOn(x => x.TicketNo).IsIn(ticketNos)
.List();
short? type = null;
short typeValue;
if (!string.IsNullOrEmpty(checkType) && short.TryParse(checkType, out typeValue))
type = typeValue;
var entitiesToSave = tickets.Select(ticket => new Carstartlistticket
{
CsltId = Guid.NewGuid().ToString(),
Carstartlist = new Carstartlist() { CslId = id },
LeaveDate = ticket.LeaveDate,
OnPointName = ticket.OnpointName,
OffPointName = ticket.OffpointName,
OutTicketMan = ticket.OutBy,
TicketNo = ticket.TicketNo,
ChekMan = checkMan,
CreatedOn = DateTime.Now,
CreatedBy = checkMan,
Type = type,
NumbserAllowance = GetPersonCountAllowance(context, ticket.Ticketline, ticket.CurrencyType)
});
foreach (var entity in entitiesToSave)
{
session.Save(entity);
}
to enhance this further try to preload all needed PersonCountAllowances

Issue in reading google text document

I could get the handle to the google text doc i needed. I am now stuck at how to read the contents.
My code looks like:
GoogleOAuthParameters oauthParameters = new GoogleOAuthParameters();
oauthParameters.setOAuthConsumerKey(Constants.CONSUMER_KEY);
oauthParameters.setOAuthConsumerSecret(Constants.CONSUMER_SECRET);
oauthParameters.setOAuthToken(Constants.ACCESS_TOKEN);
oauthParameters.setOAuthTokenSecret(Constants.ACCESS_TOKEN_SECRET);
DocsService client = new DocsService("sakshum-YourAppName-v1");
client.setOAuthCredentials(oauthParameters, new OAuthHmacSha1Signer());
URL feedUrl = new URL("https://docs.google.com/feeds/default/private/full/");
DocumentQuery dquery = new DocumentQuery(feedUrl);
dquery.setTitleQuery("blood_donor_verification_template_dev");
dquery.setTitleExact(true);
dquery.setMaxResults(10);
DocumentListFeed resultFeed = client.getFeed(dquery, DocumentListFeed.class);
System.out.println("feed size:" + resultFeed.getEntries().size());
String emailBody = "";
for (DocumentListEntry entry : resultFeed.getEntries()) {
System.out.println(entry.getPlainTextContent());
emailBody = entry.getPlainTextContent();
}
Plz note that entry.getPlainTextContent() does not work and throws object not TextContent type exception
finally i solved it as:
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl =
new URL("https://docs.google.com/feeds/download/" + docType
+ "s/Export?docID=" + docId + "&exportFormat=html");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
emailBody.append((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}

Resources