Rename and Deleting Elasticsearch Indexes - elasticsearch

I'm using C# .NET application with NEST to create an index.
I've created an elasticsearch index that customers can query called index_1. I then build another version of the index using a different instance of the application and call it index_1_temp.
What is the safest way for me to rename index_1_temp to index_1 then delete the original index_1?
I know ES has aliases but I'm not sure how to use them for this task
EDIT: The original index does not have an Alias associated with it.

I would recommend always using aliases in scenarios where you may create incrementally differing versions of an index, as may be the case when refining the model of signals within your search strategy.
You can add an alias at the point of creating an index
var client = new ElasticClient(connectionSettings);
var indices = new[] { "index-v1", "index-v2" };
var alias = "index-alias";
// delete index-v1 and index-v2 if they exist, to
// allow this example to be repeatable
foreach (var index in indices)
{
if (client.IndexExists(index).Exists)
{
client.DeleteIndex(index);
}
}
var createIndexResponse = client.CreateIndex(indices[0], c => c
.Aliases(a => a
.Alias(alias)
)
);
Then when you create a new index, you can remove the alias from current indices and add it to the new index. This alias swap operation is atomic
createIndexResponse = client.CreateIndex(indices[1]);
// wait for index-v2 to be operable
var clusterHealthResponse = client.ClusterHealth(c => c
.WaitForStatus(WaitForStatus.Yellow)
.Index(indices[1]));
// swap the alias
var bulkAliasResponse = client.Alias(ba => ba
.Add(add => add.Alias(alias).Index(indices[1]))
.Remove(remove => remove.Alias(alias).Index("*"))
);
// verify that the alias only exists on index-v2
var aliasResponse = client.GetAlias(a => a.Name(alias));
The output of the last response is
{
"index-v2" : {
"aliases" : {
"index-alias" : { }
}
}
}
When searching, the consumers would always use the alias. Since the alias points to a single index only, you can also use it to index new documents and update existing documents.

Related

ElasticSearch NEST recreate index with zero downtime

I am writing backend in C# for a website. I'd like to recreate index with little downtime.
After reading these two posts:
Nest Client c# 7.0 for elastic search removing Aliases
Recreate ElasticSearch Index with Nest 7.x
I come up with this:
var alias_exist = await _client.Indices.ExistsAsync(index_string_alias);
if (alias_exist.Exists)
{
var oldIndices = await _client.GetIndicesPointingToAliasAsync(index_string_alias);
var oldIndexName = oldIndices.First().ToString();
await _client.Indices.BulkAliasAsync(new BulkAliasRequest
{
Actions = new List<IAliasAction>
{
new AliasRemoveAction {Remove = new AliasRemoveOperation {Index = oldIndexName, Alias = index_string_alias}},
new AliasAddAction {Add = new AliasAddOperation {Index = index_string_unique, Alias = index_string_alias}}
}
});
} else
{
var putAliasResponse = await _client.Indices.PutAliasAsync(new PutAliasRequest(index_string_unique, index_string_alias));
}
}
I'd like to remove index_string_alias if exists and assign the alias to the newly created index_string_unique.
Also, I'd like to confirm that I can treat the alias as the index name in my other queries.
I am really new to Elastic Search and wonder how people figure out these things. I searched through the official documentation and found little information about the async functions in NEST. Where should I look for explanations for functions?

"Clone" index mappings

I have an index which I will be reindexing. At the moment I want to create a new index, which should contain the exact same mappings that can be found in the original index.
I've got this:
var srcMappings = client.GetMapping(new GetMappingRequest((Indices)sourceIndexName)).Mappings;
And I try to create an index:
var response = client.CreateIndex(destinationIndex, c => c
.Settings(...my settings ...)
.Mappings(... what here? ...)
);
What exactly should I pass to the .Mappings(...) above so that the mappings from the source index are replicated into the target index? I don't want to explicitly 'know' about the types.
I am trying to use Nest.
Alternatively, is there a Reindex API which would take the destination index name and create the index for me, together with the mappings of the source?
You can get the mappings from one index and use them to create the mappings in another index with
var client = new ElasticClient();
var getIndexResponse = client.GetIndex("assignments");
var createIndexResponse = client.CreateIndex("assignments2", c => c
.Mappings(m => Promise.Create(getIndexResponse.Indices["assignments"].Mappings))
);
You'll need an IPromise<T> implementation to do so
public class Promise
{
public static IPromise<TValue> Create<TValue>(TValue value) where TValue : class =>
new Promise<TValue>(value);
}
public class Promise<T> : IPromise<T> where T : class
{
public T Value { get; }
public Promise(T value) => Value = value;
}
The Promise is needed in some places in NEST's fluent API implementation where values are additive and a final value needs to be returned at a later point.
You can also do the same using the object initializer syntax and no Promise<T>
var createIndexResponse = client.CreateIndex(new CreateIndexRequest("assignments2")
{
Mappings = getIndexResponse.Indices["assignments"].Mappings
});
Alternatively, is there a Reindex API which would take the destination index name and create the index for me, together with the mappings of the source?
There are two Reindex APIs within NEST; an Observable implementation that has been around since NEST 1.x, and the Reindex API as available within Elasticsearch since 2.3 (known as ReindexOnServer in NEST). The former Observable implementation can create the destination index for you, although it will copy all settings, mappings and aliases. The latter Reindex API does not create the destination index as part of the operation, so it needs to be set up before starting the reindex process.

OrderBy used with LINQ returns no rows in DOcumentDb

I have been trying to use OrderBy in LINQ i have changed several fields but it always returns null (no rows) while my documents are present in the collection.
Here's my simple query-
var rests = _client.CreateDocumentQuery<Restraunt>(_collectionUri)
.OrderBy(x => x.RestName);
You have to create an indexing policy with a range index on the type of the property you use in the order by (here I assume a string).
If you let the default policy, the order by return no result.
You need a sorting policy like that :
restrauntsCollection.IndexingPolicy.IncludedPaths.Add(
new IncludedPath {
Path = "/RestName/?",
Indexes = new Collection<Index> {
new RangeIndex(DataType.String) { Precision = -1 } }
});
See this page for more information: https://azure.microsoft.com/en-us/documentation/articles/documentdb-orderby/
Hope this helps

Scalable Contains method for LINQ against a SQL backend

I'm looking for an elegant way to execute a Contains() statement in a scalable way. Please allow me to give some background before I come to the actual question.
The IN statement
In Entity Framework and LINQ to SQL the Contains statement is translated as a SQL IN statement. For instance, from this statement:
var ids = Enumerable.Range(1,10);
var courses = Courses.Where(c => ids.Contains(c.CourseID)).ToList();
Entity Framework will generate
SELECT
[Extent1].[CourseID] AS [CourseID],
[Extent1].[Title] AS [Title],
[Extent1].[Credits] AS [Credits],
[Extent1].[DepartmentID] AS [DepartmentID]
FROM [dbo].[Course] AS [Extent1]
WHERE [Extent1].[CourseID] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Unfortunately, the In statement is not scalable. As per MSDN:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632
which has to do with running out of resources or exceeding expression limits.
But before these errors occur, the IN statement becomes increasingly slow with growing numbers of items. I can't find documentation about its growth rate, but it performs well up to a few thousands of items, but beyond that it gets dramatically slow. (Based on SQL Server experiences).
Scalable
We can't always avoid this statement. A JOIN with the source data in stead would generally perform much better, but that's only possible when the source data is in the same context. Here I'm dealing with data coming from a client in a disconnected scenario. So I have been looking for a scalable solution. A satisfactory approach turned out to be cutting the operation into chunks:
var courses = ids.ToChunks(1000)
.Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
.SelectMany(x => x).ToList();
(where ToChunks is this little extension method).
This executes the query in chunks of 1000 that all perform well enough. With e.g. 5000 items, 5 queries will run that together are likely to be faster than one query with 5000 items.
But not DRY
But of course I don't want to scatter this construct all over my code. I am looking for an extension method by which any IQueryable<T> can be transformed into a chunky executing statement. Ideally something like this:
var courses = Courses.Where(c => ids.Contains(c.CourseID))
.AsChunky(1000)
.ToList();
But maybe this
var courses = Courses.ChunkyContains(c => c.CourseID, ids, 1000)
.ToList();
I've given the latter solution a first shot:
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList.ToChunks(chunkSize)
.Select (chunk => query.Where(x => chunk.Contains(match)))
.SelectMany(x => x);
}
Obviously, the part x => chunk.Contains(match) doesn't compile. But I don't know how to manipulate the match expression into a Contains expression.
Maybe someone can help me make this solution work. And of course I'm open to other approaches to make this statement scalable.
I’ve solved this problem with a little different approach a view month ago. Maybe it’s a good solution for you too.
I didn’t want my solution to change the query itself. So a ids.ChunkContains(p.Id) or a special WhereContains method was unfeasible. Also should the solution be able to combine a Contains with another filter as well as using the same collection multiple times.
db.TestEntities.Where(p => (ids.Contains(p.Id) || ids.Contains(p.ParentId)) && p.Name.StartsWith("Test"))
So I tried to encapsulate the logic in a special ToList method that could rewrite the Expression for a specified collection to be queried in chunks.
var ids = Enumerable.Range(1, 11);
var result = db.TestEntities.Where(p => Ids.Contains(p.Id) && p.Name.StartsWith ("Test"))
.ToChunkedList(ids,4);
To rewrite the expression tree I discovered all Contains Method calls from local collections in the query with a view helping classes.
private class ContainsExpression
{
public ContainsExpression(MethodCallExpression methodCall)
{
this.MethodCall = methodCall;
}
public MethodCallExpression MethodCall { get; private set; }
public object GetValue()
{
var parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
return Expression.Lambda<Func<object>>(parent).Compile()();
}
public bool IsLocalList()
{
Expression parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
while (parent != null) {
if (parent is ConstantExpression)
return true;
var member = parent as MemberExpression;
if (member != null) {
parent = member.Expression;
} else {
parent = null;
}
}
return false;
}
}
private class FindExpressionVisitor<T> : ExpressionVisitor where T : Expression
{
public List<T> FoundItems { get; private set; }
public FindExpressionVisitor()
{
this.FoundItems = new List<T>();
}
public override Expression Visit(Expression node)
{
var found = node as T;
if (found != null) {
this.FoundItems.Add(found);
}
return base.Visit(node);
}
}
public static List<T> ToChunkedList<T, TValue>(this IQueryable<T> query, IEnumerable<TValue> list, int chunkSize)
{
var finder = new FindExpressionVisitor<MethodCallExpression>();
finder.Visit(query.Expression);
var methodCalls = finder.FoundItems.Where(p => p.Method.Name == "Contains").Select(p => new ContainsExpression(p)).Where(p => p.IsLocalList()).ToList();
var localLists = methodCalls.Where(p => p.GetValue() == list).ToList();
If the local collection passed in the ToChunkedList method was found in the query expression, I replace the Contains call to the original list with a new call to a temporary list containing the ids for one batch.
if (localLists.Any()) {
var result = new List<T>();
var valueList = new List<TValue>();
var containsMethod = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
.Single(p => p.Name == "Contains" && p.GetParameters().Count() == 2)
.MakeGenericMethod(typeof(TValue));
var queryExpression = query.Expression;
foreach (var item in localLists) {
var parameter = new List<Expression>();
parameter.Add(Expression.Constant(valueList));
if (item.MethodCall.Object == null) {
parameter.AddRange(item.MethodCall.Arguments.Skip(1));
} else {
parameter.AddRange(item.MethodCall.Arguments);
}
var call = Expression.Call(containsMethod, parameter.ToArray());
var replacer = new ExpressionReplacer(item.MethodCall,call);
queryExpression = replacer.Visit(queryExpression);
}
var chunkQuery = query.Provider.CreateQuery<T>(queryExpression);
for (int i = 0; i < Math.Ceiling((decimal)list.Count() / chunkSize); i++) {
valueList.Clear();
valueList.AddRange(list.Skip(i * chunkSize).Take(chunkSize));
result.AddRange(chunkQuery.ToList());
}
return result;
}
// if the collection was not found return query.ToList()
return query.ToList();
Expression Replacer:
private class ExpressionReplacer : ExpressionVisitor {
private Expression find, replace;
public ExpressionReplacer(Expression find, Expression replace)
{
this.find = find;
this.replace = replace;
}
public override Expression Visit(Expression node)
{
if (node == this.find)
return this.replace;
return base.Visit(node);
}
}
Please allow me to provide an alternative to the Chunky approach.
The technique involving Contains in your predicate works well for:
A constant list of values (no volatile).
A small list of values.
Contains will do great if your local data has those two characteristics because these small set of values will be hardcoded in the final SQL query.
The problem begins when your list of values has entropy (non-constant). As of this writing, Entity Framework (Classic and Core) do not try to parameterize these values in any way, this forces SQL Server to generate a query plan every time it sees a new combination of values in your query. This operation is expensive and gets aggravated by the overall complexity of your query (e.g. many tables, a lot of values in the list, etc.).
The Chunky approach still suffers from this SQL Server query plan cache pollution problem, because it does not parametrizes the query, it just moves the cost of creating a big execution plan into smaller ones that are more easy to compute (and discard) by SQL Server, furthermore, every chunk adds an additional round-trip to the database, which increases the time needed to resolve the query.
An Efficient Solution for EF Core
🎉 NEW! QueryableValues EF6 Edition has arrived!
For EF Core keep reading below.
Wouldn't it be nice to have a way of composing local data in your query in a way that's SQL Server friendly? Enter QueryableValues.
I designed this library with these two main goals:
It MUST solve the SQL Server's query plan cache pollution problem ✅
It MUST be fast! ⚡
It has a flexible API that allows you to compose local data provided by an IEnumerable<T> and you get back an IQueryable<T>; just use it as if it were another entity of your DbContext (really), e.g.:
// Sample values.
IEnumerable<int> values = Enumerable.Range(1, 1000);
// Using a Join (query syntax).
var query1 =
from e in dbContext.MyEntities
join v in dbContext.AsQueryableValues(values) on e.Id equals v
select new
{
e.Id,
e.Name
};
// Using Contains (method syntax)
var query2 = dbContext.MyEntities
.Where(e => dbContext.AsQueryableValues(values).Contains(e.Id))
.Select(e => new
{
e.Id,
e.Name
});
You can also compose complex types!
It goes without saying that the provided IEnumerable<T> is only enumerated at the time that your query is materialized (not before), preserving the same behavior of EF Core in this regard.
How Does It Works?
Internally QueryableValues creates a parameterized query and provides your values in a serialized format that is natively understood by SQL Server. This allows your query to be resolved with a single round-trip to the database and avoids creating a new query plan on subsequent executions due to the parameterized nature of it.
Useful Links
Nuget Package
GitHub Repository
Benchmarks
SQL Server Cache Pollution Problem
QueryableValues is distributed under the MIT license
Linqkit to the rescue! Might be a better way that does it directly, but this seems to work fine and makes it pretty clear what's being done. The addition being AsExpandable(), which lets you use the Invoke extension.
using LinqKit;
public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
this IQueryable<TEntity> query,
Expression<Func<TEntity,TContains>> match,
IEnumerable<TContains> containList,
int chunkSize = 500)
{
return containList
.ToChunks(chunkSize)
.Select (chunk => query.AsExpandable()
.Where(x => chunk.Contains(match.Invoke(x))))
.SelectMany(x => x);
}
You might also want to do this:
containsList.Distinct()
.ToChunks(chunkSize)
...or something similar so you don't get duplicate results if something this occurs:
query.ChunkyContains(x => x.Id, new List<int> { 1, 1 }, 1);
Another way would be to build the predicate this way (of course, some parts should be improved, just giving the idea).
public static Expression<Func<TEntity, bool>> ContainsPredicate<TEntity, TContains>(this IEnumerable<TContains> chunk, Expression<Func<TEntity, TContains>> match)
{
return Expression.Lambda<Func<TEntity, bool>>(Expression.Call(
typeof (Enumerable),
"Contains",
new[]
{
typeof (TContains)
},
Expression.Constant(chunk, typeof(IEnumerable<TContains>)), match.Body),
match.Parameters);
}
which you could call in your ChunkContains method
return containList.ToChunks(chunkSize)
.Select(chunk => query.Where(ContainsPredicate(chunk, match)))
.SelectMany(x => x);
Using a stored procedure with a table valued parameter could also work well. You in effect write a joint In the stored procedure between your table / view and the table valued parameter.
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters

Complex foreach loop possible to shorten to linq?

I have a cluttery piece of code that I would like to shorten using Linq. It's about the part in the foreach() loop that performs an additional grouping on the result set and builds a nested Dictionary.
Is this possible using a shorter Linq syntax?
var q = from entity in this.Context.Entities
join text in this.Context.Texts on new { ObjectType = 1, ObjectId = entity.EntityId} equals new { ObjectType = text.ObjectType, ObjectId = text.ObjectId}
into texts
select new {entity, texts};
foreach (var result in q)
{
//Can this grouping be performed in the LINQ query above?
var grouped = from tx in result.texts
group tx by tx.Language
into langGroup
select new
{
langGroup.Key,
langGroup
};
//End grouping
var byLanguage = grouped.ToDictionary(x => x.Key, x => x.langGroup.ToDictionary(y => y.PropertyName, y => y.Text));
result.f.Apply(x => x.Texts = byLanguage);
}
return q.Select(x => x.entity);
Sideinfo:
What basically happens is that "texts" for every language and for every property for a certain objecttype (in this case hardcoded 1) are selected and grouped by language. A dictionary of dictionaries is created for every language and then for every property.
Entities have a property called Texts (the dictionary of dictionaries). Apply is a custom extension method which looks like this:
public static T Apply<T>(this T subject, Action<T> action)
{
action(subject);
return subject;
}
isn't this far simpler?
foreach(var entity in Context.Entities)
{
// Create the result dictionary.
entity.Texts = new Dictionary<Language,Dictionary<PropertyName,Text>>();
// loop through each text we want to classify
foreach(var text in Context.Texts.Where(t => t.ObjectType == 1
&& t.ObjectId == entity.ObjectId))
{
var language = text.Language;
var property = text.PropertyName;
// Create the sub-level dictionary, if required
if (!entity.Texts.ContainsKey(language))
entity.Texts[language] = new Dictionary<PropertyName,Text>();
entity.Texts[language][property] = text;
}
}
Sometimes good old foreach loops do the job much better.
Language, PropertyName and Text have no type in your code, so I named my types after the names...

Resources