"Clone" index mappings - elasticsearch

I have an index which I will be reindexing. At the moment I want to create a new index, which should contain the exact same mappings that can be found in the original index.
I've got this:
var srcMappings = client.GetMapping(new GetMappingRequest((Indices)sourceIndexName)).Mappings;
And I try to create an index:
var response = client.CreateIndex(destinationIndex, c => c
.Settings(...my settings ...)
.Mappings(... what here? ...)
);
What exactly should I pass to the .Mappings(...) above so that the mappings from the source index are replicated into the target index? I don't want to explicitly 'know' about the types.
I am trying to use Nest.
Alternatively, is there a Reindex API which would take the destination index name and create the index for me, together with the mappings of the source?

You can get the mappings from one index and use them to create the mappings in another index with
var client = new ElasticClient();
var getIndexResponse = client.GetIndex("assignments");
var createIndexResponse = client.CreateIndex("assignments2", c => c
.Mappings(m => Promise.Create(getIndexResponse.Indices["assignments"].Mappings))
);
You'll need an IPromise<T> implementation to do so
public class Promise
{
public static IPromise<TValue> Create<TValue>(TValue value) where TValue : class =>
new Promise<TValue>(value);
}
public class Promise<T> : IPromise<T> where T : class
{
public T Value { get; }
public Promise(T value) => Value = value;
}
The Promise is needed in some places in NEST's fluent API implementation where values are additive and a final value needs to be returned at a later point.
You can also do the same using the object initializer syntax and no Promise<T>
var createIndexResponse = client.CreateIndex(new CreateIndexRequest("assignments2")
{
Mappings = getIndexResponse.Indices["assignments"].Mappings
});
Alternatively, is there a Reindex API which would take the destination index name and create the index for me, together with the mappings of the source?
There are two Reindex APIs within NEST; an Observable implementation that has been around since NEST 1.x, and the Reindex API as available within Elasticsearch since 2.3 (known as ReindexOnServer in NEST). The former Observable implementation can create the destination index for you, although it will copy all settings, mappings and aliases. The latter Reindex API does not create the destination index as part of the operation, so it needs to be set up before starting the reindex process.

Related

Nest ElasticClient with multiple indexes to index a document

At first I had 1 index and my elasticclient was setup like below in my startup.cs
public static IServiceCollection AddElasticClient(this IServiceCollection services)
{
var elasticSettings = services.BuildServiceProvider().GetService<IOptions<ElasticSettings>>().Value;
var settings = new ConnectionSettings(new Uri(elasticSettings.Uri));
settings
.ThrowExceptions(elasticSettings.ThrowExceptions)
.PrettyJson(elasticSettings.PrettyJson)
.DefaultIndex(elasticSettings.Index)
.BasicAuthentication(elasticSettings.Username, elasticSettings.Password)
.DefaultMappingFor<CorrelationContext>(ms => ms.Ignore(p => p.DgpHeader));
var client = new ElasticClient(settings);
services.AddSingleton<IElasticClient>(client);
return services;
}
My writer looks like
public class ElasticWriter : IElasticWriter
{
private readonly IElasticClient _elasticClient;
public ElasticWriter(IElasticClient elasticClient)
{
_elasticClient = elasticClient ?? throw new ArgumentNullException(nameof(elasticClient));
}
public void Write(AuditElasticDoc doc)
{
var indexResponse = _elasticClient.IndexDocument(doc);
if (!indexResponse.IsValid)
{
throw indexResponse.OriginalException ?? new Exception("Invalid Elastic response when writing document.");
}
}
}
Now there is a new requirement by which they can provide the name of the index to write to.
All authentication data of the different indexes are provided through config settings, so I have everything available at startup.
The document type is always the same.
I found examples of specifying the index when querying but not when indexing.
Can I provide multiple indexes in my ElasticClient and specify the index when executing the IndexDocument?
Or do I need a separate client for each index?
If the latter, is there a way I can still use DI to inject the client in my writer or do I have to create one there at the spot?
Thx.
I'm using Nest 7.6.1
Instead of using IndexDocument, you can use IndexAsync method which will allow you to control additional request parameters
var indexResponse = await _elasticClient.IndexAsync(doc, descriptor => descriptor.Index("other"));
IndexDocument is a wrapper method, hiding the complexity of indexing documents from the clients. Have a look.
Request auth configuration
var indexResponse = await _elasticClient.IndexAsync(doc,
descriptor => descriptor
.Index("other")
.RequestConfiguration(rq => rq.BasicAuthentication("user", "pass")));

Set custom type name in mapping

I need to create a document mapping that has a custom name. Currently I have the following mapping for my document on the CreateIndexDescriptor object:
.Mappings(m => m
.Map<MyDocType>(mDetails => mDetails.AutoMap()));
Which creates a document mapping called mydoctype. How can I modify this so it creates a document whose type name is my_doctype?
In NEST 7.x, this is not possible - the document type will be _doc, in line with the roadmap for the removal of mapping types.
In NEST 6.x, you can specify the type name to use in a few different ways:
Using ElasticsearchTypeAttribute on the POCO
[ElasticsearchType(Name = "my_doctype")]
public class MyDocType{ }
Using DataContractAttribute on the POCO
[DataContract(Name = "my_doctype")]
public class MyDocType{ }
Using .DefaultMappingFor<T>() on ConnectionSettings
var settings = new ConnectionSettings()
.DefaultMappingFor<MyDocType>(m => m
.IndexName("my_doc_type_default_index")
.TypeName("my_doctype")
);
var client = new ElasticClient(settings);

Elastic NEST De-serializing the wrong field

Using ElasticSearch.Net v6.0.2
Given the Indexed item
{
"PurchaseFrequency": 76,
"purchaseFrequency": 80
}
and the POCO Object
public class Product
{
public int PurchaseFrequency { get; set; }
}
and the setting
this.DefaultFieldNameInferrer(x => x);
Nest is returning a PurchaseFrequency = 80 even though this is the wrong field.
How can I get NEST to pull the correct cased field from ElasticSearch?
I don't think that this is going to be easily possible because this behaviour is defined in Json.NET, which NEST uses internally (not a direct dependency in 6.x, it's IL-merged into the assembly).
For example,
JsonConvert.DeserializeAnonymousType("{\"a\":1, \"A\":2}", new { a = 0 })
deserializes the anonymous type property a value to 2. But
JsonConvert.DeserializeAnonymousType("{\"A\":2, \"a\":1}", new { a = 0 })
deserializes the anonymous type property a value to 1 i.e. the order of properties as they appear in the returned JSON has a bearing on the final value assigned to a property on an instance of a type.
If you can, avoid JSON property names that differ only in case. If you can't, then you'd need to hook up the JsonNetSerializer in the NEST.JsonSerializer nuget package and write a custom JsonConverter for your type which only honours the exact casing expected.

Rename and Deleting Elasticsearch Indexes

I'm using C# .NET application with NEST to create an index.
I've created an elasticsearch index that customers can query called index_1. I then build another version of the index using a different instance of the application and call it index_1_temp.
What is the safest way for me to rename index_1_temp to index_1 then delete the original index_1?
I know ES has aliases but I'm not sure how to use them for this task
EDIT: The original index does not have an Alias associated with it.
I would recommend always using aliases in scenarios where you may create incrementally differing versions of an index, as may be the case when refining the model of signals within your search strategy.
You can add an alias at the point of creating an index
var client = new ElasticClient(connectionSettings);
var indices = new[] { "index-v1", "index-v2" };
var alias = "index-alias";
// delete index-v1 and index-v2 if they exist, to
// allow this example to be repeatable
foreach (var index in indices)
{
if (client.IndexExists(index).Exists)
{
client.DeleteIndex(index);
}
}
var createIndexResponse = client.CreateIndex(indices[0], c => c
.Aliases(a => a
.Alias(alias)
)
);
Then when you create a new index, you can remove the alias from current indices and add it to the new index. This alias swap operation is atomic
createIndexResponse = client.CreateIndex(indices[1]);
// wait for index-v2 to be operable
var clusterHealthResponse = client.ClusterHealth(c => c
.WaitForStatus(WaitForStatus.Yellow)
.Index(indices[1]));
// swap the alias
var bulkAliasResponse = client.Alias(ba => ba
.Add(add => add.Alias(alias).Index(indices[1]))
.Remove(remove => remove.Alias(alias).Index("*"))
);
// verify that the alias only exists on index-v2
var aliasResponse = client.GetAlias(a => a.Name(alias));
The output of the last response is
{
"index-v2" : {
"aliases" : {
"index-alias" : { }
}
}
}
When searching, the consumers would always use the alias. Since the alias points to a single index only, you can also use it to index new documents and update existing documents.

Elasticearch and Spark: Updating existing entities

What is the correct way, when using Elasticsearch with Spark, to update existing entities?
I wanted to something like the following:
Get existing data as a map.
Create a new map, and populate it with the updated fields.
Persist the new map.
However, there are several issues:
The list of returned fields cannot contain the _id, as it is not part of the source.
If, for testing, I hardcode an existing _id in the map of new values, the following exception is thrown:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest
How should the _id be retrieved, and how should it be passed back to Spark?
I include the following code below to better illustrate what I was trying to do:
JavaRDD<Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, INDEX_NAME+"/"+TYPE_NAME,
"?source=,field1,field2).values();
Iterator<Map<String, Object>> iter = esRDD.toLocalIterator();
List<Map<String, Object>> listToPersist = new ArrayList<Map<String, Object>>();
while(iter.hasNext()){
Map<String, Object> map = iter.next();
// Get existing values, and do transformation logic
Map<String, Object> newMap = new HashMap<String, Object>();
newMap.put("_id", ??????);
newMap.put("field1", new_value);
listToPersist.add(newMap);
}
JavaRDD javaRDD = jsc.parallelize(ImmutableList.copyOf(listToPersist));
JavaEsSpark.saveToEs(javaRDD, INDEX_NAME+"/"+TYPE_NAME);
Ideally, I would want to update the existing map in place, rather than create a new one.
Does anyone have any example code to show, when using Spark, the correct way to update existing entities in elasticsearch?
Thanks
This is how I've done it (Scala/Spark 2.3/Elastic-Hadoop v6.5).
To read (id or other metadata):
spark
.read
.format("org.elasticsearch.spark.sql")
.option("es.read.metadata",true) // allow to read metadata
.load("yourindex/yourtype")
.select(col("_metadata._id").as("myId"),...)
To update particular columns in ES:
myDataFrame
.select("myId","columnToUpdate")
.saveToEs(
"yourindex/yourtype",
Map(
"es.mapping.id" -> "myId",
"es.write.operation" -> "update", // important to change operation to partial update
"es.mapping.exclude" -> "myId"
)
)
Try adding this upsert to your Spark:
.config("es.write.operation", "upsert")
that will let you add new fields to existing documents
According to Elasticsearch Configuration you can get document metadata like _id by set read metadata option to true:
.config("es.read.metadata", "true")
And i think you cannot use '_id' as field name.
But you can create new field with different name like:
newMap.put("idfield", yourId);
then set name of the new field as a value for mapping id option to inform elastic that this field has the document id:
.config("es.mapping.id", "idfield")
BTW don't forget to set write operation as update:
.config("es.write.operation", "update")

Resources