TermsAggregation on Array - elasticsearch

have a problem with an index field that stores a list of strings in the way of
["abc_def_fgh", "abc_def_fgh" ,"123_345_456"]
Im trying to use TermsAggregation to get
"abc_def_fgh" (2)
"123_345_456" (1)
But cant get it working, as it results the count for each of the terms (abc (2), def(2) , etc)
Any idea?
Many thanks

Try something like
var result = await client.SearchAsync<Document>(s => s
.Size(0)
.Aggregations(a => a
.Terms("tags", t => t.Field(f => f.Tags.Suffix("keyword")))));
foreach (var bucket in result.Aggregations.Terms("tags").Buckets)
{
System.Console.WriteLine($"Tag {bucket.Key}, doc count: {bucket.DocCount}");
}
public class Document
{
public string Id { get; set; }
public string[] Tags { get; set; } = { };
}
for these three documents in my index
new Document {Id = "1", Tags = new[]{"a","b"}
new Document {Id = "2", Tags = new[]{"a"}
new Document {Id = "3", Tags = new[]{"c"}
it will output
Tag a, doc count: 2
Tag b, doc count: 1
Tag c, doc count: 1
Hope that helps.

This sounds like a mapping issue. You need to make sure that you are using keyword opposed to text. You do not want the field to be analyzed when using aggregations.

Related

Insert a Dynamic into Elastic Document through nest

var doc = new ESDoc {
Field1 = "test1",
Field2 = 3,
ExtraData = 'dynamic object',
Index = "myindex"
};
ElasticClient.Index(doc, s => s.Index(doc.Index));
This is in essence what I am trying to do. I have a document object, and I am wanting to add to it a dynamic object that allows us to through whatever customer specific data we want in there. I have no need to ever search or do any querying on it, I just need to hold it for CYA issues.
This results in ExtraData having a value_kind = 1.
I tried to JsonSerializer.Serialize the data and it came out in a triple escaped string.
I have seen people trying to create a entire document of dynamic data, and using a object cast, but I feel that isnt the answer here because I have a document that I want to add a dynamic object too.
NEST and Elaticsearch.Net 7.16.0
You can do this with the following:
You need to have a type to model the dynamic behaviour of the field. Now, you could specify it as dynamic on the ESDoc class, but you'll need an instance of a type to assign to it and Nest deserializes JSON to a Dictionary<string, object> for a dynamic member. Because of this, you might want to use the DynamicDictionary type in Elasticsearch.Net (a dependency of Nest) that is a dictionary with dynamic access behaviour
public class ESDoc
{
public string Field1 { get; set; }
public int Field2 { get; set; }
public DynamicDictionary ExtraData { get; set; }
}
Map ExtraData as object data type with enabled: false, so that it is not parsed and indexed
var createIndexResponse = client.Indices.Create("myindex", c => c
.Map<ESDoc>(m => m
.Properties(p => p
.Text(t => t
.Name(f => f.Field1)
)
.Number(t => t
.Name(f => f.Field2)
.Type(NumberType.Integer)
)
.Object<object>(o => o
.Name(f => f.ExtraData)
.Enabled(false)
)
)
)
);
Now index a doc, refresh the index, and search for it to ensure it deserializes as expected
dynamic extraData = new DynamicDictionary();
extraData.foo = "Foo";
extraData.bar = new DynamicDictionary
{
["baz"] = new DynamicValue("Baz")
};
var doc = new ESDoc
{
Field1 = "test1",
Field2 = 3,
ExtraData = extraData
};
client.Index(doc, s => s.Index("myindex"));
client.Indices.Refresh("myindex");
var searchResponse = client.Search<ESDoc>(s => s.Index("myindex"));
var firstDoc = searchResponse.Documents.First();
var baz = firstDoc.ExtraData["bar"]["baz"];
Console.WriteLine($"{baz}");

LINQ GroupBy on single property

I am just not understanding the LINQ non-query syntax for GroupBy.
I have a collection of objects that I want to group by a single property. In this case Name
{ Id="1", Name="Bob", Age="23" }
{ Id="2", Name="Sally", Age="41" }
{ Id="3", Name="Bob", Age="73" }
{ Id="4", Name="Bob", Age="34" }
I would like to end up with a collection of all the unique names
{ Name="Bob" }
{ Name="Sally" }
Based on some examples I looked at I thought this would be the way to do it
var uniqueNameCollection = Persons.GroupBy(x => x.Name).Select(y => y.Key).ToList();
But I ended up with a collection with one item. So I though maybe I was over complicating things with the projection. I tried this
var uniqueNameCollection = Persons.GroupBy(x => x.Name).ToList();
Same result. I ended up with a single item in the collection. What am I doing wrong here? I am just looking to GroupBy the Name property.
var names = Persons.Select(p => p.Name).Distinct().ToList()
If you just want names
LINQ's GroupBy doesn't work the same way that SQL's GROUP BY does.
GroupBy takes a sequence and a function to find the field to group by as parameters, and return a sequence of IGroupings that each have a Key that is the field value that was grouped by and sequence of elements in that group.
IEnumerable<IGrouping<TSource>> GroupBy<TSource, TKey>(
IEnumerable<TSource> sequence,
Func<TSource, TKey> keySelector)
{ ... }
So if you start with a list like this:
class Person
{
public string Name;
}
var people = new List<Person> {
new Person { Name = "Adam" },
new Person { Name = "Eve" }
}
Grouping by name will look like this
IEnumerable<IGrouping<Person>> groups = people.GroupBy(person => person.Name);
You could then select the key from each group like this:
IEnumerable<string> names = groups.Select(group => group.Key);
names will be distinct because if there were multiple people with the same name, they would have been in the same group and there would only be one group with that name.
For what you need, it would probably be more efficient to just select the names and then use Distinct
var names = people.Select(p => p.Name).Distinct();
var uniqueNameCollection = Persons.GroupBy(x => x.Name).Select(y => y.Key).ToList();
Appears valid to me. .net Fiddle showing proper expected outcome: https://dotnetfiddle.net/2hqOvt
Using your data I ran the following code statement
var uniqueNameCollection = people.GroupBy(x => x.Name).Select(y => y.Key).ToList();
The return results were List
Bob
Sally
With 2 items in the List
run the following statement and your count should be 2.
people.GroupBy(x => x.Name).Select(y => y.Key).ToList().Count();
Works for me, download a nugget MoreLinq
using MoreLinq
var distinctitems = list.DistinctBy( u => u.Name);

NEST mapping of Dictionary<string,object>

Im trying to use NEST and canĀ“t figure out how to use it together with this class
public class Metric {
public DateTime Timestamp { get; set; }
public Dictionary<string,object> Measurement { get; set; }
}
How do i use the new fluent mapping with a class like this?
Im planning to use i like this:
var mesurements = new Dictionary<string, object>();
mesurements["visits"] = 1;
mesurements["url"] = new string[] {"/help", "/about"};
connection.Index(new Metric() {
Timestamp = DateTime.UtcNow,
Measurement = mesurements
});
Will it be possible to write a query against the dictionary? If I wanted to get all Metrics from yesterday with a mesurenemt with a key name "visits", how will that look like ?
You don't have to use mapping, you can rely on elasticsearch's schemaless nature really well in this case.
The json serializer will write that out as:
{
"timestamp" : "[datestring]",
"measurement" : {
"visits" : 1,
"url" : [ "/help", "/about"]
}
}
You can query for the existence of the "measurement.visits" field like so using NEST.
var result = client.Search<Metric>(s=>s
.From(0)
.Size(10)
.Filter(filter=>filter
.Exists("measurement.visits")
)
);
result.Documents now hold the first 10 metrics with a visits key in the Measurement dictionary.
If you do want to explicitly map possible keys in that dictionary using the new fluent mapping:
var result = client.MapFluent<Metric>(m => m
.Properties(props => props
.Object<Dictionary<string,object>>(s => s
.Name(p => p.Measurement)
.Properties(pprops => pprops
.Number(ps => ps
.Name("visits")
.Type(NumberType.#integer)
)
.String(ps => ps
.Name("url")
.Index(FieldIndexOption.not_analyzed))
)
)
)
)
);
Remember that we haven't turned off dynamic mapping using this mapping so you can still inserts other keys into your dictionary without upsetting elasticsearch. Only now elasticsearch will know visits is an actual integer andwe dont want to analyze the url values.
since we are not using any typed accessors (The .Name() call is typed to Metric) .Object<Dictionary<string,object>> could be .Object<object> too.

Getting all tags from my different records

I use EntityFramework on my ASP.NET MVC project.
Let's say I have the entity below:
public class Project
{
public int ProjectID { get; set; }
public string Description { get; set; }
public string Tags { get; set; }
}
Lets say I have the following data in my DB:
ProjectID: 1
Description: "My first element"
Tags: "one, three, five, seven"
ProjectID: 2
Description: "My second element"
Tags: "one, two, three, six"
ProjectID: 3
Description: "My third element"
Tags: "two, three, four"
I would like to collect all tags from all my records. So: "one, two, three, four, five, six, seven"
How can I do? This may seems a stupid question but I don't know how to proceed.
Thanks.
You need to use string.Split() to dig out each tag in your list.
HashSet<string> allTags = new HashSet<string>();
foreach(Project project in context.Projects)
{
string tagsList = project.Tags;
string[] separateTags = tagsList.Split(", ", StringSplitOptions.RemoveEmptyEntries);
foreach(string separateTag in separateTags)
{
allTags.Add(separateTag);
}
}
then allTags will contain all your tags. If you want to put them in one big string again, use string.Join.
After splitting the strings you can use SelectMany() to concatenate the collections, and then use Distinct() to remove duplicates.
var tags = context.Projects
.SelectMany(p => p.Tags.Split(", ", StringSplitOptions.RemoveEmptyEntries))
.Distinct();
Here is the query syntax version, which gets transated to a SelectMany() statement behind the scenes:
var tags = (from p in project
from tag in p.Tags.Split(", ", StringSplitOptions.RemoveEmptyEntries)
select tag).Distinct();
Unfortunately, Split() won't translate to SQL, so you have to do that in memory. I'd recommend the following:
var tags = context.Projects
// pull only the tags string into memory
.Select(p => p.Tags)
// operate in memory from this point on
.AsEnumerable()
// remove empty entries so you don't get "" interpreted as one empty tag
.SelectMany(tags => tags.Split(",", StringSplitOptions.RemoveEmptyEntries))
// assuming you don't want leading or trailing whitespace
.Select(tag => tag.Trim())
.ToList();

Using an list in a query in entity framework

I am trying to find a way to pass in an optional string list to a query. What I am trying to do is filter a list of tags by the relationship between them. For example if c# was selected my program would suggest only tags that appear in documents with a c# tag and then on the selection of the next, say SQL, the tags that are linked to docs for those two tags together would be shown, whittling it down so that the user can get closer and closer to his goal.
At the moment all I have is:
List<Tag> _tags = (from t in Tags
where t.allocateTagDoc.Count > 0
select t).ToList();
This is in a method that would be called repeatedly with the optional args as tags were selected.
I think I have been coming at it arse-backwards. If I make two(or more) queries one for each supplied tag, find the docs where they all appear together and then bring out all the tags that go with them... Or would that be too many hits on the db? Can I do it entirely through an entity context variable and just query the model?
Thanks again for any help!
You can try this.
First collect tag to search in a list of strings .
List<string> tagStrings = new List<string>{"c#", "sql"};
pass this list in your query, check whether it is empty or not, if empty, it will return all the tags, else tags which matches the tagStrings.
var _tags = (from t in Tags
where t.allocateTagDoc.Count > 0
&& (tagStrings.Count ==0 || tagStrings.Contains(t.tagName))
select t).ToList();
You can also try this, Dictionary represents ID of a document with it's tags:
Dictionary<int, string[]> documents =
new Dictionary<int, string[]>();
documents.Add(1, new string[] { "C#", "SQL", "EF" });
documents.Add(2, new string[] { "C#", "Interop" });
documents.Add(3, new string[] { "Javascript", "ASP.NET" });
documents.Add(4, new string[] { });
// returns tags belonging to documents with IDs 1, 2
string[] filterTags = new string[] { "C#" };
var relatedTags = GetRelatedTags(documents, filterTags);
Debug.WriteLine(string.Join(",", relatedTags));
// returns tags belonging to document with ID 1
filterTags = new string[] { "C#", "SQL" };
relatedTags = GetRelatedTags(documents, filterTags);
Debug.WriteLine(string.Join(",", relatedTags));
// returns tags belonging to all documents
// since no filtering tags are specified
filterTags = new string[] { };
relatedTags = GetRelatedTags(documents, filterTags);
Debug.WriteLine(string.Join(",", relatedTags));
public static string[] GetRelatedTags(
Dictionary<int, string[]> documents,
string[] filterTags)
{
var documentsWithFilterTags = documents.Where(o =>
filterTags
.Intersect(o.Value).Count() == filterTags.Length);
string[] relatedTags = new string[0];
foreach (string[] tags in documentsWithFilterTags.Select(o => o.Value))
relatedTags = relatedTags
.Concat(tags)
.Distinct()
.ToArray();
return relatedTags;
}
Thought I would pop back and share my solution which was completely different to what I first had in mind.
First I altered the database a little getting rid of a useless field in the allocateDocumentTag table which enabled me to use the entity framework model much more efficiently by allowing me to leave that table out and access it purely through the relationship between Tag and Document.
When I fill my form the first time I just display all the tags that have a relationship with a document. Using my search filter after that, when a Tag is selected in a checkedListBox the Document id's that are associated with that Tag(s) are returned and are then fed back to fill the used tag listbox.
public static List<Tag> fillUsed(List<int> docIds = null)
{
List<Tag> used = new List<Tag>();
if (docIds == null || docIds.Count() < 1)
{
used = (from t in frmFocus._context.Tags
where t.Documents.Count >= 1
select t).ToList();
}
else
{
used = (from t in frmFocus._context.Tags
where t.Documents.Any(d => docIds.Contains(d.id))
select t).ToList();
}
return used;
}
From there the tags feed into the doc search and vice versa. Hope this can help someone else, if the answer is unclear or you need more code then just leave a comment and I'll try and sort it.

Resources