NEST (ElasticSearch) matching Highlights to documents - elasticsearch

I'm using C# NEST with ElasticSearch. I'm able to query an index of Products and look in their Name and CategoryName fields for matches. I can also extend the query using Highlights.
Now in my IQueryResponse response I have two collections: (1) .Documents and (2) .Highlights.
e.g.: Consider the search for: "cat" which has 3 document results:
{
{ Name: "Cat product", CategoryName: "Category1" },
{ Name: "Some product", CategoryName: "Category2" },
{ Name: "Some product2", CategoryName: "Category3" }
}
But now I have 4 highlight results:
{
{ Field: "name", Highlights: ['"<u>Cat</u> product"'] },
{ Field: "categoryName", Highlights: ['"<u>Cat</u>egory1"'] },
{ Field: "categoryName", Highlights: ['"<u>Cat</u>egory2"'] },
{ Field: "categoryName", Highlights: ['"<u>Cat</u>egory3"'] }
}
They seem to be in no way related to each other. How do I know which Highlight item belongs to which Document item?

IQueryResponse also exposes .DocumentsWithMetaData of type IEnumerable<IHit<T>> where T is the type of your document
This is basically the unwrapped view of the results as return by elasticsearch IHit<T> has many useful properties such as the Highlights.
I've added a DocumentId result to the highlight class Highlight so that no matter how you get to the highlight you can relate it back easily to the hit.
So use .DocumentsWithMetaData for now, the next release will have a more logical API for highlights.

here is an updated answer for version 7.x. You receive two collections as before, .Documents and .Hits .
Within .Hits each one has an .Id that matches the _id of the index in elasticsearch. Note: if you request more than one highlighting .NumberofFragments in your query, you will just keep overwriting the result.title and result.content in the code below, so take this as a loose example to indicate how you can match the highlight result to the correct document result, and then overwrite the document field with the one containing the highlighting.
if (response.Documents.Count > 0)
{
foreach (MyResultClass result in response.Documents) //cycle through your results
{
foreach (var hit in response.Hits) // cycle through your hits to look for match
{
if (hit.Id == result.id) //you found the hit that matches your document
{
foreach (var highlightField in hit.Highlight)
{
if (highlightField.Key == "title")
{
foreach (var highlight in highlightField.Value)
{
result.title = highlight.ToString();
}
}
else if (highlightField.Key == "content")
{
foreach (var highlight in highlightField.Value)
{
result.content = highlight.ToString();
}
}
}
}
}
}

Related

Dynamically create pages with Gatsby based on many Contentful references

I am currently using Gatsby's collection routes API to create pages for a simple blog with data coming from Contentful.
For example, creating a page for each blogpost category :
-- src/pages/categories/{contentfulBlogPost.category}.js
export const query = graphql`
query categoriesQuery($category: String = "") {
allContentfulBlogPost(filter: { category: { eq: $category } }) {
edges {
node {
title
category
description {
description
}
...
}
}
}
}
...
[React component mapping all blogposts from each category in a list]
...
This is working fine.
But now I would like to have multiple categories per blogpost, so I switched to Contentful's references, many content-type, which allows to have multiple entries for a field :
Now the result of my graphQL query on field category2 is an array of different categories for each blogpost :
Query :
query categoriesQuery {
allContentfulBlogPost {
edges {
node {
category2 {
id
name
slug
}
}
}
}
}
Output :
{
"data": {
"allContentfulBlogPost": {
"edges": [
{
"node": {
"category2": [
{
"id": "75b89e48-a8c9-54fd-9742-cdf70c416b0e",
"name": "Test",
"slug": "test"
},
{
"id": "568r9e48-t1i8-sx4t8-9742-cdf70c4ed789vtu",
"name": "Test2",
"slug": "test-2"
}
]
}
},
{
"node": {
"category2": [
{
"id": "75b89e48-a8c9-54fd-9742-cdf70c416b0e",
"name": "Test",
"slug": "test"
}
]
}
},
...
Now that categories are inside an array, I don't know how to :
write a query variable to filter categories names ;
use the slug field as a route to dynamically create the page.
For blogposts authors I was doing :
query authorsQuery($author__slug: String = "") {
allContentfulBlogPost(filter: { author: { slug: { eq: $author__slug } } }) {
edges {
node {
id
author {
slug
name
}
...
}
...
}
And creating pages with src/pages/authors/{contentfulBlogPost.author__slug}.js
I guess I'll have to use the createPages API instead.
You can achieve the result using the Filesystem API, something like this may work:
src/pages/category/{contentfulBlogPost.category2__name}.js
In this case, it seems that this approach may lead to some caveats, since you may potentially create duplicated pages with the same URL (slug) because the posts can contain multiple and repeated categories.
However, I think it's more succinct to use the createPages API as you said, keeping in mind that you will need to treat the categories to avoid duplicities because they are in a one-to-many relationship.
exports.createPages = async ({ graphql, actions }) => {
const { createPage } = actions
const result = await graphql(`
query {
allContentfulBlogPost {
edges {
node {
category2 {
id
name
slug
}
}
}
}
}
`)
let categories= { slugs: [], names: [] };
result.data.allContentfulBlogPost.edges.map(({node}))=> {
let { name, slug } = node.category2;
// make some checks if needed here
categories.slugs.push(slug);
categories.names.push(name);
return new Set(categories.slugs) && new Set(categories.names);
});
categories.slugs.forEach((category, index) => {
let name = categories.names[index];
createPage({
path: `category/${category}`,
component: path.resolve(`./src/templates/your-category-template.js`),
context: {
name
}
});
});
}
The code's quite self-explanatory. Basically you are defining an empty object (categories) that contains two arrays, slugs and names:
let categories= { slugs: [], names: [] };
After that, you only need to loop through the result of the query (result) and push the field values (name, slug, and others if needed) to the previous array, making the needed checks if you want (to avoid pushing empty values, or that matches some regular expression, etc) and return a new Set to remove the duplicates.
Then, you only need to loop through the slugs to create pages using createPage API and pass the needed data via context:
context: {
name
}
Because of redundancy, this is the same than doing:
context: {
name: name
}
So, in your template, you will get the name in pageContext props. Replace it with the slug if needed, depending on your situation and your use case, the approach is exactly the same.

Gatsby.js with Ghost CMS: How to query a list of post including same tags of current post

I wish to display a list of posts that are including the same tag/tags than the current post.
I cannot find a way to query the good information.
I am able to make this query with graphiql but not able to reproduce it as I want to replace the $slug variable.
allGhostPost(filter: {tags: {elemMatch: {name: {eq: $slug }}}}) {
nodes {
title
tags {
name
}
}
}
}
As I have access to my current post tags inside of the post.js file I would like to be able to replace $slug by a variable in my component like post.tags.map(tag =>tag but this doesn't seem to be possible.
Do you know a way?
You might benefit from checking out our GraphQL recipes in our docs, they might be good reference points for you: https://ghost.org/docs/api/v3/gatsby/graphql-recipes-for-ghost/
It is possible to add a avriable in the query when looking for tags.
As tags is an array, if you want to filter out artciles exluding or including tags you can pass an array to your variable. But it is necessary to use the [String] scalar type and not String
Here is an example query
likePosts: allGhostPost(
limit: 3
filter: {
id: { ne: $id }
published_at: { lt: $published_at }
tags: { elemMatch: { slug: { in: $tags } } }
}
sort: { fields: published_at, order: DESC }
) {
edges {
node {
title
date: published_at(formatString: "DD MMM YYYY")
}
}
}
When you use the [String] scalar type you have to use in or nin instead of eq or ne
tags: { elemMatch: { slug: { in: $tags } } }
Here we are looking for all post having the same tags
To test it out in graphiql in the query variables section we could add
{
"id": "Ghost__Post__5f8809143f185d9a1c72c199",
"published_at": "2020-10-15T08:32:21.000+00:00",
"tags":["media-buying", "case-study"],
"slug": "affiliate-media-buy-case-study-how-to-deal-with-very-large-scale-networks-2"
}
Here we pass an arry to the tags variable

How to adapt query to API?

I'm trying to wrap my head around GraphQL.
Right now I'm just playing with the public API of Artsy (an art website, playground at https://metaphysics-production.artsy.net). What I want to achieve is following:
I want to get all node types entities without declaring them by hand (is there a shortcut for this)?
I want every node with a field type from which I can read the type, without parsing through imageUrl etc. to fint that out.
What I constructed as of right now is this:
{
search(query: "Berlin", first: 100, page: 1, entities: [ARTIST, ARTWORK, ARTICLE]) {
edges {
node {
displayLabel
imageUrl
href
}
}
}}
Very primitive I guess. Can you guys help me?
TL;DR:
1) There is no shortcut, it's not something GraphQL offers out of the box. Nor is it something I was able to find via their Schema.
2) Their returned node of type Searchable does not contain a property for type that you're looking for. But you can access it via the ... on SearchableItem (union) syntax.
Explanation:
For question 1):
Looking at their schema, you can see that their search query has the following type details:
search(
query: String!
entities: [SearchEntity]
mode: SearchMode
aggregations: [SearchAggregation]
page: Int
after: String
first: Int
before: String
last: Int
): SearchableConnection
The query accepts an entities property of type SearchEntity which looks like this:
enum SearchEntity {
ARTIST
ARTWORK
ARTICLE
CITY
COLLECTION
FAIR
FEATURE
GALLERY
GENE
INSTITUTION
PROFILE
SALE
SHOW
TAG
}
Depending on what your usecase is, if you're constructing this query via some code, then you can find out which SearchEntity values they have:
{
__type(name: "SearchEntity") {
name
enumValues {
name
}
}
}
Which returns:
{
"data": {
"__type": {
"name": "SearchEntity",
"enumValues": [
{
"name": "ARTIST"
},
{
"name": "ARTWORK"
},
...
}
}
}
then store them in an array, omit the quotation marks from the enum and pass the array back to the original query directly as an argument.
Something along the lines of this:
query search($entities: [SearchEntity]) {
search(query: "Berlin", first: 100, page: 1, entities: $entities) {
edges {
node {
displayLabel
imageUrl
href
}
}
}
}
and in your query variables section, you just need to add:
{
"entities": [ARTIST, ARTWORK, ...]
}
As for question 2)
The query itself returns a SearchableConnection object.
type SearchableConnection {
pageInfo: PageInfo!
edges: [SearchableEdge]
pageCursors: PageCursors
totalCount: Int
aggregations: [SearchAggregationResults]
}
Digging deeper, we can see that they have edges, of type SearchableEdge - which is what you're querying.
type SearchableEdge {
node: Searchable
cursor: String!
}
and finally, node of type Searchable which contains the data you're trying to access.
Now, the type Searchable doesn't contain type:
type Searchable {
displayLabel: String
imageUrl: String
href: String
}
But, if you look at where that Searchable type is implemented, you can see SearchableItem - which contains the property of displayType - which doesn't actually exist in Searchable.
You can access the property of SearchableItem and get the displayType, like so:
{
search(query: "Berlin", first: 100, page: 1, entities: [ARTIST, ARTWORK, ARTICLE]) {
edges {
node {
displayLabel
imageUrl
href
... on SearchableItem {
displayType
}
}
}
}
}
and your result will look like this:
{
"data": {
"search": {
"edges": [
{
"node": {
"displayLabel": "Boris Berlin",
"imageUrl": "https://d32dm0rphc51dk.cloudfront.net/CRxSPNyhHKDIonwLKIVmIA/square.jpg",
"href": "/artist/boris-berlin",
"displayType": "Artist"
}
},
...

GraphQL disable filtering if filter variable is empty

I have a Gatsby GraphQL query for a list of posts ordered by date and filtered by category.
{
posts: allContentfulPost(
sort: {fields: [date], order: DESC},
filter: {category: {slug: {eq: $slug}}}
) {
edges {
node {
title {
title
}
date
}
}
}
}
Right now when $slug is the empty string "", I get
{
"data": {
"posts": null
}
}
Is there a way to get all posts instead?
You can use the regex filter to your advantage. If you pass an empty expression, then all posts will be returned because everything will match.
query Posts($slugRegex: String = "//"){
posts: allContentfulPost(
sort: {fields: [date], order: DESC},
filter: {category: {slug: {eq: $slugRegex}}}
) {
# Rest of the query.
}
}
By default, all posts will be returned (the $slugRegex is an empty regex if nothing was passed). When the $slugRegex becomes a meaningful expression, then only matching posts will show up.
As for passing the value, I'm assuming you're using gatsby-node.js to create pages. In that case, it's as simple as that:
// gatsby-node.js
exports.createPages = async ({ actions }) => {
const { createPage } = actions
// Create a page with only "some-slug" posts.
createPage({
// ...
context: {
slugRegex: "/some-slug/"
}
})
// Create a page with all posts.
createPage({
// ...
context: {
// Nothing here. Or at least no `slugRegex`.
}
})
}
It's not possible with this query, even #skip/#include directives won't help because you can't apply them on input fields.
I would suggest to either adjust the server side logic so that null in the 'eq' field will ignore this filter or either to edit the query being sent (less favorable imo).
It seems that the graphql schema that you work against lacks the filtering support you need..
If anyone requires a solution for other systems than Gatsby this can be accomplished using #skip and #include.
fragment EventSearchResult on EventsConnection {
edges {
cursor
node {
id
name
}
}
totalCount
}
query Events($organizationId: UUID!, $isSearch: Boolean!, $search: String!) {
events(condition: { organizationId: $organizationId }, first: 100)
#skip(if: $isSearch) {
...EventSearchResult
}
eventsSearch: events(
condition: { organizationId: $organizationId }
filter: { name: { likeInsensitive: $search } }
first: 100
) #include(if: $isSearch) {
...EventSearchResult
}
}
Then in your client code you would provide search and isSearch to the query and get your events like:
const events = data.eventsSearch || data.events

Tire search return terms by first letter

I'm using Tire/ElasticSearch to create an alphabetical browse of all the tags in my database. However, the tire search returns the tag I want as well as all the other tags associated to the same item. So, for example, if my letter was "A" and an item had the tags 'aardvark' and 'biscuit', both 'aardvark' and 'biscuit' would show up as results for the 'A' query. How can I construct this so that I only get 'aardvark'?
def explore
#get alphabetical tire results with term and count only
my_letter = "A"
self.search_result = Tire.search index_name, 'tags' => 'count' do
query {string 'tags:' + my_letter + '*'}
facet 'tags' do
terms 'tags', :order => 'term'
end
end.results
end
Mapping:
{
items: {
item: {
properties: {
tags: {
type: "string",
index_name: "tag",
index: "not_analyzed",
omit_norms: true,
index_options: "docs"
},
}
}
}
}
Following things that you'll need to change:
Mapping
You need to map the tags properly in order to search through them. And as your tags, are inside you item document, you need to set the properties of tags as nested, so that you can apply your search query in the facets too. Here is the mapping that you need to set:
{
item: {
items: {
properties: {
tags: {
properties: {
type: "nested",
properties: {
value: {
type: "string",
analyzer: 'not_analyzed'
}
}
}
}
}
}
}
}
Query
Now, you can use prefix query to search through the tags that start with a certain letter and get the facets, Here is the complete query:
query: {
nested: {
path: "tags",
query: {
prefix: {
'tags.value' : 'A'
}
}
}
}
facets: {
words: {
terms: {field: 'tags.value'},
type: 'nested',
facet_filter: {prefix: {
'tags.value' : 'A'
}
}
}
}
Facet filter is applied while computing facets, so you'll only get the facets which will match your criteria. I preferred prefix query over regular exp. query because of performance issues. But I am not quite sure whether prefix query works for your problem. Let me know it it doesn't work.

Resources