ArangoSearch support for multiple fields search - full-text-search

Does ArangoSearch support search on multiple/all fields of a collection. I want to be able to search a text on all fields of a given collection. Does ArangoSearch support such a thing?

You can let a View index all fields (attributes) of your documents very easily:
{
"links": {
"yourCollection": {
"includeAllFields": true
}
},
…
}
In queries you need to be explicit about which fields to search in however:
FOR doc IN yourView
SEARCH doc.field1 == "foo" OR doc.field2 == "foo" OR doc.nested.field == "foo"
RETURN doc
It is not possible (yet) to express this using a wildcard, like SEARCH doc.* == "foo". Possible workarounds are to maintain a separate attribute which combines the content of all the individual fields you want to search in (but you need to make sure that it stays in sync with the source attributes), or to use a query builder of sorts to generate a disjunction like above.

Related

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

How to query object with contains specific key in array in Graphql

I have object to query:
query demo {
demo {
name # string
words # array of strings
}
}
I am sending this query to API, property words is array of strings.
I would like to query all object which contains hello in words (array of strings) property.
It is possible on client side using for example #include ?
And how?
The #skip and #include directives are used to explicitly skip or include a particular field based on the provided if argument. They are normally used in conjunction with variables, so that the same document can be used to effectively generate queries with different selection sets (sets of fields) based on the provided variable.
For example, you can write a single query like this
query demo($includeName: Boolean!) {
demo {
name #include(if: $includeName)
words
}
}
and just pass in a different value for includeName instead of writing two separate queries:
query demoWithName {
demo {
name
words
}
}
query demoWithoutName {
demo {
words
}
}
GraphQL provides built-in mechanism for specifying which fields to return like fragments and the #skip and #include directives. It does not, however, have any built-in mechanism for filtering (or, for that matter, sorting, pagination, etc.). The server has to implement a way to filter fields by adding the appropriate arguments to fields and changing the field resolution logic to account for the values of those arguments.

What is the difference between a field and a property in Elasticsearch?

I'm currently trying to understand the difference between fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) and properties (https://www.elastic.co/guide/en/elasticsearch/reference/current/properties.html).
They are both somehow defined as a "subfield/subproperty" of a type/mapping property, both can have separate types and analyzers (as far as I understood it), both are accessed by the dot notation (mappingProperty.subField or mappingProperty.property).
The docs are using the terms "field" and "property" randomly, I have the feeling, for example:
Type mappings, object fields and nested fields contain sub-fields,
called properties.
What is the difference between properties and (sub-)fields? How do I decide if I have a property or a field?
In other words, how do I decide if I use
{
"mappings": {
"_doc": {
"properties": {
"myProperty": {
"properties": {
}
}
}
}
}
}
or
{
"mappings": {
"_doc": {
"properties": {
"myProperty": {
"fields": {
}
}
}
}
}
}
Subfields are indexed from the parent property source. While sub-properties need to have a "real" value in the document's source.
If your source contains a real object, you need to create properties. Each property will correspond to a different value from your source.
If you only want to index the same value but with different analyzers then use subfields.
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
(sorry I find its hard to explain =| )
Note: This is an explanation from my current understanding. It may not be 100% accurate.
A property is what we used to call field in a RDBMS (a standard relationship db like MySQL). It stores properties of an object and provides the high-level structure for an index (which we can compare to a table in a relational DB).
A field, which is linked (or included) into the property concept, is a way to index that property using a specific analyzer.
So lets say you have:
One analyzer (A) to uppercase
One analyzer (B) to lowercase
One analyzer (C) to translate to Spanish (this doesn't even exist, just to give you an idea)
What an analyzer does is transform the input (the text on a property) into a series of tokens that will be indexed. When you do a search the same analyzer is used so the text is transformed into those tokens, it gives each one a score and then those tokens are used to grab documents from the index.
(A) Dog = DOG
(B) Dog = dog
(C) Dog = perro
To search using a specific field configuration you call it using a dot:
The text field uses the standard analyzer.
The text.english field uses the English analyzer.
So the fields basically allow you to perform searches using different token generation models.

Simple query without a specified field searching in whole ElasticSearch index

Say we have an ElasticSearch instance and one index. I now want to search the whole index for documents that contain a specific value. It's relevant to the search for this query over multiple fields, so I don't want to specify every field to search in.
My attempt so far (using NEST) is the following:
var res2 = client.Search<ElasticCompanyModelDTO>(s => s.Index("cvr-permanent").AllTypes().
Query(q => q
.Bool(bo => bo
.Must( sh => sh
.Term(c=>c.Value(query))
)
)
));
However, the query above results in an empty query:
I get the following output, ### ES REQEUST ### {} , after applying the following debug on my connectionstring:
.DisableDirectStreaming()
.OnRequestCompleted(details =>
{
Debug.WriteLine("### ES REQEUST ###");
if (details.RequestBodyInBytes != null) Debug.WriteLine(Encoding.UTF8.GetString(details.RequestBodyInBytes));
})
.PrettyJson();
How do I do this? Why is my query wrong?
Your problem is that you must specify a single field to search as part of a TermQuery. In fact, all ElasticSearch queries require a field or fields to be specified as part of the query. If you want to search every field in your document, you can use the built-in "_all" field (unless you've disabled it in your mapping.)
You should be sure you really want a TermQuery, too, since that will only match exact strings in the text. This type of query is typically used when querying short, unanalyzed string fields (for example, a field containing an enumeration of known values like US state abbreviations.)
If you'd like to query longer full-text fields, consider the MultiMatchQuery (it lets you specify multiple fields, too.)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
Try this
var res2 = client.Search<ElasticCompanyModelDTO>(s =>
s.Index("cvr-permanent").AllTypes()
.Query(qry => qry
.Bool(b => b
.Must(m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(query))))));
The existing answers rely on the presence of _all. In case anyone comes across this question at a later date, it is worth knowing that _all was removed in ElasticSearch 6.0
There's a really good video explaining the reasons behind this and the way the replacements work from ElasticOn starting at around 07:30 in.
In short, the _all query can be replaced by a simple_query_string and it will work with same way. The form for the _search API would be;
GET <index>/_search
{
"query": {
"simple_query_string" : {
"query": "<queryTerm>"
}
}
}
The NEST pages on Elastic's documentation for this query are here;

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Resources