I'm trying to mimic a query that I wrote in Sense (chrome plugin) using NEST in C#. I can't figure out what the difference between the two queries is. The Sense query returns records while the nest query does not. The queries are as follows:
var searchResults = client.Search<File>(s => s.Query(q => q.Term(p => p.fileContents, "int")));
and
{
"query": {
"term": {
"fileContents": {
"value": "int"
}
}
}
What is the difference between these two queries? Why would one return records and the other not?
You can find out what query NEST uses with the following code:
var json = System.Text.Encoding.UTF8.GetString(searchResults.RequestInformation.Request);
Then you can compare the output.
I prefer this slightly simpler version, which I usually just type in .NET Immediate window:
searchResults.ConnectionStatus;
Besides being shorter, it also gives the url, which can be quite helpful.
? searchResults.ConnectionStatus;
{StatusCode: 200,
Method: POST,
Url: http://localhost:9200/_all/filecontent/_search,
Request: {
"query": {
"term": {
"fileContents": {
"value": "int"
}
}
}
}
Try this:
var searchResults2 = client.Search<File>(s => s
.Query(q => q
.Term(p => p.Field(r => r.fileContents).Value("int")
)
));
Followup:
RequestInformation is not available in newer versions of NEST.
I'd suggest breaking down your code in steps (Don't directly build queries in client.Search() method.
client.Search() takes Func<SearchDescriptor<T>, ISearchRequest> as input (parameter).
My answer from a similar post:
SearchDescriptor<T> sd = new SearchDescriptor<T>()
.From(0).Size(100)
.Query(q => q
.Bool(t => t
.Must(u => u
.Bool(v => v
.Should(
...
)
)
)
)
);
And got the deserialized JSON like this:
{
"from": 0,
"size": 100,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
...
]
}
}
]
}
}
}
It was annoying, NEST library should have something that spits out the JSON from request. However this worked for me:
using (MemoryStream mStream = new MemoryStream()) {
client.Serializer.Serialize(sd, mStream);
Console.WriteLine(Encoding.ASCII.GetString(mStream.ToArray()));
}
NEST library version: 2.0.0.0.
Newer version may have an easier method to get this (Hopefully).
Related
I've been trying to build this ElasticSearch Query on the Danish CVR database API so far without success. Basically I'm trying to find companies where
The company has a relationship with "deltager" (participant) with "enhedsNummer" (ID) equal NUMBER
The relationship is still active, i.e. the "end of period" field is null
How do I construct a query that has multiple conditions like this?
'query': {
'bool': {
'must': [
{
'term': {'Vrvirksomhed.deltagerRelation.deltager.enhedsNummer': NUMBER},
AND
'term': {'Vrvirksomhed.deltagerRelation.organisationer.attributter.vaerdier.periode.gyldigTil': null}
},
],
},
},
}
FYI: database mapping may be found at http://distribution.virk.dk/cvr-permanent/_mapping
You can try:
GET /cvr-permanent/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"Vrvirksomhed.deltagerRelation.deltager.enhedsNummer": {
"value": "your_value_here"
}
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.deltagerRelation.organisationer.attributter.vaerdier.periode.gyldigTil"
}
}
]
}
}
}
Trick here is to use must_not/exist for nil values.
P.S. I cannot check it because it requires authorisation.
It doesn't appear like ElasticSearch Queries are as dynamic as I had wanted (or I don't know how use them). Instead, it appears that the Python code below is the best choice for generating the desired outcome:
import requests
import pandas as pd
# creation of empty lists:
virksomhedsnavne = []
virksomhedscvr = []
relation_fra = []
relation_til = []
# Pulling data (apparently limited to 3000 elements at a time):
for i in range(20):
if i == 0:
highestcvrnummer = 0
else:
highestcvrnummer = max(virksomhedscvr)
headers = {
'Content-Type': 'application/json',
}
json_data = {
"_source": ["Vrvirksomhed.cvrNummer", "Vrvirksomhed.navne", "Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn", "Vrvirksomhed.deltagerRelation"],
"sort" : [{"Vrvirksomhed.cvrNummer" : {"order":"asc"}}],
"query": {
"bool": {
"must": [
{
"term": {
"Vrvirksomhed.deltagerRelation.deltager.enhedsNummer": "some_value"
}
},
{
"range":{
"Vrvirksomhed.cvrNummer": {
"gt": highestcvrnummer
}
}
}
]
}
},
'size': 3000
}
response = requests.post('http://distribution.virk.dk/cvr-permanent/virksomhed/_search', headers=headers, json=json_data, auth=('USERNAME', 'PASSWORD'))
json_data = response.json()['hits']['hits']
# Aggregate and format data neatly
for data in json_data:
virksomhed_data = data['_source']['Vrvirksomhed']
virksomhedscvr.append(virksomhed_data['cvrNummer'])
try:
virksomhedsnavne.append(virksomhed_data['virksomhedMetadata']['nyesteNavn']['navn'])
except:
virksomhedsnavne.append(virksomhed_data['navne'][0]['navn'])
# Loop through all "deltagere" and find match with value
for relation in virksomhed_data['deltagerRelation']:
# If match found
if relation['deltager']['enhedsNummer'] == some_value:
# Make sure most recent period is chosen
antalopdateringer = len(relation['organisationer'])-1
relation_gyldig = relation['organisationer'][antalopdateringer]['medlemsData'][0]['attributter'][0]['vaerdier'][0]['periode']
relation_fra.append(relation_gyldig['gyldigFra'])
relation_til.append(relation_gyldig['gyldigTil'])
break
#export to excel
dict = {'CVR nummer':virksomhedscvr, 'navn':virksomhedsnavne, 'Relation fra':relation_fra, 'Relation til':relation_til}
df = pd.DataFrame(dict)
df.to_excel("output.xlsx")
If anyone else is working with the Danish CVR register's API, I hope this helps!
Also, if you find a better solution, please let me know :)
I have some data and I'm trying to add an extra filter that will exclude/filter-out any results which is where the key/value is foo.IsMarried == true.
Now, there's heaps of documents that don't have this field. If the field doesn't exist, then I'm assuming that the value is foo.IsMarried = false .. so those documents will be included in the result set.
Can anyone provide any clues, please?
I'm also using the .NET 'NEST' nuget client library - so I'll be really appreciative if the answer could be targeting that, but just happy with any answer, really.
Generally, within elasticsearch, for a boolean field, if the field doesn't exist, it doesn't mean that it's value is false. It could be that there is no value against it.
But, based on the assumption you are making in this case - we can check if the field foo.isMarried is explicitly false OR it does not exist in the document itself.
The query presented by Rahul in the other answer does the job. However since you wanted a NEST version of the same, the query can be constructed using the below snippet of code.
// Notice the use of not exists here. If you do not want to check for the 'false' value,
// you can omit the first term filter here. 'T' is the type to which you are mapping your index.
// You should pass the field based on the structure of 'T'.
private static QueryContainer BuildNotExistsQuery()
{
var boolQuery = new QueryContainerDescriptor<T>().Bool(
b => b.Should(
s => s.Term(t => t.Field(f => f.foo.IsMarried).Value(false)),
s => !s.Exists(ne => ne.Field(f => f.foo.IsMarried))
)
);
}
You can trigger the search through the NEST client within your project as shown below.
var result = client.Search<T>(
.From(0)
.Size(20)
.Query(q => BuildNotExistsQuery())
// other methods that you want to chain go here
)
You can use a should query with following conditions.
IsMarried = false
must not exists IsMarried
POST test/person/
{"name": "p1", "IsMarried": false}
POST test/person/
{"name": "p2", "IsMarried": true}
POST test/person/
{"name": "p3"}
Raw DSL query
POST test/person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"IsMarried": false
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "IsMarried"
}
}
}
}
]
}
}
}
I hope you can convert this raw DSL query to NEST!
I am using nest for elasticsearch querying. Here is my query.
Client.Search<Model>(
a =>
a.Query(
b => b.Bool(c => c.Must(d => d.Script(e => e.Inline("doc['firstname'].value == doc['lastname'].value"))))));
My intention is here to get the records whose first name and last name is equal. But elasticsearch query is working in sense. Here is that query
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['firstname'].value == doc['lastname'].value"
}
}
}
}
But I am getting script doesn't support inline response with nest
filtered queries are deprecated in Elasticsearch 2.x and the queries and filters have merged into one concept called queries, which act as both queries and filters, depending on the context in which they are used (much simpler!)
You can rewrite the query that you have using NEST 2.x (which is only compatible with Elasticsearch 2.x)
client.Search<Model>(a => a
.Query(b => b
.Bool(c => c
.Filter(d => d
.Script(e => e
.Inline("doc['firstname'].value == doc['lastname'].value")
)
)
)
)
);
which can be shortened even further using the + unary operator on the query descriptor
client.Search<Model>(a => a
.Query(b => +b
.Script(e => e
.Inline("doc['firstname'].value == doc['lastname'].value")
)
)
);
Both produce the following query json
{
"query": {
"bool": {
"filter": [
{
"script": {
"inline": "doc['firstname'].value == doc['lastname'].value"
}
}
]
}
}
}
Groovy (the default)dynamic scripting is off by default since Elasticsearch 1.4.3, so you will need to enable this for inline scripts.
I'm using the object initializer syntax with NEST to form a search query. When I include the second pdfQuery with the logical OR operator, I get no results. If I exclude it, I get results.
QueryContainer titleQuery = new MatchQuery
{
Field = Property.Path<ElasticBook>(p => p.Title),
Query = query,
Boost = 50,
Slop = 2,
MinimumShouldMatch = "55%"
};
QueryContainer pdfQuery = new MatchQuery
{
Field = Property.Path<ElasticBook>(p => p.Pdf),
Query = query,
CutoffFrequency = 0.001
};
var result = _client.Search<ElasticBook>(new SearchRequest("bookswithstop", "en")
{
From = 0,
Size = 10,
Query = titleQuery || pdfQuery,
Timeout = "20000",
Fields = new []
{
Property.Path<ElasticBook>(p => p.Title)
}
});
If I debug and inspect the result var, I copy-value one of request properties to get:
{
"timeout": "20000",
"from": 0,
"size": 10,
"fields": [
"title"
],
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "Proper Guide To Excel 2010",
"slop": 2,
"boost": 50.0,
"minimum_should_match": "55%"
}
}
},
{
"match": {
"pdf": {
"query": "Proper Guide To Excel 2010",
"cutoff_frequency": 0.001
}
}
}
]
}
}
}
The problem is that if I copy that query into sense - it returns about 100 results (albeit slowly). I've checked the header info and that seems to be correct from NEST as well:
ConnectionStatus = {StatusCode: 200,
Method: POST,
Url: http://elasticsearch-blablablamrfreeman/bookswithstop/en/_search,
Request: {
"timeout": "20000",
"from": 0,
"size": 10,
"fields": [
"title"
],
"query": {
"bool": {
"shoul...
The pdf field uses the elastic search attachment plugin (located # https://github.com/elastic/elasticsearch-mapper-attachments) and I was getting Newtonsoft.JSON system.outofmemoryexceptions being thrown before (but not now for some reason).
My only suggestion therefore is that perhaps there's some serialization issue via my query and NEST? If that were the case I'm not sure why it would just execute successfully with a 200 code and give 0 documents in the Documents property
Could anyone please explain to me how I would go about troubleshooting this please? It clearly doesn't like my second search query (pdfQuery) but I'm not sure why - and the resultant JSON request syntax seems to be correct as well!
I think this part is causing problems
Fields = new []
{
Property.Path<ElasticBook>(p => p.Title)
}
When do you use Fields option, elasticsearch is not returning _source field, so you can't access results through result.Documents. Instead, you have to use result.FieldSelections, which is quite unpleasant.
If you want to return only specific fields from elasticsearch and still be able to use result.Documents you can take advantage of source includes / excludes. With NEST you can do this as follows:
var searchResponse = client.Search<Document>(s => s
.Source(source => source.Include(f => f.Number))
.Query(q => q.MatchAll()));
Hope this helps you.
I am currently trying to implement a "function_score" query in NEST, with functions that are only applied when a filter matches.
It doesn't look like FunctionScoreFunctionsDescriptor supports adding a filter yet. Is this functionality going to be added any time soon?
Here's a super basic example of what I'd like to be able to implement:
Runs an ES query, with basic scores
Goes through a list of functions, and adds to it the first score where the filter matches
"function_score": {
"query": {...}, // base ES query
"functions": [
{
"filter": {...},
"script_score": {"script": "25"}
},
{
"filter": {...},
"script_score": {"script": "15"}
}
],
"score_mode": "first", // take the first script_score where the filter matches
"boost_mode": "sum" // and add this to the base ES query score
}
I am currently using Elasticsearch v1.1.0, and NEST v1.0.0-beta1 prerelease.
Thanks!
It's already implemented:
_client.Search<ElasticsearchProject>(s =>
s.Query(q=>q
.FunctionScore(fs=>fs.Functions(
f=>f
.ScriptScore(ss=>ss.Script("25"))
.Filter(ff=>ff.Term(t=>t.Country, "A")),
f=> f
.ScriptScore(ss=>ss.Script("15"))
.Filter(ff=>ff.Term("a","b")))
.ScoreMode(FunctionScoreMode.first)
.BoostMode(FunctionBoostMode.sum))));
The Udi's answer didn't work for me. It seems that in new version (v 2.3, C#) there's no Filter() method on ScoreFunctionsDescriptor class.
But I found a solution. You can provide an array of IScoreFunction. To do that you can use new FunctionScoreFunction() or use my helper class:
class CustomFunctionScore<T> : FunctionScoreFunction
where T: class
{
public CustomFunctionScore(Func<QueryContainerDescriptor<T>, QueryContainer> selector, double? weight = null)
{
this.Filter = selector.Invoke(new QueryContainerDescriptor<T>());
this.Weight = weight;
}
}
With this class, filter can be applied this way (this is just an example):
SearchDescriptor<BlobPost> searchDescriptor = new SearchDescriptor<BlobPost>()
.Query(qr => qr
.FunctionScore(fs => fs
.Query(q => q.Bool(b => b.Should(s => s.Match(a => a.Field(f => f.FirstName).Query("john")))))
.ScoreMode(FunctionScoreMode.Max)
.BoostMode(FunctionBoostMode.Sum)
.Functions(
new[]
{
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.Id).Query("my_id")), 10),
new CustomFunctionScore<BlobPost>(q => q.Match(a => a.Field(f => f.FirstName).Query("john")), 10),
}
)
)
);