GraphQL on clause with enum type

GraphQL on clause with enum type - graphql

I have a question regarding GraphQL because I do not know if it is possible or not.
I have a simple scheme like this:
enum Range{
D,
D_1,
D_7
}
type Data {
id: Int!
levels(range: [Range!]):[LevelEntry]
}
type LevelEntry{
range: Range!
levelData: LevelData
}
type LevelData {
range: Range!
users: Int
name: String
stairs: Int
money: Float
}
Basically I want to do a query so I can retrieve different attributes for the different entries on the levelData property of levels array which can be filtered by some levels range.
For instance:
data {
"id": 1,
"levels": [
{
"range": D,
"levelData": {
"range": D,
"users": 1
}
},
{
"range": D_1,
"levelData": {
"range": D_1,
"users": 1,
"name": "somename"
}
}
]
This means i want for D "range, users" properties and for D_1 "range,users,name" properties
I have done an example of query but I do not know if this is possible:
query data(range: [D,D_1]){
id,
levels {
range
... on D {
range,
users
}
... on D_1 {
range,
users,
name
}
}
}
Is it possible? If it is how can i do it?

Related

Go Template - remove specific field

I am having following data format:
{"time":"2022-08-24T06:00:00Z","duration":0,"level":"OK","data":{"series":[{"name":"gnb_kpi","tags":{"ID":"1017","_field":"Success_rate%","cluster_id":"ec17-1017","swversion":"6.0"},"columns":["time","_value"],"values":[["2022-08-24T06:00:00Z","100"]]}]},"previousLevel":"CRITICAL","recoverable":true}
I want to remove the _time field from the columns array and similarily the timestamp from values array. The output I want is like this:
{"time":"2022-08-24T06:00:00Z","duration":0,"level":"OK","data":{"series":[{"name":"gnb_kpi","tags":{"ID":"1017","_field":"Success_rate%","cluster_id":"ec17-1017","swVersion":"6.0"},"columns":["_value"],"values":[["100"]]}]},"previousLevel":"CRITICAL","recoverable":true}

I used the online service JSON-to-Go to generate a data structure that corresponds to your input. It produced
type AutoGenerated struct {
Time time.Time `json:"time"`
Duration int `json:"duration"`
Level string `json:"level"`
Data struct {
Series []struct {
Name string `json:"name"`
Tags struct {
ID string `json:"ID"`
Field string `json:"_field"`
ClusterID string `json:"cluster_id"`
Swversion string `json:"swversion"`
} `json:"tags"`
Columns []string `json:"columns"`
Values [][]interface{} `json:"values"`
} `json:"series"`
} `json:"data"`
PreviousLevel string `json:"previousLevel"`
Recoverable bool `json:"recoverable"`
}
The algorithm is simple:
parse JSON into the generated structure
iterate over series
find the position of time field in columns
remove the corresponding data elements from values
https://go.dev/play/p/rjnvmdBXCE4
Output is (beautified)
{
"time": "2022-08-24T06:00:00Z",
"duration": 0,
"level": "OK",
"data": {
"series": [
{
"name": "gnb_kpi",
"tags": {
"ID": "1017",
"_field": "Success_rate%",
"cluster_id": "ec17-1017",
"swversion": "6.0"
},
"columns": [
"_value"
],
"values": [
[
"100"
]
]
}
]
},
"previousLevel": "CRITICAL",
"recoverable": true
}
As you see, no time

Finding common values of array field of two documents in ElasticSearch

I have two documents in Elastic search with the following values
uid preferences
1 [10,20,30,40,50,60,70,80,100]
2 [20,70,30,100,1000,77,45]
Is there any way we can do array intersect on preferences for these two records and get the result [20,70,30,100] ? Currently we are getting these two records to app server and doing intersect , but wanted to check if there is any direct way of getting the intersect values from Elasticsearch directly .Thank You .

I'd solve this using a parameterized scripted metric aggregation. Here's a more readable version:
{
"size": 0,
"query": {
"terms": {
"id": [
1,
2
]
}
},
"aggs": {
"preferences_intersection": {
"scripted_metric": {
"init_script": "state.shared_vals = [];",
"map_script": "state.shared_vals.addAll(new ArrayList(doc['preferences']));",
"combine_script": """
return state.shared_vals.stream()
.filter(i -> Collections.frequency(state.shared_vals, i) >= params.compared_docs_count)
.sorted((o1, o2) -> o1.compareTo(o2))
.collect(Collectors.toCollection(TreeSet::new))
""",
"reduce_script": "return states[0]",
"params": {
"compared_docs_count": 2
}
}
}
}
}
Notice how the terms query was applied along with params.compared_docs_count so we can check the # of occurrences of the common values.
Here's the compact version of the query without triple quotes:
{"size":0,"query":{"terms":{"id":[1,2]}},"aggs":{"preferences_intersection":{"scripted_metric":{"init_script":"state.shared_vals = [];","map_script":"state.shared_vals.addAll(new ArrayList(doc['preferences']));","combine_script":" return state.shared_vals.stream()\n .filter(i -> Collections.frequency(state.shared_vals, i) >= params.compared_docs_count)\n .sorted((o1, o2) -> o1.compareTo(o2))\n .collect(Collectors.toCollection(TreeSet::new))","reduce_script":"return states[0]","params":{"compared_docs_count":2}}}}}

Elasticsearch aggregation float type losing precision

If you use Elasticsearch 5.5 with Dynamic field mapping
and use double values. These values are getting the float type when I check in the mappings. When you are using an aggregation than the key in the buckets will be losing precision. the Value 0.62 would be something like 0.6200000047683716.
Code fragment
"aggregations": {
"float_numbers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0.6200000047683716,
"doc_count": 1
}
]
}
}
Here is the same issue described.
link
I am posting this issue because I found an appropriate solution which I not yet have seen but it helped me a lot.
The solution is to make the float a double. This can be achieved with Dynamic templates.
dynamic templates
dynamic field mapping
Example solution:
Add dynamic_templates in index there are no items yet.
PUT term-test
{
"mappings": {
"demo_typ": {
"dynamic_templates": [
{
"all_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
]
}
}
}
Add data
POST term-test/demo_typ
{
"numeric_field": 0.62,
"long_filed": 44
}
Check mapping
GET term-test/_mapping
Do aggregation
GET term-test/_search
{
"query": {
"match_all": {}
},
"aggs": {
"float_numbers": {
"terms": {
"field": "numeric_field"
}
}
}
}
In the Java Api you can do the following
1: First create the index
elasticClient.admin()
.indices()
.prepareCreate(indexName)
.execute()
.actionGet();
2: Update the mapping
JSON
{
"dynamic_templates": [
{
"all_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
]
}
Json to XContentBuilder I got the code from link
public XContentBuilder getXContentBuilderFromJson(final String json) {
try {
Map<String, Object> map = new ObjectMapper().readValue(json, new TypeReference<Map<String, Object>>() {});
return XContentFactory.jsonBuilder().map(map);
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
Update mapping
elasticClient.admin().indices()
.preparePutMapping(indexName)
.setType(yourType)
.setSource(getXContentBuilderFromJson(json))
.execute()
.actionGet();
3: Insert data

Numbers lose precision. This is because of how floating-point numbers work: 9.62 can't be expressed as a * 2 ^ b so neither doubles nor floats can represent it accurately.
Because floats and doubles cannot accurately represent a value, it is generally a bad idea to run terms aggregations on them.
As a workaround you can do Math.round after you did the aggregation

GraphQL fallback query if no results

I have the following query:
{
entity(id: "theId") {
source1: media(source: 1){
images{
src, alt
}
}
source2: media(source: 2){
images{
src, alt
}
}
}
}
That give me a result like:
{
"entity": [
{
"source1": {
"images": [{"src": "", "alt": ""}]
},
"source2": {
"images": [{"src": "", "alt": ""}]
}
}
]
}
Is there a way to have a single result of source1 and source2, executing source1 and if it has no result it use source2 as fallback?

You are querying two fields (source1, source2) so something has to come back for both of them (null being a possible option). If you want to check them in a sequence you should probably break the query in two and run them one at the time from the client.
Could you perhaps change so you only query a single source field and have the resolver (on the server) return what makes sense based on what is available, so to speak? Like this:
{
entity(id: "theId") {
source: media(sourcesList: [1, 2]){
images{
src, alt
}
}
}
}
where sourceList is the sources to try, in order. So the resolver (server) can then check if source 1 is available and if not return source 2.
You could also add a field to let the client know which source was actually returned from the proposed list (sourceNumberReturned below would return 1 if source 1 was returned, otherwise 2).
{
entity(id: "theId") {
source: media(sourcesList: [1, 2]){
images{
src, alt
}
sourceNumberReturned
}
}
}

How can I do this in painless script Elasticsearch 5.3

We're trying to replicate this ES plugin https://github.com/MLnick/elasticsearch-vector-scoring. The reason is AWS ES doesn't allow any custom plugin to be installed. The plugin is just doing dot product and cosine similarity so I'm guessing it should be really simple to replicate that in painless script. It looks like groovy scripting is deprecated in 5.0.
Here's the source code of the plugin.
/**
* #param params index that a scored are placed in this parameter. Initialize them here.
*/
#SuppressWarnings("unchecked")
private PayloadVectorScoreScript(Map<String, Object> params) {
params.entrySet();
// get field to score
field = (String) params.get("field");
// get query vector
vector = (List<Double>) params.get("vector");
// cosine flag
Object cosineParam = params.get("cosine");
if (cosineParam != null) {
cosine = (boolean) cosineParam;
}
if (field == null || vector == null) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": field or vector parameter missing!");
}
// init index
index = new ArrayList<>(vector.size());
for (int i = 0; i < vector.size(); i++) {
index.add(String.valueOf(i));
}
if (vector.size() != index.size()) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": index and vector array must have same length!");
}
if (cosine) {
// compute query vector norm once
for (double v: vector) {
queryVectorNorm += Math.pow(v, 2.0);
}
}
}
#Override
public Object run() {
float score = 0;
// first, get the ShardTerms object for the field.
IndexField indexField = this.indexLookup().get(field);
double docVectorNorm = 0.0f;
for (int i = 0; i < index.size(); i++) {
// get the vector value stored in the term payload
IndexFieldTerm indexTermField = indexField.get(index.get(i), IndexLookup.FLAG_PAYLOADS);
float payload = 0f;
if (indexTermField != null) {
Iterator<TermPosition> iter = indexTermField.iterator();
if (iter.hasNext()) {
payload = iter.next().payloadAsFloat(0f);
if (cosine) {
// doc vector norm
docVectorNorm += Math.pow(payload, 2.0);
}
}
}
// dot product
score += payload * vector.get(i);
}
if (cosine) {
// cosine similarity score
if (docVectorNorm == 0 || queryVectorNorm == 0) return 0f;
return score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
} else {
// dot product score
return score;
}
}
I'm trying to start with just getting a field from index. But I'm getting error.
Here's the shape of my index.
I've enabled delimited_payload_filter
"settings" : {
"analysis": {
"analyzer": {
"payload_analyzer": {
"type": "custom",
"tokenizer":"whitespace",
"filter":"delimited_payload_filter"
}
}
}
}
And I have a field called #model_factor to store a vector.
{
"movies" : {
"properties" : {
"#model_factor": {
"type": "text",
"term_vector": "with_positions_offsets_payloads",
"analyzer" : "payload_analyzer"
}
}
}
}
And this is the shape of the document
{
"#model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
"name": "Test 1"
}
Here's how I use the script
{
"query": {
"function_score": {
"query" : {
"query_string": {
"query": "*"
}
},
"script_score": {
"script": {
"inline": "def termInfo = doc['_index']['#model_factor'].get('1', 4);",
"lang": "painless",
"params": {
"field": "#model_factor",
"vector": [0.1,2.3,-1.6,0.7,-1.3],
"cosine" : true
}
}
},
"boost_mode": "replace"
}
}
}
And this is the error I got.
"failures": [
{
"shard": 2,
"index": "test",
"node": "ShL2G7B_Q_CMII5OvuFJNQ",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"caused_by": {
"type": "wrong_method_type_exception",
"reason": "wrong_method_type_exception: cannot convert MethodHandle(List,int)int to (Object,String)String"
},
"script_stack": [
"termInfo = doc['_index']['#model_factor'].get('1',4);",
" ^---- HERE"
],
"script": "def termInfo = doc['_index']['#model_factor'].get('1',4);",
"lang": "painless"
}
}
]
The question is how do I access the index field to get #model_factor in painless scripting?

Option 1
Due to the fact that #model_factor is a text field, in painless scripting, it would be possible to access it, setting fielddata=true in the mapping. So the mapping should be:
{
"movies" : {
"properties" : {
"#model_factor": {
"type": "text",
"term_vector": "with_positions_offsets_payloads",
"analyzer" : "payload_analyzer",
"fielddata" : true
}
}
}
}
And then it can be scored accessing doc-values:
{
"query": {
"function_score": {
"query" : {
"query_string": {
"query": "*"
}
},
"script_score": {
"script": {
"inline": "return Double.parseDouble(doc['#model_factor'].get(1)) * params.vector[1];",
"lang": "painless",
"params": {
"vector": [0.1,2.3,-1.6,0.7,-1.3]
}
}
},
"boost_mode": "replace"
}
}
}
Problems with Option 1
So it is possible to access the field data value setting fielddata=true, but in this case, the value is the vector index as a term, not the value of the vector which is stored in the payload. Unfortunately, it looks like there is no way to access the Token Payload (where the real vector index value is stored) using painless scripting and doc-values. See the source code for elasticsearch and another similar question re: accessing term info.
So the answer is that using painless scripting is NOT possible to access the payload.
I tried also to store the vector values with a simple pattern tokenizer but when accessing the term vector values the order is not preserved, and this is probably the reason for which the author of the plugin decided to use the term as a string and then retrieve the position 0 of the vector as the term "0" and then find the real vector value in the payload.
Option 2
A very simple alternative is to use n fields in the documents, each of them represents a position in the vector, so in your example, we have a 5 dim vector with values stored in v0...v4 directly as double:
{
"#model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
"name": "Test 1",
"v0" : 1.2,
"v1" : 0.1,
"v2" : 0.4,
"v3" : -0.2,
"v4" : 0.3
}
and then the painless scripting should be:
{
"query": {
"function_score": {
"query" : {
"query_string": {
"query": "*"
}
},
"script_score": {
"script": {
"inline": "return doc['v0'].getValue() * params.vector[0];",
"lang": "painless",
"params": {
"vector": [0.1,2.3,-1.6,0.7,-1.3]
}
}
},
"boost_mode": "replace"
}
}
}
It should be easily possible to iterate on the input vector length and get the fields dynamically to calculate the dot product modifying doc['v0'].getValue() * params.vector[0] that I wrote for simplicity.
Problems with Option2
Option 2 is viable as long as the vector dimension remains not big. I think that default Elasticsearch max number of fields per document is 1000, but it can be changed also in AWS environment:
curl -X PUT \
'https://.../indexName/_settings' \
-H 'cache-control: no-cache' \
-H 'content-type: application/json'
-d '{
"index.mapping.total_fields.limit": 2000
}'
Moreover, it should be tested also the script speed on a large number of documents.
Maybe in re-scoring / re-ranking scenarios, it is a viable solution.
Option 3
The third option is really an experiment and the most fascinating in my opinion.
It tries to exploit the internal Elasticsearch representation of the Vector Space Model and does not use any scripting to score but reuse the default similarity score based on tf/idf.
Lucene, that seats at Elasticsearch core, is already using internally a modification of the cosine similarity to calculate the similarity score between documents in his Vector Space Model representation of terms as the formula below, taken from the TFIDFSImilarity javadoc, shows:
In particular, the weights of the vector representing the field are the tf/idf values of the terms of that field.
So we could index a document with termvectors, using as term the index of the vector. If we repeat it N times, we represent the value of the vector, exploiting the tf part of the scoring formula.
This means that the domain of the vector should be transformed and rescaled in {1.. Infinite} Positive Integer numbers domain. We start from 1 so that we are sure that all the documents contain all the terms, it will make it easier to exploit the formula.
For example, the vector: [21, 54, 45] can be indexed as a field in a document using a simple whitespace analyzer and the following value:
{
"#model_factor" : "0<repeated 21 times> 1<repeated 54 times> 2<repeated 45 times>",
"name": "Test 1"
}
then to query, i.e. calculate the dot product, we boost the single terms that represent the index position of the vector.
So using the same example above the input vector: [45, 1, 1] will be transformed in the query:
"should": [
{
"term": {
"#model_factor": {
"value": "0",
"boost": 45
}
}
},
{
"term": {
"#model_factor": "1" // boost:1 by default
}
},
{
"term": {
"#model_factor": "2" // boost:1 by default
}
}
]
norm(t,d) should be disabled in the mapping so that it is not used in the formula above. The idf part is constant for all the documents because all of them contains all the terms (having all the vectors the same dimension).
queryNorm(q) is the same for all the documents in the formula above so it is not a problem.
coord(q,d) is a constant because all the documents contain all the terms.
Problems with Option 3
Need to be tested.
It works only for positive numbers vectors, see this question in math stackoverflow for making it works also for negative numbers.
It is not the exact same of a dot product but very close to find similar documents based on raw vectors.
Scalability on large vector dimension can be an issue at querying time because this means we need to do a N dim terms query with different boosting.
I will try it in a test index and edit this question with the results.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

GraphQL on clause with enum type - graphql

Related

Go Template - remove specific field

Finding common values of array field of two documents in ElasticSearch

Elasticsearch aggregation float type losing precision

GraphQL fallback query if no results

How can I do this in painless script Elasticsearch 5.3

Categories

Resources