Couchbase return null value after save and read document using n1ql - spring-boot

I insert a document in couchbase using repository.save()
and after that, I make a query to find duplicates on another document.
query is:
SELECT ARRAY_AGG(i.serialnumber) serialNumbers
FROM default tempItem
UNNEST items i
WHERE tempItem.class = "com.inventory.model.item.TempItem"
AND META(tempItem).id = '4390dd9e-e392-4432-939f-ebf046570086'
and i.serialnumber in (select raw serialnumber from default where class = 'com.inventory.model.item.Item'
AND status != 'DELETED' and serialnumber is not missing)
the result of the query is:
[
{
"serialNumbers": [
"9121945901",
"9121955901",
"9211965901"
]
}
]
The document that saved is like below:
[
{
"tempItem": {
"class": "com.inventory.model.item.TempItem",
"items": [
{
"categoryId": "67aaca7b-90b1-43e4-a6c6-0e9567bf283e",
"clientIds": [
"919d0ca7-c8d4-4283-8b0a-b6f2a7b39753"
],
"description": "bla bla",
"initial": 1,
"productId": "db5c81c4-0fec-407e-8703-6f5fb69a070c",
"serialnumber": "9121945901",
"simType": "PREPAID",
"status": "ACTIVE",
"stock": 1,
"title": "bla bla"
}
]
}
}
]
and another document to check is :
{
"categoryId": "67aaca7b-90b1-43e4-a6c6-0e9567bf283e",
"class": "com.inventory.model.item.Item",
"clientIds": [
"919d0ca7-c8d4-4283-8b0a-b6f2a7b39753"
],
"createdts": 1601801989176,
"creator": "919d0ca7-c8d4-4283-8b0a-b6f2a7b39753",
"description": "bla bla",
"initial": 1,
"prefix1": "912",
"prefix2": "194",
"productId": "db5c81c4-0fec-407e-8703-6f5fb69a070c",
"serialnumber": "9121945901",
"simType": "PREPAID",
"status": "ACTIVE",
"stock": 1,
"title": "bla bla"
}
in spring boot when I run the query immediately after seve the document its return null result
and if I make some milliseconds sleep after save and before the run query returns the value
what is that problem?
can anybody help this issue?

This is probably because of ScanConsistency. Indexes in Couchbase are built asynchronously. So if you are using the default "NOT_BOUNDED" consistency and query the data with N1QL immediately after you write it, it may not be indexed yet.
I don't know how to change this in Spring, but the other options are:
REQUEST_PLUS - will likely take a bit longer to return the results but the query engine will make sure that it is as up-to-date as possible.
consistentWith(MutationState) a.k.a AT_PLUS - for a more narrowed-down scan consistency, depending on the index update rate this might provide a speedier response.
Again, not sure about Spring, but you don't have to set this globally. Each query can use a different scan consistency. So, if you value maximum performance over up-to-the-second accuracy, you can go with the default. If you value up-to-the-second accuracy over maximum performance, you can go to with REQUEST_PLUS or AT_PLUS.

Using the CouchbaseRepository then it is possible to annotate every query using the annotation #ScanConsistency(query = QueryScanConsistency.REQUEST_PLUS) to enforce the desidered Scan consistency on each query. Have a look at the official documentation Querying with consistency

Related

How do I use FreeFormTextRecordSetWriter

I my Nifi controller I want to configure the FreeFormTextRecordSetWriter, but I have no Idea what I should put in the "Text" field. I'm getting the text from my source (in my case GetSolr), and just want to write this, period.
Documentation and mailinglist do not seem to tell me how this is done, any help appreciated.
EDIT: Here the sample input + output I want to achieve (as you can see: not ransformation needed, plain text, no JSON input)
EDIT: I now realize, that I can't tell GetSolr to return just CSV data - but I have to use Json
So referencing with attribute seems to be fine. What the documentation omits is, that the ${flowFile} attribute should containt the complete flowfile that is returned.
Sample input:
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"_": "1553686715465"
}
},
"response": {
"numFound": 3194,
"start": 0,
"docs": [
{
"id": "{402EBE69-0000-CD1D-8FFF-D07756271B4E}",
"MimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"FileName": "Test.docx",
"DateLastModified": "2019-03-27T08:05:00.103Z",
"_version_": 1629145864291221504,
"LAST_UPDATE": "2019-03-27T08:16:08.451Z"
}
]
}
}
Wanted output
{402EBE69-0000-CD1D-8FFF-D07756271B4E}
BTW: The documentation says this:
The text to use when writing the results. This property will evaluate the Expression Language using any of the fields available in a Record.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
I want to use my source's text, so I'm confused
You need to use expression language as if the record's fields are the FlowFile's attributes.
Example:
Input:
{
"t1": "test",
"t2": "ttt",
"hello": true,
"testN": 1
}
Text property in FreeFormTextRecordSetWriter:
${t1} k!${t2} ${hello}:boolean
${testN}Num
Output(using ConvertRecord):
test k!ttt true:boolean
1Num
EDIT:
Seems like what you needed was reading from Solr and write a single column csv. You need to use CSVRecordSetWriter. As for the same,
I should tell you to consider to upgrade to 1.9.1. Starting from 1.9.0, the schema can be inferred for you.
otherwise, you can set Schema Access Strategy as Use 'Schema Text' Property
then, use the following schema in Schema Text
{
"name": "MyClass",
"type": "record",
"namespace": "com.acme.avro",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
this should work
I'll edit it into my answer. If it works for you, please choose my answer :)

Query performance when applying the "Great mapping refactoring"

Our applications' entities are dynamic, we don't know how many properties they'll have or what their type will be.
Up until now, we've indexed our data in the following way:
{
"message": "some string",
"count": 1,
"date": "2015-06-01"
}
After reading the following blog:
We've understood that it's better to index the data like this:
{
"data": [
{
"key": "message",
"str_val": "some_string"
},
{
"key": "count",
"int_val": 1
},
{
"key": "date",
"date_val": "2015-06-01"
}
]
}
We were wondering how the index would work in terms of nested aggregations.
will the mapping refactoring above damage the indexing time (and/or the query/aggregation time) due to the fact that now, every entity will be nested one level deeper?
We have thousands of different object types, hence our mapping file is huge. That slows down the indexing time, so a mapping refactoring is highly necessary.
Are you aware of any disadvantages when it comes to refactoring our mapping as explained in the blog above?

How to get name/confidence individually from classify_text?

Most of the other methods in the language api, such as analyze_syntax, analyze_sentiment etc, have the ability to return the constituent elements like
sentiment.score
sentiment.magnitude
token.part_of_speech.tag
etc etc etc....
but I have not found a way to return name and confidence in isolation from classify_text. It doesn't look like it's possible but that seems weird. Am missing something? Thanks
The language.documents.classifyText method returns a ClassificationCategory object which contains name and confidence. If you only want one of the fields you can filter by categories/name or categories/confidence. As an example I executed:
POST https://language.googleapis.com/v1/documents:classifyText?fields=categories%2Fname&key={YOUR_API_KEY}
{
"document": {
"content": "this is a test for a StackOverflow question. I get an error because I need more words in the document and I don't know what else to say",
"type": "PLAIN_TEXT"
}
}
Which returns:
{
"categories": [
{
"name": "/Science/Computer Science"
},
{
"name": "/Computers & Electronics/Programming"
},
{
"name": "/Jobs & Education"
}
]
}
Direct link to API explorer for interactive testing of my example (change content, filters, etc.)

Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects)

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)
Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Which is the better design for this API response

I'm trying to decide upon the best format of response for my API. I need to return a reports response which provides information on the report itself and the fields contained on it. Fields can be of differing types, so there can be: SelectList; TextArea; Location etc..
They each use different properties, so "SelectList" might use "Value" to store its string value and "Location" might use "ChildItems" to hold "Longitude" "Latitude" etc.
Here's what I mean:
"ReportList": [
{
"Fields": [
{
"Id": {},
"Label": "",
"Value": "",
"FieldType": "",
"FieldBankFieldId": {},
"ChildItems": [
{
"Item": "",
"Value": ""
}
]
}
]
}
The problem with this is I'm expecting the users to know when a value is supposed to be null. So I'm expecting a person looking to extract the value from "Location" to extract it from "ChildItems" and not "Value". The benefit to this however, is it's much easier to query for things than the alternative which is the following:
"ReportList": [
{
"Fields": [
{
"SelectList": [
{
"Id": {},
"Label": "",
"Value": "",
}
]
"Location": [
{
"Id": {},
"Label": "",
"Latitude": "",
"Longitude": "",
"etc": "",
}
]
}
]
}
So this one is a reports list that contains a list of fields which on it contains a list of fieldtype for every fieldtype I have (15 or something like that). This is opposed to just having a list of reports which has a list of fields with a "fieldtype" enum which I think is fairly easy to manipulate.
So the Question: Which format is best for a response? Any alternatives and comments appreciated.
EDIT:
To query all fields by fieldtype in a report and get values with the first way it would go something like this:
foreach(field in fields)
{
switch(field.fieldType){
case FieldType.Location :
var locationValue = field.childitems;
break;
case FieldType.SelectList:
var valueselectlist = field.Value;
break;
}
The second one would be like:
foreach(field in fields)
{
foreach(location in field.Locations)
{
var latitude = location.Latitude;
}
foreach(selectList in field.SelectLists)
{
var value= selectList.Value;
}
}
I think the right answer is the first one. With the switch statement. It makes it easier to query on for things like: Get me the value of the field with the id of this guid. It just means putting it through a big switch statement.
I went with the first one because It's easier to query for the most common use case. I'll expect the client code to put it into their own schema if they want to change it.

Resources