Null pointer exception while consuming streams - java-8

{
"rules": [
{
"rank": 1,
"grades": [
{
"id": 100,
"hierarchyCode": 32
},
{
"id": 200,
"hierarchyCode": 33
}
]
},
{
"rank": 2,
"grades": []
}
]
}
I've a json like above and I'm using streams to return "hierarchyCode" based on some condition. For example if I pass "200" my result should print 33. So far I did something like this:
request.getRules().stream()
.flatMap(ruleDTO -> ruleDTO.getGrades().stream())
.map(gradeDTO -> gradeDTO.getHierarchyCode())
.forEach(hierarchyCode -> {
//I'm doing some business logic here
Optional<SomePojo> dsf = someList.stream()
.filter(pojo -> hierarchyCode.equals(pojo.getId())) // lets say pojo.getId() returns 200
.findFirst();
System.out.println(dsf.get().getCode());
});
So in the first iteration for the expected output it returns 33, but in the second iteration it is failing with Null pointer instead of just skipping the loop since "grades" array is empty this time. How do I handle the null pointer exception here?

You can use the below code snippet using Java 8:
int result;
int valueToFilter = 200;
List<Grade> gradeList = data.getRules().stream().map(Rule::getGrades).filter(x-> x!=null && !x.isEmpty()).flatMap(Collection::stream).collect(Collectors.toList())
Optional<Grade> optional = gradeList.stream().filter(x -> x.getId() == valueToFilter).findFirst();
if(optional.isPresent()){
result = optional.get().getHierarchyCode();
System.out.println(result);
}
I have created POJO's according to my code, you can try this approach with your code structure.
In case you need POJO's as per this code, i will share the same as well.
Thanks,
Girdhar

Related

ElasticSearch Painless: using vector functions in for loops bug

I ran into what seems to be a bug in Painless where if a vector function is used, say l2norm(), the outcome remains the same outcome as the first iteration. I'm using the painless script in a function score, I hope the query below sheds some light. I'm using the "exception" to see what the value is in each of the iteration, and it's every time the score of the first vector. I know this because I cycled the parameters a couple of times, and the score is everytime "stuck" on the first thing. So what I think is happening is that the function l2norm() (and all vector functions?!) are object instances that can only be instantiated one time? If that would be the case, what would be a work around?
Link to the ES discussion: https://discuss.elastic.co/t/painless-bug-using-for-loops-and-vector-functions/267263
{
"query": {
"nested": {
"path": "media",
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"bool": {
"filter": [{
"exists": {
"field": "media.full_body_dense_vector"
}
}]
}
},
"functions": [{
"script_score": {
"script": {
"source": "if (params.filterVectors.size() > 0 && params.filterCutOffScore >= 0) {\n for (int i=0; i < params.filterVectors.size();i++) {\n def c = params.filterVectors[i]; double euDistance = l2norm(c, doc['media.full_body_dense_vector']);\n if (i==1) { throw new Exception(euDistance + ''); } \n }\n return 1.0f;",
"params": {
"filterVectors":[
[1.0,2.0,3.0],[0.1,0.4,0.5]
],
"filterCutOffScore": 1.04
},
"lang": "painless"
}
}
}]
}
}
}
},
"size": 500,
"from": 0,
"track_scores": true
}
While l2norm is a static method, it certainly shouldn't behave like a pure function!
I've investigated a bit and it seems there's only a loop-level bug. When you call l2norm outside of the loop with either parametrized or hard-coded vectors, the results will always be different -- as they should be. But not within the for loop (I've tested a while loop too -- same result). Here's a minimum reproducible example that could be used to report a bug on github:
"script": {
"source": """
def field = doc['media.full_body_dense_vector'];
def hardcodedVectors = [ [1,2,3], [0.1,0.4,0.5] ];
def noLoopDistances = [
l2norm(hardcodedVectors[0], field),
l2norm(hardcodedVectors[1], field)
];
def hardcodedDistances = [];
for (vector in hardcodedVectors) {
double euDistance = l2norm(vector, field);
hardcodedDistances.add(euDistance);
}
def parametrizedDistances = [];
for (vector in params.filterVectors) {
double euDistance = l2norm(vector, field);
parametrizedDistances.add(euDistance);
}
def comparisonMap = [
"no-loop": noLoopDistances,
"hardcoded": hardcodedDistances,
"parametrized": parametrizedDistances
];
Debug.explain(comparisonMap);
""",
"params": {
"filterVectors": [ [1,2,3], [0.1,0.4,0.5] ]
},
"lang": "painless"
}
which yields
{
"no-loop":[
8.558621384311845, // <-- the only time two different l2norm calls behave correctly
11.071133967619906
],
"parametrized":[
8.558621384311845,
8.558621384311845
],
"hardcoded":[
8.558621384311845,
8.558621384311845
]
}
What this tells me is that it's not a matter of runtime caching but rather something else that should be investigated further be the Elastic team.
The workaround, for now, would be to keep using the parametrized vectors but instead of looping perform stone-age-like checks:
if (params.filterVectors.length == 0) {
// default to something
} else if (params.filterVectors.length == 1) {
// call l2norm once
} else if (params.filterVectors.length == 2) {
// call l2norm twice, separately
}
P.S. Throwing a new Exception() in order to debug Painless is fine. Using Debug.explain is even better for reasons explained in this sub-chapter on Debugging of my Elasticsearch Handbook.
First off, thanks to Joe for confirming I wasn't imagining things and it's indeed a bug. Second, the lovely ElasticSearch team has been triaging the issue and confirmed it's a bug, so the answer to this post is a link to the Github Issue so in the future, people can track in which ElasticSearch version this behaviour is patched.

Get top level parent node for any node

Given the format at the end of the question, what's the best way to get the top-level name for a given item?
Top-level names are the ones with parentId = 1.
def getTopLevel(name: String): String = {
// Environment(150) -> Environment(150) - since its parentId is 1
// Assassination -> Security - since Assassination(12) -> Terrorism(10) -> Security(2)
}
Here's my current approach but is there something better?
unmapped = categories.size
Loop through this list until there are still unmapped items.
- build a Map(Int, String) for top levels.
- build a Map(Int, Int) - that maps an id to top level id.
- keep track of unmapped items
once loop exits, I can use both Maps to get the job done.
[
{
"name": "Destination Overview",
"id": 1,
"parentId": null
},
{
"name": "Environment",
"id": 150,
"parentId": 1
},
{
"name": "Security",
"id": 2,
"parentId": 1
},
{
"name": "Armed Conflict",
"id": 10223,
"parentId": 2
},
{
"name": "Civil Unrest",
"id": 21,
"parentId": 2
},
{
"name": "Terrorism",
"id": 10,
"parentId": 2
},
{
"name": "Assassination",
"id": 12,
"parentId": 10
}
]
This is actually two questions.
Parsing Json into a Scala collection and
Using that collection to trace items back to the top parent
For the first question, you can use play-json. The second part can be handled with a tail-recursive function. Here is the full program that solves both problems:
import play.api.libs.json.{Json, Reads}
case class Node(name: String, id: Int, parentId: Option[Int])
object JsonParentFinder {
def main(args: Array[String]): Unit = {
val s =
"""
|[
| {
| "name": "Destination Overview",
| "id": 1,
| "parentId": null
| },
| {
| "name": "Environment",
| "id": 150,
| "parentId": 1
| },
// rest of the json
|]
|""".stripMargin
implicit val NodeReads : Reads[Node] =Json.reads[Node]
val r = Json.parse(s).as[Seq[Node]]
.map(x => x.id -> x).toMap
println(getTopLevelNode(150, r))
println(getTopLevelNode(12, r))
}
def getTopLevelNode(itemId : Int, nodes: Map[Int, Node], path : List[Node] = List.empty[Node]) : List[Node] = {
if(nodes(itemId).id == 1)
nodes(itemId) +: path
else
getTopLevelNode(nodes(nodes(itemId).parentId.get).id, nodes, nodes(itemId) +: path)
}
}
Output will be:
List(Node(Destination Overview,1,None), Node(Environment,150,Some(1)))
List(Node(Destination Overview,1,None), Node(Security,2,Some(1)), Node(Terrorism,10,Some(2)), Node(Assassination,12,Some(10)))
A few notes:
I have not implemented comprehensive error-handling logic. The implicit assumption is that the only item with parentId==None is the root node. nodes(itemId).parentId.get could lead to failure.
Also, in creating the map, the assumption is that all items have unique ids.
Another assumption is that all nodes eventually have a path to the root node. If that is not the case, this will fail. But it should be straightforward to fix these cases by adding more stop conditions.
I am prepending items to the accumulator list(named path here) because prepend operation on Scala's lists takes constant time. You can just reverse the resulting list or use another data structure like Vector to efficiently build the path.

Elasticsearch aggregation float type losing precision

If you use Elasticsearch 5.5 with Dynamic field mapping
and use double values. These values are getting the float type when I check in the mappings. When you are using an aggregation than the key in the buckets will be losing precision. the Value 0.62 would be something like 0.6200000047683716.
Code fragment
"aggregations": {
"float_numbers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0.6200000047683716,
"doc_count": 1
}
]
}
}
Here is the same issue described.
link
I am posting this issue because I found an appropriate solution which I not yet have seen but it helped me a lot.
The solution is to make the float a double. This can be achieved with Dynamic templates.
dynamic templates
dynamic field mapping
Example solution:
Add dynamic_templates in index there are no items yet.
PUT term-test
{
"mappings": {
"demo_typ": {
"dynamic_templates": [
{
"all_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
]
}
}
}
Add data
POST term-test/demo_typ
{
"numeric_field": 0.62,
"long_filed": 44
}
Check mapping
GET term-test/_mapping
Do aggregation
GET term-test/_search
{
"query": {
"match_all": {}
},
"aggs": {
"float_numbers": {
"terms": {
"field": "numeric_field"
}
}
}
}
In the Java Api you can do the following
1: First create the index
elasticClient.admin()
.indices()
.prepareCreate(indexName)
.execute()
.actionGet();
2: Update the mapping
JSON
{
"dynamic_templates": [
{
"all_to_double": {
"match_mapping_type": "double",
"mapping": {
"type": "double"
}
}
}
]
}
Json to XContentBuilder I got the code from link
public XContentBuilder getXContentBuilderFromJson(final String json) {
try {
Map<String, Object> map = new ObjectMapper().readValue(json, new TypeReference<Map<String, Object>>() {});
return XContentFactory.jsonBuilder().map(map);
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
Update mapping
elasticClient.admin().indices()
.preparePutMapping(indexName)
.setType(yourType)
.setSource(getXContentBuilderFromJson(json))
.execute()
.actionGet();
3: Insert data
Numbers lose precision. This is because of how floating-point numbers work: 9.62 can't be expressed as a * 2 ^ b so neither doubles nor floats can represent it accurately.
Because floats and doubles cannot accurately represent a value, it is generally a bad idea to run terms aggregations on them.
As a workaround you can do Math.round after you did the aggregation

GraphQL fallback query if no results

I have the following query:
{
entity(id: "theId") {
source1: media(source: 1){
images{
src, alt
}
}
source2: media(source: 2){
images{
src, alt
}
}
}
}
That give me a result like:
{
"entity": [
{
"source1": {
"images": [{"src": "", "alt": ""}]
},
"source2": {
"images": [{"src": "", "alt": ""}]
}
}
]
}
Is there a way to have a single result of source1 and source2, executing source1 and if it has no result it use source2 as fallback?
You are querying two fields (source1, source2) so something has to come back for both of them (null being a possible option). If you want to check them in a sequence you should probably break the query in two and run them one at the time from the client.
Could you perhaps change so you only query a single source field and have the resolver (on the server) return what makes sense based on what is available, so to speak? Like this:
{
entity(id: "theId") {
source: media(sourcesList: [1, 2]){
images{
src, alt
}
}
}
}
where sourceList is the sources to try, in order. So the resolver (server) can then check if source 1 is available and if not return source 2.
You could also add a field to let the client know which source was actually returned from the proposed list (sourceNumberReturned below would return 1 if source 1 was returned, otherwise 2).
{
entity(id: "theId") {
source: media(sourcesList: [1, 2]){
images{
src, alt
}
sourceNumberReturned
}
}
}

Unmarshal custom types with jsonpb

What's the best way to convert this json object to protobuf?
JSON:
{
"name": "test",
"_list": {
"some1": { "value": 1 },
"some2": [
{ "value": 2 },
{ "value": 3 },
]
}
}
Proto:
message Something {
string name = 1;
message ListType {
repeated string = 1;
}
map<string, ListType> _list = 2;
}
Without having the _list in the message I would use jsonpb.Unmarsal, but I can't think of a way to define the Unmarshaler interface on a type that is generated in a diff package.
I also thought of having _list as a Any (json.RawMessage) and handle it after the Unmarshal (but can't make this to work; err message: Any JSON doesn't have '#type')
With _list being inconsistent (not just a list of strings/map of values/etc) and you mentioning you looked into using Any you could consider making your message:
message Something {
string name = 1;
google.protobuf.Struct _list = 2;
}
https://github.com/golang/protobuf/blob/master/ptypes/struct/struct.proto
With that you can marshal/unmarshal json to/from proto messages using github.com/golang/protobuf/jsonpb which is actually designed for use with the grpc gateway but you can use it too

Resources