Why doesn't Javers report the correct row(s) that was added when comparing two objects? - javers

When comparing two objects of the same size, Javers compares 1-to-1. However, if a new change is added such as new row to one of the objects, the comparison reports changes that are NOT changes. Is it possible to have Javers ignore the addition/deletion for the sake of just comparing like objects?
Basically the indices get out of sync.
Row Name Age Phone(Cell/Work)
1 Jo 20 123
2 Sam 25 133
3 Rick 30 152
4 Rick 30 145
New List
Row Name Age Phone(Cell/Work)
1 Jo 20 123
2 Sam 25 133
3 Bill 30 170
4 Rick 30 152
5 Rick 30 145
Because Bill is added the new comparison result will say that Rows 4,5 have changed when they actually didn't.
Thanks.

I'm guessing that your 'rows' are objects representing rows in an excel table and that you have mapped them as ValueObjects and put them into some list.
Since ValueObjects don't have its own identity, it's unclear, even for a human, what was the actual change. Take a look at your row 4:
Row Name Age Phone(Cell/Work)
before:
4 Rick 30 145
after:
4 Rick 30 152
Did you changed Phone at row 4 from 145 to 152? Or maybe you inserted a new data to row 4? How can we know?
We can't. By default, JaVers chooses the simplest answer, so reports value change at index 4.
If you don't care aboute the indices, you can change the list comparision algorithm from Simple to Levenshtein distance. See https://javers.org/documentation/diff-configuration/#list-algorithms
SIMPLE algorithm generates changes for shifted elements (in case when elements are inserted or removed in the middle of a list). On the contrary, Levenshtein algorithm calculates short and clear change list even in case when elements are shifted. It doesn’t care about index changes for shifted elements.
But, I'm not sure if Levenshtein is implemented for ValueObjects, if it is not implemented yet, it's a feature request to javers-core.

Related

Generate a team from a list of unique players based on priority number. 3 Arrays of signups for different role

Im trying to create an application that would form a team of 4 people in a shooter game.
There are 3 roles for 4 players. We need 2x assault, 1x sniper and 1x medic in a team.
I would be choosing players from 3 arrays, each array contains signups for that role (playername and priority number). Players can signup for multiple roles.
Sniper[0] John 100
Sniper[1] Mort 91
Sniper[2] Stef 70
Medic[0] Jerry 92
Medic[1] Mort 91
Medic[2] Jambo 19
Assault[0] Jerry 92
Assault[1] Haler 91
Assault[2] Gowgow 79
Assault[3] Jambo 19
This is how the 3 arrays would look like.
Selection in this case should be:
Sniper - John 100
Medic - Mort 91
Assault1 - Jerry 92
Assault2 - Haler 91
Application should always try to select people with highest priority for available roles.
If anyone could at least point me in the right direction on how to solve this issue. Im really stuck here as I have no idea how to do it and I don't know what to search for online either, to learn.
I solved my selection problem with a "Hungarian algorithm or Munkres".

What is the reason behind splitting data using mix and max?

I know how Sqoop split the work among the mappers, it essentially uses this logic:
SELECT MIN(id), MAX(id) FROM (Select * From myTable WHERE (1 = 1) ) t1
Where id is the value defined in --split by. I also know that I can change this logic using a different logic using --boundary-query.
I am trying to see the reason behind this logic, because what happens if for example the values of the key column is not uniformly distributed, let's say for example if I have 10 records and I want to run this with 5 mappers (Ok, it's just an example):
id_column: 1,200,201,202,203,204,205,206,207, 208, 209, 210, 211
splits: (211 - 1) / 5 = 42
mapper1 = from 1 to 42 ==> 1 record processed
mapper2 = from 42 to 84 ==> 0 records processed
mapper3 = from 84 to 126 ==> 0 records processed
mapper4 = from 126 to 168 ==> 0 records processed
mapper5 = from 168 to 211 ==> 12 records processed
Maybe I made a mistake in the example, but what I want to mention is that we will have unbalance work among mappers, with a few of records that will not be a big deal, but when we are talking about millions of records, it will definitely impact in the performance.
That being said, I want to know two things:
What is the idea behind the logic mentioned? (Maybe there is something I am not seeing)
You guys have an idea how I can build a workaround when we have ids columns not uniformly distributed like in the example.
What is the idea behind the logic mentioned?
Idea is to use primary key as split by column (if available). Generally, primary keys are uniformly distributed. To solve the problem in a generic way I can think of splitting the data in equal parts.Also, min() and max() functions are available with almost every RDBMS.
Say I come up with a new property which solves your problem with 2 mappers.
--mapper-range m1=1-10,m2=200-220
mapper1 = from 1 to 10 ==> 1 record processed
mapper2 = from 200 to 220 ==> 12 records processed
It would not much difficult for sqoop developers to override their range query for mappers using my new magical property.
But as we are talking about big data here, suppose you have 1 billions records. It is very costly to find the pattern of values of split by column as you need to process whole data for this. I guess nobody is interested in buying my magical property at this cost.
Please share your thoughts if you have anything better in mind.

treemodel js logic architecture (theorical)

I'm really new a tree sctructure and linked list, I'm a facing a theorical problem, let's say I decide to use TreeModel, seeing the sample, you basically will order the tree like:
Tree
node 1
11
12
121
122
node 2
21
211
...and so on
Considering the numbers are node's id, how would I manage them once the happen to be 2 or more digits?
node 10
101
1011
1012
102
1021
1022
10221
And so on...? using pseudo code, how do I keep the track of this? meaning, I want to get all 3er level nodels of a node? (>100 for the first 9 and >1000 for the rest???) this is actually my question itself.
I would appreciate any clarification.
TreeModel does not depend on any specific node id format, the numbers shown at the library demo page are just for illustrative purposes. Would it cause less confusion if instead of those numbers you had a string id separated by underscores?
1
1_1
1_2
...
10
10_1
10_2
Also note that TreeModel was not designed for binary trees and so it does not support in-order traversal.

Calculating an average metric in GoodData

Based on GoodData's excellent suggestion for implementing Fact tables, I have been able to design a model that meets our client’s requirements for joining different attributes across different tables. The issue I have now is that the model metrics are highly denormalized, with data repeating itself. I am currently trying to figure out a way to dedupe results.
For example, I have two tables—the first is a NAMES table and the second is my fact table:
NAMES
Val2 Name
35 John
36 Bill
37 Sally
FACT
VAL1 VAL2 SCORE COURSEGRADE
1 35 50 90%
2 35 50 80%
3 35 50 60%
4 36 10 75%
5 37 40 95%
What I am trying to do is write a metric in such a way so that we can get an average of SCORE that eliminates the duplicate value. GoodData is excellent in that it can actually give me back the unique results using the COUNT(VARIABLE1,RECORD) metric, but I can’t seem to get the average store to stick when eliminating the breakout information. If I keep all fields (including the VAL2), it shows me everything:
VAL2 SCORE(AVG)
35 50
36 10
37 40
AVG: 33.33
But when I remove VAL2, I suddenly lose the "uniqueness" of the record.
SCORE(AVG)
40
What I want is the score of 33.33 we got above.
I’ve tried using a BY statement in my SELECT AVG(SCORE), but this doesn’t seem to work. It’s almost like I need some kind of DISTINCT clause. Any thoughts on how to get that rollup value shown in my first example above?
Happy to help here. I would try the following:
Create an intermediate metric (let's call it Score by Employee):
SELECT MIN( SCORE ) BY ID ALL IN ALL OTHER DIMENSIONS
Then, once you have this metric defined you should be able to create a metric for the average score as follows:
SELECT AVG( Score by Employee )
The reason we create the first metric is to force the table to normalize score around the ID attribute which gets rid of duplicates when we use this in the next metric (we could have used MAX or AVG also, it doesn't matter).
Hopefully this solves your issue, let me know if it doesn't work and I'll be happy to help out more. Also feel free to check out GoodData's Developer Portal for more information about reporting:
https://developer.gooddata.com/docs/reporting
Best,
JT
you should definitively check "How to build a metric in a metric" presentation, made by Petr Olmer (http://www.slideshare.net/petrolmer/in10-how-to-build-a-metric-in-a-metric).
It can help you to understand it better.
Cheers,
Peter

Sorting multiple NSMutableArrays

I have high scores (name, score, time) stored in a single file and I divide them into separate arrays once it reads them, only problem is I can't figure out a way to sort all three by score and time from least to greatest and still keep the correct order of values.
For example:
Name score time
---------------
nathan 123 01:12
bob 321 55:32
frank 222 44:44
turns to:
bob 123 01:12
frank 222 44:44
nathan 321 55:32
Encapsulate the data into a single object (HighScore) that has three properties: name, time, and score. Then store them in a single array and sort the single array.
Welcome to object-oriented programming.

Resources