openrefine - Sorting column contents within a record? - sorting

Scoured the internet as best as I could but couldn't find an answer -- I was wondering, is there some way to sort the contents of a row by record? E.g. take the following table:
Key
Row to sort
Other row
a
bca
A
cab
cab
abc
f
b
zyx
yxz
u
c
def
h
fed
h
and turn it into:
Key
Row to sort
Other row
a
abc
A
bca
cab
cab
f
b
yxz
zyx
u
c
def
h
fed
h
The ultimate goal is to sort all of the columns for each record alphabetically, and then blank up so that each record is a single row.
I've tried doing a sort on the column to sort within the record itself, but that orders records by whichever record has an entry that comes in alphabetical order (regardless of whether it's the 1st entry for the record or not, interestingly).

Here is a solution using sort
Prerequisite: assuming that the values in the "Key" column are unique.
Switch to rows mode
Fill down the "Key" column via Key=> Edit cells => Fill down.
Sort the "Key" column via Key=> Sort...
Sort the "Row to sort" column via Row to sort => Sort... as additional sort
Make the sorting permanent by selecting Reorder rows permanently in the sort menu.
Blank down the "Key" and "Row to sort" column.
Here is a solution using GREL
As deduplicating and sorting records is quite a common task I have a GREL expression reducing this task to two steps:
Transform the "Row to sort" column with the following GREL expression:
if(
row.index - row.record.fromRowIndex == 0,
row.record.cells[columnName].value.uniques().sort().join(","),
null
)
Split the multi-valued cells in the "Row to sort" column on the separator ,.
The GREL expression will take all the record cells of the current column, extract their values into an array, make the values in the array unique, sort the remaining value in the array and join it into a string using , as separator.
The joining into a string is necessary as OpenRefine currently has no support for displaying arrays in the GUI.

I would do it as follows:
For all columns except the key column, use the Edit cells > Join multi-valued cells operation, with a separator that is not present in the cell values
Transform all columns except the key column with: value.split(',').sort().join(',')
Split back your columns with Edit cells > Split multi-valued cells
Then you can blank down / fill down as you wish.
Here is the JSON representation of the workflow for your example:
[
{
"op": "core/multivalued-cell-join",
"columnName": "Row to sort",
"keyColumnName": "Key",
"separator": ",",
"description": "Join multi-valued cells in column Row to sort"
},
{
"op": "core/multivalued-cell-join",
"columnName": "Other row",
"keyColumnName": "Key",
"separator": ",",
"description": "Join multi-valued cells in column Other row"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Row to sort",
"expression": "grel:value.split(',').sort().join(',')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column Row to sort using expression grel:value.split(',').sort().join(',')"
},
{
"op": "core/text-transform",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Other row",
"expression": "grel:value.split(',').sort().join(',')",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10,
"description": "Text transform on cells in column Other row using expression grel:value.split(',').sort().join(',')"
},
{
"op": "core/multivalued-cell-split",
"columnName": "Row to sort",
"keyColumnName": "Key",
"mode": "separator",
"separator": ",",
"regex": false,
"description": "Split multi-valued cells in column Row to sort"
},
{
"op": "core/multivalued-cell-split",
"columnName": "Other row",
"keyColumnName": "Key",
"mode": "separator",
"separator": ",",
"regex": false,
"description": "Split multi-valued cells in column Other row"
}
]

Related

How do I split field with comma separrated and I concatenate field1 field 2 [3 word first] in processor nifi?

I split field with comma separrated field 1 field 2 and concatenate field1 field 2 [3 word first]
example
2022-09-05T00:00:10,677 abc.1 ,
after split and concatenate
2022-09-05T00:00:10:677,abc.1,
You can use UpdateRecord and add a user-defined property something like /field3 set to concat( /field1, /field2 ). You can change /field3 to be whatever you want the output field name to be, and if you want to remove the other fields you can specify a schema in your Record Writer that only has the field(s) you want, such as:
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [{
"name": "field3",
"type": ["string", "null"]
}]
}

Timelion Statement : How to filter data from an array in Timelion visualization query

There is a column of an index in Kibana, which has an array of data
E.g. Below is a sample column = blocked_by
"blocked_by": [
{
"error_category_name": "Record is not a new one",
"error_category_owner": "AB",
"created_when": "2022-05-18T09:52:44.000Z",
"name": "ERROR IN RCS: Delete Subscriber",
"resolved_when": "2022-05-18T10:52:55.963+01:00",
"id": "8163578639440138764"
},
{
"error_category_name": "NM-1009 Phone Number is not in appropriate state",
"error_category_owner": "AB",
"created_when": "2022-05-18T09:52:45.000Z",
"name": "ERROR IN NC NM: Change MSISDN status",
"resolved_when": "2022-05-18T10:53:16.230+01:00",
"id": "8163578637640138764"
},
I want to extract only the latest record out of this column in my timelion expression
Can someone help me out, if this is possible to do so in timelion
My expression:
.es(index=sales_order,timefield=created_when,q='blocked_by.error_category_owner.keyword:(AB OR Undefined OR null OR "") AND _exists_:blocked_by').divide(.es(index=sales_order,timefield=created_when)).yaxis(2,position=right,units=percent).label(Fallout)

Data Operation - Select (Json Array)

I have a JSON Array with the following structure:
{
"InvoiceNumber": "11111",
"AccountName": "Hospital",
"items": {
"item": [
{
"Quantity": "48.000000",
"Rate": "0.330667",
"Total": "15.87"
},
{
"Quantity": "1.000000",
"Rate": "25.000000",
"Total": "25.00"
}
]
}
}
I would like to use Data Operation "Select" to select invoice numbers with invoice details.
Select:
From body('Parse_Json')?['invoices']?['invoice']
Key: Invoice Number;Map:item()['InvoiceNumber'] - this line works
Key: Rate; Map: item()['InvoiceNumber']?['items']?['item']?['Rate']- this line doesnt work.
The error message says "Array elements can only be selected using an integer index". Is it possible to select the Invoice Number AND all the invoice details such as rate etc.? Thank you in advance! Also, I am trying not to use "Apply to each"
You have to use a loop in some form, the data resides in a array. The only way you can avoid looping is if you know that the number of items in the array will always be of a certain length.
Without looping, you can't be sure that you've processed each item.
To answer your question though, if you want to select a specific item in an array, as the error describes, you need to provide the index.
This is the sort of expression you need. In this one, I am selecting the item at position 1 (arrays start at 0) ...
body('Parse_JSON')?['items']?['item'][1]['rate']
Using your JSON ...
You can always extract just the items object individually but you'll still need to loop to process each item IF the length is never a static two items (for example).
To extract the items, you select the object from the dynamic content ...
Result ...

Elasicsearch sort by inner field

I have documents that one of their field looks like the following -
"ingredients": [{
"unit": "MG",
"value": 123,
"key": "abc"
}]
And I would like to sort the different records according to the ascending value of specific ingredient. That is if I have 2 records which have use ingredient with key "abc", one with value 1 and one with value 2. The one with ingredient value 1 should appear first.
Each of those records may have more than on ingredient.
Thank you in advance!
The search query to sort will be:
{
"sort":{
"ingredients.value":{
"order":"asc"}
}}

Grouping non null fields together in Kibana

Given the following three User entries in an ElasticSearch index:
"user": [
{
"userId": "100",
"hobby": "chess"
}
"user": [
{
"userId": "200",
"hobby": "music"
}
"user": [
{
"userId": "300",
"hobby": ""
}
I want to create a vertical bar chart to compare the number of users who have a hobby as opposed to those who do not. Individual hobbies should not be shown separately, but grouped together.
If split along the Y axis, one block would take up two thirds of the height (the two users with hobbies) and one block one third of the height (the one user with no hobbies).
How could one achieve this grouping in Kibana?
Thanks
You'll need to choose Split Bars and then Filters aggregation. Once you have that selected you should see Query 1 with * in it. Change the * to hobby:*. Next hit Add Filter and put in NOT hobby:*
The filters aggregation lets you bucket things pretty much any way you can search for things.

Resources