Loop over attribute values for executing SQL in Nifi - apache-nifi

I would like to know how may I accomplish the following use case in Nifi Flow:
I would like to execute SQL query for date range over a loop. The date range are provided from list of attribute values.
For example: If my list of attributes are : 2013-01-01 2013-02-01 2013-03-01, I would like to execute SQL operation over a loop such that:
select * from where startdate>=2013-01-01 and enddate<2013-02-01
followed by:
select * from where startdate>=2013-02-01 and enddate<2013-03-01
Therefore, for the same, I roughly know the idea but cant implement concretely:
UpdateAttribute (containing list of date values) -> SplitText-> RouteOnAttribute -> ExecuteSQL
Thanks

In NiFi 1.8.0, you can use DuplicateFlowFile for this (via NIFI-5454). You can start with UpdateAttribute to add the count of delineated values in your list (let's assume it is an attribute called datelist), perhaps set list.count to
${allDelineatedValues(${datelist}, " "):count()}
Then in DuplicateFlowFile you can set Number of Copies to ${list.count:minus(1)}. Each flow file downstream will have a copy.index attribute set (the original having index 0), so you can use that in ReplaceText in conjunction with getDelimitedValue(), perhaps setting the content to the following:
select * from myTable where
startdate >= ${datelist:getDelimitedField(${copy.index:plus(1)})} and
enddate < ${datelist:getDelimitedField(${copy.index:plus(2)})}

Related

How to Select multiple related columns in add calculated fields in Quicksight parameter using ifelse?

I have a parameter 'type' in a table and it can have multiple values as follows -
human
chimpanzee
orangutan
I have 3 columns related to each type in the table -
human_avg_height, human_avg_weight, human_avg_lifespan
chimpanzee_avg_height, chimpanzee_avg_weight, chimpanzee_avg_lifespan
orangutan_avg_height, orangutan_avg_weight, orangutan_avg_lifespan
So if i select the type as human, the quicksight dashboard should only display the three columns -
human_avg_height, human_avg_weight, human_avg_lifespan
and should not display the following columns -
chimpanzee_avg_height, chimpanzee_avg_weight, chimpanzee_avg_lifespan
orangutan_avg_height, orangutan_avg_weight, orangutan_avg_lifespan
I created the parameter type and in the add calculated fields I am trying to use ifelse to select the columns based on the parameter selected as follows -
ifelse(${type}='human',{human_avg_height}, {human_avg_weight}, {human_avg_lifespan},{function})
I also tried -
ifelse(${type}='human',{{human_avg_height}, {human_avg_weight}, {human_avg_lifespan},{function}})
And -
ifelse(${type}='human',{human_avg_height, human_avg_weight, human_avg_lifespan},{function}})
But none of it is working. What am i doing wrong ?
One way to do this would be to use three different calculated fields, one for all the heights, one for weights and one for lifespan. The heights one would look like this:
ifelse(
${type}='human',{human_avg_height}, ifelse(
${type}='chimpanzee', { chimpanzee_avg_height}, ifelse(
${type}='orangutan',{ orangutan_avg_height},
NULL
)))
Make another calculated field for weights and lifespan and then add these calculated fields to your table, and filter by type.
To make it clear to the viewer what data is present, edit the Title of the visual to include the type:
${type} Data
You have to create one calculated field for each measure using the ifelse with the type to choose the correct vale, but is not necessary to create inner ifelse as skabo did, the if else syntax is ifelse(if, then [, if, then ...], else) so you can define the calculated fields as follows:
avg_height = ifelse(${type}='human', {human_avg_height}, ${type}='chimpanzee', {chimpanzee_avg_height},${type}='orangutan', {orangutan_avg_height}, NULL)
avg_weight = ifelse(${type}='human', {human_avg_weight}, ${type}='chimpanzee', {chimpanzee_avg_weight},${type}='orangutan', {orangutan_avg_weight}, NULL)
avg_lifespan = ifelse(${type}='human', {human_avg_lifespan}, ${type}='chimpanzee', {chimpanzee_avg_lifespan},${type}='orangutan', {orangutan_avg_lifespan}, NULL)
Then use those calculated fields in your visuals.

Power BI DAX measure: Count occurences of a value in a column considering the filter context of the visual

I want to count the occurrences of values in a column. In my case the value I want to count is TRUE().
Lets say my table is called Table and has two columns:
boolean value
TRUE() A
FALSE() B
TRUE() A
TRUE() B
All solutions I found so far are like this:
count_true = COUNTROWS(FILTER(Table, Table[boolean] = TRUE()))
The problem is that I still want the visual (card), that displays the measure, to consider the filters (coming from the slicers) to reduce the table. So if I have a slicer that is set to value = A, the card with the count_true measure should show 2 and not 3.
As far as I understand the FILTER function always overwrites the visuals filter context.
To further explain my intent: At an earlier point the TRUE/FALSE column had the values 1/0 and I could achieve my goal by just using the SUM function that does not specify a filter context and just acts within the visuals filter context.
I think the DAX you gave should work as long as it's a measure, not a calculated column. (Calculated columns cannot read filter context from the report.)
When evaluating the measure,
count_true = COUNTROWS ( FILTER ( Table, Table[boolean] = TRUE() ) )
the first argument inside FILTER is not necessarily the full table but that table already filtered by the local filter context (including report/page/visual filters along with slicer selections and local context from e.g. rows/column a matrix visual).
So if you select Value = "A" via slicer, then the table in FILTER is already filtered to only include "A" values.
I do not know for sure if this will fix your problem but it is more efficient dax in my opinion:
count_true = CALCULATE(COUNTROWS(Table), Table[boolean])
If you still have the issue after changing your measure to use this format, you may have an underlying issue with the model. There is also the function KEEPFILTERS that may apply here but I think using KEEPFILTERS is overcomplicating your case.

Hive get_json_object(): How to check if JSON field exists?

I am using Hive and the get_json_object() function to query data stored as JSON. The JSON has a coordinate key and two fields (latitude and longitude) that look like the following:
"coordinate":{
"center":{
"lat":36.123413127558536,
"lng":-115.17381648045654
},
"precision":10
}
I am running my Hive query to retrieve data within some geocoordinate box, like this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user.name/sample/sample1.txt'
SELECT * FROM mytable
WHERE
get_json_object(mytable.`value`, '$.coordinate.center.lat') > 36.115767
AND get_json_object(mytable.`value`, '$.coordinate.center.lng') > -115.314051
AND get_json_object(mytable.`value`, '$.coordinate.center.lat') < 36.285595
AND get_json_object(mytable.`value`, '$.coordinate.center.lng') < -115.085399
DISTRIBUTE BY rand()
SORT by rand()
LIMIT 10000;
However, the problem is that for some rows, the coordinate field is missing, or the center field is missing, or the lat and/or lng field is missing. How can I modify my Hive SELECT query to only get rows that have a complete valid coordinate entry with existing lat and lng?
I would make a separate VIEW for the table where you do
WHERE get_json_object(...) IS NOT NULL
for each field you're interested in.
Then run the given query over that view
Alternatively, fix your input source to generate some consistent data using Avro instead, for example

How do I use the Hive "test in(val1, val2)" built in function?

The Programming Hive book lists a test in built in function in Hive, but it is not obvious how to use it and I've been unable to find examples
Here is the information from Programming Hive:
Return type Signature Description
----------- --------- -----------
BOOLEAN test in(val1, val2, …) Return true if testequals one of the values in the list.
I want to know if it can be used to say whether a value is in a Hive array.
For example if I do the query:
hive > select id, mcn from patients limit 2;
id mcn
68900015 ["7382771"]
68900016 ["8847332","60015163","63605102","63251683"]
I'd like to be able to test whether one of those numbers, say "60015163" is in the mcn list for a given patient.
Not sure how to do it.
I've tried a number of variations, all of which fail to parse. Here are two examples that don't work:
select id, test in (mcn, "60015163") from patients where id = '68900016';
select id, mcn from patients where id = '68900016' and test mcn in('60015163');
The function is not test in bu instead in. In the table 6-5 test is a colum name.
So in order to know whether a value is in a Hive array, you need first to use explode on your array.
Instead of explode the array column, you can create an UDF, as it is explain here http://souravgulati.webs.com/apps/forums/topics/show/8863080-hive-accessing-hive-array-custom-udf-

set filter to - INLIST - Visual Foxpro 7

I am working on some legacy code, and I have the following wonderful issue. I am hoping some FoxPro experts can help!
The infrastructure of this legacy system is setup so that I have to use the built-in expression engine to return a result set, so no go on the SQL (i know that would be so much easier!)
Here is the issue.
I need to be able to do something like
PUBLIC ARRAY(20) ArrayOfValuesToFilterBy
SELECT dataTable
SET FILTER TO logicalField = .T. and otherField NOT INLIST(ArrayOfValuesToFilterBy)
However, I know this wont work, I just need the equivalency...not using SQL.
I can generate the list of values to filter by via SQL, just not the final record select due to the legacy infrastructure constraint.
Thanks!
First, a logical field you do not have to do explicit
set filter to Logicalfield = .t.
you can just do
set filter to LogicalField
or
set filter to NOT LogicalField
Next, on the array. VFP has a function ASCAN() which will scan an array for a value, if found, will return the row number within the array that matches what you are looking for.
As for arrays... ex:
DIMENSION MyArray[3]
MyArray[1] = "test1"
MyArray[2] = "something"
MyArray[3] = "anything else"
? ASCAN( MyArray, "else" ) && this will return 0
? ASCAN( MyArray, "anything else" ) && this will return 3
If you are doing a "set filter", the array needs to be "in scope" for the duration of the filter. If you set filter in a procedure the array exists, leave the procedure and the array is gone, you're done.
So, you could do
set filter to LogicalField and ASCAN( YourArray, StringColumnFromTable ) > 0
Now, if you want a subset to WORK WITH, you can do a SQL-Select and pull the data into a CURSOR (temporary read-write table) that has the same capabilities of the original table (except auto-increment when adding)...
I typically name my temporary cursors prefixed with "C_" for "CURSOR OF" so when I'm working with tables, I know if its production data, or just available for temp purposes for quicker display, presentation, extractions from other origins as needed.
use in select( "C_FinalRecords" )
select * from YourTable ;
where LogicalField ;
and ASCAN( YourArray, StringColumnFromTable ) > 0;
into cursor C_FinalRecords READWRITE
Then, you can just use that...
select C_FinalRecords
scan
do something with the record, or values of it...
endscan
or.. bind to a grid in a form, etc...
The INLIST() function takes an expression to search for and up to 24 expressions of the same data type to search.
SELECT dataTable
SET FILTER TO logicalField = .T. AND NOT INLIST(otherField, 'Value1', 'Value2', 'Value3', 'Value4')
I am making some assumptions here, that what you want to do is create a filter with a dynamic in list statement ? If this is correct have a play with this example :-
lcList1="ABCD"
lcList2="EFGH"
lcList3="IJKL"
lcList4="MNOP"
lcList5="QRST"
lcFullList=""
lcFullList=lcFullList+"'"+lcList1+"',"
lcFullList=lcFullList+"'"+lcList2+"',"
lcFullList=lcFullList+"'"+lcList3+"',"
lcFullList=lcFullList+"'"+lcList4+"',"
lcFullList=lcFullList+"'"+lcList5+"'"
lcField="PCode"
lcFilter="SET FILTER TO INLIST ("+lcField+","+lcFullList+")"
The results of the above would create the following filter statement and store it in lcFilter
SET FILTER TO INLIST (PCode,'ABCD','EFGH','IJKL','MNOP','QRST')
you can then use macro substitution
Select dataTable
&lcFilter
Bear in mind that there is likely to be some limitations on how many items you can define in a INLIST() statement

Resources