How to eliminate identifier while cogrouping in Pig - hadoop

I have 10 dataset to join.
When I use cogroup, what I get is below
(50637,22284),{(50637,22284,278)},{(50637,22284,308)},{},{},{},{},{},{(50637,22284,17)},{(50637,22284,5)},{(50637,22284,1)}
As seen, grouping identifier is duplicated in each parenthesis.
How can I get the output with the format below ?
(50637,22284,278,308,0,0,0,0,0,17,5,1)

Use FOREACH/GENERATE to select out the fields you want to keep. See http://pig.apache.org/docs/r0.12.1/basic.html#foreach - Nested Projection

Related

How to search for multiple values in a string column in a Cognos Oracle Query

I need to search an oracle table column for multiple word strings in cognos oracle query.
For example:
If Focus parameter returns multiple values as below
TRAINING
OMNIA
COUNTER
PROGRAM
And I need to search project.proj_name column like '%TRAINING%' or '%OMNIA%' or '%COUNTER%' or '%PROGRAM%'
I am trying below but I know it does only single value match not multiple. I want to know how to achieve multiple value match here.
'-99' in (#promptmany('Focus', 'string','-99')#) OR REGEXP_LIKE(proj_name, #promptmany('Focus', 'string','-99')#))
Working from Cognos Paul's solution to use output from promptmany as a table:
Assuming your query is named Q1...
Add a query. (Q2)
Add a SQL object to that query.
Set the Data source property for the SQL object.
Change the SQL Syntax property to IBM Cognos.
Define the query as
SELECT
parameterValue
FROM (VALUES
(#join('),(',split(',',promptmany('Scenarios','string',sq('N/A'))))#)
) query(parameterValue)
(change the names for your own use case)
Add a query. (Q3)
Add a join to the new query.
Add Q1 and Q2 to the empty boxes for the join leading to Q3.
Set the join as
[Q1].[proj_name] like '%' || [Q2].[parameterValue] || '%'
Add the required data items to Q3.
Since two keywords (from your parameter -> Q2) could be found in a single value (in Q1), you'll likely end up with duplicate rows. Cognos will probably handle this with its default aggregations, but keep a lookout.
Be careful with this. The new query (Q2) will probably be joined on the Cognos server, not on the database server. Be sure you have sufficient filters leading into this structure so Cognos is not trying to process your entire database.
This worked for me with SQL Server. I don't have an Oracle database to test against, but using IBM Cognos as the SQL Syntax should handle that.
To use REGEXP_LIKE to solve this problem, you'll need to get the second argument correct. I can't see any reason to see the error message ORA-00996: the concatenate operator is ||, not |, but I'm not working with your code in your system.
You don't specify which version of Cognos, or even which Cognos product, you are using. I'll assume Cognos Analytics 11.1.7.
To determine what Cognos Analytics is doing with your macro, create a very simple query with one item from the database (preferably from a very small table) and another data item that contains the macro. So the data item expression is:
#sq(join('|',split(',',promptmany('Focus','string','-99'))))#
When you run this, you may not be prompted. You'll see the value is -99. So to test this we'll need to remove the default so that the prompt becomes required.
#sq(join('|',split(',',promptmany('Focus','string'))))#
Be sure to enter more than one value when you test.
In my environment, the parameter returns a value that is my values surrounded by quotes (') and delimited by semicolons (;). So my tests produced the following:
expression
value
#sq(promptmany('Focus','string'))#
'PROGRAM';'COUNTER';'TRAINING'
#sq(join('|',split(',',promptmany('Focus','string'))))#
'PROGRAM';'COUNTER';'TRAINING'
#sq(join('|',split(';',promptmany('Focus','string'))))#
'PROGRAM'|'COUNTER'|'TRAINING'
replace(#sq(join('|',split(';',promptmany('Focus','string'))))#, '''', '')
PROGRAM|COUNTER|TRAINING
Your mileage may vary.
At this point, you know which macro to use in the REGEXP_LIKE function.

Fetching only distinct data from table using hasura-graphql data provider in react-admin

I am working on a project where I have to fetch only distinct data from the table display in the dropdown. How can I do that? I am using '''hasura-graphql''' data-provider for it. So how can I get only distinct data from a particular column?
Thanks in advance.
I think passing a default "distinct_on" filter with the column name as value will do the job. Also, hasura recommends sorting by this column in first position.
It is typically recommended to use order_by along with distinct_on to
ensure we get predictable results (otherwise any arbitrary row with a
distinct value of the column may be returned). Note that the
distinct_on column needs to be the first column in the order_by
expression.
So i set default sort:
<ReferenceInput
reference="yourTable"
source="yourDistinctColumn"
sort={{field: "yourDistinctColumn", order: "ASC"}}//or DESC, your choice
filter={{distinct_on: "yourDistinctColumn"}}
>
<SelectInput optionText="yourDistinctColumn"/>
</ReferenceInput>
https://hasura.io/docs/1.0/graphql/manual/queries/distinct-queries.html

Laravel 5 Eloquent where clause

how can I operate on the where clause?
For example I have this query
Product::where('purchase_data', '=', $data_inp)->get();
The problem is that I need to operate a formatting on the content of column purchase_data before is compared. What can i do?
Edit to be more specific
The column purchase_data has date values formatted in a certain way, the $data_inp variable is also a date but with another formatting. In order to compare them they have to use the same format; since I can't format the $data_inp variable to the same format of the content of purchase_data, I need to change the format of the content of the purchased_data column before comparing it with the $data_inp variable
Try using whereRaw
https://laravel.com/docs/5.5/queries#raw-expressions
Then you can transform the column to compare.
You can use
Product::whereRaw('purchase_data = '.$data_inp)->get();
Then format the purchase_data to your need
Read more about it here

How can I do a double delimiter in Hive?

let's say I have some sample rows of data
site1^http://article1.com?datacoll=5|4|3|2|1&test=yes
site1^http://article1.com?test=yes
site1^http://article1.com?datacoll=5|4|3|2|1&test=yes
I want to create a table like so
create table clicklogs (sitename string, url string)
ROW format delimited fields terminated by '^';
As you can see I have some data in the url parameter I'd like to extract, namely
datacoll=5|4|3|2|1
I also want to work with those individual elements seperated by pipes so I can do group bys on them to show for example how many urls had a 2nd position of "4" which would be 2 rows in this case. So in this case I have the "url" field that has additional data I'd like to parse out and use in my queries.
The question is, what is the best way to do that in hive?
thanks!
First, use parse_url(string urlString, string partToExtract [, string keyToExtract]) to grab the data in question:
parse_url('http://article1.com?datacoll=5|4|3|2|1&test=yes', 'QUERY', 'datacol1')
This returns '5|4|3|2|1', which gets us halfway there. Now, use split(string str, string pat) to break those out of each sub-delimiter into an array:
split(parse_url(url, 'QUERY', 'datacol1'), '\|')
With the result of this, you should be able to grab the columns that you want.
See the UDF documentation for more built-in functions.
Note: I wasn't able to verify this works in Hive from where I am, sorry if there are some minor issues.
This looks very similar to something I've done a couple weeks ago, I think the best approach in your case would be to apply a pre-processing step (possibly with hadoop streaming), and change the prototype of your table to be:
create table clicklogs(sitename string, datacol Array<int>) row format delimited fields terminated by '^' collection items terminated by '|'
Once you have that you can easily manipulate your data in Hive using lateral views and the builtin explode. The following code should help you get the counts of URLs per col.
select col, count(1) from clicklogs lateral view explode(datacol) dataTable as col group by col

Use an Aggregate function in Sort Expression

I have a report which uses a dataset returned from a stored procedure. There are two key columns: Name and Value
I am using this dataset for two tablixes. The first is just a straightforward tablix displaying the data.
The second groups the data based on a Name column. I need to order this data based on the Sum of Value column
However I get the following error:
[rsAggregateInDataRowSortExpression] A
sort expression for the tablix
'table1' includes an aggregate
function. Aggregate functions cannot
be used in data row sort expressions.
Is there another way I can show the data grouped by name and still order it by Sum(Value)?
Instead of sorting on the tablix you need to sort against the row group. Remove the sort on the tablix and then go to the row group properties and put the same sort expression under the sorting section there, this should then work.
OK, I just had to add an extra column for the the Sum value to my query and then use that. Not ideal, but it works

Resources