Is it possible, in a DAX measure, to check if the current query contains a particular column?
For example, I have a column named "Time" - is it possible to detect if a user in a self service environment has included this in their report, from the measure?
Edit - adding example of an output
An example output would be below
+---------+---------+------+--------------+
| Col1 | Col2 | Col3 | ContainsCol3 |
+---------+---------+------+--------------+
| Value 1 | Value 2 | 123 | True |
+---------+---------+------+--------------+
+---------+---------+------+--------------+
| Col1 | Col2 | Col4 | ContainsCol3 |
+---------+---------+------+--------------+
| Value 1 | Value 2 | 123 | False |
+---------+---------+------+--------------+
The query containing Col3 returns true, the query that does not include col3 returns false.
not exactly, but you can use functions like ISCROSSFILTERED, ISFILTERED, HASONEFILTER, HASONEVALUE which might be sufficient depending on your end-goal.
It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:
+------+------+
| Col1 | Col2 |
+------+------+
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+------+------+
to this:
+------+-------+
| Col1 | Col2 |
+------+-------+
| a | 1,5,6 |
| b | 2,6 |
+------+-------+
The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:
SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;
However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.
And there is collect_list that will take full list (with duplicates).
Try this
SELECT Col1, concat_ws(',', collect_set(Col2)) as col2
FROM your_table
GROUP BY Col1;
apache.org documentation
I have a huge data set with say 15 - 20 GB and it is a tab delimited file. While I can either do it in Python or in SQL, It would be easier and simple to have it done in Shell script to avoid moving the csv files
Say, For example, taking a pipe delimited file input:
----------------------------------------
Col1 | Col2 | Col3 | Col4 | Col5 | Col6
----------------------------------------
A | H1 | 123 | abcd | a1 | b1
----------------------------------------
B | H1 | 124 | abcd | a2 | b1
----------------------------------------
C | H2 | 127 | abd | a3 | b1
----------------------------------------
D | H1 | 128 | acd | a4 | b1
----------------------------------------
SQL Query would look like
SELECT Col1, Col4, Col5, Col6 FROM WHERE col2='H1'
Output:
--------------------------
Col1 | Col4 | Col5 | Col6
--------------------------
A | abcd | a1 | b1
--------------------------
B | abcd | a2 | b1
--------------------------
D | acd | a4 | b1
--------------------------
Then, I need to take in only the Col4 of this to do some string parsing and output below
OutputFile1:
--------------------------------
Col1 | Col4 | Col5 | Col6 | New1
--------------------------------
A | abcd | a1 | b1 | a,b,c,d
--------------------------------
B | abcd | a2 | b1 | a,b,c,d
--------------------------------
D | acd | a4 | b1 | a,c,d
--------------------------------
The Col4 is a URL. I need to parse the URL params. Refer Question - How to parse URL params in shell script
And I would like to know if I have another file where I have
File2 :
--------------
ColA | ColB |
--------------
A | abcd |
--------------
B | abcd |
--------------
D | qst |
--------------
I need to generate a similar parsed output for ColB.
OutputFile2:
--------------
ColA | ColB | New1
--------------
A | abcd | a,b,c,d
--------------
B | abcd | a,b,c,d
--------------
D | qst | q,s,t
--------------
SQL Query to merge OutputFile1 and OutputFile2 would do a inner join on
OutputFile1.Col1 = OutputFile2.ColA and OutputFile1.New1 = OutputFile2.New1
Final Output:
--------------------------------
Col1 | Col4 | Col5 | Col6 | New1
--------------------------------
A | abcd | a1 | b1 | a,b,c,d
--------------------------------
B | abcd | a2 | b1 | a,b,c,d
--------------------------------
Please share suggestions to implement the same.
The major constraint being the size of the file.
Thanks
There's a very simple database management program named "unity" available for UNIX at http://open-innovation.alcatel-lucent.com/projects/unity/. In unity you have 2 main files:
a data file named whatever you like, e.g. "foo", and
a descriptor file with the same base name as the data file but prefixed with "D" for Descriptor, e.g. "Dfoo"
These are both simple text files that you can edit with whatever editor you like (or it has it's own database-aware editor named uedit).
Dfoo would have one row for each column in foo describing attributes of the data that appears in that column in foo and it's separator from the next column.
foo would have the data.
It's been a while since I used unity in the raw (I have scripts that use it behind the scenes) but for the first table you show above:
----------------------------------------
Col1 | Col2 | Col3 | Col4 | Col5 | Col6
----------------------------------------
A | H1 | 123 | abcd | a1 | b1
----------------------------------------
B | H1 | 124 | abcd | a2 | b1
----------------------------------------
C | H2 | 127 | abd | a3 | b1
----------------------------------------
D | H1 | 128 | acd | a4 | b1
----------------------------------------
the Descriptor file (Dfoo) would be something like:
Col1 | 5c
Col2 | 6c
Col3 | 6c
Col4 | 6c
Col5 | 6c
Col6 \n 6c
and the data file (foo) would be:
A|H1|123|abcd|a1|b1
B|H1|124|abcd|a2|b1
C|H2|127|abd|a3|b1
D|H1|128|acd|a4|b1
You can then run unity commands like:
uprint -d- foo
to print the table with rows separated by lines of underscores and cells of the width specified in your descriptor file (e.g. 6c = 6 characters Centered while 6r = 6 characters Right-justified).
uselect Col2 from foo where Col3 leq abd
to select the values from column Col2 where the corresponding value in Col3 is Lexically EQual to the string "abd".
There are unity commands to let you do joins, merges, inserts, deletes, etc. - basically whatever you'd expect to be able to do with a relational database but it's all just based on simple text files.
In unity you can specify different separators between each column but if all of the separators are the same (except the final one which will be '\n') then you can run awk scripts on the file too just by using awk -F with the separator.
A couple of other toolsets you could look at that might be easier to install but probably don't have as much functionality as unity (which has been around since the 1970s!) would be recutils (from GNU) and csvDB so your full homework/research list is:
unity: http://open-innovation.alcatel-lucent.com/projects/unity
recutils: http://www.gnu.org/software/recutils
csvDB: http://freecode.com/projects/csvdb
Note that recutils has rec2csv and csv2rec tools for converting between the recutils and CSV formats.
For a pipe delimited file:
awk '$2=="H1"{y="";x=$4;for(i=1;i<=length($4);i++)y=y?y","substr(x,i,1):substr(x,i,1);print $1,$4,$5,$6,y;}' FS="|" OFS="|" file
For a tab-delimited file, leave the FS empty:
awk '$2=="H1"{y="";x=$4;for(i=1;i<=length($4);i++)y=y?y","substr(x,i,1):substr(x,i,1);print $1,$4,$5,$6,y;}' OFS="\t" file
I want to count values similar in a map where key would be the value in the Hive table column and the corresponding value is the count.
For example, for the table below:
+-------+-------+
| Col 1 | Col 2 |
+-------+-------+
| Key1 | Val1 |
| Key1 | Val2 |
| Key2 | Val1 |
+-------+-------+
So the hive query should return something like
Key1=2
Key2=1
It looks like you are looking for a simple group by.
SELECT Col1, COUNT(*) FROM Table GROUP BY Col1
Is it possible to set to a jqGrid subgrid a footerrow. I have a mainGrid that contains many rows and every row have a subgrid.
Or must I use the Master-Details way to do this like here ("Advanced" -> "Master Detail")
The Grid should looks like this:
| Col1 | Col2 | Col3
_______|__________|___________|___________________________
- | value1 | value 2 | 10
|__________|___________|___________________________
| Subgrid col1 | Subgrid col2 | Subcol3
|__________________|__________________|___________
| subgridvalue1 | subgridvalue2 | 15
|__________________|__________________|___________
|another subridval1|another subridval2| 5
|__________________|__________________|___________
| | Totals: | 20
_______|__________________|__________________|___________
+ |oter value|oter value2| 12
_______|__________|___________|___________________________
footer:| | Totals: | 22
I hope you can understand what I mean.
If you would use Subgrid as Grid you will be able to have any elements in the subgrids inclusive footerrow option.