BIRT : Sort a Table based on Multiple Columns - birt

I have a BIRT report generated from a JAVA POJO data source.
Interface | column1 | column2 |
+----------+---------+---------+
| | | |
I want to sort the rows of the Table given above, based on the sum of 2 columns column1 and column2
And then display the top 10 rows.
In the Sorting Tab, I've used for the expression: row["column1"]+row["column2"] and in the Filters Tab, I've put the expression row["column1"]+row["column2"], Operator as Top n and Value1 as 10.
But I'm not getting the desired result

I am posting this Answer so that it might be useful to a lot of BIRT beginners
Create a new Column Binding with the Expression : ( row["column1"]+row["column2"] )
Sort the Table based on the binding.

Related

How to groupBy on one column in laravel?

I have a section table and class Table
class table is designed in this way
(id,class_name,section_id)
one class has many sections like
--------------------------------------------
| SN | ClassName | Section_id |
--------------------------------------------
| 1 | ClassOne | 1 |
| 2 | ClassOne | 2 |
| 3 | ClassOne | 3 |
| 4 | ClassOne | 4 |
--------------------------------------------
Now i want to groupBy Only ClassName and display all the sections of that class
$data['classes'] = SectionClass::groupBy('class_name')->paginate(10);
i have groupby like this but it only gives me one section id
Try this way...
$things = SectionClass::paginate(10);
$data['classes']= $things->groupBy('class_name');
You are getting just one row because that is what GROUP BY does, groups a set of rows into a set of summary rows and returns one row for each group. In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, in SQL Server if you try the next clause
SELECT * FROM [Class] GROUP BY [ClassName]
You'll get the next error
"Column 'SN' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
Think about it, you are grouping by ClassName, and following your sample data, this will return just one row. Your SELECT clause includes column ClassName, which is easy to get because is the same in every single row, but when you are selecting another, which one should be return if only one has to be selected?
Now, things change a little bit in MySQL. MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic. You can find a complete explanation about this topic here https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
If you are expecting a result in one row, you can use GROUP_CONCAT() function to get something like
--------------------------------
| ClassName | Sections |
--------------------------------
| ClassOne | 1,2,3,4 |
--------------------------------
Your query must be something like:
select `ClassName`, group_concat(Section_id) from `class` group by `ClassName`
You can get this with a raw query in laravel or its up to you to find a way to get the same result using query builder ;)

cassandra query on map in select clause

i am new to cassandra and i am trying to read a row from database which contains values
siteId | country | someMap
1 | US | {a:b, x:z}
2 | PR | {a:b, x:z}
I have also created an index on table using create index on columnfamily(keys(someMap));
but still when i query as select * from table where siteId=1 and someMap contains key 'a'
it returns an entiremap as
1 | US | {a:b, x:z}
Can somebody help me on what should i do to get the value as
1 | US | {a:b}
You can not: even if internally each entry of a Map|List|Set is stored as a column you can only retrieve the whole collection but not part of it. You are not asking cassandra give me the entry of the map containing X, but the row whom map contains X.
HTH,
Carlo

How display two fields sums in the same query in HIve

I have a Hive table with the following fields:
id STRING , x STRING
where x can have values such as 'c'.
I need a query that display number of rows where column x contains a value 'c' and the number of rows where x has values are other than 'c'.
id | count(x='c') | count(x<>'c')
---|--------------|--------------
1 | 3 | 7
I don't know if it's possible.
You can try :
SELECT sum(if(x='c',1,0)), sum(if(x!='c',1,0)) FROM table_name;
This will print two columns. I didn't understand the id field in your sample output.

Hive: SemanticException [Error 10002]: Line 3:21 Invalid column reference 'name'

I am using the following hive query script for the version 0.13.0
DROP TABLE IF EXISTS movies.movierating;
DROP TABLE IF EXISTS movies.list;
DROP TABLE IF EXISTS movies.rating;
DROP DATABASE IF EXISTS movies;
ADD JAR /usr/local/hadoop/hive/hive/lib/RegexLoader.jar;
CREATE DATABASE IF NOT EXISTS movies;
CREATE EXTERNAL TABLE IF NOT EXISTS movies.list (id STRING, name STRING, genre STRING)
ROW FORMAT SERDE 'com.cisco.hadoop.loaders.RegexSerDe'with SERDEPROPERTIES(
"input.regex"="^(.*)\\:\\:(.*)\\:\\:(.*)$",
"output.format.string"="%1$s %2$s %3$s");
CREATE EXTERNAL TABLE IF NOT EXISTS movies.rating (id STRING, userid STRING, rating STRING, timestamp STRING)
ROW FORMAT SERDE 'com.cisco.hadoop.loaders.RegexSerDe'
with SERDEPROPERTIES(
"input.regex"="^(.*)\\:\\:(.*)\\:\\:(.*)\\:\\:(.*)$",
"output.format.string"="%1$s %2$s %3$s %4$s");
LOAD DATA LOCAL INPATH 'ml-10M100K/movies.dat' into TABLE movies.list;
LOAD DATA LOCAL INPATH 'ml-10M100K/ratings.dat' into TABLE movies.rating;
CREATE TABLE movies.movierating(id STRING, name STRING, genre STRING, rating STRING);
INSERT OVERWRITE TABLE movies.movierating
SELECT list.id, list.name, list.genre, rating.rating from movies.list list LEFT JOIN movies.rating rating ON (list.id=rating.id) GROUP BY list.id;
The issue is when I execute the script without the "GROUP BY" clause it works fine.
But when I execute it with the "GROUP BY" clause, I get the following error
FAILED: SemanticException [Error 10002]: Line 4:21 Invalid column reference 'name'
Any ideas what is happening here?
Appreciate your help
Thanks!
If you group by a column, your select statement can only select a) that column, b) columns derived only from that column, or c) a UDAF applied to other columns.
In this case, you're only grouping by list.id, so when you try to select list.name, that's invalid. Think about it this way: what if your list table contained the following two entries:
id|name |genre
--+-----+------
01|name1|comedy
01|name2|horror
What would you expect this query to return:
select list.id, list.name, list.genre from list group by list.id;
In this case it's nonsensical. I'm guessing that id in reality is a primary key, but note that hive does not know this, so the above data set is perfectly valid.
With all that in mind, it's not clear to me how to fix it because I don't know the desired output. For example, let's say without the group by (just the join), you have as output:
id|name |genre |rating
--+-----+------+-------
01|name1|comedy|'pretty good'
01|name1|comedy|'bad'
02|name2|horror|'9/10'
03|name3|action|NULL
What would you want the output to be with the group by? What are you trying to accomplish by doing the group by?
OK let me see if I can ask this in a better way.
Here are my two tables
Movies list table - Consists of movies information
ID | Movie Name | Genre
1 | Movie 1 | comedy
2 | movie 2 | action
3 | movie 3 | thriller
And I have ratings table
MOVIE_ID | USER ID | RATING on 5 | TIMESTAMP
1 | xyz | 5 | 12345612
1 | abc | 4 | 23232312
2 | zvc | 1 | 12321123
2 | zyx | 2 | 12312312
What I would like to do is get the output in the following way:
Movie ID | Movie Name | Genre | Rating Average
1 | Movie 1 | comedy | 4.5
2 | Movie 2 | action | 1.5
I am not a db expert but I understand this, when you group the data together you need to convert the multiple values to the scalar values or all the values, if string should be same right?
For example in my previous case, I was grouping them together as a string. So which is okay for list.id, list.name and list.genre, but the list.rating, well that is always going to give some problem here (I just learnt PIG along with hive, so grouping works differently there)
So to tackle the problem, I casted the rating and averaged it out and stored it in the float table. Have a look at my code below:
CREATE TABLE movies.movierating(id STRING, name STRING, genre STRING, rating FLOAT);
INSERT OVERWRITE TABLE movies.movierating
SELECT list.id, list.name, list.genre, AVG(cast(rating.rating as FLOAT)) from movies.list list LEFT JOIN movies.rating rating ON (list.id=rating.id) GROUP BY list.id, list.name,list.genre order by list.id DESC;
Thank you for your explanation. I might save the following question for the next thread but here is my observation:
The performance of the Overall job is reduced when performing Grouping and Joining together than to do it in two separate queries. For the same job, I had changed the code a bit to perform the grouping first and then joining the data and the over all time was reduced by 40 seconds. Earlier it was taking 140 seconds and now it is taking 100 seconds. Any reasons to that?
Once again thank you for your explanation.
I came across same issue:
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid column reference "charge_province"
After I put the "charge_province" in the group by, the issue is gone. I don't know why.

How should I range partition an index with a varchar2 column in Oracle? Is it a bad idea?

I am using Oracle 10g Enterprise edition.
A table in our Oracle database stores the soundex value representation of another text column. We are using a custom soundex implementation in which the soundex values are longer than are generated by traditional soundex algorithms (such as the one Oracle uses). That's really beside the point.
Basically I have a varchar2 column that has values containing a single character followed by a dynamic number of numeric values (e.g. 'A12345', 'S382771', etc). The table is partitioned by another column, but I'd like to add a partitioned index to the soundex column since it is often searched. When trying to add a range partitioned index using the first character of the soundex column it worked great:
create index IDX_NAMES_SOUNDEX on NAMES_SOUNDEX (soundex)
global partition by range (soundex) (
partition IDX_NAMES_SOUNDEX_PART_A values less than ('B'), -- 'A%'
partition IDX_NAMES_SOUNDEX_PART_B values less than ('C'), -- 'B%'
...
);
However, I in order to more evenly distribute the size of the partitions, I want to define some partitions by the first two chars, like so:
create index IDX_NAMES_SOUNDEX on NAMES_SOUNDEX (soundex)
global partition by range (soundex) (
partition IDX_NAMES_SOUNDEX_PART_A5 values less than ('A5'), -- 'A0% - A4%'
partition IDX_NAMES_SOUNDEX_PART_A values less than ('B'), -- 'A4% - A9%'
partition IDX_NAMES_SOUNDEX_PART_B values less than ('C'), -- 'B%'
...
);
I'm not sure how to properly range partition using varchar2 columns. I'm sure this is a less than ideal choice, so perhaps someone can recommend a better solution. Here's a distribution of the soundex data in my table:
-----------------------------------
| SUBSTR(SOUNDEX,1,1) | COUNT |
-----------------------------------
| A | 6476349 |
| B | 854880 |
| D | 520676 |
| F | 1200045 |
| G | 280647 |
| H | 3048637 |
| J | 711031 |
| K | 1336522 |
| L | 348743 |
| M | 3259464 |
| N | 1510070 |
| Q | 276769 |
| R | 1263008 |
| S | 3396223 |
| V | 533844 |
| W | 555007 |
| Y | 348504 |
| Z | 1079179 |
-----------------------------------
As you can see, the distribution is not evenly spread, which is why I want to define range partitions using the first two characters instead of just the first character.
Suggestions?
Thanks!
What exactly is your question?
Don't you know how you can split your table in n equal parts to avoid skew?
You can do that with analytic function percentile_disc().
Here an SQL PLUS example with n=100, I admit that it isn't very sophisticated but it will do the job.
set pages 0
set lines 200
drop table random_strings;
create table random_strings
as
select upper(dbms_random.string('A', 12)) rndmstr
from dual
connect by level < 1000;
spool parts
select 'select '||level||'/100,percentile_disc('||level||
'/100) within group (order by RNDMSTR) from random_strings;'
sql_statement
from dual
connect by level <= 100
/
spool off
This will output in file parts.lst:
select 1/100,percentile_disc(1/100) within group (order by RNDMSTR) from random_strings;
select 2/100,percentile_disc(2/100) within group (order by RNDMSTR) from random_strings;
select 3/100,percentile_disc(3/100) within group (order by RNDMSTR) from random_strings;
...
select 100/100,percentile_disc(100/100) within group (order by RNDMSTR) from random_strings;
Now you can run script parts.lst to get the partition values. Each partition will contain 1% of the data initially.
Script parts.lst will output:
,01 AJUDRRSPGMNP
,02 AOMJZQPZASQZ
,03 AWDQXVGLLUSJ
,04 BIEPUHAEMELR
....
,99 ZTMHDWTXUJAR
1 ZYVJLNATVLOY
Is the table is being searched by the partitioning key in addition to the SOUNDEX value? Or is it being searched just by the SOUNDEX column?
If you are just trying to achieve an even distribution of data among partitions, have you considered using hash partitions rather than range partitions? Assuming you choose a power of 2 for the number of partitions, that should give you a pretty even distribution of data between partitions.
Talk to me!
Can you tell me what your reason is for partitioning this table? It sounds like it is an OLTP table and may not need to be partition. We don’t want to partition just to say we are partitioned. Tell me what you are trying to accomplish by partitioning this table and I can help you pick a correct partitioning scheme. Partitioning does not equal faster queries. It actually can cause your queries to be slower in some cases.
I see some of your additional thoughts above and I don’t believe you need to partition your table. If your queries are going to be doing aggregates on entire partitions then you may want to partition. If you are going to have hundreds of millions of rows of data you may want to partition to help with DBA maintenance. If you just want you queries to run fast then the primary key index will suffice. Please let me know
Just create a global index on your desired columns.

Resources