My records in 'answers table':
**id question_id answer**
20 12 app/Http/routes.ph
21 13 uri
22 13 closure
23 14 controller
24 15 class name
25 15 App\Http\Controllers
26 16 for displayh
My code to retrieve data:
$qas= DB::table('answers')
->groupBy('question_id')
->get();
dump($qas);
}
I get only 5 records instead of 7; where the other 2 records?
It works similarly to distinct function which is not my intention.
How to write code to get all 7 record which are grouped by 'question_id'?
Understand the concept of GroupBy, its mean that, it will consider same items as one, eg 14, 14-> 14 | 13,13-> 13 if you want to get all answer, then you have to write this
DB::table('answers')->get();
Edited
distinct and group by will do same action in your case.
If You can explain your problem in more details, then may be there is possible solution are.
You're talking about result with 7 rows, but that's not grouping. Maybe you're talking about ordering instead of grouping? If so, you'd want to use ->orderBy('question_id', 'asc'), that will give you 7 rows, sorted by question_id.
In SQL, GROUP BY clause is useful when you need to aggregate the result in a sort of way for example for SUM or AVERAGE of data.
Your case is different, you don't need to aggregate data using some function, you simply need a different data structure other than a simply and flat result list.
For your need, you should avoid GROUP BY, and reorder data in your code in a programmatic way.
Related
I have a table like this:
a
b
c
1
2
abc
2
3
4.00
note c2 is text while c3 is a number.
When I do
=QUERY(A1:C,"select *")
The result is like
a
b
c
1
2
2
3
4.00
The "text" in C2 has been missed. You can see the live sheet here:
https://docs.google.com/spreadsheets/d/1UOiP1JILUwgyYUsmy5RzQrpGj7opvPEXE46B3xfvHoQ/edit?usp=sharing
How to deal with this issue?
QUERY is very useful, but it has a main limitation: only can handle one kind of data per column. The other data is left as blank. There are usually ways to try to overcome this from inside the QUERY, but I've found them unfruitful. What you can do is just to use:
={A:C}
You can work with filters by its own, but as a step-by-step to adapt the main features of query: If you need to add conditions, use LAMBDA INDEX and FILTER
For example, to check where A is not null:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}) --> with INDEX(quer,,1), I've accesed the first column
Where B is more than one cell and less than other:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,2)>D1,INDEX(quer,,2)<D2))({A:C})
For sorting and limiting an amount of items, use SORTN. For example, you want to sort by 3rd column and limit to 5 higher values in that column:
=LAMBDA(quer,SORTN(FILTER(quer,INDEX(quer,,1)<>""),5,1,3,0))({A:C})
Or, to limit to 5 elements without sorting use ARRAY_CONSTRAIN:
=ARRAY_CONSTRAIN(LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}),5)
There are other options, you can use REGEXMATCH and other options, and emulate QUERYs functions without missing data. Let me know!
shenkwen,
If you are comfortable with adding an Google App Script in your sheet to give you a custom function, I have a QUERY replacement function that supports all standard SQL SELECT syntax. I don't analyze the column data to try and force to one type based on which is the most common data in the column - so this is not an issue.
The custom function code - is one file and is at:
https://github.com/demmings/gsSQL/tree/main/dist
After you save, you have a new function from your sheet. In your example, the syntax would be
=gsSQL("select a,b,c from testTable", {{"testTable", "F150:H152", 60, true}})
If your data is on a separate tab called 'testTable'(or whatever you want), the second parameter is not required.
I have typed in your example data into my test sheet (see line 150)
https://docs.google.com/spreadsheets/d/1Zmyk7a7u0xvICrxen-c0CdpssrLTkHwYx6XL00Tb1ws/edit?usp=sharing
I have a couple of joined Athena tables in Quicksight. The data looks something like this:
Ans_Count | ID | Alias
10 | 1 | A
10 | 1 | B
10 | 1 | C
20 | 2 | D
20 | 2 | E
20 | 2 | F
I want to create a calculated field such that it sums the Ans_Count column based on distinct IDs only. i.e., in the example above the result should be 30.
How do I do that?? Thanks!
Are you looking for the sum before or after applying a filter?
Sumif(Ans_Count,ID) may be what your looking for.
If you need to always return the result of the sum, regardless of the filter on the visual, look at the sumOver() function.
You can use distinctCountOver at PRE_AGG level to count unique number of values for a given partition. You could use that count to drive the sumIf condition as well.
Example : distinctCountOver(operand, [partition fields], PRE_AGG)
More details about what will be visual's group by specification and an example where there duplicate IDs will help give a specific solution.
It might even be as simple as minOver(Ans_Count, [ID], PRE_AGG) and using SUM aggregation on top of it in the visual.
If you want another column with the values repeated, use sumOver(Ans_Count, [ID], PRE_AGG). Or, if you want to aggregate via QuickSight, you would use sumOver(sum(Ans_Count), [ID]).
I agree with the above suggestions to use sumOver(sum(Ans_Count), [ID]).
I have yet to understand the use cases for pre_agg, so if anyone has concrete examples please share them!
Another suggestion would be to do a sumover + partition by in your table (if possible) before uploading the dataset, then checking if the results matche with Quicksight's aggregations. I find Quicksight can be tricky with calculated fields, aggregations, and nested ifs so I've been doing calculations in SQL where possible before bringing it in to quicksight to have a better grasp of what the outputs should look like. This obviously is an extra step, but can help in understanding how quicksight pulls off calcs and brings up figures (as the documentation doesn't always give much), and spotting things that don't look right (I've had a few) before you share your analysis with a wider group.
I have data in table A as below
Assetid attribute value
1546 Ins_date 05062011
1546 status active
1546 X 10.4567
1546 Y 27.56
1546 size 17
675 X 4.778
675 Y 53.676
675 depth 5
675 st_date 06092010
I have data as above in table A. This table has many Assetids 1546,675,....etc. attributes might vary for assets.
I want output as below:
assetid ins_date status X Y Size depth st_date
1546 05062011 active 10.4567 27.56 17 null null
675 null null 4.778 53.676 null 5 06092010
I have created Stored procedure, then called in Informatica to achieve this output. However, since i have large volume of data, it is taking much time to load.
Please suggest me other easy and best way to load it.
Use a router to split the rows into separate groups depending on attribute and then use a set of joiners to merge the rows with the same assetid values.
Use an Aggregator transformation to condense the records into one record per assetid. Then for each attribute, create a port that returns MAX(value) where the attribute matches. Note that this method assumes that you know all possible attributes ahead of time.
As suggested in previous answer you can use the aggregator. Since your data set is large you can use a technique using variable port in an expression as well provided the data is sorted before it reaches the expression.
You can download the sample mappings that demonstrate both the the techniques from Informatica Marketplace App titled "PowerCenter Mapping: Convert Rows Into Columns".
Is there any way to limit the number of columns under a particular row in Hbase? I have seen methods to limit rows. I wonder if there is any ways i can limit column family values
Like,
row columnfamily(page) value
1 page:1 1
1 page:2 2
1 page:3 3
I need to retrieve row1 values for column families page:1 and page:2
Is it possible?
There are a number of different ways that you can go with this problem. Basically, you want a server-side filter to limit your return data in a Get/Scan. Normally, this would be done with a co-processor, but that is still under development, so you really want to apply a filter to your query.
Example Filters: http://svn.apache.org/repos/asf/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/filter/
The easiest example would be a prefix filter (although it looks like you want some sort of range filter). Just to give you a rough idea of how this would work, here's how you apply a PrefixFilter to a Get:
HTable myTable; // predefined
Scan scan; // predefined
scan.setFilter(new ColumnPrefixFilter(Bytes.toBytes("myprefix")));
return myTable.getScanner(scan);
It is possible.
When scan-ning use http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[], byte[])
When get-ting use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#addColumn(byte[], byte[])
If the column key is predictable, for example, key is an index, then based on a particular value the keys could be added by iterating. Besides you could use filters as well if the conditioning could be random and complicated for example > 1 and < 3, key in (3, 10, 11) etc. For filter use this. There are host of pre-implemented filters. You would probably be interested in the qualifier filter.
Hope this helps.
Greetings,
In my reporting services I would like to add sorting. Is there any way I can add sorting by two fields inside one column's sort expression? something like:
=Fields!SomeValue1.Value
=Fields!Somevalue2.Value
when I use this sort expression, values are not sorted correctly.
Values I would like to sort are something like
SomeValue1 SomeValue2
10 11
9 1
20 21
13 12
13 7
17 6
The case is that SomeValue1 and SomeValue2 comes from another value that as follows:
10-11
9-1
20-21
13-12
13-7
17-6
Any help would be appreciated.
I can think of three possiblities:
(Simplest) Include an order by SomeValue1, SomeValue2 clause at the end of your SQL query.
Sort by two expressions in Table Properties - ie. in the Sorting tab in the Table Properties dialog, enter =Fields!SomeValue1.Value as the expression on the first line, then click on the line below and enter =Fields!SomeValue2.Value - like so:
(Hardest) Sort on a single expression in Table Properties consisting of SomeValue1 and SomeValue2 converted to 0-padded, fixed length strings, concatenated together - similar to the original value, but formatted consistently, like so: 0000000001-0000000001.
I recommend the first approach.