Does column name length count for each cell size when counting the column size in Google bigquery? - hadoop

I know in HBase for example, you need to put column names as small as possible to minimize the size.
Is it the same in google bigquery? Should I put column names as small as possible?

Good news: In BigQuery you don't need to worry about the column names length. Be as descriptive as you'd like to, since the column name is part of the table description, and not of each record.

Related

Generate a 3 column table by flattening the given multi column table using google sheets formula

I want to generate a 3 column table by flattening the given multi column table using google sheets formula. The formula should scale to any sized input table.
The sample file is here
https://docs.google.com/spreadsheets/d/1qTvQ58hh1jJEMVWqhQ833jjDBpuudJambmvdJemESPI/edit?usp=sharing
I have tried with
={ARRAYFORMULA(B2:B6),ARRAYFORMULA(D2:D6)}
Disaggregating is not a simple process. You can do it in different ways. Here is one with REDUCE that doesn't involve joining and splitting process. I've moved your input chart some columns to the right so the formula is "scalable" to as many companies you may have:
=QUERY(
REDUCE({"","",""},SEQUENCE(COUNTA(D2:D)),LAMBDA(a,c,{a;
REDUCE({"","",""},SEQUENCE(COUNTA(F1:1)),LAMBDA(d,e,{d;INDEX(D2:D,c),INDEX(F1:1,1,e),INDEX(F2:1000,c,e)}))})),
"Where Col1 is not null",)
You may try:
=hstack(tocol(map(B2:B6,lambda(z,wraprows(z,counta(D1:I1),z))),1,1),tocol(bycol(D1:I1,lambda(z,index(text(SEQUENCE(counta(B2:B6)),"")&z))),1,1),tocol(D2:I6,,1))

Writing a formula in a cell in Google Sheets that averages the results from a column derived from expected values in multiple columns

I'm an average user of Google sheets and I've tried writing/looking up the formula I'm going for, but I haven't had any luck yet.
I have a spreadsheet that details multiple values that I need to display in a single cell the average of a certain set of values derived from a specific set of those values from multiple columns.
The flow of information would look something along the lines of:
if value in Column D=L
then
if value in Column J<$1.20
then
Find Avg of all Values in Column N
I'd need the formula to narrow it's field of data each time so the final result was the average of all the values in Column N that had a value in column J<$1.20 with a value in Column D=L.
I feel like a dummy over here because I just can't narrow down how I should write this flow and get it to work right without adding multiple extra hidden columns. Can anyone help on this one?
I've tried writing the formula multiple different ways but haven't kept it written down to pass on.

Power Query - conditional replace/clear entire cell in multiple columns

I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew
Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format

How to get the sum of values of a column in tmap?

I have 2 columns - Matches(Integer), Accounts_type(String). And i want to create a third column where i want to get proportions of matches played by different account types. I am new to Talend & am facing issue with this for past 2 days & did a lot of research but to no avail. Please help..
You can do it like this:
You need to read your source data twice (I used tFixedFlowInput_1 and tFixedFlowInput_2 with the same data). The idea is to calculate the total of your matches in tAggregateRow_1, it simply does a sum of all Matches without a group by column, then use that as a lookup.
The tMap then joins your source data with the calculated total. Since the total will always be one record, you don't need any join column. You then simply divide Matches by Total as required.
This is supposing you have unique values in Account_type; if you don't, you need to add another tAggregateRow between your source and tMap_1, in order to get sum of Matches for each Account_type (group by Account_type).

Fastest way to find records that end with key

I'm looking for optimal way to search through millions of records that contain serial number saved as varchar column which ends with specified string key.
I was using EndsWith, however performance is rather poor if several queries are sent.
Is there a better way to do it?
EDIT:
Since search key is of variable length, I can't create column that holds cut-off value of serial number. However, I've done some tests with using Substring and Equals vs EndsWith and I've lowered down execution speed to 40% of the one of EndsWith.
I'm still looking for better solution though :)
Unfortunately, searching for strings ending with a particular pattern is difficult on most databases+, because searching for string suffixes cannot use an index. This results in full table scans, which may be slow on tables with millions of rows.
If your database supports reverse indexes, add one for your string key column; otherwise, you can improve performance by simulating reverse indexes:
Add a column for storing your string key in reverse
If your RDBMS supports computed columns, add one for the reversed key
Otherwise, define a trigger that populates the reversed column from the key column
Create an index on the reversed column
Use the reversed column for your searches by passing in the reversed suffix that you are looking for.
For example, if you have data like this
key
-----------
01-02-3-xyz
07-12-8-abc
then the augmented table would have
key rev_key
----------- -----------
01-02-3-xyz zyx-3-20-10
07-12-8-abc cba-8-21-70
and your search for ENDS_WITH(key, '3-xyz') would ask for STARTS_WITH(rev_key, 'zyx-3'). Since string indexes speed up lookups by prefix, the "starts with" lookup would go much faster.
+ One notable exception is Oracle, which provides reverse key indexes specifically for situations like this.

Resources