Get names of columns in column family without data - hadoop

I have column family with generated columns and heavy data in it.
Every column has name like PREFIX + TIMESTAMP.
I need to extract timestamp from column name before than I make a decision whether I need to get data from it.
How can I get only names of columns in CF?

Related

D365/Dataverse - Create Calculated/Look Up Column that is set to the highest date in another table

I have Table 1. It is filled with dates a inspection is going on. Plumbing or Garden inspections for example.
Table 2 links to these appointments and has additional columns with details such as a Person assigned to the inspection, and what property the inspection is at. I need these two tables to be separate as described, and they are linked by a simple ID column.
Is it possible at all to add a column to Table 2 called 'Last Date of Plumbing Inspection'. The idea is for any given Property in Table 2, there can be multiple inspection entries in Table 1 for it. The point of this column is that it should look in Table 1, find the matching ID, find the latest inspection date out of all the Plumbing-related inspections, and then set the column value to that.
The problem I am having with this is it seems like calculated columns can ONLY implement logic using the columns of the table the calculated column was created in. In Table 2, I can't create a calculated column that interacts with Table 1 at all. I could create a look up column, but I can't combine calculated columns with look up columns. Is there a way to build this latest inspection date column without too much complexity?
Actually you can create a Rollup field and put a MAX aggregate function for achieving your requirement from related table. Read more

NiFi : Create file name using the field values from the csv flowfile

I have a csv flowfile with single record. I need to create its file name based on couple of column values in the csv file. Can you please let me know how we can do it by using the column name only not the position of the column as column position may change. Example 
CSV File
Name , City, State, Country, Gender
John, Dallas, Texas, USA, M
File name should be John_USA.csv
I am trying extract text processor and pulling the first data row using -
row = ^.\r?\n(.)
And then updateattribute processor I am pulling the values from the columns using below expression
${row:getDelimitedField(1)}_${row:getDelimitedField(4)}.csv
But this use the position of the column not the column name. How can I build it using the column name not the position of columns
The way I will do it (maybe be not the efficient one):
Convert the CSV to json
Pass content to attributes (so you can access the field you want like dictionnary (key-value))
Update Attributes
Convert it back to CSV (thus you can control the schema, and the position of the fields).

Is there a way to dynamically identify and expand an embedded table's columns?

If I want to expand this embedded table...
...and I click on the expand button, I'm presented with the dropdown to select which columns I want to expand:
However, if I choose '(Select All Columns)' to expand them all, Power Query turns that into hard-coded column names of all the columns at the time I do that. Like this:
= Table.ExpandTableColumn(Source, "AllData", {"Column1", "Column2", "Column3", "Column4", "Custom"}, {"Column1", "Column2", "Column3", "Column4", "Custom"})
After that, if the underlying embedded table's columns change, the hard-coded column names will no longer be relevant and the query will "break."
So how can I tell it to dynamically identify and extract all of the current columns of the embedded table?
You can do something like this to get the list of column names:
List.Accumulate(Source[AllData], {}, (state, current) => List.Union({state, Table.ColumnNames(current)}))
This goes through each cell in the column, gets the column names from the table in that cell, and adds the new names to the result. It's easier to store this in a new step and then reference that in your next step.
Keep in mind that this method can be much slower than passing in the list of names you know about because it has to scan through the entire table to get the column names. You may also have problems if you use this for the third parameter in Table.ExpandTableColumn because it could use a column name that already exists.
Try using Table.Join which joins and expands the second table in one step.
"Merged Queries" = Table.Join(Source,{"Index.1"},Table2,{"Index.2"},JoinKind.LeftOuter)
You just need to make sure that the columns between the tables are unique.
Use Table.PrefixColumns to ensure column names are unique

How to find columns count of csv(Excel) sheet in ETL?

To count the rows of csv file we can use Get Files Rows Count Input in etl. How to find the number columns of a csv file?
Just read the first row of the CSV file using Text-File-Input setting header rows to 0. Usually, the first row contains field names. If you read the whole row into a single field, you can use Split-Field-To-Rows to have a single fieldname per row and the number of rows tells you the number of fields. There are other ways, but this one easily prepares for a subsequent metadata injection - if that's what you have in mind.
No Need of Metadata injection , In Split-Field-To-Rows, check "Include rownum in output" and give some name to that Variable. Then apply sort rows on that Variable, use Sample rows, then you will get number of fields which are present in the file.

Does column name length count for each cell size when counting the column size in Google bigquery?

I know in HBase for example, you need to put column names as small as possible to minimize the size.
Is it the same in google bigquery? Should I put column names as small as possible?
Good news: In BigQuery you don't need to worry about the column names length. Be as descriptive as you'd like to, since the column name is part of the table description, and not of each record.

Resources