HBase : Confusion between COLUMN and FILTER (SingleColumnValueFilter) - shell

I have installed hbase and I have access to command's shell.
I have a table with 2 familly column like this:
create 'arbres', 'emplacement', 'propriete'
This request works fine :
scan 'arbres',{FILTER=>"SingleColumnValueFilter('emplacement', 'lieu_adresse', =,'binary:VOIE INCONNUE')", COLUMNS=>['emplacement'], COLUMN=>15}
But this second one, list all rows, without filter
scan 'arbres',{FILTER=>"SingleColumnValueFilter('emplacement', 'lieu_adresse', =,'binary:VOIE INCONNUE')", COLUMNS=>['propriete'], COLUMN=>15}
I don't understand why and I don't find the reason in the documentation.
Please can you explain a little the reason.
regards

The second command has a filter on different column family and column that you are not accessing.
The push down requires the columns to be accessed, meaning you should have the column family and column mentioned in the COLUMNS=>[]
The reason one would have two different column families is to make access easier and light weight, since each column family will have its own file.

Related

Power Query - conditional replace/clear entire cell in multiple columns

I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew
Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format

Power Query Create conditional new column

I am having a little challenge.
In the attached sample data you can see that for a single AMI two different Product names exist.I do know that when that occurs only Linux/UNIX is correct. I would like to create a new column that will be looking for each AMI if it has 2 distinct string values then the new column to always be Linux/UNIX.
İmage
For the time being, i am using in power query text.contains and i specify manually those AMIs but this is not so productive. I am not aware if i can run loops with power query.
Thanks in advance for your help.
Filippos
Group the data on Ami and count Product names; merge results back into table; add custom column that checks the new column like =if [newcolumn] >1 then "Linux" else null
Another possibility. Sort table. Click to select product and type column then right click remove duplicates, which should only keep first instance. If you sorted correctly, then all the non Linuxes will be removed

Queries in Dynamodb

I have an application written in Nodejs that needs to find ONE row based on a city name (this could just be the table's name, different cities will be categorized as different tables), and a field named "currentJobLoads" which is a number. For example, a user might want to find ONE row with the city name "Chicago" and the lowest currentJobLoads. How can I achieve this in Dynamodb without scan operations(since scan would be slower and can only read so much data before it gets terminated)? Any suggestions would be highly appreciated.
You didn't specify what your current partition key and sort key for the table are, but I'm guessing the currentJobLoads field isn't one of them. So you would need to create a Global Secondary Index on the currentJobLoads field, at which point you will be able to run query operations against that field.

Get data's source in kettle

When I use kettle , I was wandering how to get a table column's source column. Just for an example , after I have merged two tables into one table based on primary key already , Given any column in output table , I could judge whether table it belongs to and get the original column name in original table. Thank you for helping and sorry for my poor English...
http://i.stack.imgur.com/xoR0s.png
When I was given any field in table3 (suppose a field named A in table3) , I could know where it comes from without the graphical view (from java code or other ways) , like the original table name (here are input1 or input2) and the original column name(maybe B in input1 , but represents A in table3). Besides I use mysql.
There are a couple of ways to do this:
1) Manually. If you right-click on the output step and choose Show Output fields (or whatever it's called), you will see the "origin step" for each of the outgoing fields. You can do the same for input fields. Then you can trace them back to those origin steps, and repeat the process of viewing the input fields at those steps, and seeing those fields' origins, and so on. This is probably not what you're looking for.
2) With code. Prior to 6.0, you'd need to programmatically perform the same operations as are listed in option 1 above. In 6.0 there is the Data Lineage capability, which offers the LineageClient API that can find the origin fields for the specified output fields. For more information see my blog post describing the Data Lineage capability. Also I put a Gremlin Console in the PDI Marketplace, to make the use of LineageClient easier (and you can visually see the lineage graph too).

Altova Mapforce: Joining XML Input and conditional SQL Join using two tables

I'm trying to get the following done: Using Altova Mapforce, I use an XML file with schema as a source. I want to map it to exactly the same output, but only add data to one field.
The value of the field (it's Tax) is determined using a two table SQL join with a WHERE clause over both tables. The tables are joined using foreign keys, the relation is recognized by Mapforce.
The first field of the WHERE clause comes from the first table (header type table), the second and third field from the second tables (lines type tables).
However, I cannot seem to create the logical and correct equivalent of what I am describing here. I've tried it using complex AND constructions where it then inserts the one field I would need multiple times. I've tried WHERE clauses but they fail as they never supply both tables at the same time and there seems to be no way to use a pre-specified JOINing of two tables as a source. The WHERE clause then recognizes only the fields from the first table, not the second one.
Is there an example for this? Joining two (or more) tables, using WHERE to determine the exact row, then using a value from that row?
Best wishes.

Resources