Is there any way to limit the number of columns under a particular row in Hbase? I have seen methods to limit rows. I wonder if there is any ways i can limit column family values
Like,
row columnfamily(page) value
1 page:1 1
1 page:2 2
1 page:3 3
I need to retrieve row1 values for column families page:1 and page:2
Is it possible?
There are a number of different ways that you can go with this problem. Basically, you want a server-side filter to limit your return data in a Get/Scan. Normally, this would be done with a co-processor, but that is still under development, so you really want to apply a filter to your query.
Example Filters: http://svn.apache.org/repos/asf/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/filter/
The easiest example would be a prefix filter (although it looks like you want some sort of range filter). Just to give you a rough idea of how this would work, here's how you apply a PrefixFilter to a Get:
HTable myTable; // predefined
Scan scan; // predefined
scan.setFilter(new ColumnPrefixFilter(Bytes.toBytes("myprefix")));
return myTable.getScanner(scan);
It is possible.
When scan-ning use http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[], byte[])
When get-ting use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#addColumn(byte[], byte[])
If the column key is predictable, for example, key is an index, then based on a particular value the keys could be added by iterating. Besides you could use filters as well if the conditioning could be random and complicated for example > 1 and < 3, key in (3, 10, 11) etc. For filter use this. There are host of pre-implemented filters. You would probably be interested in the qualifier filter.
Hope this helps.
Related
I have a table like this:
a
b
c
1
2
abc
2
3
4.00
note c2 is text while c3 is a number.
When I do
=QUERY(A1:C,"select *")
The result is like
a
b
c
1
2
2
3
4.00
The "text" in C2 has been missed. You can see the live sheet here:
https://docs.google.com/spreadsheets/d/1UOiP1JILUwgyYUsmy5RzQrpGj7opvPEXE46B3xfvHoQ/edit?usp=sharing
How to deal with this issue?
QUERY is very useful, but it has a main limitation: only can handle one kind of data per column. The other data is left as blank. There are usually ways to try to overcome this from inside the QUERY, but I've found them unfruitful. What you can do is just to use:
={A:C}
You can work with filters by its own, but as a step-by-step to adapt the main features of query: If you need to add conditions, use LAMBDA INDEX and FILTER
For example, to check where A is not null:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}) --> with INDEX(quer,,1), I've accesed the first column
Where B is more than one cell and less than other:
=LAMBDA(quer,FILTER(quer,INDEX(quer,,2)>D1,INDEX(quer,,2)<D2))({A:C})
For sorting and limiting an amount of items, use SORTN. For example, you want to sort by 3rd column and limit to 5 higher values in that column:
=LAMBDA(quer,SORTN(FILTER(quer,INDEX(quer,,1)<>""),5,1,3,0))({A:C})
Or, to limit to 5 elements without sorting use ARRAY_CONSTRAIN:
=ARRAY_CONSTRAIN(LAMBDA(quer,FILTER(quer,INDEX(quer,,1)<>""))({A:C}),5)
There are other options, you can use REGEXMATCH and other options, and emulate QUERYs functions without missing data. Let me know!
shenkwen,
If you are comfortable with adding an Google App Script in your sheet to give you a custom function, I have a QUERY replacement function that supports all standard SQL SELECT syntax. I don't analyze the column data to try and force to one type based on which is the most common data in the column - so this is not an issue.
The custom function code - is one file and is at:
https://github.com/demmings/gsSQL/tree/main/dist
After you save, you have a new function from your sheet. In your example, the syntax would be
=gsSQL("select a,b,c from testTable", {{"testTable", "F150:H152", 60, true}})
If your data is on a separate tab called 'testTable'(or whatever you want), the second parameter is not required.
I have typed in your example data into my test sheet (see line 150)
https://docs.google.com/spreadsheets/d/1Zmyk7a7u0xvICrxen-c0CdpssrLTkHwYx6XL00Tb1ws/edit?usp=sharing
is it possible to restrict what values are allowed for a column in informatica. For eg, there is a column called Item_number, for which I want values to be in range of 1 to 10, is it possible to implement that, if so which transformation I can use for that?
You need to use FILTER transformation. In properties you can mention -
Item_number >=1 AND Item_number <=10
You can try using a lookup transformation in your mapping and that lookup can be a flat file or table depending on your requirement.
Keep the valid values in that lookup file/table and verify the input data for that column.
If lookup returns a value, the data is valid; else incorrect data in case null is returned.
If you want to restrict the data in your column without removing any rows, use a simple function combination:
IIF(in_val>10, 10, IIF(in_val<1, 1, in_val))
Check if that works fine with your scenario - come back if you need further help.
I'm trying to clear the entire cell if it doesn't contain a given keyword.
I've managed to do this for one column:
Table.ReplaceValue(#"PrevStep",each [#"My Column"], each if Text.PositionOf([#"My Column"],"keyword")>-1 then [#"My Column"] else null,Replacer.ReplaceValue,{"My Column"})
The problem is I need to iterate/repeat that step for a number of columns... the number of columns may vary and column names also may be different every time. I can have all those column names put into a list but I'm not able to use it.
The solution I'm looking for may look like this
for each ColNam in MyColumnsList
Table.ReplaceValue(#"PrevStep",each [#"ColNam"], each if Text.PositionOf([#"ColNam"],"keyword")>-1 then [#"ColNam"] else null,Replacer.ReplaceValue,MyColumnsList)
next
but this is not the VBA code but Power Query M - and of course the problem is with #PrevStep as I would see it like a recursions... again... do not know how to process.
Is the path I follow correct or should it be done some other way
Thanks
Andrew
Unpivot your columns to turn all the columns into two columns. Apply your replacement to the single value column then pivot it back into the original format
I have 2 columns - Matches(Integer), Accounts_type(String). And i want to create a third column where i want to get proportions of matches played by different account types. I am new to Talend & am facing issue with this for past 2 days & did a lot of research but to no avail. Please help..
You can do it like this:
You need to read your source data twice (I used tFixedFlowInput_1 and tFixedFlowInput_2 with the same data). The idea is to calculate the total of your matches in tAggregateRow_1, it simply does a sum of all Matches without a group by column, then use that as a lookup.
The tMap then joins your source data with the calculated total. Since the total will always be one record, you don't need any join column. You then simply divide Matches by Total as required.
This is supposing you have unique values in Account_type; if you don't, you need to add another tAggregateRow between your source and tMap_1, in order to get sum of Matches for each Account_type (group by Account_type).
Current application use JPA to auto generate table/entity id. Now a requirement wants to get a query to manually insert data in to the database using SQL queries
So the questions are:
Is it worth to create a sequence in this schema just for this little requirement?
If answer to 1 is no, then what could be a plan b?
Yes. A sequence is trivial - why would you not do it?
N/A
Few ways:
Use a UUID. UUIDs are pseudo-random, large alphanumeric strings which are guaranteed to be unique once generated.
Does the data have something unique? Like a timestamp, or IP address, etc? If so, use that
Combination of current timestamp + some less unique value in the data
Combination of current timestamp + some integer i that you keep incrementing
There are others (including generating a checksum, custom random numbers instead of UUIDs, etc) - but those have the possibility of overlaps, so not mentioning them.
Edit: Minor clarifications
Are you just doing a single data load into an empty table, and there are no other users concurrently inserting data? If so, you can just use ROWNUM to generate the IDs starting from 1, e.g.
INSERT INTO mytable
SELECT ROWNUM AS ID
,etc AS etc
FROM ...