In Hbase 1.2.4
What is the difference between checkAndPut and checkAndMutate?
checkAndPut - compares the value with the current value from the hbase according to the passed CompareOp. CompareOp=EQUALS Adds the value to the put object if expected value is equal.
checkAndMutate - compares the value with the current value from the hbase according to the passed CompareOp.CompareOp=EQUALS Adds the value to the rowmutation object if expected value is equal.
you can add multiple put and delete objects in the order in which you wants the mutation to executes in hbase to the rowmutation object
In rowmutation the order of puts and deletes matter
RowMutations mutations = new RowMutations(row);
//add new columns
Put put = new Put(row);
put.add(cf, col1, v1);
put.add(cf, col2, v2);
Delete delete = new Delete(row);
delete.deleteFamily(cf1, now);
//delete column family and add new columns to same family
mutations.add(delete);
mutations.add(put);
table.mutateRow(mutations);
checkAndMutate
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#checkAndMutate-byte:A-byte:A-byte:A-org.apache.hadoop.hbase.filter.CompareFilter.CompareOp-byte:A-org.apache.hadoop.hbase.client.RowMutations-
checkAndPut
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#checkAndPut-byte:A-byte:A-byte:A-org.apache.hadoop.hbase.filter.CompareFilter.CompareOp-byte:A-org.apache.hadoop.hbase.client.Put-
Related
I'm learning about Power Builder, and i don't know how to use these, (DWitemstatus, getnextmodified, modifiedcount, getitemstatus, NotModified!, DataModified!, New!, NewModified!)
please help me.
Thanks for read !
These relate to the status of rows in a datawindow. Generally the rows are retrieved from a database but this doesn't always have to be the case - data can be imported from a text file, XML, JSON, etc. as well.
DWItemstatus - these values are constants and describe how the data would be changed in the database.
Values are:
NotModified! - data unchanged since retrieved
DataModified! - data in one or more columns has changed
New! - row is new but no values have been assigned
NewModifed! - row is new and at least one value has been assigned to a column.
So in terms of SQL, a row which is not modified would not generate any SQL to the DBMS. A DataModified row would typically generate an UPDATE statement. New and NewModifed would typically generate INSERT statements.
GetNextModifed is a method to search a set of rows in a datawindow to find the modified rows within that set. The method takes a buffer parameter and a row parameter. The datawindow buffers are Primary!, Filter!, and Delete!. In general you would only look at the Primary buffer.
ModifedCount is a method to determine the number of rows which have been modifed in a datawindow. Note that deleting a row is not considered a modification. To find the number of rows deleted use the DeletedCount method.
GetItemStatus is a method to get the status of column within a row in a data set in a datawindow. It takes the parameters row, column (name or number), and DWBuffer.
So now an example of using this:
// loop through rows checking for changes
IF dw_dash.Modifiedcount() > 0 THEN
ll = dw_dash.GetNextModified(0,Primary!)
ldw = dw_dash
DO WHILE ll > 0
// watch value changed
IF ldw.GetItemStatus(ll,'watch',Primary!) = DataModified! THEN
event we_post_item(ll, 'watch', ldw)
END IF
// followup value changed
IF ldw.GetItemStatus(ll,'followupdate',Primary!) = DataModified! THEN
event we_post_item(ll, 'followupdate', ldw)
END IF
ll = ldw.GetNextModified(ll,Primary!)
LOOP
ldw.resetupdate() //reset the modifed flags
END IF
In this example we first check to see if any row in the datawindow has been modified. Then we get the first modified row and check if either the 'watch' or 'followupdate' columns were changed. If they were we trigger an event to do something. We then loop to the next modified row and so on. Finally we reset the modified flags so the row would now show as not being mofified.
I have a Table called Records with the following four columns:
ID | StartChainage | EndChainage | DistanceTraveled
The DistanceTraveled is the difference between EndChainage and StartChainage. For each new record, the StartChainage should be equal to the EndChainage of the previous one.
I have created a Form called Record1 where I can only add values in the field called EndChainage, while in the field StartChainage I use the following expression:
=IIf([ID]=1,0,DLookUp("[EndChainage]","Record","[ID]=Forms![Record1]![ID]-1"))
Where I actually say that for the first record in the Table "Records" (i.e. ID=1) the value in the StartChainage must be 0, else it should obtain the value of the EndChainage field of the previous record.
This works fine and I have a Form with fields where I only input the value of the EndChainage and the Form sets the value of the StartChainage for the next record and it also calculates the DistanceTraveled.
The problem is that the calculated fields are NOT updating the relevant fields of the Table. In the Table the only updated fields are the EndChainage ones, i.e. the ones I only type manually the values.
How can I make the Table to get automatically updated by the calculated fields of the Form?
Maybe I could use calculated fields in the Table itself, but this is not what I really want.
Try with:
=IIf([ID]=1,0,DLookUp("[EndChainage]","Record","[ID]=" & Forms![Record1]![ID]-1 & ""))
Check if you get the ID:
=Forms![Record1]![ID]-1
My datasource has a column that contains a comma-separated list of numbers.
I want to create a dataset that takes those numbers and turns them into groupings to use in a bar chart.
requirements
numbers will be between 0-17 inclusive
groupings: 0-2,3-5,6-10,11-17
x-axis labels have to be the groupings
y-axis is the percent of rows that contain that grouping
note that because each row can contribute to multiple columns the percentages can add up to > 100%
any help you can offer would be awesome... i'm very new to BIRT and have been stuck on this for a couple days now
Not sure that I understand the requirements exactly, but your basic question "split dataset column into multiple rows" can be solved either using a scripted dataset or with pure SQL (depending on your DB).
Either way, you will need a second dataset (e.g. your data model is master-detail, and in your layout you will need something like
Table/List "Master bound to master DS
Table/List "Detail" bound to detail DS
The detail DS need the comma-separated result column from the master DS as an input parameter of type "String".
Doing this with a scripted dataset is quite easy IFF you understand Javascript AND you understand how scripted datasets work: Create a report variable "myValues" of type object with a default value of null and a second report variable "myValuesIndex" of type integer with a default value of 0.
(Note: this is all untested!)
Create the dataset "detail" as a scripted DS, with one input parameter "csv" of type String and one output parameter "value" of type String.
In the open event of the scripted DS, code:
vars["myValues"] = this.getInputParameterValue("csv").split(",");
vars["myValuesIndex"] = 0;
In the fetch event, code:
var i = vars["myValuesIndex"];
var len = vars["myValues"].length;
if (i < len) {
row["value"] = vars["myValues"][i];
vars["myValuesIndex"] = i+1;
return true;
} else {
return false;
}
For example, for the master DS result row with csv = "1,2,3-4,foo", the detail DS will result in 4 rows with
value = "1"
value = "2"
value = "3-4"
value = "foo"
Using an Oracle DB, this can be done without Javascript. The detail DS (with the same input parameter as above) would then look like:
select t.value as value from table(split(?)) t
For the definition of the split function, see RedFilter's answer on
Is there a function to split a string in PL/SQL?
If you get ORA-22813, you should change the original definition
create or replace type split_tbl as table of varchar2(32767);
to
create or replace type split_tbl as table of varchar2(4000);
as mentioned on https://community.oracle.com/thread/2288603?tstart=0
It's also possible with pure SQL in 11g using regexp_substr (see the same page).
create parameters in the scripted data set. we have to pass or link actual dataset values to scripted dataset parameters through DataSet parameter Binding after assigning the scripted data set to Table.
I am loading initial data (url list for a crawler) to Cassandra with status crawled=0. Then using Hadoop I crawl all the links and try to change crawled from 0 to something else, for example 1 or 2, or 3. When I check in Cassandra cli interface get ColumnFamily['www.somedomain.com'] the value of crawler column remains the same. If during initial import I have not mentioned crawled column, it adds correctly. This is only one part of the algorithm and I need further updates of this column with other Map/Reduce jobs, etc.
In Thrift and Cassandra API it is said that we have only inserts and deletions. Insert should work as an update.
For crawled column I have UTF8 type.
Mutation class is like this:
private static Mutation getMutationCrawled(Text crawledVal)
{
Text column = new Text();
column.set("crawled");
Column c = new Column();
c.setName(ByteBuffer.wrap(Arrays.copyOf(column.getBytes(), column.getLength())));
c.setValue(ByteBuffer.wrap(crawledVal.getBytes()));
c.setTimestamp(System.currentTimeMillis());
Mutation m = new Mutation();
m.setColumn_or_supercolumn(new ColumnOrSuperColumn());
m.column_or_supercolumn.setColumn(c);
return m;
}
Cassandra resolves conflicts using the timestamp of the mutation, with the largest timestamp winning. You can set the timestamp value to whatever you want, but the convention is to set the timestamp as a value in micro seconds. In the example above, you set the timestamp with,
c.setTimestamp(System.currentTimeMillis());
Most likely the initial import code to populate the values is setting the timestamp in micro seconds. The micro second timestamp values are larger than the millisecond timestamp values, so your updates are being ignored.
Is there an efficient way to delete multiple rows in HBase or does my use case smell like not suitable for HBase?
There is a table say 'chart', which contains items that are in charts. Row keys are in the following format:
chart|date_reversed|ranked_attribute_value_reversed|content_id
Sometimes I want to regenerate chart for a given date, so I want to delete all rows starting from 'chart|date_reversed_1' till 'chart|date_reversed_2'. Is there a better way than to issue a Delete for each row found by a Scan? All the rows to be deleted are going to be close to each other.
I need to delete the rows, because I don't want one item (one content_id) to have multiple entries which it will have if its ranked_attribute_value had been changed (its change is the reason why chart needs to be regenerated).
Being a HBase beginner, so perhaps I might be misusing rows for something that columns would be better -- if you have a design suggestions, cool! Or, maybe the charts are better generated in a file (e.g. no HBase for output)? I'm using MapReduce.
Firstly, coming to the point of range delete there is no range delete yet in HBase, AFAIK. But there is a way to delete more than one rows at a time in the HTableInterface API. For this simply form a Delete object with row keys from scan and put them in a List and use the API, done! To make scan faster do not include any column family in the scan result as all you need is the row key for deleting whole rows.
Secondly, about the design. First my understanding of the requirement is, there are contents with content id and each content has charts generated against them and those data are stored; there can be multiple charts per content via dates and depends on the rank. In addition we want the last generated content's chart to show at the top of the table.
For my assumption of the requirement I would suggest using three tables - auto_id, content_charts and generated_order. The row key for content_charts would be its content id and the row key for generated_order would be a long, which would auto-decremented using HTableInterface API. For decrementing use '-1' as the amount to offset and initialize the value Long.MAX_VALUE in the auto_id table at the first start up of the app or manually. So now if you want to delete the chart data simply clean the column family using delete and then put back the new data and then make put in the generated_order table. This way the latest insertion will also be at the top in the latest insertion table which will hold the content id as a cell value. If you want to ensure generated_order has only one entry per content save the generated_order id first and take the value and save it into content_charts when putting and before deleting the column family first delete the row from generated_order. This way you could lookup and charts for a content using 2 gets at max and no scan required for the charts.
I hope this is helpful.
You can use the BulkDeleteProtocol which uses a Scan that defines the relevant range (start row, end row, filters).
See here
I ran into your situation and this is my code to implement what you want
Scan scan = new Scan();
scan.addFamily("Family");
scan.setStartRow(structuredKeyMaker.key(starDate));
scan.setStopRow(structuredKeyMaker.key(endDate + 1));
try {
ResultScanner scanner = table.getScanner(scan);
Iterator<Entity> cdrIterator = new EntityIteratorWrapper(scanner.iterator(), EntityMapper.create(); // this is a simple iterator that maps rows to exact entity of mine, not so important !
List<Delete> deletes = new ArrayList<Delete>();
int bufferSize = 10000000; // this is needed so I don't run out of memory as I have a huge amount of data ! so this is a simple in memory buffer
int counter = 0;
while (entityIterator.hasNext()) {
if (counter < bufferSize) {
// key maker is used to extract key as byte[] from my entity
deletes.add(new Delete(KeyMaker.key(entityIterator.next())));
counter++;
} else {
table.delete(deletes);
deletes.clear();
counter = 0;
}
}
if (deletes.size() > 0) {
table.delete(deletes);
deletes.clear();
}
} catch (IOException e) {
e.printStackTrace();
}