How to insert Double[] into HBase? - hadoop

For Example,
if i want to insert a double[] like,
Double[] dArr={10.23,25.1,30.5,45.3};
into HBase table.
Could you plzz tell me how to insert it into hbase??

You can store anything you want, you just have to serialize it to a byte[]:
Double[] dArr={10.23,25.1,30.5,45.3};
byte[] value = new byte[0];
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
for (Double d:dArr) {
value = Bytes.add(value, Bytes.toBytes(d));
}
Put put = new Put( rowKey );
put.add(family, column, Bytes.toBytes(value));
...
You'll have to unserialize the data when you read the value (convert from byte[] to Double[])
Another thing you can do is store each Double (as byte[]) in its own column (d0 to dX)
Double[] dArr={10.23,25.1,30.5,45.3};
Put put = new Put( rowKey );
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
int i=0;
for (Double d:dArr) {
put.add( family, Bytes.add(column, Bytes.toBytes(i)), Bytes.toBytes(d));
i++;
}
...

Related

Using Bulk Insert dramatically slows down processing?

I'm fairly new to Oracle but I have used the Bulk insert on a couple other applications. Most seem to go faster using it but I've had a couple where it slows down the application. This is my second one where it slowed it down significantly so I'm wondering if I have something setup incorrectly or maybe I need to set it up differently. In this case I have a console application that processed ~1,900 records. Inserting them individually it will take ~2.5 hours and when I switched over to the Bulk insert it jumped to 5 hours.
The article I based this off of is http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
Here is what I'm doing, I'm retrieving some records from the DB, do calculations, and then write the results out to a text file. After the calculations are done I have to write those results back to a different table in the DB so we can look back at what those calculations later on if needed.
When I make the calculation I add the results to a List. Once I'm done writing out the file I look at that List and if there are any records I do the bulk insert.
With the bulk insert I have a setting in the App.config to set the number of records I want to insert. In this case I'm using 250 records. I assumed it would be better to limit my in memory arrays to say 250 records versus the 1,900. I loop through that list to the count in the App.config and create an array for each column. Those arrays are then passed as parameters to Oracle.
App.config
<add key="UpdateBatchCount" value="250" />
Class
class EligibleHours
{
public string EmployeeID { get; set; }
public decimal Hours { get; set; }
public string HoursSource { get; set; }
}
Data Manager
public static void SaveEligibleHours(List<EligibleHours> listHours)
{
//set the number of records to update batch on from config file Subtract one because of 0 based index
int batchCount = int.Parse(ConfigurationManager.AppSettings["UpdateBatchCount"]);
//create the arrays to add values to
string[] arrEmployeeId = new string[batchCount];
decimal[] arrHours = new decimal[batchCount];
string[] arrHoursSource = new string[batchCount];
int i = 0;
foreach (var item in listHours)
{
//Create an array of employee numbers that will be used for a batch update.
//update after every X amount of records, update. Add 1 to i to compensated for 0 based indexing.
if (i + 1 <= batchCount)
{
arrEmployeeId[i] = item.EmployeeID;
arrHours[i] = item.Hours;
arrHoursSource[i] = item.HoursSource;
i++;
}
else
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
//reset counter and array
i = 0;
arrEmployeeId = new string[batchCount];
arrHours = new decimal[batchCount];
arrHoursSource = new string[batchCount];
}
}
//process last array
if (arrEmployeeId.Length > 0)
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
}
}
private static void UpdateDbWithEligibleHours(string[] arrEmployeeId, decimal[] arrHours, string[] arrHoursSource)
{
StringBuilder sbQuery = new StringBuilder();
sbQuery.Append("insert into ELIGIBLE_HOURS ");
sbQuery.Append("(EMP_ID, HOURS_SOURCE, TOT_ELIG_HRS, REPORT_DATE) ");
sbQuery.Append("values ");
sbQuery.Append("(:1, :2, :3, SYSDATE) ");
string connectionString = ConfigurationManager.ConnectionStrings["Server_Connection"].ToString();
using (OracleConnection dbConn = new OracleConnection(connectionString))
{
dbConn.Open();
//create Oracle parameters and pass arrays of data
OracleParameter p_employee_id = new OracleParameter();
p_employee_id.OracleDbType = OracleDbType.Char;
p_employee_id.Value = arrEmployeeId;
OracleParameter p_hoursSource = new OracleParameter();
p_hoursSource.OracleDbType = OracleDbType.Char;
p_hoursSource.Value = arrHoursSource;
OracleParameter p_hours = new OracleParameter();
p_hours.OracleDbType = OracleDbType.Decimal;
p_hours.Value = arrHours;
OracleCommand objCmd = dbConn.CreateCommand();
objCmd.CommandText = sbQuery.ToString();
objCmd.ArrayBindCount = arrEmployeeId.Length;
objCmd.Parameters.Add(p_employee_id);
objCmd.Parameters.Add(p_hoursSource);
objCmd.Parameters.Add(p_hours);
objCmd.ExecuteNonQuery();
}
}

Requested row out of range for doMiniBatchMutation on HRegion

The error occured when hbase client batch data. At first it's ok. Some time later it's wrong! The detailed error is:
: 1 time, org.apache.hadoop.hbase.exceptions.FailedSanityCheckException: Requested row out of range for doMiniBatchMutation on HRegion idcard,bfef6945ac273d83\x00\x00\x00\x00\x00\x17\xCC$,1461584032622.dadb8843fe441dac4a3d4d7669597ef5., startKey='bfef6945ac273d83\x00\x00\x00\x00\x00\x17\xCC$', getEndKey()='', row='9a6ec957205e1d74\x00\x00\x00\x00\x01\x90\x1F\xF5'
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:712)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:662)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2046)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32393)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
The environment is :
hbase hbase-1.1.3
hadoop2.6
hbase-client 1.2.0
The hbase client's code is :
public static void batchPutData(Connection connection, long startNum, long count) throws IOException, ParseException{
//table
Table table = connection.getTable(TableName.valueOf(TABLE_NAME));
//index table
Table index_table = connection.getTable(TableName.valueOf(INDEX_TABLE_NAME));
//random name
RandomChineseName randomChineseName = new RandomChineseName();
//random car
RandomCar randomCar = new RandomCar();
List<Put> puts = new ArrayList<Put>();
List<Put> indexPlateputs = new ArrayList<Put>();
for(long i = 0; i < count; i++){
long index = startNum+i;
Date birthdate = RandomUtils.randomDate();
String birthdateStr = DateUtil.dateToStr(birthdate, "yyyy-MM-dd");
boolean isBoy = i%2==0?true:false;
String name = isBoy?randomChineseName.randomBoyName():randomChineseName.randomGirlName();
String nation = RandomUtils.randomNation();
String plate = randomCar.randomPlate();
byte[] idbuff = Bytes.toBytes(index);
String hashPrefix = MD5Hash.getMD5AsHex(idbuff).substring(0, 16);
//create a put for table
Put put = new Put(Bytes.add(Bytes.toBytes(hashPrefix), idbuff));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("name"), Bytes.toBytes(name));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("sex"), Bytes.toBytes(isBoy?1:0));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("birthdate"), Bytes.toBytes(birthdateStr));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("nation"), Bytes.toBytes(nation));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("plate"), Bytes.toBytes(plate));
puts.add(put);
//create a put for index table
String namehashPrefix = MD5Hash.getMD5AsHex(Bytes.toBytes(name)).substring(0, 16);
byte[] bprf = Bytes.add(Bytes.toBytes(namehashPrefix), Bytes.toBytes(name));
bprf = Bytes.add(bprf, Bytes.toBytes(SPLIT), Bytes.toBytes(birthdateStr));
Put namePut = new Put(Bytes.add(bprf, Bytes.toBytes(SPLIT), Bytes.toBytes(index)));
namePut.addColumn(Bytes.toBytes("index"), Bytes.toBytes("idcard"), Bytes.toBytes(0));
indexPlateputs.add(namePut);
//insert for every ten thousands
if(i%10000 == 0){
table.put(puts);
index_table.put(indexPlateputs);
puts.clear();
indexPlateputs.clear();
}
}
}
It seems like conflicts with HBase version . Change HBase version to 1.1.4 or 1.0.0 or other stable version to have a try.

Am I approaching this correctly?

JavaFX 8 – FXML – TableView – TableColumn
I have the following objects
CourseResult
SimpleStringProperty CourseID
SimpleStringProperty CourseName
SimpleStringProperty Grade
SimpleIntegerProperty Credits
ATableRow
SimpleStringProperty StudentID
SimpleStringProperty StudentName
CourseResult[] AResult // Occurs 1 to 20.
In my program
javafx.collections.ObservableList<ATableRow> TableData = javafx.collections.FXCollections.observableArrayList();
I populate this Observable list from the database and I am able to see all the values perfectly in debugger.
I create the Tableview and add columns.
public void createTableForThisSemester(int thisSemester, int numberOfCourses, javafx.collections.ObservableList<AResultRow> TableRows) {
TableView<AResultRow> thisTable = new TableView<>();
TableColumn<AResultRow, String> tcolRollNo = new TableColumn<>("Roll Number");
tcolRollNo.setEditable(false);
tcolRollNo.setPrefWidth(120);
TableColumn<AResultRow, String> tcolName = new TableColumn<>("Student Name");
tcolName.setEditable(false);
tcolName.setPrefWidth(350);
tcolRollNo.setCellValueFactory(cellData -> cellData.getValue().StudentIDProperty());
tcolName.setCellValueFactory(cellData -> cellData.getValue().StudentNameProperty());
boolean xyz = thisTable.getColumns().addAll(tcolRollNo, tcolName);
// TableColumn[] courseColumn = new TableColumn[numberOfCourses];
for (int courseNo = 0; courseNo < numberOfCourses; courseNo++) {
String colName = getASemesterCourse(thisSemester, courseNo).getCourseID();
TableColumn<AResultRow, String> thisColumn = new TableColumn<>(colName);
thisColumn.setPrefWidth(80);
thisColumn.setStyle("-fx-alignment: CENTER; font-weight:bold;");
thisColumn.setCellValueFactory(cellData -> cellData.getValue().courseGradeProperty(courseNo));
boolean retVal = thisTable.getColumns().addAll(thisColumn);
}
// System.out.println("# of Rows in Table [" + thisSemester + "] = " + TableRows.size());
thisTable.getSelectionModel().setSelectionMode(SelectionMode.SINGLE);
thisTable.setItems(TableRows);
ScrollPane thisScrollPane = new ScrollPane();
thisScrollPane.setFitToWidth(true);
thisScrollPane.setFitToHeight(true);
thisScrollPane.setMinHeight((theDetails.getHeight() - 25));
thisScrollPane.setMaxHeight((theDetails.getHeight() - 25));
thisScrollPane.setMinWidth((theDetails.getWidth() - 25));
thisScrollPane.setHbarPolicy(ScrollPane.ScrollBarPolicy.ALWAYS);
Tab thisTab = tabs.getTabs().get(thisSemester);
thisTab.setContent(thisScrollPane);
thisScrollPane.setContent(thisTable);
}
Columns StudentID and Name are perfectly populated. But the Results are not being populated in
thisColumn.setCellValueFactory(cellData -> cellData.getValue().courseGradeProperty(courseNo));
I get this error in Netbeans for the line shown above. "Local Variables referenced from a lambda expression must be final or effectively final".
Remember, the GRADE[courseNo] field is a String and is populated.
How to show this value in the TABLE.
I have been trying various methods, like storing this value in a temp String... etc. etc.

Dynamically choose which properties to get using Linq

I have an MVC application with a dynamic table on one of the pages, which the users defines how many columns the table has, the columns order and where to get the data from for each field.
I have written some very bad code in order to keep it dynamic and now I would like it to be more efficient.
My problem is that I don't know how to define the columns I should get back into my IEnumerable on runtime. My main issue is that I don't know how many columns I might have.
I have a reference to a class which gets the field's text. I also have a dictionary of each field's order with the exact property It should get the data from.
My code should look something like that:
var docsRes3 = from d in docs
select new[]
{
for (int i=0; i<numOfCols; i++)
{
gen.getFieldText(d, res.FieldSourceDic[i]);
}
};
where:
docs = List from which I would like to get only specific fields
res.FieldSourceDic = Dictionary in which the key is the order of the column and the value is the property
gen.getFieldText = The function which gets the entity and the property and returns the value
Obviously, it doesn't work.
I also tried
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = "d." + res.FieldSourceDic[i] + ".ToString()";
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var docsRes2 = from d in docs
select new[] { fieldsSB.ToString() };
It also didn't work.
The only thing that worked for me so far was:
List<string[]> docsRes = new List<string[]>();
foreach (NewOriginDocumentManagment d in docs)
{
string[] row = new string[numOfCols];
for (int i = 0; i < numOfCols; i++)
{
row[i] = gen.getFieldText(d, res.FieldSourceDic[i]);
}
docsRes.Add(row);
}
Any idea how can I pass the linq the list of fields and it'll cut the needed data out of it efficiently?
Thanks, Hoe I was clear about what I need....
Try following:
var docsRes3 = from d in docs
select (
from k in res.FieldSourceDic.Keys.Take(numOfCols)
select gen.getFieldText(d, res.FieldSourceDic[k]));
I got my answer with some help from the following link:
http://www.codeproject.com/Questions/141367/Dynamic-Columns-from-List-using-LINQ
First I created a string array of all properties:
//Creats a string of all properties as defined in the XML
//Columns order must be started at 0. No skips are allowed
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = res.FieldSourceDic[i];
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var cols = fieldsSB.ToString().Split(',');
//Gets the data for each row dynamically
var docsRes = docs.Select(d => GetProps(d, cols));
than I created the GetProps function, which is using my own function as described in the question:
private static dynamic GetProps(object d, IEnumerable<string> props)
{
if (d == null)
{
return null;
}
DynamicGridGenerator gen = new DynamicGridGenerator();
List<string> res = new List<string>();
foreach (var p in props)
{
res.Add(gen.getFieldText(d, p));
}
return res;
}

convert system.data.linq.binary to byte[]

I am storing bytes in a database table. When I retrieve it with Linq 2 sql I get the return type in system.data.linq.Binary.
I am not able to convert the system.data.linq.binary to byte array(byte[]).
How do I convert it?
///my datacontext
var db = new db();
//key is an value from user
var img = from i in db.images
where i.id == key
select i.data;
the i.data is in linq.binary I want it to be in byte[].
I tried with (byte[])img but it did not work.
Have you tried calling ToArray() on i.data?
var img = from i in db.images
where i.id == key
select i.data.ToArray();
System.Data.Linq.Binary has a ToArray method just for that purpose.
Probably its too late by now but may help others :)
//testTable PK:ID, binaryData :binary(32)
public void insertDummyData()
{
DBML.testTable v = new DBML.testTable ();
v.ID = 1;
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
v.binaryData = new System.Data.Linq.Binary(encoding.GetBytes("11111111000000001111111100000000"));
db.testTable.InsertOnSubmit(v);
db.SubmitChanges();
}
Or else, Click on the Binary field from .dbml file, open properties and then change the field type from Binary to byte[] as found here
(byte[])linqBinaryField.ToArray()
You can try MemoryStream. I wrote a function in my project to convert an image to byte array like the following:
public static byte[] Image2ByteArr(string filename)
{
Bitmap bm = new Bitmap(getPath(filename));
MemoryStream ms = new MemoryStream();
bm.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
return ms.ToArray();
}
Hope that helpful for you!

Resources