How to insert Double[] into HBase?

How to insert Double[] into HBase? - hadoop

For Example,
if i want to insert a double[] like,
Double[] dArr={10.23,25.1,30.5,45.3};
into HBase table.
Could you plzz tell me how to insert it into hbase??

You can store anything you want, you just have to serialize it to a byte[]:
Double[] dArr={10.23,25.1,30.5,45.3};
byte[] value = new byte[0];
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
for (Double d:dArr) {
value = Bytes.add(value, Bytes.toBytes(d));
}
Put put = new Put( rowKey );
put.add(family, column, Bytes.toBytes(value));
...
You'll have to unserialize the data when you read the value (convert from byte[] to Double[])
Another thing you can do is store each Double (as byte[]) in its own column (d0 to dX)
Double[] dArr={10.23,25.1,30.5,45.3};
Put put = new Put( rowKey );
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
int i=0;
for (Double d:dArr) {
put.add( family, Bytes.add(column, Bytes.toBytes(i)), Bytes.toBytes(d));
i++;
}
...

Related

Using Bulk Insert dramatically slows down processing?

I'm fairly new to Oracle but I have used the Bulk insert on a couple other applications. Most seem to go faster using it but I've had a couple where it slows down the application. This is my second one where it slowed it down significantly so I'm wondering if I have something setup incorrectly or maybe I need to set it up differently. In this case I have a console application that processed ~1,900 records. Inserting them individually it will take ~2.5 hours and when I switched over to the Bulk insert it jumped to 5 hours.
The article I based this off of is http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
Here is what I'm doing, I'm retrieving some records from the DB, do calculations, and then write the results out to a text file. After the calculations are done I have to write those results back to a different table in the DB so we can look back at what those calculations later on if needed.
When I make the calculation I add the results to a List. Once I'm done writing out the file I look at that List and if there are any records I do the bulk insert.
With the bulk insert I have a setting in the App.config to set the number of records I want to insert. In this case I'm using 250 records. I assumed it would be better to limit my in memory arrays to say 250 records versus the 1,900. I loop through that list to the count in the App.config and create an array for each column. Those arrays are then passed as parameters to Oracle.
App.config
<add key="UpdateBatchCount" value="250" />
Class
class EligibleHours
{
public string EmployeeID { get; set; }
public decimal Hours { get; set; }
public string HoursSource { get; set; }
}
Data Manager
public static void SaveEligibleHours(List<EligibleHours> listHours)
{
//set the number of records to update batch on from config file Subtract one because of 0 based index
int batchCount = int.Parse(ConfigurationManager.AppSettings["UpdateBatchCount"]);
//create the arrays to add values to
string[] arrEmployeeId = new string[batchCount];
decimal[] arrHours = new decimal[batchCount];
string[] arrHoursSource = new string[batchCount];
int i = 0;
foreach (var item in listHours)
{
//Create an array of employee numbers that will be used for a batch update.
//update after every X amount of records, update. Add 1 to i to compensated for 0 based indexing.
if (i + 1 <= batchCount)
{
arrEmployeeId[i] = item.EmployeeID;
arrHours[i] = item.Hours;
arrHoursSource[i] = item.HoursSource;
i++;
}
else
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
//reset counter and array
i = 0;
arrEmployeeId = new string[batchCount];
arrHours = new decimal[batchCount];
arrHoursSource = new string[batchCount];
}
}
//process last array
if (arrEmployeeId.Length > 0)
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
}
}
private static void UpdateDbWithEligibleHours(string[] arrEmployeeId, decimal[] arrHours, string[] arrHoursSource)
{
StringBuilder sbQuery = new StringBuilder();
sbQuery.Append("insert into ELIGIBLE_HOURS ");
sbQuery.Append("(EMP_ID, HOURS_SOURCE, TOT_ELIG_HRS, REPORT_DATE) ");
sbQuery.Append("values ");
sbQuery.Append("(:1, :2, :3, SYSDATE) ");
string connectionString = ConfigurationManager.ConnectionStrings["Server_Connection"].ToString();
using (OracleConnection dbConn = new OracleConnection(connectionString))
{
dbConn.Open();
//create Oracle parameters and pass arrays of data
OracleParameter p_employee_id = new OracleParameter();
p_employee_id.OracleDbType = OracleDbType.Char;
p_employee_id.Value = arrEmployeeId;
OracleParameter p_hoursSource = new OracleParameter();
p_hoursSource.OracleDbType = OracleDbType.Char;
p_hoursSource.Value = arrHoursSource;
OracleParameter p_hours = new OracleParameter();
p_hours.OracleDbType = OracleDbType.Decimal;
p_hours.Value = arrHours;
OracleCommand objCmd = dbConn.CreateCommand();
objCmd.CommandText = sbQuery.ToString();
objCmd.ArrayBindCount = arrEmployeeId.Length;
objCmd.Parameters.Add(p_employee_id);
objCmd.Parameters.Add(p_hoursSource);
objCmd.Parameters.Add(p_hours);
objCmd.ExecuteNonQuery();
}
}

Requested row out of range for doMiniBatchMutation on HRegion

The error occured when hbase client batch data. At first it's ok. Some time later it's wrong! The detailed error is:
: 1 time, org.apache.hadoop.hbase.exceptions.FailedSanityCheckException: Requested row out of range for doMiniBatchMutation on HRegion idcard,bfef6945ac273d83\x00\x00\x00\x00\x00\x17\xCC$,1461584032622.dadb8843fe441dac4a3d4d7669597ef5., startKey='bfef6945ac273d83\x00\x00\x00\x00\x00\x17\xCC$', getEndKey()='', row='9a6ec957205e1d74\x00\x00\x00\x00\x01\x90\x1F\xF5'
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:712)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:662)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2046)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32393)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
The environment is :
hbase hbase-1.1.3
hadoop2.6
hbase-client 1.2.0
The hbase client's code is :
public static void batchPutData(Connection connection, long startNum, long count) throws IOException, ParseException{
//table
Table table = connection.getTable(TableName.valueOf(TABLE_NAME));
//index table
Table index_table = connection.getTable(TableName.valueOf(INDEX_TABLE_NAME));
//random name
RandomChineseName randomChineseName = new RandomChineseName();
//random car
RandomCar randomCar = new RandomCar();
List<Put> puts = new ArrayList<Put>();
List<Put> indexPlateputs = new ArrayList<Put>();
for(long i = 0; i < count; i++){
long index = startNum+i;
Date birthdate = RandomUtils.randomDate();
String birthdateStr = DateUtil.dateToStr(birthdate, "yyyy-MM-dd");
boolean isBoy = i%2==0?true:false;
String name = isBoy?randomChineseName.randomBoyName():randomChineseName.randomGirlName();
String nation = RandomUtils.randomNation();
String plate = randomCar.randomPlate();
byte[] idbuff = Bytes.toBytes(index);
String hashPrefix = MD5Hash.getMD5AsHex(idbuff).substring(0, 16);
//create a put for table
Put put = new Put(Bytes.add(Bytes.toBytes(hashPrefix), idbuff));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("name"), Bytes.toBytes(name));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("sex"), Bytes.toBytes(isBoy?1:0));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("birthdate"), Bytes.toBytes(birthdateStr));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("nation"), Bytes.toBytes(nation));
put.addColumn(Bytes.toBytes("idcard"), Bytes.toBytes("plate"), Bytes.toBytes(plate));
puts.add(put);
//create a put for index table
String namehashPrefix = MD5Hash.getMD5AsHex(Bytes.toBytes(name)).substring(0, 16);
byte[] bprf = Bytes.add(Bytes.toBytes(namehashPrefix), Bytes.toBytes(name));
bprf = Bytes.add(bprf, Bytes.toBytes(SPLIT), Bytes.toBytes(birthdateStr));
Put namePut = new Put(Bytes.add(bprf, Bytes.toBytes(SPLIT), Bytes.toBytes(index)));
namePut.addColumn(Bytes.toBytes("index"), Bytes.toBytes("idcard"), Bytes.toBytes(0));
indexPlateputs.add(namePut);
//insert for every ten thousands
if(i%10000 == 0){
table.put(puts);
index_table.put(indexPlateputs);
puts.clear();
indexPlateputs.clear();
}
}
}

It seems like conflicts with HBase version . Change HBase version to 1.1.4 or 1.0.0 or other stable version to have a try.

Am I approaching this correctly?

JavaFX 8 – FXML – TableView – TableColumn
I have the following objects
CourseResult
SimpleStringProperty CourseID
SimpleStringProperty CourseName
SimpleStringProperty Grade
SimpleIntegerProperty Credits
ATableRow
SimpleStringProperty StudentID
SimpleStringProperty StudentName
CourseResult[] AResult // Occurs 1 to 20.
In my program
javafx.collections.ObservableList<ATableRow> TableData = javafx.collections.FXCollections.observableArrayList();
I populate this Observable list from the database and I am able to see all the values perfectly in debugger.
I create the Tableview and add columns.
public void createTableForThisSemester(int thisSemester, int numberOfCourses, javafx.collections.ObservableList<AResultRow> TableRows) {
TableView<AResultRow> thisTable = new TableView<>();
TableColumn<AResultRow, String> tcolRollNo = new TableColumn<>("Roll Number");
tcolRollNo.setEditable(false);
tcolRollNo.setPrefWidth(120);
TableColumn<AResultRow, String> tcolName = new TableColumn<>("Student Name");
tcolName.setEditable(false);
tcolName.setPrefWidth(350);
tcolRollNo.setCellValueFactory(cellData -> cellData.getValue().StudentIDProperty());
tcolName.setCellValueFactory(cellData -> cellData.getValue().StudentNameProperty());
boolean xyz = thisTable.getColumns().addAll(tcolRollNo, tcolName);
// TableColumn[] courseColumn = new TableColumn[numberOfCourses];
for (int courseNo = 0; courseNo < numberOfCourses; courseNo++) {
String colName = getASemesterCourse(thisSemester, courseNo).getCourseID();
TableColumn<AResultRow, String> thisColumn = new TableColumn<>(colName);
thisColumn.setPrefWidth(80);
thisColumn.setStyle("-fx-alignment: CENTER; font-weight:bold;");
thisColumn.setCellValueFactory(cellData -> cellData.getValue().courseGradeProperty(courseNo));
boolean retVal = thisTable.getColumns().addAll(thisColumn);
}
// System.out.println("# of Rows in Table [" + thisSemester + "] = " + TableRows.size());
thisTable.getSelectionModel().setSelectionMode(SelectionMode.SINGLE);
thisTable.setItems(TableRows);
ScrollPane thisScrollPane = new ScrollPane();
thisScrollPane.setFitToWidth(true);
thisScrollPane.setFitToHeight(true);
thisScrollPane.setMinHeight((theDetails.getHeight() - 25));
thisScrollPane.setMaxHeight((theDetails.getHeight() - 25));
thisScrollPane.setMinWidth((theDetails.getWidth() - 25));
thisScrollPane.setHbarPolicy(ScrollPane.ScrollBarPolicy.ALWAYS);
Tab thisTab = tabs.getTabs().get(thisSemester);
thisTab.setContent(thisScrollPane);
thisScrollPane.setContent(thisTable);
}
Columns StudentID and Name are perfectly populated. But the Results are not being populated in
thisColumn.setCellValueFactory(cellData -> cellData.getValue().courseGradeProperty(courseNo));
I get this error in Netbeans for the line shown above. "Local Variables referenced from a lambda expression must be final or effectively final".
Remember, the GRADE[courseNo] field is a String and is populated.
How to show this value in the TABLE.
I have been trying various methods, like storing this value in a temp String... etc. etc.

Dynamically choose which properties to get using Linq

I have an MVC application with a dynamic table on one of the pages, which the users defines how many columns the table has, the columns order and where to get the data from for each field.
I have written some very bad code in order to keep it dynamic and now I would like it to be more efficient.
My problem is that I don't know how to define the columns I should get back into my IEnumerable on runtime. My main issue is that I don't know how many columns I might have.
I have a reference to a class which gets the field's text. I also have a dictionary of each field's order with the exact property It should get the data from.
My code should look something like that:
var docsRes3 = from d in docs
select new[]
{
for (int i=0; i<numOfCols; i++)
{
gen.getFieldText(d, res.FieldSourceDic[i]);
}
};
where:
docs = List from which I would like to get only specific fields
res.FieldSourceDic = Dictionary in which the key is the order of the column and the value is the property
gen.getFieldText = The function which gets the entity and the property and returns the value
Obviously, it doesn't work.
I also tried
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = "d." + res.FieldSourceDic[i] + ".ToString()";
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var docsRes2 = from d in docs
select new[] { fieldsSB.ToString() };
It also didn't work.
The only thing that worked for me so far was:
List<string[]> docsRes = new List<string[]>();
foreach (NewOriginDocumentManagment d in docs)
{
string[] row = new string[numOfCols];
for (int i = 0; i < numOfCols; i++)
{
row[i] = gen.getFieldText(d, res.FieldSourceDic[i]);
}
docsRes.Add(row);
}
Any idea how can I pass the linq the list of fields and it'll cut the needed data out of it efficiently?
Thanks, Hoe I was clear about what I need....

Try following:
var docsRes3 = from d in docs
select (
from k in res.FieldSourceDic.Keys.Take(numOfCols)
select gen.getFieldText(d, res.FieldSourceDic[k]));

I got my answer with some help from the following link:
http://www.codeproject.com/Questions/141367/Dynamic-Columns-from-List-using-LINQ
First I created a string array of all properties:
//Creats a string of all properties as defined in the XML
//Columns order must be started at 0. No skips are allowed
StringBuilder fieldsSB = new StringBuilder();
for (int i = 0; i < numOfCols; i++)
{
string field = res.FieldSourceDic[i];
if (!string.IsNullOrEmpty(fieldsSB.ToString()))
{
fieldsSB.Append(",");
}
fieldsSB.Append(field);
}
var cols = fieldsSB.ToString().Split(',');
//Gets the data for each row dynamically
var docsRes = docs.Select(d => GetProps(d, cols));
than I created the GetProps function, which is using my own function as described in the question:
private static dynamic GetProps(object d, IEnumerable<string> props)
{
if (d == null)
{
return null;
}
DynamicGridGenerator gen = new DynamicGridGenerator();
List<string> res = new List<string>();
foreach (var p in props)
{
res.Add(gen.getFieldText(d, p));
}
return res;
}

convert system.data.linq.binary to byte[]

I am storing bytes in a database table. When I retrieve it with Linq 2 sql I get the return type in system.data.linq.Binary.
I am not able to convert the system.data.linq.binary to byte array(byte[]).
How do I convert it?
///my datacontext
var db = new db();
//key is an value from user
var img = from i in db.images
where i.id == key
select i.data;
the i.data is in linq.binary I want it to be in byte[].
I tried with (byte[])img but it did not work.

Have you tried calling ToArray() on i.data?
var img = from i in db.images
where i.id == key
select i.data.ToArray();
System.Data.Linq.Binary has a ToArray method just for that purpose.

Probably its too late by now but may help others :)
//testTable PK:ID, binaryData :binary(32)
public void insertDummyData()
{
DBML.testTable v = new DBML.testTable ();
v.ID = 1;
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
v.binaryData = new System.Data.Linq.Binary(encoding.GetBytes("11111111000000001111111100000000"));
db.testTable.InsertOnSubmit(v);
db.SubmitChanges();
}
Or else, Click on the Binary field from .dbml file, open properties and then change the field type from Binary to byte[] as found here

(byte[])linqBinaryField.ToArray()

You can try MemoryStream. I wrote a function in my project to convert an image to byte array like the following:
public static byte[] Image2ByteArr(string filename)
{
Bitmap bm = new Bitmap(getPath(filename));
MemoryStream ms = new MemoryStream();
bm.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
return ms.ToArray();
}
Hope that helpful for you!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to insert Double[] into HBase? - hadoop

For Example, if i want to insert a double[] like, Double[] dArr={10.23,25.1,30.5,45.3}; into HBase table. Could you plzz tell me how to insert it into hbase??

Related

Using Bulk Insert dramatically slows down processing?

Requested row out of range for doMiniBatchMutation on HRegion

Am I approaching this correctly?

Dynamically choose which properties to get using Linq

convert system.data.linq.binary to byte[]

Categories

Resources