Processing CSV load and manipulate - processing

I have the following code:
import processing.pdf.*;
Table table;
void setup() {
size(2000,2000, PDF, "my.pdf");
background(255);
table = loadTable("my.csv", "header");
println(table.getRowCount() + " total rows in table");
for (TableRow row : table.rows()) {
String ind = row.getString("Individual");
float value = row.getFloat("Value");
if (ind == "An") {
stroke(0);
noFill();
ellipse(1000,1000,value,value);
println("Dracula");
}
println("Finished.");
exit();
}
One csv row looks like this:
Individual, Value
An, 34.56
If I run the above code I get
200 total rows in table
Finished.
Why does it not extract all the An data and plot it (like I expected it to do) ?
Thanks.

Related

how to find average of each column of a datatable using c#

I have a .csv file containing names, roll, subjects correspondingly.I parsed it into a datatable and I calculated the highest mark of each subject. All i want to calculate is the average of each Subject. Can anyone help me with this !!!!!
This was my output.
Highest mark for ComputerScience:
Name : Manoj
Roll Number : 1212334556
Mark : 94
Highest Mark for Biology:
Name : Sandeep
Roll Number : 1223456477
Mark : 90
Highest Mark for Commerce:
Name : BarathRam
Roll Number : 1212345664
Mark : 97
And csv file contains Names,Rollno, Computer, Biology, Commerce.
Now all i need to get is the average of each subject
My code:
static DataTable table;
static void Main(string[] args)
{
StreamReader r = new StreamReader(#"C:\Users\GOPINATH\Desktop\stud1.csv");
string line = r.ReadLine(); //reads first line - column header
string[] part = line.Split(','); //splits the line by comma
createDataTable(part);
//copy from CSV to DataTable<String,String,int,int,int>
while ((line = r.ReadLine()) != null)
{
try
{
part = line.Split(',');
table.Rows.Add(part[0], part[1], Convert.ToInt32(part[2]), Convert.ToInt32(part[3]), Convert.ToInt32(part[4]));
}
catch(Exception e)
{
Console.WriteLine(e.Message);
}
}
r.Close();
int mark1_index = 0, mark2_index = 0, mark3_index = 0; //initailize index value 0 for highest marks
//finding the index of the highest mark for each subject
for(int i=0 ; i<table.Rows.Count ; i++)
{
if (Convert.ToInt32(table.Rows[i][2]) > Convert.ToInt32(table.Rows[mark1_index][2])) //subject1
{
mark1_index = i;
}
if (Convert.ToInt32(table.Rows[i][3]) > Convert.ToInt32(table.Rows[mark2_index][3])) //subject2
{
mark2_index = i;
}
if (Convert.ToInt32(table.Rows[i][4]) > Convert.ToInt32(table.Rows[mark3_index][4])) //subject3
{
mark3_index = i;
}
}
printmark(table,mark1_index, 2);
printmark(table,mark2_index, 3);
printmark(table,mark3_index, 4);
Console.Read();
}
public static void createDataTable(string[] columnName)
{
//create DataTable<String,String,int,int,int>
table = new DataTable();
table.Columns.Add(columnName[0], typeof(String));
table.Columns.Add(columnName[1], typeof(String));
table.Columns.Add(columnName[2], typeof(int));
table.Columns.Add(columnName[3], typeof(int));
table.Columns.Add(columnName[4], typeof(int));
}
public static void printmark(DataTable t, int rowIndex, int columnIndex)
{
Console.WriteLine("Highest mark for " + t.Columns[columnIndex].ColumnName + ":");
Console.WriteLine("\tName: " + (string)t.Rows[rowIndex][0]);
Console.WriteLine("\tRole Number: " + (string)t.Rows[rowIndex][1]);
Console.WriteLine("\tMark: " + (int)t.Rows[rowIndex][columnIndex]);
}
}
}
You could use Linq and do this.
DataTable t;
var average = t.AsEnumerable().Average(x=> x.Field<int>("columnname"));
var result=table.AsEnumerable()
.GroupBy(x=>x.Field<string>("Subject"))
.Select(x=>new
{
Subject=x.Key,
Average=x.Average(x=> x.Field<int>("Mark"));
}).ToList();
In order to calculate the average mark by Subject, first you need to groupby Subject then calculate the average for each group.

Using Bulk Insert dramatically slows down processing?

I'm fairly new to Oracle but I have used the Bulk insert on a couple other applications. Most seem to go faster using it but I've had a couple where it slows down the application. This is my second one where it slowed it down significantly so I'm wondering if I have something setup incorrectly or maybe I need to set it up differently. In this case I have a console application that processed ~1,900 records. Inserting them individually it will take ~2.5 hours and when I switched over to the Bulk insert it jumped to 5 hours.
The article I based this off of is http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
Here is what I'm doing, I'm retrieving some records from the DB, do calculations, and then write the results out to a text file. After the calculations are done I have to write those results back to a different table in the DB so we can look back at what those calculations later on if needed.
When I make the calculation I add the results to a List. Once I'm done writing out the file I look at that List and if there are any records I do the bulk insert.
With the bulk insert I have a setting in the App.config to set the number of records I want to insert. In this case I'm using 250 records. I assumed it would be better to limit my in memory arrays to say 250 records versus the 1,900. I loop through that list to the count in the App.config and create an array for each column. Those arrays are then passed as parameters to Oracle.
App.config
<add key="UpdateBatchCount" value="250" />
Class
class EligibleHours
{
public string EmployeeID { get; set; }
public decimal Hours { get; set; }
public string HoursSource { get; set; }
}
Data Manager
public static void SaveEligibleHours(List<EligibleHours> listHours)
{
//set the number of records to update batch on from config file Subtract one because of 0 based index
int batchCount = int.Parse(ConfigurationManager.AppSettings["UpdateBatchCount"]);
//create the arrays to add values to
string[] arrEmployeeId = new string[batchCount];
decimal[] arrHours = new decimal[batchCount];
string[] arrHoursSource = new string[batchCount];
int i = 0;
foreach (var item in listHours)
{
//Create an array of employee numbers that will be used for a batch update.
//update after every X amount of records, update. Add 1 to i to compensated for 0 based indexing.
if (i + 1 <= batchCount)
{
arrEmployeeId[i] = item.EmployeeID;
arrHours[i] = item.Hours;
arrHoursSource[i] = item.HoursSource;
i++;
}
else
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
//reset counter and array
i = 0;
arrEmployeeId = new string[batchCount];
arrHours = new decimal[batchCount];
arrHoursSource = new string[batchCount];
}
}
//process last array
if (arrEmployeeId.Length > 0)
{
UpdateDbWithEligibleHours(arrEmployeeId, arrHours, arrHoursSource);
}
}
private static void UpdateDbWithEligibleHours(string[] arrEmployeeId, decimal[] arrHours, string[] arrHoursSource)
{
StringBuilder sbQuery = new StringBuilder();
sbQuery.Append("insert into ELIGIBLE_HOURS ");
sbQuery.Append("(EMP_ID, HOURS_SOURCE, TOT_ELIG_HRS, REPORT_DATE) ");
sbQuery.Append("values ");
sbQuery.Append("(:1, :2, :3, SYSDATE) ");
string connectionString = ConfigurationManager.ConnectionStrings["Server_Connection"].ToString();
using (OracleConnection dbConn = new OracleConnection(connectionString))
{
dbConn.Open();
//create Oracle parameters and pass arrays of data
OracleParameter p_employee_id = new OracleParameter();
p_employee_id.OracleDbType = OracleDbType.Char;
p_employee_id.Value = arrEmployeeId;
OracleParameter p_hoursSource = new OracleParameter();
p_hoursSource.OracleDbType = OracleDbType.Char;
p_hoursSource.Value = arrHoursSource;
OracleParameter p_hours = new OracleParameter();
p_hours.OracleDbType = OracleDbType.Decimal;
p_hours.Value = arrHours;
OracleCommand objCmd = dbConn.CreateCommand();
objCmd.CommandText = sbQuery.ToString();
objCmd.ArrayBindCount = arrEmployeeId.Length;
objCmd.Parameters.Add(p_employee_id);
objCmd.Parameters.Add(p_hoursSource);
objCmd.Parameters.Add(p_hours);
objCmd.ExecuteNonQuery();
}
}

How to insert Double[] into HBase?

For Example,
if i want to insert a double[] like,
Double[] dArr={10.23,25.1,30.5,45.3};
into HBase table.
Could you plzz tell me how to insert it into hbase??
You can store anything you want, you just have to serialize it to a byte[]:
Double[] dArr={10.23,25.1,30.5,45.3};
byte[] value = new byte[0];
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
for (Double d:dArr) {
value = Bytes.add(value, Bytes.toBytes(d));
}
Put put = new Put( rowKey );
put.add(family, column, Bytes.toBytes(value));
...
You'll have to unserialize the data when you read the value (convert from byte[] to Double[])
Another thing you can do is store each Double (as byte[]) in its own column (d0 to dX)
Double[] dArr={10.23,25.1,30.5,45.3};
Put put = new Put( rowKey );
byte[] family = "f".getBytes();
byte[] column = "d".getBytes();
int i=0;
for (Double d:dArr) {
put.add( family, Bytes.add(column, Bytes.toBytes(i)), Bytes.toBytes(d));
i++;
}
...

Retrieve records by page size

I have a big table if I use a normal query it has a timeout exception. So I want to select top 1000 then output it, the next step is to retrieve from 1001 to 2000 and log it and so on.
I am not sure how to add a parameter in my query.
int pageNumer = 0;
var query = DBContext.MyTable.Where(c=>c.FacilityID == facilityID)
.OrderBy(c=>c.FilePath)
.Skip(pageNumer*1000)
.Take(1000);
foreach(var x in query)
{
// Console.WriteLine(x.Name);
}
// I want pageNumber is incremented until it goes to the bottom of the table.
// I don't know how many records in the table.
Try this out:
int pageNumber = 0;
bool hasHitEnd = false;
while (!hasHitEnd)
{
var query = DBContext.MyTable.Where(c=>c.FacilityID == facilityID)
.OrderBy(c=>c.FilePath)
.Skip(pageNumber*1000)
.Take(1000);
foreach(var x in query)
{
// Do something
}
if (query.Count < 1000)
{
hasHitEnd = true;
}
pageNumber++;
}

How to create PowePoint presentation in OpenXML format with Apache Poi and XSLF?

If I go to Apache POI XSLF there should be samples for both OLE2 and OpenXML specs, but there are only the OLE2 based Horrible Slide Layout Format examples.
Could please anybody help me out with XML Slide Layout Format example ? The API is quite different.
It is not like with spreadsheet where one just change the implementation of HSSFWorkbook to XSSFWorkbook.
How would this look like with XSLF implementation ? POI apparently can't create a document from scratch, so we need an existing empty dummy document, right ?
//table data
String[][] data = {
{"INPUT FILE", "NUMBER OF RECORDS"},
{"Item File", "11,559"},
{"Vendor File", "300"},
{"Purchase History File", "10,000"},
{"Total # of requisitions", "10,200,038"}
};
SlideShow ppt = new SlideShow();
Slide slide = ppt.createSlide();
//create a table of 5 rows and 2 columns
Table table = new Table(5, 2);
for (int i = 0; i < data.length; i++) {
for (int j = 0; j < data[i].length; j++) {
TableCell cell = table.getCell(i, j);
cell.setText(data[i][j]);
RichTextRun rt = cell.getTextRun().getRichTextRuns()[0];
rt.setFontName("Arial");
rt.setFontSize(10);
cell.setVerticalAlignment(TextBox.AnchorMiddle);
cell.setHorizontalAlignment(TextBox.AlignCenter);
}
}
//set table borders
Line border = table.createBorder();
border.setLineColor(Color.black);
border.setLineWidth(1.0);
table.setAllBorders(border);
//set width of the 1st column
table.setColumnWidth(0, 300);
//set width of the 2nd column
table.setColumnWidth(1, 150);
slide.addShape(table);
table.moveTo(100, 100);
FileOutputStream out = new FileOutputStream(file);
ppt.write(out);
out.close();
It is not implemented yet, org.apache.poi version 3.8-beta3, when it will be implemented is very unknown to me.
XMLSlideShow.java
public MasterSheet createMasterSheet() throws IOException {
throw new IllegalStateException("Not implemented yet!");
}
public Slide createSlide() throws IOException {
throw new IllegalStateException("Not implemented yet!");
}

Resources