Processing: How can I find the number of times two fields are equal in a CSV file? - processing

I'm learning Processing for the first time and I've been tasked to deal with data but it's been terribly confusing for me.
For every line of a CSV file (apart from the header), I want to compare two specific columns of each. i.e. ListA vs ListB
For example, with the data below:
ListA,ListB
Male,Yes
Male,No
Female,Yes
Male,Yes
And for example, I want to check for all instances that a value in ListA is "Male" AND that the corresponding value in ListB is "Yes". In this scenario, I should get the value "2" for the two rows this is true.
How would I do that?
For now, I have a 2D String array of the data in the CSV file. From that I managed to assign specific columns as ListA and ListB. I tried using sort but it would only sort one list and not both at the same time.
Current relevant code:
for (int i=1; i<lines.length; i++) {
listA[i-1] = csv[i][int(whichA)];
listB[i-1] = csv[i][int(whichB)];
}
lA = Arrays.asList(listA);
lB = Arrays.asList(listB);
Not sure if this code really helps makes things clearer though. :/
Any help would be appreciated. Thank you.

So something like this should do what you need it to. Pseudocode:
int numRows = 0;
for (int i = 0; i < length; ++i) {
if (array1[i] equals "Male" AND array2[i] equals "Yes") {
++numRows;
//add to new collection here if you need the data
}
}

Related

Slow loop over data table collection

I am updating values in an Excel workbook using values from a MySQL database. There are just eleven rows in the WorkbookMap list, and six DataTables in the RptValueSet DataSet. I've proven that the problem is in this loop, not in the communication with the database. Getting the results is fast, but writing them to the workbook is slow. The code below works fine when the DataTables in the DataSet are small - such as a 3-column, 7-row results set, and I receive the updated Excel workbook almost instantly. However, the loop slows down noticeably when the results set increases; a 3-column, 50-row DataTable results in a 7 - 10 second delay in returning the updated Excel workbook. I'm not sure I really need to put the DataTables into a collection, but that's the only way I could figure out how to iterate over them. Any tips on optimizing this loop would be much appreciated!
// Create a list to contain the destination for the data in the workbook
List<WorkbookMap> wbMap = new List<WorkbookMap>();
// Create a new data set to contain results from database
DataSet RptValuesSet = new DataSet();
// RptValuesSet populated from database here....
// Create a collection so we can loop thru the dataset
DataTableCollection RptValuesColl = RptValuesSet.Tables;
for (int i = 0; i < RptValuesColl.Count; i++)
{
DataTable tbl = RptValuesColl[i];
// Find the correct entry in the workbook map
for (int j = 0; j < wbMap.Count; j++)
{
if (wbMap[j].SPCall == tbl.TableName)
{
// Write the results to the correct location in the workbook
MovingColumnRef = wbMap[j].StartColumn;
for (int c = 1; c < tbl.Columns.Count; c++)
{
row = wbMap[j].StartRow; // start at the top row for each new column
for (int r = 0; r < tbl.Rows.Count; r++)
{
// Write the database value to the workbook given the sheetName and cell address
UpdateValue(wbMap[j].SheetName, MovingColumnRef + row, tbl.Rows[r][c].ToString(), 0, wbMap[j].String);
row++;
}
MovingColumnRef = IncrementColRef(MovingColumnRef);
}
}
}
}
Without delving deeply into your code. I noticed you said that you believe the slowness is coming from writing to the sheet. Try putting the data into an Array first before updating the workbook. E.g. you would write to the sheet like this. anchorRangeName is the name of a range which would be only one cell in the workbook.
private void WriteResultToRange(Excel.Workbook wb, string anchorRangeName, object[,] resultArray)
{
Excel.Range resultRange = GetRange(anchorRangeName, wb).get_Resize(resultArray.GetLength(0), resultArray.GetLength(1));
resultRange.Value2 = resultArray;
}
You'll still need to get the data from the database into an array.

LinqToExcel - Need to start at a specific row

I'm using the LinqToExcel library. Working great so far, except that I need to start the query at a specific row. This is because the excel spreadsheet from the client uses some images and "header" information at the top of the excel file before the data actually starts.
The data itself will be simple to read and is fairly generic, I just need to know how to tell the ExcelQueryFactory to start at a specific row.
I am aware of the WorksheetRange<Company>("B3", "G10") option, but I don't want to specify an ending row, just where to start reading the file.
Using the latest v. of LinqToExcel with C#
I just tried this code and it seemed to work just fine:
var book = new LinqToExcel.ExcelQueryFactory(#"E:\Temporary\Book1.xlsx");
var query =
from row in book.WorksheetRange("A4", "B16384")
select new
{
Name = row["Name"].Cast<string>(),
Age = row["Age"].Cast<int>(),
};
I only got back the rows with data.
I suppose that you already solved this, but maybe for others - looks like you can use
var excel = new ExcelQueryFactory(path);
var allRows = excel.WorksheetNoHeader();
//start from 3rd row (zero-based indexing), length = allRows.Count() or computed range of rows you want
for (int i = 2; i < length; i++)
{
RowNoHeader row = allRows.ElementAtOrDefault(i);
//process the row - access columns as you want - also zero-based indexing
}
Not as simple as specifying some Range("B3", ...), but also the way.
Hope this helps at least somebody ;)
I had tried this, works fine for my scenario.
//get the sheets info
var faceWrksheet = excel.Worksheet(facemechSheetName);
// get the total rows count.
int _faceMechRows = faceWrksheet.Count();
// append with End Range.
var faceMechResult = excel.WorksheetRange<ExcelFaceMech>("A5", "AS" + _faceMechRows.ToString(), SheetName).
Where(i => i.WorkOrder != null).Select(x => x).ToList();
Have you tried WorksheetRange<Company>("B3", "G")
Unforunatly, at this moment and iteration in the LinqToExcel framework, there does not appear to be any way to do this.
To get around this we are requiring the client to have the data to be uploaded in it's own "sheet" within the excel document. The header row at the first row and the data under it. If they want any "meta data" they will need to include this in another sheet. Below is an example from the LinqToExcel documentation on how to query off a specific sheet.
var excel = new ExcelQueryFactory("excelFileName");
var oldCompanies = from c in repo.Worksheet<Company>("US Companies") //worksheet name = 'US Companies'
where c.LaunchDate < new DateTime(1900, 0, 0)
select c;

How can I get the "actual" count of element in a IEnumerable?

If I wrote :
for (int i = 0; i < Strutture.Count(); i++)
{
}
and Strutture is an IEnumerable with 200 elements, IIS crash. That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable.
So, how can I get the "current" number of elements? I need a list?
"That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable."
Without doing such, how is it going to know how many elements there are?
For example:
Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
Without executing the LINQ, how could you know how many elements there are?
If you want to know how many elements there are in the source (e.g. Enumerable.Range) then I suggest you use a reference to that source and query it directly. E.g.
var numbers = Enumerable.Range(0,1000);
numbers.Count();
Also keep in mind some data sources don't really have a concept of 'Count' or if they do it involves going through every single item and counting them.
Lastly, if you're using .Count() repetitively [and you don't expect the value to actually change] it can be a good idea to cache:
var count = numbers.Count();
for (int i =0; i<count; i++) // Do Something
Supplemental:
"At first Count(), LINQ queries are executes. Than, for the next, it just "check" the value :) Not "execute the LINQ query again..." :)" - Markzzz
Then why don't we do that?
var query = Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
var result = query.ToArray() //Gets and stores the result!
result.Length;
:)
"But when I do the first "count", it should store (after the LINQ queries) the new IEnumerable (the state is changed). If I do again .Count(), why LINQ need to execute again ALL queries." - Markzzz
Because you're creating a query that gets compiled down into X,Y,Z. You're running the same query twice however the result may vary.
For example, check this out:
static void Main(string[] args)
{
var dataSource = Enumerable.Range(0, 100).ToList();
var query = dataSource.Where(i => i % 2 == 0);
//Run the query once and return the count:
Console.WriteLine(query.Count()); //50
//Now lets modify the datasource - remembering this could be a table in a db etc.
dataSource.AddRange(Enumerable.Range(100, 100));
//Run the query again and return the count:
Console.WriteLine(query.Count()); //100
Console.ReadLine();
}
This is why I recommended storing the results of the query above!
Materialize the number:
int number = Strutture.Count();
for (int i = 0; i < number; i++)
{
}
or materialize the list:
var list = Strutture.ToList();
for (int i = 0; i < list.Count; i++)
{
}
or use a foreach
foreach(var item in Strutture)
{
}

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);
Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.
The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).
You could convert to for loop to something like this:
testColl1.Take(newLength)
Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

How to order integers according to size and track their positions by variable name

I have a program with multiple int variables where individual counts are added to the specific variable each time a set fail condition is encountered. I want the user to be able to track how many failures of each category they have encountered by a button click. I want to display the range on a datagridview in order from highest value integer down to lowest. I also need to display in the adjacent column the name of the test step that relates to the value. My plan was to use Array.sort to order the integers but i then lose track of their names so cant assign the adjacent string column. I tried using a hashtable but if i use the string as a key it sorts alphabetically not numerically and if i use the integer as a key i get duplicate entries which dont get added to the hash table. here is some of the examples i tried but they have the aforementioned problems. essentially i want to end with two arrays where the order matches the naming and value convention. FYI the variables were declared before this section of code, variables ending in x are the string name for the (non x) value of the integer.
Hashtable sorter = new Hashtable();
sorter[download] = downloadx;
sorter[power] = powerx;
sorter[phase] = phasex;
sorter[eeprom] = eepromx;
sorter[upulse] = upulsex;
sorter[vpulse] = vpulsex;
sorter[wpulse] = wpulsex;
sorter[volts] = voltsx;
sorter[current] = currentx;
sorter[ad] = adx;
sorter[comms] = commsx;
sorter[ntc] = ntcx;
sorter[prt] = prtx;
string list = "";
string[] names = new string[13];
foreach (DictionaryEntry child in sorter)
{
list += child.Value.ToString() + "z";
}
int[] ordered = new int[] { download, power, phase, eeprom, upulse, vpulse, wpulse, volts, current, ad, comms, ntc, prt };
Array.Sort(ordered);
Array.Reverse(ordered);
for (int i = 0; i < sorter.Count; i++)
{
int pos = list.IndexOf("z");
names[i] = list.Substring(0, pos);
list = list.Substring(pos + 1);
}
First question here so hope its not too longwinded.
Thanks
Use a Dictionary. And you can order it by the value : myDico.OrderBy(x => x.Value).Reverse(), the sort will be numerical descending. You just have to enumerate the result.
I hope I understand your need. Otherwise ignore me.
You want to be using a
Dictionary <string, int>
to store your numbers.I'm not clear on how you're displaying results at the end - do you have a grid or a list control?
You ask about usings. Which ones do you already have?
EDIT for .NET 2.0
There might be a more elegant solution, but you could implement the logic by putting your rows in a DataTable. Then you can make a DataView of that table and sort by whichever column you like, ascending or descending.
See http://msdn.microsoft.com/en-us/library/system.data.dataview(v=VS.80).aspx for example.
EDIT for .NET 3.5 and higher
As far as sorting a Dictionary by its values:
var sortedEntries = myDictionary.OrderBy(pair => pair.Value);
If you need the results to be a Dictionary, you can call .ToDictionary() on that. For reverse order, use .OrderByDescending(pair => pair.Value).

Resources