Reading the next line using LINQ and File.ReadAllLines() - linq

I have a file which represents items, in one line there's Item GUID followed by 5 lines describing the item.
Example:
Line 1: Guid=8e2803d1-444a-4893-a23d-d3b4ba51baee name= line1
Line 2: Item details = bla bla
.
.
Line 7: Guid=79e5e39d-0c17-42aa-a7c4-c5fa9bfe7309 name= line7
Line 8: Item details = bla bla
.
.
I am trying to access this file first to get the GUIDs of the items meet the criteria provided using LINQ e.g. where line.Contains("line1").. This way I will get the whole line, I will extract the GUID from there, I want to pass this GUID to another function which should access the file "again", find that line (where line.Contains("line1") && line.Contains("8e2803d1-444a-4893-a23d-d3b4ba51baee") and reads the next 5 lines starting from that line.
Is there any efficient way to do so?

I don't think it really makes sense to use LINQ entirely given the requirements of what you need to do and given that the index of the line in the array is fairy integral. I would also recommend doing everything in one pass - opening the file multiple times won't be as efficient as just reading everything once and processing it immediately. As long as the file is structured as well as you describe, this won't be terribly difficult:
private void GetStuff()
{
var lines = File.ReadAllLines("foo.txt");
var result = new Dictionary<Guid, String[]>();
for (var index = 0; index < lines.Length; index += 6)
{
var item = new
{
Guid = new Guid(lines[index]),
Description = lines.Skip(index + 1).Take(5).ToArray()
};
result.Add(item.Guid, item.Description);
}
}

I tried a couple different ways to do this with LINQ but nothing allowed me to do a single scan of the file. For this scenario you're talking about I would go down to the Enumerable level and use the GetEnumerator like this:
public IEnumerable<LogData> GetLogData(string filename)
{
var line1Regex = #"Line\s(\d+):\sGuid=([0123456789abcdefg]{8}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{12})\sname=\s(\w*)";
int detailLines = 4;
var lines = File.ReadAllLines(filename).GetEnumerator();
while (lines.MoveNext())
{
var line = (string)lines.Current;
var match = Regex.Match(line, line1Regex);
if (!match.Success)
continue;
var details = new string[detailLines];
for (int i = 0; i < detailLines && lines.MoveNext(); i++)
{
details[i] = (string)lines.Current;
}
yield return new LogData
{
Id = new Guid(match.Groups[2].Value),
Name = match.Groups[3].Value,
LineNumber = int.Parse(match.Groups[1].Value),
Details = details
};
}
}

Related

For loop to check values from two spreadsheets

I have two spreadsheets:
Column A on sheet 6th&7thRoster lists all IDs in a sample, contains 853 items.
Column C on sheet alreadySubmitted contains the IDs of users who've completed a task. Contains 632 items.
I'm trying to parse through both columns. If a user from Column A of sheet 6th&7thRoster matches a user from Column C of sheet sandboxAlreadySubmitted, I want to write the word "Yes" on Column I of the current row of sheet 6th&7thRoster. When using the code below, I'm not seeing not seeing any instances of the word "Yes" on Column I of 6th&7thRoster, even though I know there's multiple places where that should be the case.
function checkRoster() {
var mainSheet = SpreadsheetApp.openById('XXXXXXX');
var roster = mainSheet.getSheetByName('6th&7thRoster');
var submissions = mainSheet.getSheetByName('alreadySubmitted');
var rosterLastRow = roster.getLastRow();
var submissionsLastRow = submissions.getLastRow();
var rosterArray = roster.getRange('A2:A853').getValues();
var submissionsArray = submissions.getRange('C2:C632').getValues;
var i;
var x;
for (i = 1; i < 853; i++) {
for (x = 1; x < 632; x++){
if (rosterArray[i] == submissionsArray[x]){
roster.getRange(i, 9).setValue("Yes");
}
}
}
}
Feedback on how to solve and achieve this task will be much appreciated. For confidentiality, I cannot share the original sheets.
You want to compate the values of A2:A853 of 6th&7thRoster and C2:C632 of alreadySubmitted.
When the values of C2:C632 of alreadySubmitted are the same with the values of A2:A853 of 6th&7thRoster, you want to put Yes to the column "I".
If my understanding is correct, how about this modification? Please think of this as just one of several possible answers.
Modified script:
function checkRoster() {
var mainSheet = SpreadsheetApp.openById('XXXXXXX');
var roster = mainSheet.getSheetByName('6th&7thRoster');
var submissions = mainSheet.getSheetByName('alreadySubmitted');
var rosterLastRow = roster.getLastRow();
var submissionsLastRow = submissions.getLastRow();
var rosterArray = roster.getRange('A2:A853').getValues();
var submissionsArray = submissions.getRange('C2:C632').getValues(); // Modified
// I modified below script.
var obj = submissionsArray.reduce(function(o, [v]) {
if (v) o[v] = true;
return o;
}, {});
var values = rosterArray.map(function([v]) {return [obj[v] ? "Yes" : ""]});
roster.getRange(2, 9, values.length, values[0].length).setValues(values);
}
Flow:
Retrieve values from A2:A853 of 6th&7thRoster and C2:C632 of alreadySubmitted.
Create an object for searching the values from the values of alreadySubmitted.
Create the row values for putting to 6th&7thRoster.
References:
reduce()
map()
If I misunderstood your question and this was not the direction you want, I apologize.

Use of regEx with multiple categories

I need to rotate the match through variables Cat1 to Catx as long as there is data for the Cat'x'. Whenever I do this this, it does not consider as a variable but the literal Cat4 or Cat5 instead of the variable Cat4 & Cat5 when I try to compile the new category label. Such as the following with i increasing until there is no value to the variable searched for .. i.e. Cat57 has nothing assigned.
category = "Cat"+i
This is the portion of my code I believe that needs to be adjusted .. essentially based on the category I am going to assign it a specific column in my spreadsheet (this part hasn't been added yet) .. still stuck on the matching through multiple categories
if(studentmarks && studentmarks.length > 0 && assign.maxPoints > 0){
for (d = 0; d < studentmarks.length; d++){
var marks = studentmarks[d];
if(student.userId == marks.userId){
var ss = SpreadsheetApp.openByUrl(url).getSheetByName(shet);
var re = RegExp(Cat1);
if (assign.title.match(re))
ss.appendRow([category, assign.title, marks.assignedGrade,
assign.maxPoints]);
As we have talked in the comments there is no need to actually use RegEx in this case.
Just define an array and iterate through. Without the need
// Define new array
var Cat = [];
//
// Do stuf to populate Cat array
//
//Access your data
for(var i = 0; i < Cat.length; i++){
var element = Cat[i];
}

Group lines of log-file using Linq

I have an array of strings from a log file with the following format:
var lines = new []
{
"--------",
"TimeStamp: 12:45",
"Message: Message #1",
"--------",
"--------",
"TimeStamp: 12:54",
"Message: Message #2",
"--------",
"--------",
"Message: Message #3",
"TimeStamp: 12:55",
"--------"
}
I want to group each set of lines (as delimited by "--------") into a list using LINQ. Basically, I want a List<List<string>> or similar where each inner list contains 4 strings - 2 separators, a timestamp and a message.
I should add that I would like to make this as generic as possible, as the log-file format could change.
Can this be done?
Will this work?
var result = Enumerable.Range(0, lines.Length / 4)
.Select(l => lines.Skip(l * 4).Take(4).ToList())
.ToList()
EDIT:
This looks a little hacky but I'm sure it can be cleaned up
IEnumerable<List<String>> GetLogGroups(string[] lines)
{
var list = new List<String>();
foreach (var line in lines)
{
list.Add(line);
if (list.Count(l => l.All(c => c == '-')) == 2)
{
yield return list;
list = new List<string>();
}
}
}
You should be able to actually do better than returning a List>. If you're using C# 4, you could project each set of values into a dynamic type where the string before the colon becomes the property name and the value is on the left-hand side. You then create a custom iterator which reads the lines until the end "------" appears in each set and then yield return that row. On MoveNext, you read the next set of lines. Rinse and repeat until EOF. I don't have time at the moment to write up a full implementation, but my sample on reading in CSV and using LINQ over the dynamic objects may give you an idea of what you can do. See http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-CSV-using-DynamicObject. (note this sample is in VB, but the same can be done in C# as well with some modifications).
The iterator implementation has the added benefit of not having to load the entire document into memory before parsing. With this version, you only load the amount for one set of blocks at a time. It allows you to handle really large files.
Assuming that your structure is always
delimeter
TimeStamp
Message
delimeter
public List<List<String>> ConvertLog(String[] log)
{
var LogSet = new List<List<String>>();
for(i = 0; i < log.Length(); i += 4)
{
if (log.Length <= i+3)
{
var set = new List<String>() { log[i], log[i+1], log[i+2], log[i+3] };
LogSet.Add(set);
}
}
}
Or in Linq
public List<List<String> ConvertLog(String[] log)
{
return Enumerable.Range(0, lines.Length / 4)
.Select(l => lines.Skip(l * 4).Take(4).ToList())
.ToList()
}

Reading Text Files with LINQ

I have a file that I want to read into an array.
string[] allLines = File.ReadAllLines(#"path to file");
I know that I can iterate through the array and find each line that contains a pattern and display the line number and the line itself.
My question is:
Is it possible to do the same thing with LINQ?
Well yes - using the Select() overload that takes an index we can do this by projecting to an anonymous type that contains the line itself as well as its line number:
var targetLines = File.ReadAllLines(#"foo.txt")
.Select((x, i) => new { Line = x, LineNumber = i })
.Where( x => x.Line.Contains("pattern"))
.ToList();
foreach (var line in targetLines)
{
Console.WriteLine("{0} : {1}", line.LineNumber, line.Line);
}
Since the console output is a side effect it should be separate from the LINQ query itself.
Using LINQ is possible. However, since you want the line number as well, the code will likely be more readable by iterating yourself:
const string pattern = "foo";
for (int lineNumber = 1; lineNumber <= allLines.Length; lineNumber++)
{
if (allLines[lineNumber-1].Contains(pattern))
{
Console.WriteLine("{0}. {1}", lineNumber, allLines[i]);
}
}
something like this should work
var result = from line in File.ReadAllLines(#"path")
where line.Substring(0,1) == "a" // put your criteria here
select line

Need an algorithm to group several parameters of a person under the persons name

I have a bunch of names in alphabetical order with multiple instances of the same name all in alphabetical order so that the names are all grouped together. Beside each name, after a coma, I have a role that has been assigned to them, one name-role pair per line, something like whats shown below
name1,role1
name1,role2
name1,role3
name1,role8
name2,role8
name2,role2
name2,role4
name3,role1
name4,role5
name4,role1
...
..
.
I am looking for an algorithm to take the above .csv file as input create an output .csv file in the following format
name1,role1,role2,role3,role8
name2,role8,role2,role4
name3,role1
name4,role5,role1
...
..
.
So basically I want each name to appear only once and then the roles to be printed in csv format next to the names for all names and roles in the input file.
The algorithm should be language independent. I would appreciate it if it does NOT use OOP principles :-) I am a newbie.
Obviously has some formatting bugs but this will get you started.
var lastName = "";
do{
var name = readName();
var role = readRole();
if(lastName!=name){
print("\n"+name+",");
lastName = name;
}
print(role+",");
}while(reader.isReady());
This is easy to do if your language has associative arrays: arrays that can be indexed by anything (such as a string) rather than just numbers. Some languages call them "hashes," "maps," or "dictionaries."
On the other hand, if you can guarantee that the names are grouped together as in your sample data, Stefan's solution works quite well.
It's kind of a pity you said it had to be language-agnostic because Python is rather well-qualified for this:
import itertools
def split(s):
return s.strip().split(',', 1)
with open(filename, 'r') as f:
for name, lines in itertools.groupby(f, lambda s: split(s)[0])
print name + ',' + ','.join(split(s)[1] for s in lines)
Basically the groupby call takes all consecutive lines with the same name and groups them together.
Now that I think about it, though, Stefan's answer is probably more efficient.
Here is a solution in Java:
Scanner sc = new Scanner (new File(fileName));
Map<String, List<String>> nameRoles = new HashMap<String, List<String>> ();
while (sc.hasNextLine()) {
String line = sc.nextLine();
String args[] = line.split (",");
if (nameRoles.containsKey(args[0]) {
nameRoles.get(args[0]).add(args[1]);
} else {
List<String> roles = new ArrayList<String>();
roles.add(args[1]);
nameRoles.put(args[0], roles);
}
}
// then print it out
for (String name : nameRoles.keySet()) {
List<String> roles = nameRoles.get(name);
System.out.print(name + ",");
for (String role : roles) {
System.out.print(role + ",");
}
System.out.println();
}
With this approach, you can work with an random input like:
name1,role1
name3,role1
name2,role8
name1,role2
name2,role2
name4,role5
name4,role1
Here it is in C# using nothing fancy. It should be self-explanatory:
static void Main(string[] args)
{
using (StreamReader file = new StreamReader("input.txt"))
{
string prevName = "";
while (!file.EndOfStream)
{
string line = file.ReadLine(); // read a line
string[] tokens = line.Split(','); // split the name and the parameter
string name = tokens[0]; // this is the name
string param = tokens[1]; // this is the parameter
if (name == prevName) // if the name is the same as the previous name we read, we add the current param to that name. This works right because the names are sorted.
{
Console.Write(param + " ");
}
else // otherwise, we are definitely done with the previous name, and have printed all of its parameters (due to the sorting).
{
if (prevName != "") // make sure we don't print an extra newline the first time around
{
Console.WriteLine();
}
Console.Write(name + ": " + param + " "); // write the name followed by the first parameter. The output format can easily be tweaked to print commas.
prevName = name; // store the new name as the previous name.
}
}
}
}

Resources