Group lines of log-file using Linq - linq

I have an array of strings from a log file with the following format:
var lines = new []
{
"--------",
"TimeStamp: 12:45",
"Message: Message #1",
"--------",
"--------",
"TimeStamp: 12:54",
"Message: Message #2",
"--------",
"--------",
"Message: Message #3",
"TimeStamp: 12:55",
"--------"
}
I want to group each set of lines (as delimited by "--------") into a list using LINQ. Basically, I want a List<List<string>> or similar where each inner list contains 4 strings - 2 separators, a timestamp and a message.
I should add that I would like to make this as generic as possible, as the log-file format could change.
Can this be done?

Will this work?
var result = Enumerable.Range(0, lines.Length / 4)
.Select(l => lines.Skip(l * 4).Take(4).ToList())
.ToList()
EDIT:
This looks a little hacky but I'm sure it can be cleaned up
IEnumerable<List<String>> GetLogGroups(string[] lines)
{
var list = new List<String>();
foreach (var line in lines)
{
list.Add(line);
if (list.Count(l => l.All(c => c == '-')) == 2)
{
yield return list;
list = new List<string>();
}
}
}

You should be able to actually do better than returning a List>. If you're using C# 4, you could project each set of values into a dynamic type where the string before the colon becomes the property name and the value is on the left-hand side. You then create a custom iterator which reads the lines until the end "------" appears in each set and then yield return that row. On MoveNext, you read the next set of lines. Rinse and repeat until EOF. I don't have time at the moment to write up a full implementation, but my sample on reading in CSV and using LINQ over the dynamic objects may give you an idea of what you can do. See http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-CSV-using-DynamicObject. (note this sample is in VB, but the same can be done in C# as well with some modifications).
The iterator implementation has the added benefit of not having to load the entire document into memory before parsing. With this version, you only load the amount for one set of blocks at a time. It allows you to handle really large files.

Assuming that your structure is always
delimeter
TimeStamp
Message
delimeter
public List<List<String>> ConvertLog(String[] log)
{
var LogSet = new List<List<String>>();
for(i = 0; i < log.Length(); i += 4)
{
if (log.Length <= i+3)
{
var set = new List<String>() { log[i], log[i+1], log[i+2], log[i+3] };
LogSet.Add(set);
}
}
}
Or in Linq
public List<List<String> ConvertLog(String[] log)
{
return Enumerable.Range(0, lines.Length / 4)
.Select(l => lines.Skip(l * 4).Take(4).ToList())
.ToList()
}

Related

At least one one object must implement Icomparable

I am attempting to get unique values in a list of similar value distinguished only by a one element in a pipe delimited string... I keep getting at least one object must implement Icomparable. I don't understand why I keep getting that. I am able to groupBy that value... Why can't I find the max... I guess it is looking for something to compare it with. If I get the integer version will it stop yelling at me? This is the last time I am going to try using LINQ...
var queryResults = PatientList.GroupBy(x => x.Value.Split('|')[1]).Select(x => x.Max());
I know I can get the unique values some other way. I am just having a hard time figuring it out. In that List I know that the string with the highest value amongst its similar brethren is the one that I want to add to the list. How can I do that? I am totally drawing a blank because I have been trying to get this to work in linq for the last few days with no luck...
foreach (XmlNode node in nodeList)
{
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(node.OuterXml);
string popPatInfo = xDoc.SelectSingleNode("./template/elements/element[#name=\"FirstName\"]").Attributes["value"].Value + ", " + xDoc.SelectSingleNode("./template/elements/element[#name=\"LastName\"]").Attributes["value"].Value + " | " + DateTime.Parse(xDoc.SelectSingleNode("./template/elements/element[#name=\"DateOfBirth\"]").Attributes["value"].Value.Split('T')[0]).ToString("dd-MMM-yyyy");
string patientInfo = xDoc.SelectSingleNode("./template/elements/element[#name=\"PatientId\"]").Attributes["value"].Value + "|" + xDoc.SelectSingleNode("./template/elements/element[#name=\"PopulationPatientID\"]").Attributes["enc"].Value;// +"|" + xDoc.SelectSingleNode("./template/elements/element[#name=\"AdminDate\"]").Attributes["value"].Value;
int enc = Int32.Parse(patientInfo.Split('|')[1]);
if (enc > temp)
{
lastEncounter.Add(enc, patientInfo);
temp = enc;
}
//lastEncounter.Add(Int32.Parse(patientInfo.Split('|')[1]));
PatientList.Add( new SelectListItem { Text = popPatInfo, Value = patientInfo });
}
I was thinking about using some kind of temp variable to find out what is the highest value and then add that string to the List. I am totally drawing a blank however...
Here I get the IDs in an anonymous type to make it readable.
var patientEncounters= from patient in PatientList
let PatientID=Int32.Parse(patient.Value.Split('|')[0])
let EncounterID=Int32.Parse(patient.Value.Split('|')[1])
select new { PatientID, EncounterID };
Then we group by UserID and get the last encounter
var lastEncounterForEachUser=from pe in patientEncounters
group pe by pe.PatientID into grouped
select new
{
PatientID=grouped.Key,
LastEncounterID=grouped.Max(g=>g.EncounterID)
};
Linq doesn't know how to compare 2 Patient objects, so it can't determine which one is the "greatest". You need to make the Patient class implement IComparable<Patient>, to define how Patient objects are compared.
// Compare objets by Id value
public int CompareTo(Patient other)
{
return this.Id.CompareTo(other.Id);
}
Another option is to use the MaxBy extension method available in Jon Skeet's MoreLinq project:
var queryResults = PatientList.GroupBy(x => x.Value.Split('|')[1])
.Select(x => x.MaxBy(p => p.Id));
EDIT: I assumed there was a Patient class, but reading your code again, I realize it's not the case. PatientList is actually a collection of SelectListItem, so you need to implement IComparable in that class.

Reading Text Files with LINQ

I have a file that I want to read into an array.
string[] allLines = File.ReadAllLines(#"path to file");
I know that I can iterate through the array and find each line that contains a pattern and display the line number and the line itself.
My question is:
Is it possible to do the same thing with LINQ?
Well yes - using the Select() overload that takes an index we can do this by projecting to an anonymous type that contains the line itself as well as its line number:
var targetLines = File.ReadAllLines(#"foo.txt")
.Select((x, i) => new { Line = x, LineNumber = i })
.Where( x => x.Line.Contains("pattern"))
.ToList();
foreach (var line in targetLines)
{
Console.WriteLine("{0} : {1}", line.LineNumber, line.Line);
}
Since the console output is a side effect it should be separate from the LINQ query itself.
Using LINQ is possible. However, since you want the line number as well, the code will likely be more readable by iterating yourself:
const string pattern = "foo";
for (int lineNumber = 1; lineNumber <= allLines.Length; lineNumber++)
{
if (allLines[lineNumber-1].Contains(pattern))
{
Console.WriteLine("{0}. {1}", lineNumber, allLines[i]);
}
}
something like this should work
var result = from line in File.ReadAllLines(#"path")
where line.Substring(0,1) == "a" // put your criteria here
select line

Need an algorithm to group several parameters of a person under the persons name

I have a bunch of names in alphabetical order with multiple instances of the same name all in alphabetical order so that the names are all grouped together. Beside each name, after a coma, I have a role that has been assigned to them, one name-role pair per line, something like whats shown below
name1,role1
name1,role2
name1,role3
name1,role8
name2,role8
name2,role2
name2,role4
name3,role1
name4,role5
name4,role1
...
..
.
I am looking for an algorithm to take the above .csv file as input create an output .csv file in the following format
name1,role1,role2,role3,role8
name2,role8,role2,role4
name3,role1
name4,role5,role1
...
..
.
So basically I want each name to appear only once and then the roles to be printed in csv format next to the names for all names and roles in the input file.
The algorithm should be language independent. I would appreciate it if it does NOT use OOP principles :-) I am a newbie.
Obviously has some formatting bugs but this will get you started.
var lastName = "";
do{
var name = readName();
var role = readRole();
if(lastName!=name){
print("\n"+name+",");
lastName = name;
}
print(role+",");
}while(reader.isReady());
This is easy to do if your language has associative arrays: arrays that can be indexed by anything (such as a string) rather than just numbers. Some languages call them "hashes," "maps," or "dictionaries."
On the other hand, if you can guarantee that the names are grouped together as in your sample data, Stefan's solution works quite well.
It's kind of a pity you said it had to be language-agnostic because Python is rather well-qualified for this:
import itertools
def split(s):
return s.strip().split(',', 1)
with open(filename, 'r') as f:
for name, lines in itertools.groupby(f, lambda s: split(s)[0])
print name + ',' + ','.join(split(s)[1] for s in lines)
Basically the groupby call takes all consecutive lines with the same name and groups them together.
Now that I think about it, though, Stefan's answer is probably more efficient.
Here is a solution in Java:
Scanner sc = new Scanner (new File(fileName));
Map<String, List<String>> nameRoles = new HashMap<String, List<String>> ();
while (sc.hasNextLine()) {
String line = sc.nextLine();
String args[] = line.split (",");
if (nameRoles.containsKey(args[0]) {
nameRoles.get(args[0]).add(args[1]);
} else {
List<String> roles = new ArrayList<String>();
roles.add(args[1]);
nameRoles.put(args[0], roles);
}
}
// then print it out
for (String name : nameRoles.keySet()) {
List<String> roles = nameRoles.get(name);
System.out.print(name + ",");
for (String role : roles) {
System.out.print(role + ",");
}
System.out.println();
}
With this approach, you can work with an random input like:
name1,role1
name3,role1
name2,role8
name1,role2
name2,role2
name4,role5
name4,role1
Here it is in C# using nothing fancy. It should be self-explanatory:
static void Main(string[] args)
{
using (StreamReader file = new StreamReader("input.txt"))
{
string prevName = "";
while (!file.EndOfStream)
{
string line = file.ReadLine(); // read a line
string[] tokens = line.Split(','); // split the name and the parameter
string name = tokens[0]; // this is the name
string param = tokens[1]; // this is the parameter
if (name == prevName) // if the name is the same as the previous name we read, we add the current param to that name. This works right because the names are sorted.
{
Console.Write(param + " ");
}
else // otherwise, we are definitely done with the previous name, and have printed all of its parameters (due to the sorting).
{
if (prevName != "") // make sure we don't print an extra newline the first time around
{
Console.WriteLine();
}
Console.Write(name + ": " + param + " "); // write the name followed by the first parameter. The output format can easily be tweaked to print commas.
prevName = name; // store the new name as the previous name.
}
}
}
}

Reading the next line using LINQ and File.ReadAllLines()

I have a file which represents items, in one line there's Item GUID followed by 5 lines describing the item.
Example:
Line 1: Guid=8e2803d1-444a-4893-a23d-d3b4ba51baee name= line1
Line 2: Item details = bla bla
.
.
Line 7: Guid=79e5e39d-0c17-42aa-a7c4-c5fa9bfe7309 name= line7
Line 8: Item details = bla bla
.
.
I am trying to access this file first to get the GUIDs of the items meet the criteria provided using LINQ e.g. where line.Contains("line1").. This way I will get the whole line, I will extract the GUID from there, I want to pass this GUID to another function which should access the file "again", find that line (where line.Contains("line1") && line.Contains("8e2803d1-444a-4893-a23d-d3b4ba51baee") and reads the next 5 lines starting from that line.
Is there any efficient way to do so?
I don't think it really makes sense to use LINQ entirely given the requirements of what you need to do and given that the index of the line in the array is fairy integral. I would also recommend doing everything in one pass - opening the file multiple times won't be as efficient as just reading everything once and processing it immediately. As long as the file is structured as well as you describe, this won't be terribly difficult:
private void GetStuff()
{
var lines = File.ReadAllLines("foo.txt");
var result = new Dictionary<Guid, String[]>();
for (var index = 0; index < lines.Length; index += 6)
{
var item = new
{
Guid = new Guid(lines[index]),
Description = lines.Skip(index + 1).Take(5).ToArray()
};
result.Add(item.Guid, item.Description);
}
}
I tried a couple different ways to do this with LINQ but nothing allowed me to do a single scan of the file. For this scenario you're talking about I would go down to the Enumerable level and use the GetEnumerator like this:
public IEnumerable<LogData> GetLogData(string filename)
{
var line1Regex = #"Line\s(\d+):\sGuid=([0123456789abcdefg]{8}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{12})\sname=\s(\w*)";
int detailLines = 4;
var lines = File.ReadAllLines(filename).GetEnumerator();
while (lines.MoveNext())
{
var line = (string)lines.Current;
var match = Regex.Match(line, line1Regex);
if (!match.Success)
continue;
var details = new string[detailLines];
for (int i = 0; i < detailLines && lines.MoveNext(); i++)
{
details[i] = (string)lines.Current;
}
yield return new LogData
{
Id = new Guid(match.Groups[2].Value),
Name = match.Groups[3].Value,
LineNumber = int.Parse(match.Groups[1].Value),
Details = details
};
}
}

Reproduce a "DELETE NOT IN" SQL Statement via LINQ/Subsonic

I want to do something like DELETE FROM TABLE WHERE ID NOT IN (1,2,3) AND PAGEID = 9
I have a List of IDS but that could be changed if needs be. I can't work out how to get a boolean result for the LINQ parser.
Here is what Subsonic expects I think.
db.Delete(content => content.PageID == ID).Execute();
I can't work out how to do the NOT IN statement. I've tried the List.Contains method but something not quite right.
UPDATE: One alternative is to do:
var items = TABLE.Find(x => x.PageID == ID)'
foreach(var item in items)
{
item.Delete();
}
This hits the database a lot more though
When you say "something not quite right" what exactly do you mean?
I'd expect to write:
List<int> excluded = new List<int> { 1, 2, 3 };
db.Delete(content => !excluded.Contains(content.PageID)).Execute();
Note that you need to call Contains on the array of excluded values, not on your candidate. In other words, instead of saying "item not in collection" you're saying "collection doesn't contain item."
Try .Contains:
db.Delete(content => content.PageID.Contains(<Array containing ID's>).Execute();
(the above is just an example, might need some polishing for your specific situation)
I have found that this works but its not via LINQ
var table = new WebPageContentTable(_db.DataProvider);
var g = new SubSonic.Query.Delete<WebPageContent(_db.DataProvider)
.From(table)
.Where(table.ID)
.NotIn(usedID)
.Execute();
I have found that this does work and via LINQ - however it hits the database multiple times.
var f = WebPageContent.Find(x => !usedID.Any(e => e == x.ID));
if (f.Count > 0)
{
var repo = WebPageContent.GetRepo();
repo.Delete(f);
}
This I imagine would work in one hit to the database but I get an exception thrown in QueryVisitor::VisitUnary
WebPageContent.Delete(x => !usedID.Any(e => e == x.ID));

Resources