Parsing complex xml in Ruby with sax parsing - ruby

I've been searching and searching for a couple of days on how to do this, but I can't seem to understand sax parsing in a way that will help me accomplish what I want to accomplish. I understand sax parsing on a basic level, but I can't wrap my mind around how to use it to extract the data I need to extract.
I'm currently using:
xml data
ruby
the saxerator gem (I'm not sold on this, it's just the easiest I've found so far that I'm able to understand clearly enough)
Here's a sample of the xml structure:
<result created="2015-08-26T09:42:35-05:00" host="testdata" status="
<items>
<client>
<clientid>00001</clientid>
<name>
<![CDATA[ ABC Company ]]>
</name>
<site>
<siteid>222222</siteid>
<name>
<![CDATA[ 123 Blvd ]]>
</name>
<workstations/>
<servers>
<server>
<id>333333</id>
<name>
<![CDATA[ 123BLVD-SRV ]]>
</name>
<failed_checks>
<check>
<checkid>4444444</checkid>
<check_type>0001</check_type>
<description>
<![CDATA[Critical Events Check - Application log]]>
</description>
<dsc_247>2</dsc_247>
<date>2015-08-26</date>
<time>06:03:44</time>
<consecutive_fails>2</consecutive_fails>
<startdate>2015-08-25</startdate>
<starttime>10:43:51</starttime>
<formatted_output>
<![CDATA[Event log issues[CLIENT:]]>
</formatted_output>
<checkstatus>
<![CDATA[ Status ]]>
</checkstatus>
</check>
</failed_checks>
</server>
</servers>
</site>
</client>
What I'm trying to extract is an array of clients. Each client will have a name, a clientid, an array of its workstations (and their properties), and an array of its servers (and their properties). Something like this:
clients_array = [
{
:name => 'ABC Company',
:clientid => '00001',
:workstations => [
{
:name => 'hostname',
:id => '00002',
:failed_checks => [
{
:description => 'description', :cause => 'cause'
}
]
},
{
:name => 'hostname2',
:id => '00003',
...
}
]
},
{
:name => 'Second Company',
:clientid => '...',
...
}
]
The problem I'm running into is I can extract the client node's information easily enough, but extracting the workstation and server information for each client node is difficult.
Side note: I would just use DOM parsing, which I've done in the past with great success, but the XML I'm working with is far too large and has crashed the server.
Here's what I've been working with so far. I keep getting stuck at the site/workstations/servers nodes because sometimes there will be one site (hash element) and sometimes there are multiple sites (array element). The same goes for workstations and servers.
Since this is sax parsing, I don't understand how I can point the workstations and servers back to each client. I don't need the site data, just the workstations and servers for each client:
require 'saxerator'
def parse_sax
clients_array = []
parser = Saxerator.parser(File.new("data.xml"))
parser.for_tag(:client).each do |client|
# Create a hash to store 'this' client's data in
client_hash = Hash.new
# Grab some data
client_hash[:name] = client['name']
client_hash[:clientid] = client['clientid']
# Here's where the workstation/server code would go
parser.for_tag(:site).each do |site|
# This just goes through and finds ALL sites
end
clients_array << client_hash
end
I thought I had figured it out when I thought about parsing clients, workstations, and servers separately:
parser.for_tag(:client).each do |client|
...
end
parser.for_tag(:workstation).each do |ws|
...
end
parser.for_tag(:server).each do |srv|
...
end
But then I end up with a bunch of separate client, workstation, and server objects with no way of relating the devices back to their respective clients.
It's very possible my grasp of sax parsing is such that I'm just missing something trivial that will accomplish what I want, but I can't seem to discover the solution.
I'm more than happy to provide clarification where needed and any help is more than appreciated.

Use XMLTextReader for huge xml files. Use code like this
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
List<Client> clients = new List<Client>();
XmlTextReader reader = new XmlTextReader(FILENAME);
while (!reader.EOF)
{
if (reader.Name == "client")
{
string xmlClient = reader.ReadOuterXml();
XElement xClient = XElement.Parse(xmlClient);
Client newClient = new Client();
clients.Add(newClient);
newClient.name = xClient.Element("name").Value;
newClient.clientid = xClient.Element("clientid").Value;
newClient.workstations = xClient.Descendants("server").Select(x => new WorkStation
{
name = x.Element("name").Value,
id = x.Element("id").Value
}).ToList();
}
else
{
reader.ReadToFollowing("client");
}
}
}
}
public class Client
{
public string name { get; set;}
public string clientid { get; set; }
public List<WorkStation> workstations { get; set; }
}
public class WorkStation
{
public string name { get; set; }
public string id { get; set; }
}
}
​

Related

Attribute based index hints change my results

I have this query that hasn't changed since I first got it working:
ISearchResponse<Series> response = await IndexManager.GetClient()
.SearchAsync<Series>(r => r
.Filter(f => f.Term<Role>(t => t.ReleasableTo.First(), Role.Visitor))
.SortDescending(ser => ser.EndDate)
.Size(1));
My IndexManager.GetClient() is simply responsible for setting up my connection to ElasticSearch, and ensuring that the indexes are built properly. The rest of the code gets the most recent article series that is releasable to the general public.
Inside the IndexManager I set up explicit index mapping, and when I did that I got results from my query every time. The code looked like this:
client.Map<Series>(m => m.Dynamic(DynamicMappingOption.Allow)
.DynamicTemplates(t => t
.Add(a => a.Name("releasableTo").Match("*releasableTo").MatchMappingType("string").Mapping(map => map.String(s => s.Index(FieldIndexOption.NotAnalyzed))))
.Add(a => a.Name("id").Match("*id").MatchMappingType("string").Mapping(map => map.String(s => s.Index(FieldIndexOption.NotAnalyzed))))
.Add(a => a.Name("services").Match("*amPm").MatchMappingType("string").Mapping(map => map.String(s => s.Index(FieldIndexOption.NotAnalyzed)))
.Match("*dayOfWeek").MatchMappingType("string").Mapping(map => map.String(s => s.Index(FieldIndexOption.NotAnalyzed))))
.Add(a => a.Name("urls").Match("*Url").MatchMappingType("string").Mapping(map => map.String(s => s.Index(FieldIndexOption.NotAnalyzed))))
));
While all well and good, doing this for every type we stored wasn't really going to scale well. So I made a conscious decision to use the attributes and map it that way:
// In IndexManager
client.Map<T>(m => m.MapFromAttributes());
// In the type definition
class Series
{
// ....
[DataMember]
[ElasticProperty(Index = FieldIndexOption.NotAnalyzed, Store = true)]
public HashSet<Role> ReleasableTo { get; set; }
// ....
}
As soon as I do this, I no longer get results. When I look at my indexes in Kibana, I see my 'releasableTo' field is not analyzed and it is indexed. However the query I wrote no longer works. If I remove the filter clause I get results, but I really need that to work.
What am I missing? How do I get my query to work again?
It appears that the ElasticSearch attributes to provide indexing hints don't know what to do with enums.
The problem turned out to be the fact that the Role type was an enumeration. The client.Map<Series>(m => m.MapFromAttributes()) call skipped that property. At run time, it dynamically maps the property to a string.
// In the type definition
class Series
{
// ....
[DataMember]
[ElasticProperty(Index = FieldIndexOption.NotAnalyzed, Store = true)]
public HashSet<Role> ReleasableTo { get; set; }
// ....
}
To get the field properly indexed I had to explicitly set it's type in the ElasticProperty attribute. Changing the code to this:
// In the type definition
class Series
{
// ....
[DataMember]
[ElasticProperty(Index = FieldIndexOption.NotAnalyzed, Type = FieldType.String, Store = true)]
public HashSet<Role> ReleasableTo { get; set; }
// ....
}
made my query work again. The moral of the story is that unless it's a primitive type, be explicit when setting the field type.

nHibernate join and take one record

I'm doing the below join, there are many bookingActions records, but I want there to only be one BookingAction record per booking record. I want the BookingAction record that has the highest primary key value.
How would I do this?
var bookingLocationsQuery = (
from
booking in session.Query<Booking>()
join
bookingActions in session.Query<BookingAction>() on booking.Id equals bookingActions.bookingId
where
(booking.bookingAdminID == userId)
select new { booking, bookingActions }
);
A couple of suggestions. First, you should be leveraging NHibernate's many-to-one to do the join for you instead of doing it manually. It looks like you currently have something like this...
public class BookingAction
{
// ... other properties ...
public virtual int bookingId { get; set; }
}
<class name="BookingAction">
<!-- ... other properties ... -->
<property name="bookingId" />
</class>
Don't do that. Instead, you should have:
public class BookingAction
{
// ... other properties ...
public virtual Booking Booking { get; set; }
}
<class name="BookingAction">
<!-- ... other properties ... -->
<many-to-one name="Booking" column="bookingId" />
</class>
Similar advice for Booking.bookingAdminID. It should be a many-to-one to User, not just a simple property.
Second, after you make those changes, you should be able to accomplish your goal with a query like this:
var subquery = session.Query<BookingAction>()
.Where(a => a.Booking.Admin.Id == userId)
.GroupBy(a => a.Booking.Id)
.Select(g => g.Max(a => a.Id));
var bookingActions = session.Query<BookingAction>()
.Fetch(a => a.Booking)
.Where(a => subquery.Contains(a.Id));
Sorry about switching it to the chained extension method syntax - that's easier for me to work with. It's exactly equivalent to the from ... select syntax in execution.
Try using the Max() method, for sample:
var bookingLocation = session.Query<Booking>()
.Where(booking => booking.bookingAdminID == userId)
.Max(x => booking.bookingAdminID);

Get single value from XML and bind it to a textblock?

Trying to create a prayer time app for prayertimes in Oslo. I have a XML file located in the app.
What i want to do:
Based on month and the day, get value for morning prayer, evening prayer and so on.
I want one value at a time, and show it in a textblock. how do i do it?
I am currently getting the info in a listBox but i rather want the single value to be shown in a textblock. Or should i use some other thing?
public class PrayerTime
{
public string Fajr { get; set; }
public string Sunrise { get; set; }
}
To get the value:
XDocument loadedCustomData = XDocument.Load("WimPrayerTime.xml");
var filteredData = from c in loadedCustomData.Descendants("PrayerTime")
where c.Attribute("Day").Value == myDay.Day.ToString()
&& c.Attribute("Moth").Value == myDay.Month.ToString()
select new PrayerTime()
{
Fajr = c.Attribute("Fajr").Value,
Soloppgang = c.Attribute("Soloppgang").Value,
};
listBox1.ItemsSource = filteredData;
Also i want to know how best the XML should be set up for this purpose.
Like this:
<PrayerTime>
<Day>1</Day>
<Month>5</Month>
<Fajr>07:00</Fajr>
<Sunrise>09:00</Sunrise>
</PrayerTime>
Or like this:
<PrayerTime
Day ="1"
Month="5"
Fajr="07:00"
Sunrise="09:00"
/>
yourTextBox.Text = filteredData.First().Fajr;
As to know whether it's best to put information in a XML file as attributes or nodes, that's a recurrent question with no definite answer. In most cases, it's just a matter of taste.

How can I create an Expression within another Expression?

Forgive me if this has been asked already. I've only just started using LINQ. I have the following Expression:
public static Expression<Func<TblCustomer, CustomerSummary>> SelectToSummary()
{
return m => (new CustomerSummary()
{
ID = m.ID,
CustomerName = m.CustomerName,
LastSalesContact = // This is a Person entity, no idea how to create it
});
}
I want to be able to populate LastSalesContact, which is a Person entity.
The details that I wish to populate come from m.LatestPerson, so how can I map over the fields from m.LatestPerson to LastSalesContact. I want the mapping to be re-useable, i.e. I do not want to do this:
LastSalesContact = new Person()
{
// Etc
}
Can I use a static Expression, such as this:
public static Expression<Func<TblUser, User>> SelectToUser()
{
return x => (new User()
{
// Populate
});
}
UPDATE:
This is what I need to do:
return m => (new CustomerSummary()
{
ID = m.ID,
CustomerName = m.CustomerName,
LastSalesContact = new Person()
{
PersonId = m.LatestPerson.PersonId,
PersonName = m.LatestPerson.PersonName,
Company = new Company()
{
CompanyId = m.LatestPerson.Company.CompanyId,
etc
}
}
});
But I will be re-using the Person() creation in about 10-15 different classes, so I don't want exactly the same code duplicated X amount of times. I'd probably also want to do the same for Company.
Can't you just use automapper for that?
public static Expression<Func<TblCustomer, CustomerSummary>> SelectToSummary()
{
return m => Mapper.Map<TblCustomer, CustommerSummary>(m);
}
You'd have to do some bootstrapping, but then it's very reusable.
UPDATE:
I may not be getting something, but what it the purpose of this function? If you just want to map one or collection of Tbl object to other objects, why have the expression?
You could just have something like this:
var customers = _customerRepository.GetAll(); // returns IEnumerable<TblCustomer>
var summaries = Mapper.Map<IEnumerable<TblCustomer>, IEnumerable<CustomerSummary>>(customers);
Or is there something I missed?
I don't think you'll be able to use a lambda expression to do this... you'll need to build up the expression tree by hand using the factory methods in Expression. It's unlikely to be pleasant, to be honest.
My generally preferred way of working out how to build up expression trees is to start with a simple example of what you want to do written as a lambda expression, and then decompile it. That should show you how the expression tree is built - although the C# compiler gets to use the metadata associated with properties more easily than we can (we have to use Type.GetProperty).
This is always assuming I've understood you correctly... it's quite possible that I haven't.
How about this:
public static Person CreatePerson(TblPerson data)
{
// ...
}
public static Expression<Func<TblPerson, Person>> CreatePersonExpression()
{
return d => CreatePerson(d);
}
return m => (new CustomerSummary()
{
ID = m.ID,
CustomerName = m.CustomerName,
LastSalesContact = CreatePerson(m.LatestPerson)
});

Entity framework linq query Include() multiple children entities

This may be a really elementry question but whats a nice way to include multiple children entities when writing a query that spans THREE levels (or more)?
i.e. I have 4 tables: Company, Employee, Employee_Car and Employee_Country
Company has a 1:m relationship with Employee.
Employee has a 1:m relationship with both Employee_Car and Employee_Country.
If i want to write a query that returns the data from all 4 the tables, I am currently writing:
Company company = context.Companies
.Include("Employee.Employee_Car")
.Include("Employee.Employee_Country")
.FirstOrDefault(c => c.Id == companyID);
There has to be a more elegant way! This is long winded and generates horrendous SQL
I am using EF4 with VS 2010
Use extension methods.
Replace NameOfContext with the name of your object context.
public static class Extensions{
public static IQueryable<Company> CompleteCompanies(this NameOfContext context){
return context.Companies
.Include("Employee.Employee_Car")
.Include("Employee.Employee_Country") ;
}
public static Company CompanyById(this NameOfContext context, int companyID){
return context.Companies
.Include("Employee.Employee_Car")
.Include("Employee.Employee_Country")
.FirstOrDefault(c => c.Id == companyID) ;
}
}
Then your code becomes
Company company =
context.CompleteCompanies().FirstOrDefault(c => c.Id == companyID);
//or if you want even more
Company company =
context.CompanyById(companyID);
EF Core
For eager loading relationships more than one navigation away (e.g. grand child or grand parent relations), where the intermediate relation is a collection (i.e. 1 to many with the original 'subject'), EF Core has a new extension method, .ThenInclude(), and the syntax is slightly different to the older EF 4-6 syntax:
using Microsoft.EntityFrameworkCore;
...
var company = context.Companies
.Include(co => co.Employees)
.ThenInclude(emp => emp.Employee_Car)
.Include(co => co.Employees)
.ThenInclude(emp => emp.Employee_Country)
With some notes
As per above (Employees.Employee_Car and Employees.Employee_Country), if you need to include 2 or more child properties of an intermediate child collection, you'll need to repeat the .Include navigation for the collection for each child of the collection.
Personally, I would keep the extra 'indent' in the .ThenInclude to preserve your sanity.
For serialization of intermediaries which are 1:1 (or N:1) with the original subject, the dot syntax is also supported, e.g.
var company = context.Companies
.Include(co => co.City.Country);
This is functionally equivalent to:
var company = context.Companies
.Include(co => co.City)
.ThenInclude(ci => ci.Country);
However, in EFCore, the old EF4 / 6 syntax of using 'Select' to chain through an intermediary which is 1:N with the subject is not supported, i.e.
var company = context.Companies
.Include(co => co.Employee.Select(emp => emp.Address));
Will typically result in obscure errors like
Serialization and deserialization of 'System.IntPtr' instances are not supported
EF 4.1 to EF 6
There is a strongly typed .Include which allows the required depth of eager loading to be specified by providing Select expressions to the appropriate depth:
using System.Data.Entity; // NB!
var company = context.Companies
.Include(co => co.Employees.Select(emp => emp.Employee_Car))
.Include(co => co.Employees.Select(emp => emp.Employee_Country))
.FirstOrDefault(co => co.companyID == companyID);
The Sql generated is by no means intuitive, but seems performant enough. I've put a small example on GitHub here
You might find this article of interest which is available at codeplex.com.
Improving Entity Framework Query Performance Using Graph-Based Querying.
The article presents a new way of expressing queries that span multiple tables in the form of declarative graph shapes.
Moreover, the article contains a thorough performance comparison of this new approach with EF queries. This analysis shows that GBQ quickly outperforms EF queries.
How do you construct a LINQ to Entities query to load child objects directly, instead of calling a Reference property or Load()
There is no other way - except implementing lazy loading.
Or manual loading....
myobj = context.MyObjects.First();
myobj.ChildA.Load();
myobj.ChildB.Load();
...
Might be it will help someone, 4 level and 2 child's on each level
Library.Include(a => a.Library.Select(b => b.Library.Select(c => c.Library)))
.Include(d=>d.Book.)
.Include(g => g.Library.Select(h=>g.Book))
.Include(j => j.Library.Select(k => k.Library.Select(l=>l.Book)))
To doing this:
namespace Application.Test
{
using Utils.Extensions;
public class Test
{
public DbSet<User> Users { get; set; }
public DbSet<Room> Rooms { get; set; }
public DbSet<Post> Posts { get; set; }
public DbSet<Comment> Comments { get; set; }
public void Foo()
{
DB.Users.Include(x => x.Posts, x => x.Rooms, x => x.Members);
//OR
DB.Users.Include(x => x.Posts, x => x.Rooms, x => x.Members)
.ThenInclude(x => x.Posts, y => y.Owner, y => y.Comments);
}
}
}
this extension might be helpful:
namespace Utils.Extensions
{
using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
public static partial class LinqExtension
{
public static IQueryable<TEntity> Include<TEntity>(
this IQueryable<TEntity> sources,
params Expression<Func<TEntity, object>>[] properties)
where TEntity : class
{
System.Text.RegularExpressions.Regex regex = new(#"^\w+[.]");
IQueryable<TEntity> _sources = sources;
foreach (var property in properties)
_sources = _sources.Include($"{regex.Replace(property.Body.ToString(), "")}");
return _sources;
}
public static IQueryable<TEntity> ThenInclude<TEntity, TProperty>(
this IQueryable<TEntity> sources,
Expression<Func<TEntity, IEnumerable<TProperty>>> predicate,
params Expression<Func<TProperty, object>>[] properties)
where TEntity : class
{
System.Text.RegularExpressions.Regex regex = new(#"^\w+[.]");
IQueryable<TEntity> _sources = sources;
foreach (var property in properties)
_sources = _sources.Include($"{regex.Replace(predicate.Body.ToString(), "")}.{regex.Replace(property.Body.ToString(), "")}");
return _sources;
}
}
}

Resources