HttpAgilityPack Is there a way to clone the HtmlDocument - clone

I need to clone an HtmlDocument object, and I'd like to avoid serializing/deserializing.
HtmlNode has a clone (deep) capability, so I thought to clone the document node, but I have no way to recreate an HtmlDocument from a document node.
Any suggestions?

Eventually I did this (as an extension function):
public static HtmlDocument Clone(this HtmlDocument source)
{
if (source == null)
{
return null;
}
HtmlDocument clone = new HtmlDocument();
PropertyInfo[] infos = typeof(HtmlDocument).GetProperties();
foreach (PropertyInfo pinfo in infos)
{
if (pinfo.CanRead && pinfo.CanWrite)
{
pinfo.SetValue(clone, pinfo.GetValue(source));
}
}
clone.DocumentNode.CopyFrom(source.DocumentNode, true);
return clone;
}
Seems to do the trick.
Comments are welcomed.

Related

Programmatically cloning Octopus Deploy Process steps and modifyng the cloned steps

We are developing a Pipeline for which we have to add over 100 steps and modify two things for each step: Step Name and PackageID.
Rather than going through the pain of doing this via the UI, we’d like to do this programmatically.
Below is some C# I’ve sketched out for this (I’m a C# developer with extremely limited PowerShell skills, that’s why I did this in C#).
The lines above the comment “From here on is where I'm fuzzy” are working code, but the lines below the comment are just pseudocode.
Can someone explain to me how to write the lines below the comment (or, the PowerShell equivalent)?
I wasn’t able to find API calls for this.
Thanks
namespace ODClientExample
{
class Program
{
static void Main(string[] args)
{
List<string> ListOfWindowsServices = new List<string>();
ListOfWindowsServices.Add("svc1");
ListOfWindowsServices.Add("svc2");
ListOfFWindowsServices.Add("svc3");
var server = "https://mysite.whatever/";
var apiKey = "API-xxxxxxxxxxxxxxxxxx"; // I generated this via the Octopus UI
var endpoint = new OctopusServerEndpoint(server, apiKey);
var repository = new OctopusRepository(endpoint);
var project = repository.Projects.FindByName("Windows Services");
// From here on is where I'm fuzzy:
//
var procesSteps = GetProcessSteps(project);
var processStepToClone = GetProcesStepByName(processSteps, "SomeProcessStep");
foreach (string svcName in ListofSvcNames)
{
processStepToClone.StepName = svcName;
processStepToClone.PackageID = svcName;
}
}
}
}
I've made a little more progress. I'm now able to access the Steps in the Process, and add a Step. However, when my code calls repository.DeploymentProcesses.Modify, I get this exception:
Please provide a value for the package ID.
Please select the feed that this package will be downloaded from.
Please select one or more roles that 'svc1' step will apply to.
Here's my latest code:
static void Main(string[] args)
{
List<string> ListOfFexWindowsServices = new List<string>();
ListOfFexWindowsServices.Add("svc2");
ListOfFexWindowsServices.Add("svc3");
ListOfFexWindowsServices.Add("svc4");
string server = "https://mysite.stuff/";
string apiKey = "API-xxxxxxxxxxxxxxxxxxxxxxxx"; // I generated this via the Octopus UI
OctopusServerEndpoint endpoint = new OctopusServerEndpoint(server, apiKey);
OctopusRepository repository = new OctopusRepository(endpoint);
ProjectResource projectResource = repository.Projects.FindByName("MyProject");
DeploymentProcessResource deploymentProcess = repository.DeploymentProcesses.Get(projectResource.DeploymentProcessId);
var projectSteps = deploymentProcess.Steps;
DeploymentStepResource stepToClone = new DeploymentStepResource();
foreach (DeploymentStepResource step in projectSteps)
{
if (step.Name == "svc1")
{
stepToClone = step;
break;
}
}
foreach (string serviceName in ListOfFexWindowsServices)
{
DeploymentStepResource newStep = new DeploymentStepResource();
PopulateNewStep(newStep, stepToClone, serviceName);
deploymentProcess.Steps.Add(newStep);
repository.DeploymentProcesses.Modify(deploymentProcess);
}
}
static void PopulateNewStep(DeploymentStepResource newStep, DeploymentStepResource stepToClone, string serviceName)
{
newStep.Name = serviceName;
newStep.Id = Guid.NewGuid().ToString();
newStep.StartTrigger = stepToClone.StartTrigger;
newStep.Condition = stepToClone.Condition;
DeploymentActionResource action = new DeploymentActionResource
{
Name = newStep.Name,
ActionType = "Octopus.TentaclePackage",
Id = Guid.NewGuid().ToString(),
};
PopulateActionProperties(action);
newStep.Actions.Add(action);
// ISSUE: Anything else to do (eg, any other things from stepToClone to copy, or other stuff to create)?
newStep.PackageRequirement = stepToClone.PackageRequirement;
}
static void PopulateActionProperties(DeploymentActionResource action)
{
action.Properties.Add(new KeyValuePair<string, PropertyValueResource>("Octopus.Action.WindowsService.CustomAccountPassword", "#{WindowsService.Password}"));
// TODO: Repeat this sort of thing for each Action Property you see in stepToClone.
}
void Main()
{
var sourceProjectName = "<source project name>";
var targetProjectName = "<target project name>";
var stepToCopyName = "<step name to copy>";
var repo = GetOctopusRepository();
var sourceProject = repo.Projects.FindByName(sourceProjectName);
var targetProject = repo.Projects.FindByName(targetProjectName);
if (sourceProject != null && targetProject != null)
{
var sourceDeploymentProcess = repo.DeploymentProcesses.Get(sourceProject.DeploymentProcessId);
var targetDeploymentProcess = repo.DeploymentProcesses.Get(targetProject.DeploymentProcessId);
if (sourceDeploymentProcess != null && targetDeploymentProcess != null)
{
Console.WriteLine($"Start copy from project '{sourceProjectName}' to project '{targetProjectName}'");
CopyStepToTarget(sourceDeploymentProcess, targetDeploymentProcess, stepToCopyName);
// Update or add the target deployment process
repo.DeploymentProcesses.Modify(targetDeploymentProcess);
Console.WriteLine($"End copy from project '{sourceProjectName}' to project '{targetProjectName}'");
}
}
}
private OctopusRepository GetOctopusRepository()
{
var octopusServer = Environment.GetEnvironmentVariable("OCTOPUS_CLI_SERVER");
var octopusApiKey = Environment.GetEnvironmentVariable("OCTOPUS_CLI_API_KEY");
var endPoint = new OctopusServerEndpoint(octopusServer, octopusApiKey);
return new OctopusRepository(endPoint);
}
private void CopyStepToTarget(DeploymentProcessResource sourceProcess, DeploymentProcessResource targetProcess, string sourceStepName, bool includeChannels = false, bool includeEnvironments = false)
{
var sourceStep = sourceProcess.FindStep(sourceStepName);
if (sourceStep == null)
{
Console.WriteLine($"{sourceStepName} not found in {sourceProcess.ProjectId}");
return;
}
Console.WriteLine($"-> copy step '{sourceStep.Name}'");
var stepToAdd = targetProcess.AddOrUpdateStep(sourceStep.Name);
stepToAdd.RequirePackagesToBeAcquired(sourceStep.RequiresPackagesToBeAcquired);
stepToAdd.WithCondition(sourceStep.Condition);
stepToAdd.WithStartTrigger(sourceStep.StartTrigger);
foreach (var property in sourceStep.Properties)
{
if (stepToAdd.Properties.ContainsKey(property.Key))
{
stepToAdd.Properties[property.Key] = property.Value;
}
else
{
stepToAdd.Properties.Add(property.Key, property.Value);
}
}
foreach (var sourceAction in sourceStep.Actions)
{
Console.WriteLine($"-> copy action '{sourceAction.Name}'");
var targetAction = stepToAdd.AddOrUpdateAction(sourceAction.Name);
targetAction.ActionType = sourceAction.ActionType;
targetAction.IsDisabled = sourceAction.IsDisabled;
if (includeChannels)
{
foreach (var sourceChannel in sourceAction.Channels)
{
targetAction.Channels.Add(sourceChannel);
}
}
if (includeEnvironments)
{
foreach (var sourceEnvironment in sourceAction.Environments)
{
targetAction.Environments.Add(sourceEnvironment);
}
}
foreach (var actionProperty in sourceAction.Properties)
{
if (targetAction.Properties.ContainsKey(actionProperty.Key))
{
targetAction.Properties[actionProperty.Key] = actionProperty.Value;
}
else
{
targetAction.Properties.Add(actionProperty.Key, actionProperty.Value);
}
}
}
}
The above code sample is available in the Octopus Client Api Samples

NPOI - Determine heading before a paragraph

I'm attempting to write a parser to extract details from a word document using NPOI. I'm able to retrieve details from each table in the document but I need to be able to identify which section of the document the table comes from in order to differentiate between them. While I can identify all of the lines that have the specific heading type I need, I can't work out how to tell which heading precedes which table.
Can anybody offer any advice? If it's not possible with NPOI, can anybody recommend another way to do it?
If you are parsing word document. I'll suggest you to use OpenXMlpowertool by Eric white, download it from NuGet package manager or download directly from net.
Here is the code snippet i have used to parse document , code snippet is very small, clean and stable. You must first debug it to understand it working which will help you to customize it for yourself. it will read all text, paragraphs , bullets and contents etc. go through the documentation of Eric White for more details but below code snippet is the most you'll need to parse and top of it you can build your functionality.
using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;
private static WordprocessingDocument _wordDocument;
_wordDocument = WordprocessingDocument.Open(wordFileStream, false); // stream wordFileStream in constructor
// To get header and footer use this
var headerList = _wordDocument.MainDocumentPart.HeaderParts.ToList();
var footerList = _wordDocument.MainDocumentPart.FooterParts.ToList();
private void GetDocumentBodyContents()
{
List<string> allList = new List<string>();
List<string> allListText = new List<string>();
try
{
//RevisionAccepter.AcceptRevisions(_wordDocument);
XElement root = _wordDocument.MainDocumentPart.GetXDocument().Root;
XElement body = root.LogicalChildrenContent().First();
OutputBlockLevelContent(_wordDocument, body);
}
catch (Exception ex)
{ }
}
private void OutputBlockLevelContent(WordprocessingDocument wordDoc, XElement blockLevelContentContainer)
{
try
{
string currentItem = string.Empty, currentItemText = string.Empty, numberText = string.Empty;
foreach (XElement blockLevelContentElement in
blockLevelContentContainer.LogicalChildrenContent())
{
if (blockLevelContentElement.Name == W.p)
{
currentItem = ListItemRetriever.RetrieveListItem(wordDoc, blockLevelContentElement);
//currentItemText = blockLevelContentElement
// .LogicalChildrenContent(W.r)
// .LogicalChildrenContent(W.t)
// .Select(t => (string)t)
// .StringConcatenate();
currentItemText = blockLevelContentElement
.LogicalChildrenContent(W.r)
.Select(t =>
{
if (t.LogicalChildrenContent(W.br).Count() > 0)
{
//Adding line Break for Steps because it is truncated when typecaste with String
t.SetElementValue(W.br, "<br />");
}
return (string)t;
}
).StringConcatenate();
continue;
}
// If element is not a paragraph, it must be a table.
foreach (var row in blockLevelContentElement.LogicalChildrenContent())
{
foreach (var cell in row.LogicalChildrenContent())
{
// Cells are a block-level content container, so can call this method recursively.
OutputBlockLevelContent(wordDoc, cell);
}
}
}
}
catch (Exception ex)
{
}
}

HTML Agility pack throw in "An unhandled exception of type 'System.ArgumentNullException' occurred in mscorlib.dll" Exception

I'm going to parsing a website content Website HTML Content. I use follow codes:
private async void loadButton_Click(object sender, RoutedEventArgs e)
{
string address = "http://www.iiees.ac.ir/fa/eqcatalog";
loadButton.IsEnabled = false;
//string htmlCode = await DownloadStringAsync(address, Encoding.UTF8);
string htmlCode = await GetDataAsync(address, Encoding.UTF8);
paragraph1.Inlines.Add(new Run(htmlCode));
loadButton.IsEnabled = true;
}
private async Task<string> GetDataAsync(string address, Encoding encoding)
{
string htmlCode = await DownloadStringAsync(address, encoding);
List<string> options = new List<string>();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (var table in doc.DocumentNode.SelectNodes("//table"))
{
if (table.Id == "CatalogForm")
{
foreach (HtmlNode row in table.SelectNodes("tr"))
{
if (row.Attributes.Any(a => a.Value == "hidecircle"))
{
foreach (HtmlNode td in row.SelectNodes("TD"))
{
foreach (HtmlNode option in td.SelectNodes("option"))
{
options.Add(option.InnerText);
}
}
}
}
}
}
var sb = new StringBuilder();
options.ForEach(str => sb.AppendLine(str));
return sb.ToString();
}
But throw in "An unhandled exception of type 'System.ArgumentNullException' occurred in mscorlib.dll" exception.
For more information, I upload my project here.
Download project
I created a sample from your code, and I found the following:
I am not particularly familiar with xpath, but seems it is case sensitive. Searching for "TD" will return nothing, while searching for "td" will.
SelectNodes will return null whenever nothing was found, and a foreach that depends on it will always throw an exception. So it is better to check whether the result of SelectNodes is not null before trying to loop over it.
foreach (HtmlNode td in row.SelectNodes("td"))
{
var ops = td.SelectNodes("option");
if (ops != null)
{
foreach (HtmlNode option in ops)
{
options.Add(option.InnerText);
}
}
}

Agnostic Screen scraper using HtmlAgilityPack

Lets say I want a screen scraper that doesn't care if you pass it an HTML page, url that goes to an XML Document, or a Url that goes to a text file.
examples:
http://tonto.eia.doe.gov/oog/info/wohdp/dslpriwk.txt
http://google.com
This will work if the page is HTML or a text file:
public class ScreenScrapingService : IScreenScrapingService
{
public XDocument Scrape(string url)
{
var scraper = new HtmlWeb();
var stringWriter = new StringWriter();
var xml = new XmlTextWriter(stringWriter);
scraper.LoadHtmlAsXml(url, xml);
var text = stringWriter.ToString();
return XDocument.Parse(text);
}
}
However; if it is an XML file such as:
http://www.eia.gov/petroleum/gasdiesel/includes/gas_diesel_rss.xml
[Test]
public void Scrape_ShouldScrapeSomething()
{
//arrange
var sut = new ScreenScrapingService();
//act
var result = sut.Scrape("http://www.eia.gov/petroleum/gasdiesel/includes/gas_diesel_rss.xml");
//assert
}
Then I get the error:
An exception of type 'System.Xml.XmlException' occurred in System.Xml.dll but was not handled in user code
Is it possible to write this so that it doesn't care what the URL ultimately is?
to get the exact exception on visual studio CTR+ALT+E and enable CommonLanguageRunTimeExceptions, it seems like LoadHtmlAsXml expects html, so probably your best bet is to use a WebClient.DownloadString(url) and HtmlDocument with property OptionOutputAsXml set to true as the following, when that fails catch it
public XDocument Scrape(string url)
{
var wc = new WebClient();
var htmlorxml = wc.DownloadString(url);
var doc = new HtmlDocument() { OptionOutputAsXml = true};
var stringWriter = new StringWriter();
doc.Save(stringWriter);
try
{
return XDocument.Parse(stringWriter.ToString());
}
catch
{
//it only gets here when the string is xml already
try
{
return XDocument.Parse(htmlorxml);
}
catch
{
return null;
}
}
}

How to convert this EF Cloning Code to use DBContext instead of EntityObject?

I need to clone Master and Child Entities. I have come across this solution on CodeProject which seems to do the job, see: http://www.codeproject.com/Tips/474296/Clone-an-Entity-in-Entity-Framework-4.
However I am using EF5 and DBContext whereas this code is using EF4 and EntityObject, so I am wondering what changes I need to make to it?
The code is:
public static T CopyEntity<T>(MyContext ctx, T entity, bool copyKeys = false) where T : EntityObject
{
T clone = ctx.CreateObject<T>();
PropertyInfo[] pis = entity.GetType().GetProperties();
foreach (PropertyInfo pi in pis)
{
EdmScalarPropertyAttribute[] attrs = (EdmScalarPropertyAttribute[])
pi.GetCustomAttributes(typeof(EdmScalarPropertyAttribute), false);
foreach (EdmScalarPropertyAttribute attr in attrs)
{
if (!copyKeys && attr.EntityKeyProperty)
continue;
pi.SetValue(clone, pi.GetValue(entity, null), null);
}
}
return clone;
}
The calling code is:
Customer newCustomer = CopyEntity(myObjectContext, myCustomer, false);
foreach(Order order in myCustomer.Orders)
{
Order newOrder = CopyEntity(myObjectContext, order, true);
newCustomer.Orders.Add(newOrder);
}
I am posting here as the feedbacks on this post look inactive and I am sure this is a question that could be answered by any EF pro.
Many thanks in advance.
If you want a clone of an entity using EF5 DbContext the simplest way is this:
//clone of the current entity values
Object currentValClone = context.Entry(entityToClone)
.CurrentValues.ToObject();
//clone of the original entity values
Object originalValueClone = context.Entry(entityToClone)
.OriginalValues.ToObject();
//clone of the current entity database values (results in db hit
Object dbValueClone = context.Entry(entityToClone)
.GetDatabaseValues().ToObject();
your code will be work only if in your entity property has EdmScalarPropertyAttribute
alternatively you can use MetadataWorkspace to get entity property
public static class EntityExtensions
{
public static TEntity CopyEntity<TEntity>(DbContext context, TEntity entity, bool copyKeys = false)
where TEntity : class
{
ObjectContext ctx = ((IObjectContextAdapter)context).ObjectContext;
TEntity clone = null;
if (ctx != null)
{
context.Configuration.AutoDetectChangesEnabled = false;
try
{
clone = ctx.CreateObject<TEntity>();
var objectEntityType = ctx.MetadataWorkspace.GetItems(DataSpace.OSpace).Where(x => x.BuiltInTypeKind == BuiltInTypeKind.EntityType).OfType<EntityType>().Where(x => x.Name == clone.GetType().Name).Single();
var pis = entity.GetType().GetProperties().Where(t => t.CanWrite);
foreach (PropertyInfo pi in pis)
{
var key = objectEntityType.KeyProperties.Where(t => t.Name == pi.Name).FirstOrDefault();
if (key != null && !copyKeys)
continue;
pi.SetValue(clone, pi.GetValue(entity, null), null);
}
}
finally
{
context.Configuration.AutoDetectChangesEnabled = true;
}
}
return clone;
}
}

Resources