How to terminate a while loop when an Xpath query returns a null reference html agility pack - xpath

I'm trying to loop through every row of a variable length table on the a webpage (http://www.oddschecker.com/golf/the-masters/winner) and extract some data
The problem is I can't seem to catch the null reference and terminate the loop without it throwing an exception!
int i = 1;
bool test = string.IsNullOrEmpty(doc.DocumentNode.SelectNodes(String.Format("//*[#id='t1']/tr[{0}]/td[3]/a[2]", i))[0].InnerText);
while (test != true)
{
string name = doc.DocumentNode.SelectNodes(String.Format("//*[#id='t1']/tr[{0}]/td[3]/a[2]", i))[0].InnerText;
//extract data
i++;
}
try-catch statements don't catch it either:
bool test = false;
try
{
string golfersName = doc.DocumentNode.SelectNodes(String.Format("//*[#id='t1']/tr[{0}]/td[3]/a[2]", i))[0].InnerText;
}
catch
{
test = true;
}
while (test != true)
{
...

The code logic is a bit off. With the original code, if test evaluated true the loop will never terminates. It seems that you want to do checking in every loop iteration instead of only once at the beginning.
Anyway, there is a better way around. You can select all relevant nodes without specifying each <tr> indices, and use foreach to loop through the node set :
var nodes = doc.DocumentNode.SelectNodes("//*[#id='t1']/tr/td[3]/a[2]");
foreach(HtmlNode node in nodes)
{
string name = node.InnerText;
//extract data
}
or using for loop instead of foreach, if index of each node is necessary for the "extract data" process :
for(i=1; i<=nodes.Count; i++)
{
//array index starts from 0, unlike XPath element index
string name = nodes[i-1].InnerText;
//extract data
}
Side note : To query single element you can use SelectSingleNode("...") instead of SelectNodes("...")[0]. Both methods return null if no nodes match XPath criteria, so you can do checking against the original value returned instead of against InnerText property to avoid exception :
var node = doc.DocumentNode.SelectSingleNode("...");
if(node != null)
{
//do something
}

Related

Algorithm / data structure for resolving nested interpolated values in this example?

I am working on a compiler and one aspect currently is how to wait for interpolated variable names to be resolved. So I am wondering how to take a nested interpolated variable string and build some sort of simple data model/schema for unwrapping the evaluated string so to speak. Let me demonstrate.
Say we have a string like this:
foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}
That has 1, 2, and 3 levels of nested interpolations in it. So essentially it should resolve something like this:
wait for x, y, one, two, and c to resolve.
when both x and y resolve, then resolve a{x}-{y} immediately.
when both one and two resolve, resolve baz{one}-{two}.
when a{x}-{y}, baz{one}-{two}, and c all resolve, then finally resolve the whole expression.
I am shaky on my understanding of the logic flow for handling something like this, wondering if you could help solidify/clarify the general algorithm (high level pseudocode or something like that). Mainly just looking for how I would structure the data model and algorithm so as to progressively evaluate when the pieces are ready.
I'm starting out trying and it's not clear what to do next:
{
dependencies: [
{
path: [x]
},
{
path: [y]
}
],
parent: {
dependency: a{x}-{y} // interpolated term
parent: {
dependencies: [
{
}
]
}
}
}
Some sort of tree is probably necessary, but I am having trouble figuring out what it might look like, wondering if you could shed some light on that with some pseudocode (or JavaScript even).
watch the leaf nodes at first
then, when the children of a node are completed, propagate upward to resolving the next parent node. This would mean once x and y are done, it could resolve a{x}-{y}, but then wait until the other nodes are ready before doing the final top-level evaluation.
You can just simulate it by sending "events" to the system theoretically, like:
ready('y')
ready('c')
ready('x')
ready('a{x}-{y}')
function ready(variable) {
if ()
}
...actually that may not work, not sure how to handle the interpolated nodes in a hacky way like that. But even a high level description of how to solve this would be helpful.
export type SiteDependencyObserverParentType = {
observer: SiteDependencyObserverType
remaining: number
}
export type SiteDependencyObserverType = {
children: Array<SiteDependencyObserverType>
node: LinkNodeType
parent?: SiteDependencyObserverParentType
path: Array<string>
}
(What I'm currently thinking, some TypeScript)
Here is an approach in JavaScript:
Parse the input string to create a Node instance for each {} term, and create parent-child dependencies between the nodes.
Collect the leaf Nodes of this tree as the tree is being constructed: group these leaf nodes by their identifier. Note that the same identifier could occur multiple times in the input string, leading to multiple Nodes. If a variable x is resolved, then all Nodes with that name (the group) will be resolved.
Each node has a resolve method to set its final value
Each node has a notify method that any of its child nodes can call in order to notify it that the child has been resolved with a value. This may (or may not yet) lead to a cascading call of resolve.
In a demo, a timer is set up that at every tick will resolve a randomly picked variable to some number
I think that in your example, foo, and a might be functions that need to be called, but I didn't elaborate on that, and just considered them as literal text that does not need further treatment. It should not be difficult to extend the algorithm with such function-calling features.
class Node {
constructor(parent) {
this.source = ""; // The slice of the input string that maps to this node
this.texts = []; // Literal text that's not part of interpolation
this.children = []; // Node instances corresponding to interpolation
this.parent = parent; // Link to parent that should get notified when this node resolves
this.value = undefined; // Not yet resolved
}
isResolved() {
return this.value !== undefined;
}
resolve(value) {
if (this.isResolved()) return; // A node is not allowed to resolve twice: ignore
console.log(`Resolving "${this.source}" to "${value}"`);
this.value = value;
if (this.parent) this.parent.notify();
}
notify() {
// Check if all dependencies have been resolved
let value = "";
for (let i = 0; i < this.children.length; i++) {
const child = this.children[i];
if (!child.isResolved()) { // Not ready yet
console.log(`"${this.source}" is getting notified, but not all dependecies are ready yet`);
return;
}
value += this.texts[i] + child.value;
}
console.log(`"${this.source}" is getting notified, and all dependecies are ready:`);
this.resolve(value + this.texts.at(-1));
}
}
function makeTree(s) {
const leaves = {}; // nodes keyed by atomic names (like "x" "y" in the example)
const tokens = s.split(/([{}])/);
let i = 0; // Index in s
function dfs(parent=null) {
const node = new Node(parent);
const start = i;
while (tokens.length) {
const token = tokens.shift();
i += token.length;
if (token == "}") break;
if (token == "{") {
node.children.push(dfs(node));
} else {
node.texts.push(token);
}
}
node.source = s.slice(start, i - (tokens.length ? 1 : 0));
if (node.children.length == 0) { // It's a leaf
const label = node.texts[0];
leaves[label] ??= []; // Define as empty array if not yet defined
leaves[label].push(node);
}
return node;
}
dfs();
return leaves;
}
// ------------------- DEMO --------------------
let s = "foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}";
const leaves = makeTree(s);
// Create a random order in which to resolve the atomic variables:
function shuffle(array) {
for (var i = array.length - 1; i > 0; i--) {
var j = Math.floor(Math.random() * (i + 1));
[array[j], array[i]] = [array[i], array[j]];
}
return array;
}
const names = shuffle(Object.keys(leaves));
// Use a timer to resolve the variables one by one in the given random order
let index = 0;
function resolveRandomVariable() {
if (index >= names.length) return; // all done
console.log("\n---------------- timer tick --------------");
const name = names[index++];
console.log(`Variable ${name} gets a value: "${index}". Calling resolve() on the connected node instance(s):`);
for (const node of leaves[name]) node.resolve(index);
setTimeout(resolveRandomVariable, 1000);
}
setTimeout(resolveRandomVariable, 1000);
your idea of building a dependency tree it's really likeable.
Anyway I tryed to find a solution as simplest possible.
Even if it already works, there are many optimizations possible, take this just as proof of concept.
The background idea it's produce a List of Strings which you can read in order where each element it's what you need to solve progressively. Each element might be mandatory to solve something that come next in the List, hence for the overall expression. Once you solved all the chunks you have all pieces to solve your original expression.
It's written in Java, I hope it's understandable.
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
public class StackOverflow {
public static void main(String[] args) {
String exp = "foo{a{x}-{y}}-{baz{one}-{two}}-foo{c}";
List<String> chunks = expToChunks(exp);
//it just reverse the order of the list
Collections.reverse(chunks);
System.out.println(chunks);
//output -> [c, two, one, baz{one}-{two}, y, x, a{x}-{y}]
}
public static List<String> expToChunks(String exp) {
List<String> chunks = new ArrayList<>();
//this first piece just find the first inner open parenthesys and its relative close parenthesys
int begin = exp.indexOf("{") + 1;
int numberOfParenthesys = 1;
int end = -1;
for(int i = begin; i < exp.length(); i++) {
char c = exp.charAt(i);
if (c == '{') numberOfParenthesys ++;
if (c == '}') numberOfParenthesys --;
if (numberOfParenthesys == 0) {
end = i;
break;
}
}
//this if put an end to recursive calls
if(begin > 0 && begin < exp.length() && end > 0) {
//add the chunk to the final list
String substring = exp.substring(begin, end);
chunks.add(substring);
//remove from the starting expression the already considered chunk
String newExp = exp.replace("{" + substring + "}", "");
//recursive call for inner element on the chunk found
chunks.addAll(Objects.requireNonNull(expToChunks(substring)));
//calculate other chunks on the remained expression
chunks.addAll(Objects.requireNonNull(expToChunks(newExp)));
}
return chunks;
}
}
Some details on the code:
The following piece find the begin and the end index of the first outer chunk of expression. The background idea is: in a valid expression the number of open parenthesys must be equal to the number of closing parenthesys. The count of open(+1) and close(-1) parenthesys can't ever be negative.
So using that simple loop once I find the count of parenthesys to be 0, I also found the first chunk of the expression.
int begin = exp.indexOf("{") + 1;
int numberOfParenthesys = 1;
int end = -1;
for(int i = begin; i < exp.length(); i++) {
char c = exp.charAt(i);
if (c == '{') numberOfParenthesys ++;
if (c == '}') numberOfParenthesys --;
if (numberOfParenthesys == 0) {
end = i;
break;
}
}
The if condition provide validation on the begin and end indexes and stop the recursive call when no more chunks can be found on the remained expression.
if(begin > 0 && begin < exp.length() && end > 0) {
...
}

Null Reference Exception despite checking for Null

I'm reading from an XML file with C# that has a dynamically generated number of nodes named Col1, Col2, etc. When I attempt to run a while loop on these nodes and check for null I still get a NullReferenceException. Can anyone suggest on how to handle this to avoid the exception?
int col = 1;
string colCount = col.ToString();
colCount = "Col" + colCount;
while (nodes[0][colCount].InnerText != null)
{
timeToFillValues.Add(double.Parse(nodes[0][colCount].InnerText));
col++;
colCount = "Col" + col.ToString();
}
Since you mentioned in comments the error happens when checking condition, there are only few cases where you should be getting an error:
if(nodes == null)
{
// was nodes not populated? Then populate it
}
if(nodes[0][colCount] == null)
{
// does element actually exist or is it null?
}
Check for these conditions before calling the while loop and you should find out wha tis causing the issue.

data structure for sorted browsing history

Suppose i want to implement the browser history functionality. If i visit the the url for this first time it goes into the history , if i visit the same page again it comes up in the history list.
lets say that i only display the top 20 sites, but i can choose to see history say for the last month , last week and so on .
what is the best approach for this ? i would use hash map for inserting / checking if it is visited earlier , but how do i sort efficiently for recently visited, i don't want to use tree map or tree set . also, how can i store history of weeks and months. Is it written on disk when browser closes ? and when i click clear history , how is the data structure deleted ?
This is in Java-ish code.
You'll need two data structures: a hash map and a doubly linked list. The doubly linked list contains History objects (which contain a url string and a timestamp) in order sorted by timestamp; the hash map is a Map<String, History>, with urls as the key.
class History {
History prev
History next
String url
Long timestamp
void remove() {
prev.next = next
next.prev = prev
next = null
prev = null
}
}
When you add a url to the history, check to see if it's in the hash map; if it is then update its timestamp, remove it from the linked list, and add it to the end of the linked list. If it's not in the hash map then add it to the hash map and also add it to the end of the linked list. Adding a url (whether or not it's already in the hash map) is a constant time operation.
class Main {
History first // first element of the linked list
History last // last element of the linked list
HashMap<String, History> map
void add(String url) {
History hist = map.get(url)
if(hist != null) {
hist.remove()
hist.timestamp = System.currenttimemillis()
} else {
hist = new History(url, System.currenttimemillis())
map.add(url, hist)
}
last.next = hist
hist.prev = last
last = hist
}
}
To get the history from e.g. the last week, traverse the linked list backwards until you hit the correct timestamp.
If thread-safety is a concern, then use a thread-safe queue for urls to be added to the history, and use a single thread to process this queue; this way your map and linked list don't need to be thread-safe i.e. you don't need to worry about locks etc.
For persistence you can serialize / deserialize the linked list; when you deserialize the linked list, reconstruct the hash map by traversing it and adding its elements to the map. Then to clear the history you'd null the list and map in memory and delete the file you serialized the data to.
A more efficient solution in terms of memory consumption and IO (i.e. (de)serialization cost) is to use a serverless database like SQLite; this way you won't need to keep the history in memory, and if you want to get the history from e.g. the last week you'd just need to query the database rather than traversing the linked list. However, SQLite is essentially a treemap (specifically a B-Tree, which is optimized for data stored on disk).
Here is a Swift 4.0 implementation based on Zim-Zam O'Pootertoot's answer, including an iterator for traversing the history:
import Foundation
class SearchHistory: Sequence {
var first: SearchHistoryItem
var last: SearchHistoryItem
var map = [String: SearchHistoryItem]()
var count = 0
var limit: Int
init(limit: Int) {
first = SearchHistoryItem(name: "")
last = first
self.limit = Swift.max(limit, 2)
}
func add(name: String) {
var item: SearchHistoryItem! = map[name]
if item != nil {
if item.name == last.name {
last = last.prev!
}
item.remove()
item.timestamp = Date()
} else {
item = SearchHistoryItem(name: name)
count += 1
map[name] = item
if count > limit {
first.next!.remove()
count -= 1
}
}
last.next = item
item.prev = last
last = item
}
func makeIterator() -> SearchHistory.SearchHistoryIterator {
return SearchHistoryIterator(item: last)
}
struct SearchHistoryIterator: IteratorProtocol {
var currentItem: SearchHistoryItem
init(item: SearchHistoryItem) {
currentItem = item
}
mutating func next() -> SearchHistoryItem? {
var item: SearchHistoryItem? = nil
if let prev = currentItem.prev {
item = currentItem
currentItem = prev
}
return item
}
}
}
class SearchHistoryItem {
var prev: SearchHistoryItem?
var next: SearchHistoryItem?
var name: String
var timestamp: Date
init(name: String) {
self.name = name
timestamp = Date()
}
func remove() {
prev?.next = next
next?.prev = prev
next = nil
prev = nil
}
}

Linq to Objects - query objects for any non-numeric data

I am trying to write some logic to determine if all values of a certain property of an object in a collection are numeric and greater than zero. I can easily write this using ForEach but I'd like to do it using Linq to Object. I tried this:
var result = entity.Reports.Any(
x =>
x.QuestionBlock == _question.QuestionBlock
&& (!string.IsNullOrEmpty(x.Data)) && Int32.TryParse(x.Data, out tempVal)
&& Int32.Parse(x.Data) > 0);
It does not work correctly. I also tried this, hoping that the TryParse() on Int32 will return false the first time it encounter a string that cannot be parsed into an int. But it appears the out param will contain the first value string value that can be parsed into an int.
var result = entity.GranteeReportDataModels.Any(
x =>
x.QuestionBlock == _question.QuestionBlock
&& (!string.IsNullOrEmpty(x.Data)) && Int32.TryParse(x.Data, out tempVal));
Any help is greatly appreciated!
If you want to test if "all" values meet a condition, you should use the All extension method off IEnumerable<T>, not Any. I would write it like this:
var result = entity.Reports.All(x =>
{
int result = 0;
return int.TryParse(x.Data, out result) && result > 0;
});
I don't believe you need to test for an null or empty string, because int.TryPrase will return false if you pass in a null or empty string.
var allDataIsNatural = entity.Reports.All(r =>
{
int i;
if (!int.TryParse(r.Data, out i))
{
return false;
}
return i > 0;
});
Any will return when the first row is true but, you clearly say you would like to check them all.
You can use this extension which tries to parse a string to int and returns a int?:
public static int? TryGetInt(this string item)
{
int i;
bool success = int.TryParse(item, out i);
return success ? (int?)i : (int?)null;
}
Then this query works:
bool all = entity.Reports.All(x => {
if(x.QuestionBlock != _question.QuestionBlockint)
return false;
int? data = x.Data.TryGetInt();
return data.HasValue && data.Value > 0;
});
or more readable (a little bit less efficient):
bool all = entityReports
.All(x => x.Data.TryGetInt().HasValue && x.Data.TryGetInt() > 0
&& x.QuestionBlock == _question.QuestionBlockint);
This approach avoids using a local variable as out parameter which is an undocumented behaviour in Linq-To-Objects and might stop working in future. It's also more readable.

why doesn't it terminate?

var m_root : Node = root
private def insert(key: Int, value: Int): Node = {
if(m_root == null) {
m_root = Node(key, value, null, null)
}
var t : Node = m_root
var flag : Int = 1
while (t != null && flag == 1) {
if(key == t.key) {
t
}
else if(key < t.key) {
if(t.left == null) {
t.left = Node(key, value, null, null)
flag = 0
} else {
t = t.left
}
} else {
if(t.right == null) {
t.right = Node(key, value, null, null)
flag = 0
} else {
t = t.right
}
}
}
t
}
I wrote iterative version insert a node to binary search tree. I want to terminate when node is created, but it doesn't stop, because I think I didn't assign terminating condition. How to I edit my code to terminate when a node inserted in?
I'm not sure exactly what behaviour you want, but the cause is quite clear.
Your loop is a while condition, which will loop until t is null. So while t is non-null the loop will continue.
You only ever assign t to non-null values - in fact you're specifically checking for the null case and stopping it happening by creating a new node.
So either you need to reconsider your loop condition, or ensure t does in fact become null in some cases, depending on what your actual algorithm requirements are.
And since you're returning t at the bottom, I suggest the while condition is wrong; the only possible way this could terminate is if t is null at this point, so it would be pointless to return this anyway.
The first clause of your "if" statement in the loop
if(key == t.key) {
t
}
... does nothing if the comparison is true. It doesn't terminate the loop. The statement t is not synonymous with return t here. You can set flag = 0 at that point to terminate the loop.

Resources