i am building up a java application to extract the values inside the table tags using xpath.
Please suggest me an efficient way to get all 200 values from the page. my code works perfectly fine for the 100 rows withing the 1st DataTable. However, i have no way to get to the 2nd dataTable.
i am able to extract them using the following java class.
the expected output
http://a.com/ data for a 526735 Z
http://b.com/ data for b 522273 Z
.
.
.
.
http://c.com/ data for c 578335 Z
http://d.com/ data for d 513445 Z
<table>
<tbody>
<tr>
<td style="padding-right>
<table class = dataTabe>
<tbody>
<tr>
<td>data for a</td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td>data for b</td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
.
.
.100 <tr> here
.
</tbody>
</table>
</td>
<td style="padding-right>
<table class = dataTabe>
<tbody>
<tr>
<td>data for c</td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td>data for d</td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
.
.
.100 rows here
.
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
This is the class used to get the data.
import java.io.BufferedReader;
import java.io.InputStream;
import org.w3c.tidy.*;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Node;
import org.w3c.tidy.Tidy;
import org.w3c.tidy.Tidy;
public class CompaniesGetter {
public static void main(String[] args) throws Exception{
String name,link,scripcode,group,s,key;
int a=1;
int count=1;
URL oracle = new URL("http://money.rediff.com/companies");
URLConnection yc = oracle.openConnection();
InputStream is = yc.getInputStream();
is = oracle.openStream();
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document tidyDOM = tidy.parseDOM(is, null);
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
Map<String,String> mLink=new HashMap<String,String>();
Map<String,String> mCode=new HashMap<String,String>();
Map<String,String> mGroup=new HashMap<String,String>();
ArrayList<String> aName=new ArrayList<String>();
//for(int j=0;j<2;j++)
for(int i =1;i<=200;i++)
{if(i==100)
{
a=2;
s=attrib[1];
}
link = "//table[#class='dataTable']/tbody/tr["+i+"]/td/a/#href";
name = "//table[#class='dataTable']/tbody/tr["+i+"]/td/a";
scripcode = "//table[#class='dataTable']/tbody/tr["+i+"]/td[2]";
group = "//table[#class='dataTable']/tbody/tr["+i+"]/td[3]";
String linkValue = (String)xPath.evaluate(link, tidyDOM, XPathConstants.STRING);
String nameValue = (String)xPath.evaluate(name, tidyDOM, XPathConstants.STRING);
String scripValue = (String)xPath.evaluate(scripcode, tidyDOM, XPathConstants.STRING);
String groupValue = (String)xPath.evaluate(group, tidyDOM, XPathConstants.STRING);
aName.add(nameValue);
mLink.put(nameValue, linkValue);
mCode.put(nameValue, scripValue);
mGroup.put(nameValue,groupValue);
}
Iterator<String> itr=aName.iterator();
while (itr.hasNext()){
key=itr.next();
System.out.println("::"+(count++)+" "+key + " "+mLink.get(key)+" "+mCode.get(key)+" "+mGroup.get(key)+" ::");
}
}
}
Hm. Just a tip: Do you use the variable "a" in the XPaths?
link = "//table[#class='dataTable']/tbody/tr["+i+"]/td/a/#href";
should be
link = "//table[#class='dataTable'][" + a + "]/tbody/tr["+i+"]/td/a/#href";
Related
How do we remove the inline height attribute from html?
<tr style="height:2px;">
</tr>
<tr style="height:2px;">
</tr>
I want only height attributes to be removed from all tr tags.
Thanks a lot in advance,
You can:
If your trs have no other styles other than height, you can simply remove strip them from their style attribute (the line I commented out)
Otherwise, you can write something like the snippet below to filter which style keys you want to remove
string html = #"<tr style='height:2px;'>
</tr>
<tr style='height:2px;'>
</tr>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var trs = doc.DocumentNode.SelectNodes("tr");
foreach (var tr in trs)
{
Console.WriteLine(tr.OuterHtml);
//tr.Attributes.Remove("style");
var filteredStyles = GetStyles(tr.GetAttributeValue("style"), "height");
tr.SetAttributeValue("style", string.Join(":", filteredStyles));
Console.WriteLine(tr.OuterHtml);
}
Helper function:
private static List<string> GetStyles(string style, params string[] keysToRemove)
{
List<string> styles = new List<string>();
var stylesKeyPairs = style.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
if (keysToRemove != null)
{
foreach (var styleKeyPair in stylesKeyPairs)
{
var styleKeys = styleKeyPair.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!keysToRemove.Contains(styleKeys.FirstOrDefault()))
styles.Add(styleKeyPair);
}
}
else
styles.AddRange(stylesKeyPairs);
return styles;
}
Output (for both solutions, in this case):
I'm doing a little experiment that might grow into something bigger, but I've hit a snag. Something isn't working right with the variable "placement" and getElementById. I know you can use variables with getElementById, but for some reason I can't get it to work. Here's my code so far. Thanks for the help!
<script>
var chord = [3, 2, 0, 0, 3, 3];
for(i=0; i<chord.length; i++){
switch(chord[i]){
case 0:
var note = '0';
break;
case 1:
var note = '1';
break;
case 2:
var note = '2';
break;
case 3:
var note = '3';
break;
case 4:
var note = '4';
break;
case 5:
var note = '5';
break;
}
var placement = 'note' + i + note;
var placement = placement.toString();
document.getElementById(placement).innerHTML = 'o';
}
</script>
<table class="chord">
<tr style="border-top:5px solid gray;">
<td id="note00"></td><td id="note10"></td><td id="note20"></td><td id="note30"></td><td id="note40"></td><td id="note50"></td>
</tr>
<tr>
<td id="note01"></td><td id="note11"></td><td id="note21"></td><td id="note31"></td><td id="note41"></td><td id="note51"></td>
</tr>
<tr>
<td id="note02"></td><td id="note12"></td><td id="note22"></td><td id="note32"></td><td id="note42"></td><td id="note52"></td>
</tr>
<tr>
<td id="note03"></td><td id="note13"></td><td id="note23"></td><td id="note33"></td><td id="note43"></td><td id="note53"></td>
</tr>
<tr>
<td id="note04"></td><td id="note14"></td><td id="note24"></td><td id="note34"></td><td id="note44"></td><td id="note54"></td>
</tr>
</table>
http://jsfiddle.net/L59s4924/
it works.
But try to remove second "var" on "placement" variable declaration.
var placement = 'note' + i + note;
placement = placement.toString();
What is the better way to parse such xml:
<FindLicensesResponse xmlns="http://abc.com">
<FindLicensesResult>
<Licensies>
<ActivityLicense>
<id>1</id>
<DateIssue>2011-12-29T00:00:00</DateIssue>
<ActivityType xmlns:s01="http://www.w3.org/2001/XMLSchema-instance" s01:type="ActivityType">
<code>somecode1</code>
</ActivityType>
<ActivityTerritory xmlns:s02="http://www.w3.org/2001/XMLSchema-instance" s02:type="Territory">
<code>somecode2</code>
</ActivityTerritory>
<ActivityLicenseAttachments />
</ActivityLicense>
<ActivityLicense>
<id>2</id>
<DateIssue>2011-12-21T00:00:00</DateIssue>
<ActivityType xmlns:s01="http://www.w3.org/2001/XMLSchema-instance" s01:type="ActivityType">
<code>somecode3</code>
</ActivityType>
<ActivityTerritory xmlns:s02="http://www.w3.org/2001/XMLSchema-instance" s02:type="Territory">
<code>somecode4</code>
</ActivityTerritory>
<ActivityLicenseAttachments />
</ActivityLicense>
</Licensies>
</FindLicensesResult>
I need to get values from each ActivityLicense: id, DateIssue and inner ActivityType: code and inner ActivityTerritory: code.
Now I do it like this:
CachedXPathAPI xpathAPI = new CachedXPathAPI();
Element nsctx = result.getSOAPPart().createElementNS(null, "nsctx");
nsctx.setAttributeNS("http://www.w3.org/2000/xmlns/","xmlns:el","http://abc.com");
NodeList activityLicenses = xpathAPI.selectNodeList(result.getSOAPPart(),"//el:ActivityLicense", nsctx);
for (int i = 0; i < activityLicenses.getLength(); i++) {
Node id = xpathAPI.selectSingleNode(activityLicenses.item(i), "//el:id", nsctx);
Node dateIssue = xpathAPI.selectSingleNode(activityLicenses.item(i), "//el:DateIssue",nsctx);
System.out.println("id: " + id.getTextContent());
System.out.println("dateIssue: " + dateIssue.getTextContent());
}
But I can't get values from ActivityType/code and ActivityTerritory/code
check out this solution
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.InputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
public class StringTest {
public static void main(String[] args) throws Exception {
String xml = "";
java.util.Scanner sc = new java.util.Scanner(new File("xml.xml"));
while(sc.hasNextLine()){
xml+=sc.nextLine();
}
javax.xml.parsers.DocumentBuilderFactory dbFactory = javax.xml.parsers.DocumentBuilderFactory.newInstance();
javax.xml.parsers.DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(xml.getBytes());
org.w3c.dom.Document doc = dBuilder.parse(is);
doc.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
org.w3c.dom.NodeList nodeList = doc.getElementsByTagName("ActivityLicense");
for(int i=0;i<nodeList.getLength();i++){
org.w3c.dom.Node node = nodeList.item(i);
System.out.println(xpath.evaluate("ActivityTerritory/code/text()", node, XPathConstants.STRING));
}
}
}
I have been struggling to get a partial View working in Razor. The View engine cannot make sense of the code below but it is simple using the ASPX View engine. Can anyone show me how to get this to work with Razor? Note that I am just writing out a calendar so the <tr> tag happens at the end of every week. The first sign of a problem is that the Razor code will not format in the VS editor and it complains that the 'while' block is missing its closing brace. I have tried all kinds of combinations, even using a delegate. (I think the cause of the problem may be the conditional TR tag because it is highlighted as an error because it is not closed.)
Razor (doesn't work)
<table class="calendarGrid">
<tr class="calendarDayNames">
<th>Monday</th>
<th>Tuesday</th>
<th>Wednesday</th>
<th>Thursday</th>
<th>Friday</th>
<th>Saturday</th>
<th>Sunday</th>
</tr>
#{
var loopDate = gridStartDate;
}
#while (loopDate <= gridEndDate)
{
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
<tr class="calendarWeek">
}
<td class="calendarDay">
<span class="calendarDayNumber">#loopDate.Day</span>
#if (Model.AllCalendarDays.ContainsKey(loopDate.Date))
{
foreach (var ev in Model.AllCalendarDays[loopDate.Date])
{
<span class="calendarEvent">#ev.Venue</span>
}
}
</td>
#{
loopDate = loopDate.AddDays(1);
#if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
</tr>
}
}
}
ASPX (works)
<table class="calendarGrid">
<tr class="calendarDayNames">
<th>Monday</th>
<th>Tuesday</th>
<th>Wednesday</th>
<th>Thursday</th>
<th>Friday</th>
<th>Saturday</th>
<th>Sunday</th>
</tr>
<%
var loopDate = gridStartDate;
while (loopDate <= gridEndDate)
{
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
%>
<tr class="calendarWeek">
<%} %>
<td class="calendarDay">
<span class="calendarDayNumber">
<%: loopDate.Day %></span>
<% if (Model.AllCalendarDays.ContainsKey(loopDate.Date))
{
foreach (var ev in Model.AllCalendarDays[loopDate.Date])
{ %>
<span class="calendarEvent">
<%: ev.Venue %></span>
<% }
} %>
</td>
<% {
loopDate = loopDate.AddDays(1);
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{ %>
</tr>
<% }
}
} %>
</table>
Working solution in Razor based on #jgauffin's view model suggestion and #dommer's ugly raw html solution. Combined together they're almost aesthetically acceptable. :)
View model now has iterator
public IEnumerable<Tuple<DateTime, IList<CalendarEventDto>>> GridItems()
{
var loopDate = GridStartDate;
while (loopDate <= GridEndDate)
{
yield return new Tuple<DateTime, IList<CalendarEventDto>>(loopDate.Date, AllCalendarDays[loopDate.Date]);
loopDate = loopDate.AddDays(1);
}
}
Okay, the Tuple is lazy but I will probably create another model to hold more complex information about the date and events (IsPast/greyed, etc).
The pesky View
#foreach (var item in Model.GridItems())
{
if (item.Item1.DayOfWeek == DayOfWeek.Monday)
{
#Html.Raw("<tr class=\"calendarWeek\">");
}
#Html.Raw("<td class=\"calendarDay\">");
#Html.Raw(string.Format("<span class=\"calendarDayNumber\">{0}</span>", item.Item1.Day));
foreach (var ev in item.Item2)
{
#Html.Raw(string.Format("<span class=\"calendarEvent\">{0}</span>", Server.HtmlEncode(ev.Venue)));
}
#Html.Raw("</td>");
if (item.Item1.DayOfWeek == DayOfWeek.Sunday)
{
#Html.Raw("</tr>");
}
}
Note that when I reformat the View source in VS, it gets egregiously tabbed, with the if statement having about 10 tabs to the left of it, but there are no compilation warnings and it does what I want. Not nice, or easy though. I think the Razor devs should provide some support for explicit breakout and breakin to code and markup so that when the parser cannot parse it unambiguously, we can tell it what we intended.
#Andrew Nurse's solution
Andrew 'works on the ASP.Net team building the Razor parser!'. His solution runs okay but still produces compiler warnings and is obviously confusing Visual Studio because the code cannot be reformatted without ending up in a big glob on a few lines:
<tbody>
#foreach (var calDay in Model.GridItems())
{
if (calDay.DayOfWeek == DayOfWeek.Monday)
{
#:<tr class="calendarWeek">
}
<td class="calendarDay">
<span class="calendarDayNumber">#calDay.Day</span>
#foreach (var ev in calDay.CalendarEvents)
{
<span class="calendarEvent">#ev.Venue</span>
}
</td>
if (calDay.DayOfWeek == DayOfWeek.Sunday)
{
#:</tr>
}
}
</tbody>
The primary issues here were these lines:
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
<tr class="calendarWeek">
}
...
#if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
</tr>
}
The problem is that Razor uses the tags to detect the start and end of markup. So since you didn't close the "tr" tag inside the first if, it doesn't actually switch back to code, so it doesn't see the "}" as code. The solution is to use "#:", which lets you put a line of markup without regard for tags. So replacing those lines with this should work and be more concise than using Html.Raw:
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
#:<tr class="calendarWeek">
}
...
#if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
#:</tr>
}
I would move all logic to the viewmodel which leaves the following code in your view:
#while (Model.MoveNext())
{
#Model.WeekHeader
<td class="calendarDay">
<span class="calendarDayNumber">#Model.DayNumber</span>
#foreach (var ev in Model.CurrentDayEvents)
{
<span class="calendarEvent">#ev.Venue</span>
}
</td>
#Model.WeekFooter
}
And the new model:
public class CalendarViewModel
{
private DateTime _currentDate;
public string WeekHeader
{
get
{
return _currentDate.DayOfWeek == DayOfWeek.Monday ? "<tr class="calendarWeek">" : "";
}
}
public string WeekFooter
{
get
{
return _currentDate.DayOfWeek == DayOfWeek.Monday ? "</tr>" : "";
}
}
public IEnumerable<DayEvent>
{
get
{
return AllCalendarDays.ContainsKey(loopDate.Date) ? AllCalendarDays[loopDate.Date] ? new List<DayEvent>();
}
}
public bool MoveNext()
{
if (_currentDate == DateTime.MinValue)
{
_currentDate = gridStartDate;
return true;
}
_currentDate = _currentDate.AddDays(1);
return _currentDate <= gridEndDate;
}
}
MAJOR EDIT: Okay, what happens if you do this?
<table class="calendarGrid">
<tr class="calendarDayNames">
<th>Monday</th>
<th>Tuesday</th>
<th>Wednesday</th>
<th>Thursday</th>
<th>Friday</th>
<th>Saturday</th>
<th>Sunday</th>
</tr>
#{
var loopDate = gridStartDate;
while (loopDate <= gridEndDate)
{
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
#Html.Raw("<tr class=\"calendarWeek\">");
}
#Html.Raw("<td class=\"calendarDay\">");
#Html.Raw("<span class=\"calendarDayNumber\">" + loopDate.Day + "</span>");
if (Model.AllCalendarDays.ContainsKey(loopDate.Date))
{
foreach (var ev in Model.AllCalendarDays[loopDate.Date])
{
#Html.Raw("<span class=\"calendarEvent\">" + ev.Venue + "</span>");
}
}
#Html.Raw("</td>");
loopDate = loopDate.AddDays(1);
if (loopDate.DayOfWeek == DayOfWeek.Monday)
{
#Html.Raw("</tr>");
}
}
}
Have you tried adding <text> tags around the contents of the blocks?
I think the Razor parse only works when it's obvious where the blocks end. It may be getting confused by the fact you have an if, a td and then some more code, all inside the block.
There's more info on this here: http://weblogs.asp.net/scottgu/archive/2010/12/15/asp-net-mvc-3-razor-s-and-lt-text-gt-syntax.aspx
This seems like it should be simple, but I can't figure out how to make it work.
My data model has a "Server" table, and a "ServerType" table. PKs for both tables are ints, and Server has a field ServerTypeId which is a fk to ServerType.Id.
I have a Razor List.cshtml that is typed to IEnumerable:
#model IEnumerable<Server>
<table border="1">
<tr>
<th>
Server Type
</th>
<th>
Name
</th>
</tr>
#foreach (var item in Model)
{
<tr>
<td>
#Html.DropDownListFor(modelItem => item.ServerTypeId, (IEnumerable<SelectListItem>)ViewData["ServerType"])
</td>
<td>
#Html.DisplayFor(modelItem => item.Name)
</td>
</tr>
}
</table>
My controller has:
public ActionResult List()
{
var s = GetServers();
ViewData["ServerType"] = GetServerTypes();
return View("List", s);
}
private List<SelectListItem> GetServerTypes()
{
string id;
SelectListItem si;
List<SelectListItem> sl = new List<SelectListItem>();
IQueryable<ServerType> items = (from t in _entities.ServerTypes select t);
foreach (var item in items)
{
id = item.Id.ToString();
si = new SelectListItem { Value = id, Text = item.Description };
sl.Add(si);
}
return sl;
}
This displays the data, but the value in the dropdown is not selected. I've tried both Html.DropDownList and Html.DropDownListFor, with different permutation of names for the ViewData property, with and without the Id at the end.
Do I need to create a viewmodel that has copies of the ServerType in order to set the Selected property? Or is it a problem because my ids are ints, and the SelectItemList Value property is a string?
For anyone else still looking for he answer. I had to do 3 things to get this to work
User #for instead of #foreach. This is so it has an index to work
with when naming things.
Don't have the ViewBag variable name the
same as the property. It tries to help you out and binds things too
early if they have the same name.
Pass the current value in the constructor for SelectList. My
code ended up as:
#for (int i = 0; i < Model.Count; i++)
{
<tr>
<td>
#Html.DropDownListFor(modelItem => Model[i].operatorToken,
new SelectList(ViewBag.operatorTokensList, "Value", "Text", Model[i].operatorToken),
"Select", htmlAttributes: new { #class = "form-control" })
...ect...
}
Notice the 4th parameter to SelectList() sets the selected value.
My operatorTokensList was valued with:
new[] { new { Value = ">", Text = ">" },
new { Value = ">=", Text = ">=" },
new { Value = "=", Text = "=" },
new { Value = "<=", Text = "<=" },
new { Value = "<", Text = "<" } };
(The user was selecting "greater than", "greater than or equal", etc.)
At no point in your population of the List in GetServerTypes() do you specify that any of the items are selected. This is something you need to do manually, as MVC3 isn't smart enough to infer it for you in the DropDownListFor method. This is further complicated by the fact that you are not using a single model.
A better way to do this might be:
(Keep in mind in the below code, I'm assuming that the Server class has a primary id called "Id")
For the controller code:
public ActionResult List()
{
IEnumerable<Server> s = GetServers();
ViewData["ServerTypes"] = GetServerTypes(s);
return View("List", s);
}
private Dictionary<int, SelectList> GetServerTypes(IEnumerable<Server> s)
{
Dictionary<int, SelectList> sl = new Dictionary<int, SelectList>();
IEnumerable<ServerType> items = (from t in _entities.ServerTypes select t);
foreach (Server srv in s) {
sl.Add(srv.Id, new SelectList(items, "Id", "Description", srv.ServerTypeId));
}
return sl;
}
For the view code:
(Also note below how the I've corrected the arguments used in the lambda functions)
#model IEnumerable<Server>
<table border="1">
<tr>
<th>
Server Type
</th>
<th>
Name
</th>
</tr>
#foreach (var item in Model)
{
<tr>
<td>
#Html.DropDownListFor(modelItem => modelItem.ServerTypeId, (IEnumerable<SelectListItem>)(ViewData["ServerTypes"][item.Id]))
</td>
<td>
#Html.DisplayFor(modelItem => modelItem.Name)
</td>
</tr>
}
</table>
#TreyE correctly points out that you never specify that any particular select list item should be selected when displayed in the view.
There are several ways you can do this. First is to use the SelectList object and use its constructor that allows you to pass in the object that should be selected, it's the overload SelectList(IEnumerable, String, String, Object) MSDN SelectList.
SelectList is only supported on .NET 3.5+ though FYI.
Second, in GetServerTypes() you could write:
private List<SelectListItem> GetServerTypes()
{
List<SelectListItem> sl = new List<SelectListItem>();
IQueryable<ServerType> items = (from t in _entities.ServerTypes select t);
foreach (var item in items)
sl.add(new SelectListItem { Value = item.id, Text = item.Description, Selected = item.isSelected } );
return sl;
}
Also remember that only one item should be selected, so make sure that if you do try to use some boolean property it is not possible that more than one item could have its isSelected property set to true.
Alternatively, if you need to use some type of if statement to decide if Selected = true (i.e. your item has no isSelected boolean) then you can add that in the foreach loop.
foreach(var item in items)
{
if //condition
sl.Add(new SelectListItem { Value = item.id, Text = item.Description, Selected = true });
else
sl.Add(new SelectListItem { Value = item.id, Text = item.Description, Selected = false });
}