ImportXML "post_name" from an article, having trouble finding proper XPath - xpath

I've been having trouble finding the proper or accurate Xpath for google sheets ImportXML.
Article in question:
https://www.digitaltrends.com/news/this-is-what-a-birthday-party-on-the-iss-looks-like/
Result i'm looking for:
'post_name': 'this-is-what-a-birthday-party-on-the-iss-looks-like'
Using the "copy full XPath feature in Chrome Inspect feature, i'm getting:
/html/head/script[43]/text()
Which does not work with Google Sheet's ImportXML feature. Can someone guide me through how will i be able to pull this section of the site?
EDIT: I'm trying to retrieve anything within these parameters such as "post name, post title, post id." [View Source1

The page is built in javascript on the client side, not the server side. It is therefore impossible to retrieve information with the IMPORTXML function. You need to read what is included in the script ...
function extract(){
var url='https://www.digitaltrends.com/news/this-is-what-a-birthday-party-on-the-iss-looks-like/'
var source = UrlFetchApp.fetch(url).getContentText()
var data = source.split('<script>')
//Logger.log(data[3])
info = "'post_name" + data[3].split('post_name')[1].split(',')[0]
Logger.log(info)
}
Now if you want to retrieve all the information contained in the JSON
function extract(){
var url='https://www.digitaltrends.com/news/this-is-what-a-birthday-party-on-the-iss-looks-like/'
var source = UrlFetchApp.fetch(url).getContentText()
var data = source.split('<script>')
//Logger.log(data[3])
info = data[3].replace(/(\t)/gm,"").replace(/(\n)/gm,"").replace(/(')/gm,"\"").replace(/(: )/gm,":")
info = info.split('({')[1].split(',});}')[0]
//Logger.log(info)
var myData = JSON.parse('{' + info + '}')
getPairs(eval(myData),'myData')
}
function getPairs(obj,id) {
const regex = new RegExp('[^0-9]+');
const fullPath = true
var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
for (let p in obj) {
var newid = (regex.test(p)) ? id + '.' + p : id + '[' + p + ']';
if (obj[p]!=null){
if (typeof obj[p] != 'object' && typeof obj[p] != 'function'){
sheet.appendRow([fullPath?newid:p, obj[p]]);
}
if (typeof obj[p] == 'object') {
getPairs( obj[p], newid );
}
}
}
}

Related

What is the nest way to bulk index(around 40 k files of type .docx) using ingest-attachment?

I am fairly familiar with the ELK stack and currently using Elastic search 6.6. Our use case is content search for about 40K .docx files
(uploaded by Portfolio managers as research reports.
Max file size allowed is 10 MB, but mostly file sizes are in few Kb).
I have used the ingest attachment plug in to index sample test files and I am able to also search the content using KIBANA
for ex: POST /attachment_test/my_type/_search?pretty=true
{
"query": {
"match": {
"attachment.content": "JP Morgan"
}
}
}
returns me the expected results.
My doubts:
Using the ingest plug in, we need to push data to the plug in. I am using VS 2017 and elastic NEST dll. Which means, I have to programmatically read the 40K documents and push them to ES using the NEST commands?
I have gone through the Fscrawler project and know that it can achieve the purpose but I am keeping it as my last resort
If I were to use approach 1 (code), is there any bulk upload API available for posting number of attachments together to ES (in batches)?
Finally, I uploaded 40K files in to the elastic index using C# code:
private static void PopulateIndex(ElasticClient client)
{
var directory =System.Configuration.ConfigurationManager.AppSettings["CallReportPath"].ToString();
var callReportsCollection = Directory.GetFiles(directory, "*.doc"); //this will fetch both doc and docx
//callReportsCollection.ToList().AddRange(Directory.GetFiles(directory, "*.doc"));
ConcurrentBag<string> reportsBag = new ConcurrentBag<string>(callReportsCollection);
int i = 0;
var callReportElasticDataSet = new DLCallReportSearch().GetCallReportDetailsForElastic();//.AsEnumerable();//.Take(50).CopyToDataTable();
try
{
Parallel.ForEach(reportsBag, callReport =>
//Array.ForEach(callReportsCollection,callReport=>
{
var base64File = Convert.ToBase64String(File.ReadAllBytes(callReport));
var fileSavedName = callReport.Replace(directory, "");
// var dt = dLCallReportSearch.GetCallFileName(fileSavedName.Replace("'", "''"));//replace the ' in a file name with '';
var rows = callReportElasticDataSet.Select("CALL_SAVE_FILE like '%" + fileSavedName.Replace("'", "''") + "'");
if (rows != null && rows.Count() > 0)
{
var row = rows.FirstOrDefault();
//foreach (DataRow row in rows)
//{
i++;
client.Index(new Document
{
Id = i,
DocId = Convert.ToInt32(row["CALL_ID"].ToString()),
Path = row["CALL_SAVE_FILE"].ToString().Replace(CallReportPath, ""),
Title = row["CALL_FILE"].ToString().Replace(CallReportPath, ""),
Author = row["USER_NAME"].ToString(),
DateOfMeeting = string.IsNullOrEmpty(row["CALL_DT"].ToString()) ? (DateTime?)null : Convert.ToDateTime(row["CALL_DT"].ToString()),
Location = row["CALL_LOCATION"].ToString(),
UploadDate = string.IsNullOrEmpty(row["CALL_REPORT_DT"].ToString()) ? (DateTime?)null : Convert.ToDateTime(row["CALL_REPORT_DT"].ToString()),
CompanyName = row["COMP_NAME"].ToString(),
CompanyId = Convert.ToInt32(row["COMP_ID"].ToString()),
Country = row["COU_NAME"].ToString(),
CountryCode = row["COU_CD"].ToString(),
RegionCode = row["REGION_CODE"].ToString(),
RegionName = row["REGION_NAME"].ToString(),
SectorCode = row["SECTOR_CD"].ToString(),
SectorName = row["SECTOR_NAME"].ToString(),
Content = base64File
}, p => p.Pipeline("attachments"));
//}
}
});
}
catch (Exception ex)
{
throw ex;
}
}

is my if statement doing what i think its doing?

Here I have tis function that is querying data and returning it to me and im putting that data in to html elements to make a post.my if statement at the bottom is where im having a bit of problem i trying to only apply my comment window once to the new clones once they have been pushed over to the new div called story board, i believe im telling my if statement that if the class already exists in that new clone then do nothing else apply it there.. to seee what i am talking about...here is my test domain...http://subdomain.jason-c.com/
sign in is "kio" pass is the same and when you hit publish on the stories, everytime a nw one hits it will apply comment box to a post in the storyboard window that already has a comment text area. what am i doing wrong.
function publishWindowHandler(){
var query = new Parse.Query('Post');
console.log(currentUser);
query.equalTo("User", currentUser);
query.include("User");
query.descending("createdAt")
console.log(user.get('username'));
query.find({
success:function(results){
document.getElementById("publishCenter").textContent = "";
for(var i =0; i < results.length; i++){
var userPost = results[i];
//console.log(userPost.get("User") + " / " + userPost.get("Author") + " / " + userPost.get("Story") + " / " + userPost.get("objectId"));
var authorTitle = document.createElement("p");
var newPost = document.createElement("P");
var title = document.createElement("P");
var userLabel = document.createElement("p");
var postId = userPost.id;
var postBtn = document.createElement("INPUT");
postBtn.className ="publishBtn";
postBtn.id ="publishBtn";
postBtn.setAttribute("Type", "button");
postBtn.setAttribute("value", "Publish");
title.textContent = "Story: " + userPost.get("Title");
authorTitle.textContent = "Author: " + userPost.get("Author");
newPost.textContent = userPost.get("Story");
userLabel.textContent = "Published by: " +userPost.get("User").get ("username");
var postWrapper = document.createElement("DIV");
postWrapper.className = "postWrapper";
postWrapper.id = postId;
document.getElementById("publishCenter").appendChild(postWrapper);
postWrapper.appendChild(title);
postWrapper.appendChild(authorTitle);
postWrapper.appendChild(newPost);
postWrapper.appendChild(userLabel);
postWrapper.appendChild(postBtn);
postBtn.addEventListener("click", publicViewHandler);
function publicViewHandler(){
$(this).parent(".postWrapper").clone().appendTo(".storyBoard");
function testWindow(){
if($(publicBoard).children().hasClass(".commentWindow")){
}
else
{
$(".storyBoard").children().append(commentWindow);
}
}
testWindow();
}
}
}
})
}
According to the documentation, jquery hasClass doesn't need '.' prefixing the passed in class name.
https://api.jquery.com/hasclass/
Try removing that and see if that get's you anywhere.
Also, where is the variable commentWindow defined? Is it global?
var myClone = $(this).parent().clone(true);
myClone.appendTo(".storyBoard");
console.log(publicBoard);
console.log("hello",$(this));
console.log($(publicBoard).find('.postWrapper').find("commentWindow"));
myClone.append($(commentWindow).clone());
this is what i ended up doing to solve my issue took me a while and a little help from a friend.

C#/ Html agility pack, is there a more eloquent way to screen scrape?

I'm working on an app in C# that gathers web data from a few different pages daily and saves it in SQL Server. I'm using html agility pack... at the moment I have an xpath for each field/ column in the database. There are 62 columns in the table, and with checking for proper values and formatting, the code below is VERY verbose and repetitive (specifically, xpath expressions and associated blocks). I was wondering if there was a nicer, more concise way, perhaps using LINQ? (which I haven't used much yet but would like to) Here's just the first couple fields set below, this repeats .... 62 cols. I'm not looking for a rewrite, just any suggestions I can get.
List<IDataPoint> list = new List<IDataPoint>();
HtmlWeb hwObject = new HtmlWeb();
HtmlDocument htmlDoc = hwObject.Load(AddressString);
if (htmlDoc.DocumentNode != null && !htmlDoc.DocumentNode.InnerHtml.Contains("There is no key statistics data available"))
{
var symbolNode = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div[3]/div[4] /div/div/div/div/div/div/h2");
if (symbolNode != null)
{
KeyStatsDP keyStatsDp = new KeyStatsDP();
String symb = "";
symb = symbolNode.InnerHtml;
symb = symb.Substring(symb.LastIndexOf("(") + 1);
symb = symb.Substring(0, symb.Length - 1);
keyStatsDp.Symbol = symb;
String mktCapXPath = "//*[#id=\"yfs_j10_" + symb.ToLower() + "\"]";
var mktCapNode = htmlDoc.DocumentNode.SelectSingleNode(mktCapXPath);
if (mktCapNode != null)
{
String mktCap = mktCapNode.InnerHtml;
keyStatsDp.MarketCapIntraDay = ConvertMoneyInStrToInt(mktCap);
}
var entValNode = htmlDoc.DocumentNode.SelectSingleNode("//html/body/div[3]/div[4]/table[2]/tr[2]/td/table[2]/tr/td/table/tr[2]/td[2]");
if (entValNode != null)
{
if (!entValNode.InnerHtml.Contains("N"))
{
String entVal = entValNode.InnerHtml;
keyStatsDp.EntValue = ConvertMoneyInStrToInt(entVal);
}
}

Magento jQuery Ajax search issue

I am trying to implement a customize Magento Advanced Search, using jQuery and Ajax.
The search works something like this:
1) There are 2/3 options to select size
2) 6 options to narrow down search result
3) A text box to enter keywords etc
There is no submit button, that means, search starts when user clicks on any size and/or Options or enter more than 3 char in search text box, search starts, an ajax request is sent to server.
If user clicks on another option or say type in another character, I use to abort previous search using search_request.abort() and set previous search to null.
When you search for first couple of times, it works fine, search returns result and if no result is found, proper message is displayed.But, after after couple of requests, it starts failing. Like if I click on options in short interval (I mean, search is fired frequently)). Some times, it fails without any reason and sometimes it fails, and when I click on the same option a couple of seconds later, it display the results.
Sometimes, when the search fails or there is no result.
It displays the incomplete message in result area like:
Where as it should look like :
Search when it returns result:
Search when it fails
My question is:
1) do you think it might be failing because too many requests are submitted too frequently? I mean is there something related to Magento settings?
2) What can I do to correct this?
Here is the jQuery code:
<script type="text/javascript">
function search(textfield)
{
j('#slides').html("<img src='<?php echo Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_WEB); ?>skin/frontend/luckybloke/luckybloke/images/ajax-loader.gif' class='no-results-found' style='width:100px !important; padding-top:10px !important;text-align:center;margin-left: 43%;' />");
var form = j("#form-validate");
var action = form.attr("action");
var fieldAdded = Array();
var searchData = '';
var searchVal = '';
var sizeSelected = '';
var searchArray = new Array();
var searchCtr = 0;
var tmp = new Array();
if(typeof search_request =='object' && search_request!=null){
search_request.abort();
search_request = null;
j('#slides').html("<img src='<?php echo Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_WEB); ?>skin/frontend/luckybloke/luckybloke/images/ajax-loader.gif' class='no-results-found' style='width:100px !important; padding-top:10px !important;text-align:center;margin-left: 43%;' />");
}
j('.input-search').each(function(index,domEle){
var eleID = "#"+j(domEle).attr("id");
var curEle= j(domEle).attr("name");
fieldAdded.push(curEle.replace('[]',''));
////consloe.log(eleID);
if((j(eleID).val()!='' || !jQuery.inArray(curEle,fieldAdded))&& eleID!='#name'){
if(searchVal==''){
searchVal=searchVal+j(eleID).val();
}
if(searchData==''){
searchData = searchData+""+curEle+"="+j(eleID).val();
}else{
searchData = searchData+"&"+curEle+"="+j(eleID).val();
}
searchArray[searchCtr] = j(eleID).val();
searchCtr++;
}
//add description field to search query
tmp[curEle] = j(eleID).val();
}
});
if(searchVal==''){
}
if(j("#name").val()=='brand, style, keyword'){
var val = '';
}else{
val = j("#name").val();
}
searchData = searchData+"&search_keywords="+val;
//toggleFields(sizeSelected, searchArray);
search_request = j.ajax({
type:'get',
url:action,
data:searchData,
cache: false,
dataType:'html',
success:searchComplete
});
return;
}
function searchComplete(responseText, statusText, xhr, $form)
{
var no_item_msg = 'No product found matching your search criteria.';
if(statusText=='success'){
var isFound = responseText.toString().search(new RegExp(/no items found/i));
//alert(isFound+''+responseText.toString());
if((isFound>0 || responseText.length == 0) && search_request ){
j("div#slides").html("<span class='no-results-found'>"+no_item_msg+"</span>");
// j("div#slides").html("<span class='no-results-found'>"+no_item_msg+"</span>");
/*j("div#lb-product-list").block({message:no_item_msg});
setTimeout('j("div#lb-product-list").unblock()', 2000);*/
}else{
var dataToFill = j("div#lb-product-list");
var isFound = responseText.toString().search(new RegExp(/JUST_ONE/i));
if(isFound>=0){
responseText = responseText.replace(/just_one/i,'');
}else{
j.getScript("<?php echo Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_WEB); ?>js/carousol.js", function() {});
}
dataToFill.html(j(responseText).children());
search_request = null;
}
//reset the css attribute's position value, as ui blocking seems to be affecting positioning of elements are unblocking
j("div#lb-product-list").css("position",'');
return true;
}
}
</script>

What exactly am I sending through the parameters?

When doing a XMLHttpRequest and using POST as the form method, what exactly am I sending? I know it should be like send(parameters), parameters = "variable1=Hello", for example. But what if I want to do this:
parameters = "variable1=" + encodeURIComponent(document.getElementById("variable1").value);
variable1 being the id of the HTML form input.
Can I do it like this or do I need to assign the encodeURIComponent value to a javascript variable and send that variable:
var variable2;
parameters = "variable2=" + encodeURIComponent(document.getElementById("variable1").value);
You're suppose to send the object and it's value, but, is it an object from the HTML form, a javascript object or a php object? The problem is I already tried it and I still can't get the encoded input in my database, all I get is the raw input from the user.
BTW, I know it's a pretty dull question, but I feel the need to understand exactly what I'm doing if I want to come up with a solution.
g
function createObject()
{
var request_type;
var browser = navigator.appName;
if(browser == "Microsoft Internet Explorer")
{
request_type = new ActiveXObject("Microsoft.XMLHTTP");
}
else
{
request_type = new XMLHttpRequest();
}
return request_type;
}
var http = createObject();
//INSERT
function insert()
{
var Faculty2 = encodeURIComponent(document.getElementById("Faculty").value);
var Major2 = encodeURIComponent(document.getElementById("Major").value);
var Professor2 = encodeURIComponent(document.getElementById("Professor").value);
var Lastname2 = encodeURIComponent(document.getElementById("Lastname").value);
var Course2 = encodeURIComponent(document.getElementById("Course").value);
var Comments2 = encodeURIComponent(document.getElementById("Comments").value);
var Grade2 = encodeURIComponent(document.getElementById("Grade").value);
var Redflag2 = encodeURIComponent(document.getElementById("Redflag").value);
var Owner2 = encodeURIComponent(document.getElementById("Owner").value);
//Location and parameters of data about to be sent are defined
//Required: verify that all fields are not empty. Use encode URI() to solve some issues about character encoding.
var params = "Faculty=" + Faculty2 + "&Major=" + Major2 + "&Professor=" + Professor2 + "&Lastname=" + Lastname2 + "&Course=" + Course2 + "&Comments=" + Comments2 + "&Grade=" + Grade2 + "&Redflag=" + Redflag2 + "&Owner=" + Owner2;
var url = "prehp/insert.php";
http.open("POST", url, true);
//Technical information about the data
http.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
http.setRequestHeader("Content-length", params.length);
//Now, we send the data
http.onreadystatechange = function()
{
if(http.readyState == 4)
{ var answer = http.responseText;
document.getElementById('insert_response').innerHTML = "Ready!" + answer;
}
else
{document.getElementById('insert_response').innerHTML = "Error";
}}
http.send(params);
}
PHP code:
$insertAccounts_sql = "INSERT INTO Information (Faculty, Major, Professor, Lastname, Course, Comments, Grade, Redflag, Owner)
VALUES('$_POST[Faculty]','$_POST[Major]','$_POST[Professor]','$_POST[Lastname]','$_POST[Course]','$_POST[Comments]','$_POST[Grade]','$_POST[Redflag]','$_POST[Owner]')";
$dbq = mysql_query($insertAccounts_sql, $dbc);
if($dbq)
{
print "1 record added: Works very well!";
}
else
if(!$dbq)
{
die('Error: ' . mysql_error());
}
$dbk = mysql_close($dbc);
if($dbk)
{
print "Database closed!";
}
else
if(!$dbk)
{
print "Database not closed!";
}
I did that but the value that the database got was the raw input and not the encoded input. I'm running out of ideas, don't know what else to try. Could it be the settings of the database, can the database be decoding the input before storing it? That seems far-fetched to me, but I've been looking at this from all angles and still can't come up with a fresh answer.
PS: Sorry for posting my comments on the answer area, first timer here.
when creating query strings, it has really nothing to do with objects or anything like that. All you want to be sending is key/value pairs. how you construct that is up to you, but it often neater and more manageable to assign your values to variables first. i.e.
var myVar1Value = encodeURIComponent(document.getElementById('variable1').value);
var myVar2Value = encodeURIComponent(document.getElementById('variable2').value);
var url = "http://www.mydomain.com?" + "var1=" + myVar1Value + "&var2=" + myVar2Value;
It's called a query string, so it's just a string. what you do with it on the server side is what makes the 'magic' happen.
edit: If you're having problems with values, then you should print them to console to verify you are getting what you expect.

Resources