Designing a web crawler - data-structures

I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it.
How does it all begin from the beginning.
Say Google started with some hub pages say hundreds of them (How these hub pages were found in the first place is a different sub-question).
As Google follows links from a page and so on, does it keep making a hash table to make sure that it doesn't follow the earlier visited pages.
What if the same page has 2 names (URLs) say in these days when we have URL shorteners etc..
I have taken Google as an example. Though Google doesn't leak how its web crawler algorithms and page ranking etc work, but any guesses?

If you want to get a detailed answer take a look at section 3.8 this paper, which describes the URL-seen test of a modern scraper:
In the course of extracting links, any
Web crawler will encounter multiple
links to the same document. To avoid
downloading and processing a document
multiple times, a URL-seen test must
be performed on each extracted link
before adding it to the URL frontier.
(An alternative design would be to
instead perform the URL-seen test when
the URL is removed from the frontier,
but this approach would result in a
much larger frontier.)
To perform the
URL-seen test, we store all of the
URLs seen by Mercator in canonical
form in a large table called the URL
set. Again, there are too many entries
for them all to fit in memory, so like
the document fingerprint set, the URL
set is stored mostly on disk.
To save
space, we do not store the textual
representation of each URL in the URL
set, but rather a fixed-sized
checksum. Unlike the fingerprints
presented to the content-seen test’s
document fingerprint set, the stream
of URLs tested against the URL set has
a non-trivial amount of locality. To
reduce the number of operations on the
backing disk file, we therefore keep
an in-memory cache of popular URLs.
The intuition for this cache is that
links to some URLs are quite common,
so caching the popular ones in memory
will lead to a high in-memory hit
rate.
In fact, using an in-memory
cache of 2^18 entries and the LRU-like
clock replacement policy, we achieve
an overall hit rate on the in-memory
cache of 66.2%, and a hit rate of 9.5%
on the table of recently-added URLs,
for a net hit rate of 75.7%. Moreover,
of the 24.3% of requests that miss in
both the cache of popular URLs and the
table of recently-added URLs, about
1=3 produce hits on the buffer in our
random access file implementation,
which also resides in user-space. The
net result of all this buffering is
that each membership test we perform
on the URL set results in an average
of 0.16 seek and 0.17 read kernel
calls (some fraction of which are
served out of the kernel’s file system
buffers). So, each URL set membership
test induces one-sixth as many kernel
calls as a membership test on the
document fingerprint set. These
savings are purely due to the amount
of URL locality (i.e., repetition of
popular URLs) inherent in the stream
of URLs encountered during a crawl.
Basically they hash all of the URLs with a hashing function that guarantees unique hashes for each URL and due to the locality of URLs, it becomes very easy to find URLs. Google even open-sourced their hashing function: CityHash
WARNING!
They might also be talking about bot traps!!! A bot trap is a section of a page that keeps generating new links with unique URLs and you will essentially get trapped in an "infinite loop" by following the links that are being served by that page. This is not exactly a loop, because a loop would be the result of visiting the same URL, but it's an infinite chain of URLs which you should avoid crawling.
Update 12/13/2012- the day after the world was supposed to end :)
Per Fr0zenFyr's comment: if one uses the AOPIC algorithm for selecting pages, then it's fairly easy to avoid bot-traps of the infinite loop kind. Here is a summary of how AOPIC works:
Get a set of N seed pages.
Allocate X amount of credit to each page, such that each page has X/N credit (i.e. equal amount of credit) before crawling has started.
Select a page P, where the P has the highest amount of credit (or if all pages have the same amount of credit, then crawl a random page).
Crawl page P (let's say that P had 100 credits when it was crawled).
Extract all the links from page P (let's say there are 10 of them).
Set the credits of P to 0.
Take a 10% "tax" and allocate it to a Lambda page.
Allocate an equal amount of credits each link found on page P from P's original credit - the tax: so (100 (P credits) - 10 (10% tax))/10 (links) = 9 credits per each link.
Repeat from step 3.
Since the Lambda page continuously collects tax, eventually it will be the page with the largest amount of credit and we'll have to "crawl" it. I say "crawl" in quotes, because we don't actually make an HTTP request for the Lambda page, we just take its credits and distribute them equally to all of the pages in our database.
Since bot traps only give internal links credits and they rarely get credit from the outside, they will continually leak credits (from taxation) to the Lambda page. The Lambda page will distribute that credits out to all of the pages in the database evenly and upon each cycle the bot trap page will lose more and more credits, until it has so little credits that it almost never gets crawled again. This will not happen with good pages, because they often get credits from back-links found on other pages. This also results in a dynamic page rank and what you will notice is that any time you take a snapshot of your database, order the pages by the amount of credits they have, then they will most likely be ordered roughly according to their true page rank.
This only avoid bot traps of the infinite-loop kind, but there are many other bot traps which you should watch out for and there are ways to get around them too.

While everybody here already suggested how to create your web crawler, here is how how Google ranks pages.
Google gives each page a rank based on the number of callback links (how many links on other websites point to a specific website/page). This is called relevance score. This is based on the fact that if a page has many other pages link to it, it's probably an important page.
Each site/page is viewed as a node in a graph. Links to other pages are directed edges. A degree of a vertex is defined as the number of incoming edges. Nodes with a higher number of incoming edges are ranked higher.
Here's how the PageRank is determined. Suppose that page Pj has Lj links. If one of those links is to page Pi, then Pj will pass on 1/Lj of its importance to Pi. The importance ranking of Pi is then the sum of all the contributions made by pages linking to it. So if we denote the set of pages linking to Pi by Bi, then we have this formula:
Importance(Pi)= sum( Importance(Pj)/Lj ) for all links from Pi to Bi
The ranks are placed in a matrix called hyperlink matrix: H[i,j]
A row in this matrix is either 0, or 1/Lj if there is a link from Pi to Bi. Another property of this matrix is that if we sum all rows in a column we get 1.
Now we need multiply this matrix by an Eigen vector, named I (with eigen value 1) such that:
I = H*I
Now we start iterating: IH, IIH, IIIH .... I^k *H until the solution converges. ie we get pretty much the same numbers in the matrix in step k and k+1.
Now whatever is left in the I vector is the importance of each page.
For a simple class homework example see http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html
As for solving the duplicate issue in your interview question, do a checksum on the entire page and use either that or a bash of the checksum as your key in a map to keep track of visited pages.

Depends on how deep their question was intended to be. If they were just trying to avoid following the same links back and forth, then hashing the URL's would be sufficient.
What about content that has literally thousands of URL's that lead to the same content? Like a QueryString parameter that doesn't affect anything, but can have an infinite number of iterations. I suppose you could hash the contents of the page as well and compare URL's to see if they are similar to catch content that is identified by multiple URL's. See for example, Bot Traps mentioned in #Lirik's post.

You'd have to have some sort of hash table to store the results in, you'd just have to check it before each page load.

The problem here is not to crawl duplicated URLS, wich is resolved by a index using a hash obtained from urls. The problem is to crawl DUPLICATED CONTENT. Each url of a "Crawler Trap" is different (year, day, SessionID...).
There is not a "perfect" solution... but you can use some of this strategies:
• Keep a field of wich level the url is inside the website. For each cicle of getting urls from a page, increase the level. It will be like a tree. You can stop to crawl at certain level, like 10 (i think google use this).
• You can try to create a kind of HASH wich can be compared to find similar documents, since you cant compare with each document in your database. There are SimHash from google, but i could not find any implementation to use. Then i´ve created my own. My hash count low and high frequency characters inside the html code and generate a 20bytes hash, wich is compared with a small cache of last crawled pages inside a AVLTree with an NearNeighbors search with some tolerance (about 2). You cant use any reference to characters locations in this hash. After "recognize" the trap, you can record the url pattern of the duplicate content and start to ignore pages with that too.
• Like google, you can create a ranking to each website and "trust" more in one than others.

The web crawler is a computer program which used to collect/crawling following key values(HREF links, Image links, Meta Data .etc) from given website URL. It is designed like intelligent to follow different HREF links which are already fetched from the previous URL, so in this way, Crawler can jump from one website to other websites. Usually, it called as a Web spider or Web Bot. This mechanism always acts as a backbone of the Web search engine.
Please find the source code from my tech blog - http://www.algonuts.info/how-to-built-a-simple-web-crawler-in-php.html
<?php
class webCrawler
{
public $siteURL;
public $error;
function __construct()
{
$this->siteURL = "";
$this->error = "";
}
function parser()
{
global $hrefTag,$hrefTagCountStart,$hrefTagCountFinal,$hrefTagLengthStart,$hrefTagLengthFinal,$hrefTagPointer;
global $imgTag,$imgTagCountStart,$imgTagCountFinal,$imgTagLengthStart,$imgTagLengthFinal,$imgTagPointer;
global $Url_Extensions,$Document_Extensions,$Image_Extensions,$crawlOptions;
$dotCount = 0;
$slashCount = 0;
$singleSlashCount = 0;
$doubleSlashCount = 0;
$parentDirectoryCount = 0;
$linkBuffer = array();
if(($url = trim($this->siteURL)) != "")
{
$crawlURL = rtrim($url,"/");
if(($directoryURL = dirname($crawlURL)) == "http:")
{ $directoryURL = $crawlURL; }
$urlParser = preg_split("/\//",$crawlURL);
//-- Curl Start --
$curlObject = curl_init($crawlURL);
curl_setopt_array($curlObject,$crawlOptions);
$webPageContent = curl_exec($curlObject);
$errorNumber = curl_errno($curlObject);
curl_close($curlObject);
//-- Curl End --
if($errorNumber == 0)
{
$webPageCounter = 0;
$webPageLength = strlen($webPageContent);
while($webPageCounter < $webPageLength)
{
$character = $webPageContent[$webPageCounter];
if($character == "")
{
$webPageCounter++;
continue;
}
$character = strtolower($character);
//-- Href Filter Start --
if($hrefTagPointer[$hrefTagLengthStart] == $character)
{
$hrefTagLengthStart++;
if($hrefTagLengthStart == $hrefTagLengthFinal)
{
$hrefTagCountStart++;
if($hrefTagCountStart == $hrefTagCountFinal)
{
if($hrefURL != "")
{
if($parentDirectoryCount >= 1 || $singleSlashCount >= 1 || $doubleSlashCount >= 1)
{
if($doubleSlashCount >= 1)
{ $hrefURL = "http://".$hrefURL; }
else if($parentDirectoryCount >= 1)
{
$tempData = 0;
$tempString = "";
$tempTotal = count($urlParser) - $parentDirectoryCount;
while($tempData < $tempTotal)
{
$tempString .= $urlParser[$tempData]."/";
$tempData++;
}
$hrefURL = $tempString."".$hrefURL;
}
else if($singleSlashCount >= 1)
{ $hrefURL = $urlParser[0]."/".$urlParser[1]."/".$urlParser[2]."/".$hrefURL; }
}
$host = "";
$hrefURL = urldecode($hrefURL);
$hrefURL = rtrim($hrefURL,"/");
if(filter_var($hrefURL,FILTER_VALIDATE_URL) == true)
{
$dump = parse_url($hrefURL);
if(isset($dump["host"]))
{ $host = trim(strtolower($dump["host"])); }
}
else
{
$hrefURL = $directoryURL."/".$hrefURL;
if(filter_var($hrefURL,FILTER_VALIDATE_URL) == true)
{
$dump = parse_url($hrefURL);
if(isset($dump["host"]))
{ $host = trim(strtolower($dump["host"])); }
}
}
if($host != "")
{
$extension = pathinfo($hrefURL,PATHINFO_EXTENSION);
if($extension != "")
{
$tempBuffer ="";
$extensionlength = strlen($extension);
for($tempData = 0; $tempData < $extensionlength; $tempData++)
{
if($extension[$tempData] != "?")
{
$tempBuffer = $tempBuffer.$extension[$tempData];
continue;
}
else
{
$extension = trim($tempBuffer);
break;
}
}
if(in_array($extension,$Url_Extensions))
{ $type = "domain"; }
else if(in_array($extension,$Image_Extensions))
{ $type = "image"; }
else if(in_array($extension,$Document_Extensions))
{ $type = "document"; }
else
{ $type = "unknown"; }
}
else
{ $type = "domain"; }
if($hrefURL != "")
{
if($type == "domain" && !in_array($hrefURL,$this->linkBuffer["domain"]))
{ $this->linkBuffer["domain"][] = $hrefURL; }
if($type == "image" && !in_array($hrefURL,$this->linkBuffer["image"]))
{ $this->linkBuffer["image"][] = $hrefURL; }
if($type == "document" && !in_array($hrefURL,$this->linkBuffer["document"]))
{ $this->linkBuffer["document"][] = $hrefURL; }
if($type == "unknown" && !in_array($hrefURL,$this->linkBuffer["unknown"]))
{ $this->linkBuffer["unknown"][] = $hrefURL; }
}
}
}
$hrefTagCountStart = 0;
}
if($hrefTagCountStart == 3)
{
$hrefURL = "";
$dotCount = 0;
$slashCount = 0;
$singleSlashCount = 0;
$doubleSlashCount = 0;
$parentDirectoryCount = 0;
$webPageCounter++;
while($webPageCounter < $webPageLength)
{
$character = $webPageContent[$webPageCounter];
if($character == "")
{
$webPageCounter++;
continue;
}
if($character == "\"" || $character == "'")
{
$webPageCounter++;
while($webPageCounter < $webPageLength)
{
$character = $webPageContent[$webPageCounter];
if($character == "")
{
$webPageCounter++;
continue;
}
if($character == "\"" || $character == "'" || $character == "#")
{
$webPageCounter--;
break;
}
else if($hrefURL != "")
{ $hrefURL .= $character; }
else if($character == "." || $character == "/")
{
if($character == ".")
{
$dotCount++;
$slashCount = 0;
}
else if($character == "/")
{
$slashCount++;
if($dotCount == 2 && $slashCount == 1)
$parentDirectoryCount++;
else if($dotCount == 0 && $slashCount == 1)
$singleSlashCount++;
else if($dotCount == 0 && $slashCount == 2)
$doubleSlashCount++;
$dotCount = 0;
}
}
else
{ $hrefURL .= $character; }
$webPageCounter++;
}
break;
}
$webPageCounter++;
}
}
$hrefTagLengthStart = 0;
$hrefTagLengthFinal = strlen($hrefTag[$hrefTagCountStart]);
$hrefTagPointer =& $hrefTag[$hrefTagCountStart];
}
}
else
{ $hrefTagLengthStart = 0; }
//-- Href Filter End --
//-- Image Filter Start --
if($imgTagPointer[$imgTagLengthStart] == $character)
{
$imgTagLengthStart++;
if($imgTagLengthStart == $imgTagLengthFinal)
{
$imgTagCountStart++;
if($imgTagCountStart == $imgTagCountFinal)
{
if($imgURL != "")
{
if($parentDirectoryCount >= 1 || $singleSlashCount >= 1 || $doubleSlashCount >= 1)
{
if($doubleSlashCount >= 1)
{ $imgURL = "http://".$imgURL; }
else if($parentDirectoryCount >= 1)
{
$tempData = 0;
$tempString = "";
$tempTotal = count($urlParser) - $parentDirectoryCount;
while($tempData < $tempTotal)
{
$tempString .= $urlParser[$tempData]."/";
$tempData++;
}
$imgURL = $tempString."".$imgURL;
}
else if($singleSlashCount >= 1)
{ $imgURL = $urlParser[0]."/".$urlParser[1]."/".$urlParser[2]."/".$imgURL; }
}
$host = "";
$imgURL = urldecode($imgURL);
$imgURL = rtrim($imgURL,"/");
if(filter_var($imgURL,FILTER_VALIDATE_URL) == true)
{
$dump = parse_url($imgURL);
$host = trim(strtolower($dump["host"]));
}
else
{
$imgURL = $directoryURL."/".$imgURL;
if(filter_var($imgURL,FILTER_VALIDATE_URL) == true)
{
$dump = parse_url($imgURL);
$host = trim(strtolower($dump["host"]));
}
}
if($host != "")
{
$extension = pathinfo($imgURL,PATHINFO_EXTENSION);
if($extension != "")
{
$tempBuffer ="";
$extensionlength = strlen($extension);
for($tempData = 0; $tempData < $extensionlength; $tempData++)
{
if($extension[$tempData] != "?")
{
$tempBuffer = $tempBuffer.$extension[$tempData];
continue;
}
else
{
$extension = trim($tempBuffer);
break;
}
}
if(in_array($extension,$Url_Extensions))
{ $type = "domain"; }
else if(in_array($extension,$Image_Extensions))
{ $type = "image"; }
else if(in_array($extension,$Document_Extensions))
{ $type = "document"; }
else
{ $type = "unknown"; }
}
else
{ $type = "domain"; }
if($imgURL != "")
{
if($type == "domain" && !in_array($imgURL,$this->linkBuffer["domain"]))
{ $this->linkBuffer["domain"][] = $imgURL; }
if($type == "image" && !in_array($imgURL,$this->linkBuffer["image"]))
{ $this->linkBuffer["image"][] = $imgURL; }
if($type == "document" && !in_array($imgURL,$this->linkBuffer["document"]))
{ $this->linkBuffer["document"][] = $imgURL; }
if($type == "unknown" && !in_array($imgURL,$this->linkBuffer["unknown"]))
{ $this->linkBuffer["unknown"][] = $imgURL; }
}
}
}
$imgTagCountStart = 0;
}
if($imgTagCountStart == 3)
{
$imgURL = "";
$dotCount = 0;
$slashCount = 0;
$singleSlashCount = 0;
$doubleSlashCount = 0;
$parentDirectoryCount = 0;
$webPageCounter++;
while($webPageCounter < $webPageLength)
{
$character = $webPageContent[$webPageCounter];
if($character == "")
{
$webPageCounter++;
continue;
}
if($character == "\"" || $character == "'")
{
$webPageCounter++;
while($webPageCounter < $webPageLength)
{
$character = $webPageContent[$webPageCounter];
if($character == "")
{
$webPageCounter++;
continue;
}
if($character == "\"" || $character == "'" || $character == "#")
{
$webPageCounter--;
break;
}
else if($imgURL != "")
{ $imgURL .= $character; }
else if($character == "." || $character == "/")
{
if($character == ".")
{
$dotCount++;
$slashCount = 0;
}
else if($character == "/")
{
$slashCount++;
if($dotCount == 2 && $slashCount == 1)
$parentDirectoryCount++;
else if($dotCount == 0 && $slashCount == 1)
$singleSlashCount++;
else if($dotCount == 0 && $slashCount == 2)
$doubleSlashCount++;
$dotCount = 0;
}
}
else
{ $imgURL .= $character; }
$webPageCounter++;
}
break;
}
$webPageCounter++;
}
}
$imgTagLengthStart = 0;
$imgTagLengthFinal = strlen($imgTag[$imgTagCountStart]);
$imgTagPointer =& $imgTag[$imgTagCountStart];
}
}
else
{ $imgTagLengthStart = 0; }
//-- Image Filter End --
$webPageCounter++;
}
}
else
{ $this->error = "Unable to proceed, permission denied"; }
}
else
{ $this->error = "Please enter url"; }
if($this->error != "")
{ $this->linkBuffer["error"] = $this->error; }
return $this->linkBuffer;
}
}
?>

Well the web is basically a directed graph, so you can construct a graph out of the urls and then do a BFS or DFS traversal while marking the visited nodes so you don't visit the same page twice.

This is a web crawler example. Which can be used to collect mac Addresses for mac spoofing.
#!/usr/bin/env python
import sys
import os
import urlparse
import urllib
from bs4 import BeautifulSoup
def mac_addr_str(f_data):
global fptr
global mac_list
word_array = f_data.split(" ")
for word in word_array:
if len(word) == 17 and ':' in word[2] and ':' in word[5] and ':' in word[8] and ':' in word[11] and ':' in word[14]:
if word not in mac_list:
mac_list.append(word)
fptr.writelines(word +"\n")
print word
url = "http://stackoverflow.com/questions/tagged/mac-address"
url_list = [url]
visited = [url]
pwd = os.getcwd();
pwd = pwd + "/internet_mac.txt";
fptr = open(pwd, "a")
mac_list = []
while len(url_list) > 0:
try:
htmltext = urllib.urlopen(url_list[0]).read()
except:
url_list[0]
mac_addr_str(htmltext)
soup = BeautifulSoup(htmltext)
url_list.pop(0)
for tag in soup.findAll('a',href=True):
tag['href'] = urlparse.urljoin(url,tag['href'])
if url in tag['href'] and tag['href'] not in visited:
url_list.append(tag['href'])
visited.append(tag['href'])
Change the url to crawl more sites......good luck

Related

Nested For loop exited after 22 items in 27 items, running on Node server side

[Modified]
A little help here can do.
I ran into a problem with for loop. I got one for loop nested in another. The outer loop has just level of elements, while the nested contains only 27 items.
The problem is that the setup can only run through 22 items of the nested array before exiting.
The setup is running on the server side, done in node.js. The Client is React which communicates with the server through websocket. From the console report, node.js runs the above code 3x; and in all those times, only 22 items are looped out of the expected 27.
Join at Repl for the full Node.js dir:
https://replit.com/join/uabhesdiif-emexrevolarter
The code below. I appreciate any help. Thank you.
VerifyResult.js
let data = {};
let error = [];
let feedback = [];
if(typeof dataJson == 'string') {
data = JSON.parse(dataJson)
} else {
data = dataJson;
}
try {
// get variables
const getAvatarState = data.settings.isAvatar;
const getIdState = data.settings.isAdminId;
const getStudents = data.students[0]['Students Data'];
const getClassInfo = data.students[0]['Class Info'][0];
const getAvatars = data.images;
const getLogo = data.logo;
const getFilenames = data.filenames;
const getFirstTermState = data.settings.isFirstTerm;
const getSecondTermState = data.settings.isSecondTerm;
const getThirdTermState = data.settings.isThirdTerm;
const getSubjectList = data.subjects;
let getFirstTerm = [];
let getSecondTerm =[];
let getThirdTerm = [];
// verify #1: if avatars are permitted & same number of images included
if(getAvatarState == '1') {
if(getStudents.length > getAvatars.length) {
feedback.push(`${getAvatars.length} Images supplied for Avatar is lesser than the number of ${getStudents.length} students`);
feedback.push('Some Results may not include Avatar')
}
if(getStudents.length < getAvatars.length) {
feedback.push(`${getAvatars.length} Images supplied for Avatar is more than the number of ${getStudents.length} students`);
feedback.push('This may not pose any problem except possible mismatch')
}
} else {
feedback.push('Avatar will not be included in any result')
}
// verify #2: if logo is included
if(getLogo == null) {
feedback.push('No image was added as logo, and hence, will not be included')
}
// verify #3: if empty value for Admin Numbers, Names, in Students file
let idList = [];
let countItem = 0;
let innerCount = 0;
for(let k = 0; k < getStudents.length; k++) {
let d = getStudents[k];
countItem = k + 1;
let fName, lName;
if(getIdState == '1') {
if(d.Admin_Number != undefined && d.Admin_Number != '') {
idList.push(d.Admin_Number);
} else {
error.push(`Admin Number for Student at row "${countItem}" is empty, in Students workbook`)
}
}
if(d.Firstname != undefined && d.Firstname != '') {
fName = d.Firstname
} else {
error.push(`Firstname for Student at row "${countItem}" is empty, in Students workbook`)
}
if(d.Surname != undefined && d.Surname != '') {
lName = d.Surname
} else {
error.push(`Surname for Student at row "${countItem}" is empty, in Students workbook`)
}
if(fName != undefined && lName != undefined) {
(getIdState == '0') && idList.push(`${lName}/${fName}`);
}
};
// verify #4: if IDs for Students file is unique
if(!IsArrayUnique(idList)) {
(getIdState == '1') && error.push(`One or Student Admin Numbers are not unique, in Students file`);
(getIdState == '0') && error.push(`One or more Student names are not unique, in Students file`);
}
// verify #5: if empty value for Class Info, in Students file
if(getClassInfo.Class_Name == undefined) {
error.push('Class Name of Class Info worksheet is empty, in Students file')
}
if(getClassInfo.School == undefined) {
error.push('School Name of Class Info worksheet is empty, in Students file')
}
if(getClassInfo.Session == undefined) {
error.push('Session of Class Info worksheet is empty, in Students file')
}
if(getClassInfo.Times_School_Opened == undefined) {
error.push('No of Times School Opened of Class Info worksheet is empty, in Students file')
}
countItem = 0;
for(let i = 0; i < getSubjectList.length; i++) {
let sub = getSubjectList[i];
let gFile = getFilenames[i];
countItem = i + 1;
let innError = false;
let list = [];
let subj, fName, lName, sName, termName;
if(getFirstTermState == '1') {
// verify #6: if empty value in Subject files for First Term
innError = false;
termName = 'First Term';
subj = sub['First Term'];
if(sub['Class Info'][0].Class_Name == undefined) {
innError = true;
error.push(`Class Name in ${gFile} is empty`)
}
if(sub['Class Info'][0].Subject == undefined) {
innError = true;
error.push(`Subject Name in ${gFile} is empty`)
}
if(subj < 1) {
innError = true;
error.push(`Subject scores for ${termName} cannot be found in ${gFile}`)
}
if(!innError) {
console.log('==== STUDENTS TOTAL: ', subj.length);
innerCount = 0;
for(let j = 0; j < subj.length; j++) {
let item = subj[j];
innerCount = j + 1;
if(item.Surname == undefined) {
innError = true;
error.push(`Surname for a Student at row ${innerCount} is empty, for ${termName} in ${gFile}`)
}
if(item.Firstname == undefined) {
innError = true;
error.push(`Firstname for a Student at row ${innerCount} is empty, for ${termName} in ${gFile}`)
}
if(item.Admin_Number == undefined && getIdState == '1') {
innError = true;
error.push(`Admin Number for a Student at row ${innerCount} is empty, for ${termName} in ${gFile}`)
}
if(!innError) {
fName = item.Firstname;
lName = item.Surname;
if(item.CA == undefined && !innError) {
innError = true;
error.push(`CA for "${lName} ${fName}" at row ${innerCount} is empty, for ${termName} in ${gFile}`)
}
if(item.Exam == undefined && !innError) {
innError = true;
error.push(`Exam for "${lName} ${fName}" at row ${innerCount} is empty, for ${termName} in ${gFile}`)
}
}
// verify #7: if both CA & Exam are numbers for First Term
if(!innError) {
console.log('====== TOTAL READ: ', innerCount + ' | ' + countItem);
console.log('====== IS CA A NUMBER?: ', item.CA + ' | ' + IsNumber(item.CA));
console.log('====== IS Exam A NUMBER?: ', item.Exam + ' | ' + IsNumber(item.Exam));
if(!IsNumber(item.CA)) {
innError = true;
error.push(`CA for "${lName} ${fName}" at row ${innerCount} is not a number (${item.CA}), for ${termName} in ${gFile}`)
}
if(!IsNumber(item.Exam)) {
innError = true;
error.push(`Exam for "${lName} ${fName}" at row ${innerCount} is not a number (${item.Exam}), for ${termName} in ${gFile}`)
}
}
// verify #8: if Total is > 45 & < 50
if(!innError) {
const itemTotal = Number(item.CA) + Number(item.Exam);
if(itemTotal > 45 && itemTotal < 50) {
innError = true;
error.push(`The CA (CA: ${item.CA}; Exam: ${item.Exam}; Total: ${itemTotal}) for "${lName} ${fName}" at row ${innerCount}, should be upgraded to 50, for ${termName} in ${gFile}`)
}
}
if(!innError) {
if(getIdState == '1'){
sName = item.Admin_Number;
list.push(sName);
} else {
sName = lName + '/' + fName;
list.push(`${sName}`);
}
// verify #9: match IDs as Admin Number or Names for First Term
if(idList.indexOf(sName) === -1) {
innError = true;
(getIdState == '1') && error.push(`Student Admin Number "${sName}" is mismatched, for ${termName} in ${gFile}`);
(getIdState == '0') && error.push(`Student name "${lName} ${fName}" is mismatched, for ${termName} in ${gFile}`);
}
}
}
// verify #10: if unique IDs for First Term
if(!innError) {
if(idList.length == list.length) {
if(!IsArrayUnique(list)){
innError = true;
(getIdState == '1') && error.push(`One or more Student Admin Numbers are not unique, for ${termName} in ${gFile}`)
(getIdState == '0') && error.push(`One or more Student names are not unique, for ${termName} in ${gFile}`);
}
} else {
innError = true;
error.push(`Found ${list.length} Students against expected ${idList.length} number of Students, for ${termName} in ${gFile}`)
}
}
}
}
}
// verify #3: match admin Id is set as active, else match students names
} catch (err) {
error.push(err)
}
const res = {
data: data,
error: error,
feedback: feedback
}
data.json
{"settings":{"isAdminId":"1","isFirstTerm":"1","isSecondTerm":"0","isThirdTerm":"0","isAvatar":"1","isVerify":"1"},"subjects":[{"First Term":[{"Surname":"ADEOLA","Firstname":"TOLUWALASE","CA":38,"Exam":56,"Admin_Number":"AND/J/0271"},{"Surname":"ADEMOLA","Firstname":"SAMUEL","CA":35,"Exam":55,"Admin_Number":"ADM/J/0273"},{"Surname":"AGBONIRO","Firstname":"JESSE","CA":40,"Exam":54,"Admin_Number":"ADM/J/0284"},{"Surname":"AJAYI","Firstname":"ABIGAIL","CA":23,"Exam":33,"Admin_Number":"ADM/J/0269"},{"Surname":"AKANDE","Firstname":"MICHAEL","CA":29,"Exam":41,"Admin_Number":"ADM/J/0267"},{"Surname":"AKINTOKUN","Firstname":"MOFE","CA":27,"Exam":47,"Admin_Number":"ADM/J/0272"},{"Surname":"ARTHUR","Firstname":"DESTINY","CA":33,"Exam":55,"Admin_Number":"ADM/J/0266"},{"Surname":"AYANWENU","Firstname":"AKOREDE","CA":22,"Exam":35,"Admin_Number":"ADM/J/0270"},{"Surname":"BETHEL","Firstname":"MOYINULUWA","CA":40,"Exam":58,"Admin_Number":"ADM/J/0274"},{"Surname":"CHIJIOKE","Firstname":"JOY","CA":26,"Exam":44,"Admin_Number":"ADM/J/0276"},{"Surname":"CHINWUBA","Firstname":"CHARLES","CA":33,"Exam":40,"Admin_Number":"ADM/J/0275"},{"Surname":"DAVID","Firstname":"BEST","CA":40,"Exam":59,"Admin_Number":"ADM/J/0277"},{"Surname":"DUYILE","Firstname":"IFEOLUWA","CA":29,"Exam":41,"Admin_Number":"ADM/J/0256"},{"Surname":"EZEADILI","Firstname":"CHIBUNDU","CA":35,"Exam":49,"Admin_Number":"ADM/J/0278"},{"Surname":"FAGITE","Firstname":"TOBILOBA","CA":40,"Exam":57,"Admin_Number":"ADM/J/0279"},{"Surname":"IDOGEN","Firstname":"EHIZOFUA","CA":31,"Exam":51,"Admin_Number":"ADM/J/0282"},{"Surname":"IFADA","Firstname":"HOSSONA","CA":33,"Exam":50,"Admin_Number":"ADM/J/0255"},{"Surname":"IHEDIOHA","Firstname":"DELIGHT","CA":34,"Exam":53,"Admin_Number":"ADM/J/0280"},{"Surname":"JOSIAH","Firstname":"MATTHEW","CA":29,"Exam":49,"Admin_Number":"ADM/J/0283"},{"Surname":"MBONU","Firstname":"MMESOMA","CA":32,"Exam":45,"Admin_Number":"ADM/J/0285"},{"Surname":"NWONU","Firstname":"DAVID","CA":27,"Exam":41,"Admin_Number":"ADM/J/0260"},{"Surname":"OBI","Firstname":"LUCIA","CA":20,"Exam":26,"Admin_Number":"ADM/J/0289"},{"Surname":"ODOZOR","Firstname":"AUSTIN","CA":19,"Exam":29,"Admin_Number":"ADM/J/0249"},{"Surname":"OLAWALE","Firstname":"MOYINOLUWA","CA":29,"Admin_Number":"ADM/J/0290"},{"Surname":"OKERENITE","Firstname":"SPLENDID","Exam":53,"Admin_Number":"ADM/J/0287"},{"Surname":"OKUSHI","Firstname":"DANIEL","CA":32,"Exam":"x","Admin_Number":"ADM/J/0288"},{"Surname":"DABRINZE","Firstname":"POSSIBLE","CA":"y","Exam":40,"Admin_Number":"ADM/J/0250"}],"Second Term":[],"Third Term":[],"Class Info":[{"Class_Name":"JS2 Goodness","Subject":"BST (Information Tech.)"}],"ID":[{"Id":"Subjects","Start_Row":3}]}],"filenames":["subject_ict.xlsx"],"students":[{"Students Data":[{"Surname":"ADEOLA","Firstname":"TOLUWALASE","House":"Green","Admin_Number":"AND/J/0271","DOB":"26 FEB 2010","Sex":"M","Times_Present":126,"Override_Teacher_Comment":"very naughty","Override_Skills_Musical":3,"Override_Skills_Painting":4,"Override_Skills_Craft":3,"Override_Skills_Tools":3,"Override_Skills_Fluency":4,"Override_Sports_Indoor":4,"Override_Sports_Ball":4,"Override_Sports_Combative":3,"Override_Sports_Track":3,"Override_Sports_Gymnastics":4,"Override_Curricular_Jets":5,"Override_Curricular_Farmers":3,"Override_Curricular_Debating":4,"Override_Curricular_Homemaker":5,"Override_Curricular_Drama":5,"Override_Curricular_Voluntary":4,"Override_Curricular_Others":3,"Override_Behaviour_Reliability":5,"Override_Behaviour_Neatness":5,"Override_Behaviour_Politeness":5,"Override_Behaviour_Honesty":5,"Override_Behaviour_Creativity":4,"Override_Behaviour_Leadership":5,"Override_Behaviour_Spirituality":5,"Override_Behaviour_Cooporation":5},{"Surname":"ADEMOLA","Firstname":"SAMUEL","House":"Green","Admin_Number":"ADM/J/0273","DOB":"27 MAY 2009","Sex":"M","Times_Present":126},{"Surname":"AGBONIRO","Firstname":"JESSE","House":"Red","Admin_Number":"ADM/J/0284","DOB":"9 SEPT 2010","Sex":"M","Times_Present":126},{"Surname":"AJAYI","Firstname":"ABIGAIL","House":"Green","Admin_Number":"ADM/J/0269","DOB":"JUNE 2010","Sex":"F","Times_Present":120},{"Surname":"AKANDE","Firstname":"MICHAEL","House":"Red","Admin_Number":"ADM/J/0267","DOB":"29 DEC 2010","Sex":"M","Times_Present":126},{"Surname":"AKINTOKUN","Firstname":"MOFE","House":"Yellow","Admin_Number":"ADM/J/0272","DOB":"21 JULY 2010","Sex":"M","Times_Present":126},{"Surname":"ARTHUR","Firstname":"DESTINY","House":"Green","Admin_Number":"ADM/J/0266","DOB":"23 AUG 2009","Sex":"M","Times_Present":126},{"Surname":"AYANWENU","Firstname":"AKOREDE","House":"Blue","Admin_Number":"ADM/J/0270","DOB":"17 MAY 2010","Sex":"F","Times_Present":126},{"Surname":"BETHEL","Firstname":"MOYINULUWA","House":"Blue","Admin_Number":"ADM/J/0274","DOB":"13 MAR 2010","Sex":"F","Times_Present":126},{"Surname":"CHIJIOKE","Firstname":"JOY","House":"Yellow","Admin_Number":"ADM/J/0276","DOB":"27 FEB 2010","Sex":"F","Times_Present":126},{"Surname":"CHINWUBA","Firstname":"CHARLES","House":"Yellow","Admin_Number":"ADM/J/0275","DOB":"15 MAY 2010","Sex":"M","Times_Present":122},{"Surname":"DAVID","Firstname":"BEST","House":"Yellow","Admin_Number":"ADM/J/0277","DOB":"3 MAR 2010","Sex":"F","Times_Present":126},{"Surname":"DUYILE","Firstname":"IFEOLUWA","House":"Green","Admin_Number":"ADM/J/0256","DOB":"8 JAN 2010","Sex":"M","Times_Present":126},{"Surname":"EZEADILI","Firstname":"CHIBUNDU","House":"Yellow","Admin_Number":"ADM/J/0278","DOB":"14 FEB 2010","Sex":"M","Times_Present":126},{"Surname":"FAGITE","Firstname":"TOBILOBA","House":"Blue","Admin_Number":"ADM/J/0279","DOB":"26 AUG 2010","Sex":"M","Times_Present":126},{"Surname":"IDOGEN","Firstname":"EHIZOFUA","House":"Blue","Admin_Number":"ADM/J/0282","DOB":"6 JUNE 2010","Sex":"F","Times_Present":126},{"Surname":"IFADA","Firstname":"HOSSONA","House":"Red","Admin_Number":"ADM/J/0255","DOB":"12 OCT 2010","Sex":"M","Times_Present":126},{"Surname":"IHEDIOHA","Firstname":"DELIGHT","House":"Green","Admin_Number":"ADM/J/0280","DOB":"4 NOV 2010","Sex":"M","Times_Present":126},{"Surname":"JOSIAH","Firstname":"MATTHEW","House":"Red","Admin_Number":"ADM/J/0283","DOB":"13 AUG 2010","Sex":"F","Times_Present":126},{"Surname":"MBONU","Firstname":"MMESOMA","House":"Green","Admin_Number":"ADM/J/0285","DOB":"17 MAY 2009","Sex":"F","Times_Present":126},{"Surname":"NWONU","Firstname":"DAVID","House":"Yellow","Admin_Number":"ADM/J/0260","DOB":"29 AUG 2010","Sex":"M","Times_Present":124},{"Surname":"OBI","Firstname":"LUCIA","House":"Red","Admin_Number":"ADM/J/0289","DOB":"29 DEC 2010","Sex":"F","Times_Present":126},{"Surname":"ODOZOR","Firstname":"AUSTIN","House":"Green","Admin_Number":"ADM/J/0249","DOB":"9 AUG 2008","Sex":"M","Times_Present":126},{"Surname":"OLAWALE","Firstname":"MOYINOLUWA","House":"Red","Admin_Number":"ADM/J/0290","DOB":"19 SEP 2010","Sex":"F","Times_Present":108},{"Surname":"OKERENITE","Firstname":"SPLENDID","House":"Green","Admin_Number":"ADM/J/0287","DOB":"14 DEC 2009","Sex":"M","Times_Present":126},{"Surname":"OKUSHI","Firstname":"DANIEL","House":"Blue","Admin_Number":"ADM/J/0288","DOB":"29 SEP 2010","Sex":"M","Times_Present":126},{"Surname":"DABRINZE","Firstname":"POSSIBLE","House":"Yellow","Admin_Number":"ADM/J/0250","DOB":"20 JUN 2010","Sex":"M","Times_Present":126}],"Class Info":[{"Class_Name":"JS2 Goodness","School":"Junior Secondary","Session":"2021/2022","Times_School_Opened":126,"Next_Term_Begins":"4th January, 2022"}],"Settings":[{"Id_By_Admin_No":"Yes","First_Term_Active":"Yes","Second_Term_Active":"No","Third_Term_Active":"No","Students_Photos":"Yes","Verify":"Yes"}],"ID":[{"Id":"Students","Start_Row":3}]}],"keys":[{"Junior Grade Keys":[{"Grade_Name":"A","Min_Value":70,"Max_Value":100},{"Grade_Name":"B","Min_Value":60,"Max_Value":69.9},{"Grade_Name":"C","Min_Value":50,"Max_Value":59.9},{"Grade_Name":"P","Min_Value":40,"Max_Value":49.9},{"Grade_Name":"F","Min_Value":0,"Max_Value":39.9}],"Senior Grade Keys":[{"Grade_Name":"A","Min_Value":75,"Max_Value":100},{"Grade_Name":"B2","Min_Value":70,"Max_Value":74.9},{"Grade_Name":"B3","Min_Value":65,"Max_Value":69.9},{"Grade_Name":"C4","Min_Value":60,"Max_Value":64.9},{"Grade_Name":"C5","Min_Value":55,"Max_Value":59.9},{"Grade_Name":"C6","Min_Value":50,"Max_Value":54.9},{"Grade_Name":"D7","Min_Value":45,"Max_Value":49.9},{"Grade_Name":"E8","Min_Value":40,"Max_Value":44.9},{"Grade_Name":"F9","Min_Value":0,"Max_Value":39.9}],"Rating Keys":[{"Rating_Name":"Skill","Min_Value":2,"Max_Value":5},{"Rating_Name":"Sports","Min_Value":2,"Max_Value":5},{"Rating_Name":"Curricular","Min_Value":2,"Max_Value":5},{"Rating_Name":"Behaviour","Min_Value":2,"Max_Value":5}],"School Info":[{"School_Name":"SAMUEL ADEGBITE ANGLICAN COLLEGE","Extra_Info":"(A Day-Co-educational Institution Of The Anglican Diocese Of Lagos West)","Address":"St. Peters Anglican Church, Ikotun Road, Idumu, Lagos.","Title":"SECONDARY SCHOOL STUDENT'S INTERNAL ACADEMIC REPORT SHEET","Watermark":"SAMUEL ADEGBITE ANGLICAN COLLEGE"},{"Address":null,"Title":null},{"Address":null,"Title":null},{"Address":null,"Title":null}],"ID":[{"Id":"DON'T CHANGE!!!","Start_Row":null},{"Id":"File ID","Start_Row":"Start Row"},{"Id":"Keys","Start_Row":1},{"Id":null}]}],"pool":[{"Principal Comment":[{"Pool":"Good performance, keep it up"},{"Pool":"A wonderful performance, keep it up"},{"Pool":"A highly impressive result, keep it up"},{"Pool":"Good result but you need to improve in your weak area(s)"},{"Pool":"Average result, there is still room for improvement"},{"Pool":"Average result, you can still do better"},{"Pool":"A fair result, you really need to improve in your weak subject(s)"},{"Pool":"Weak performance, you really need to improve in your weak subjects"},{"Pool":"A very good performance, keep it up"},{"Pool":"Beautiful performance, keep the flag flying"},{"Pool":"Weak result, you need to concentrate more on your studies"},{"Pool":"Excellent performance, keep it up"},{"Pool":"A brilliant performance, keep it up"},{"Pool":"Good result, there is room for improvement"},{"Pool":"An average result, you really need to study harder"},{"Pool":"A very good result, weldone."},{"Pool":"A beautiful result, keep it up"},{"Pool":"An impressive result, keep it up"},{"Pool":"A very good performance, keep the flag flying"},{"Pool":"An average result, there is still room for improvement,"},{"Pool":"A fair result, you can still do better."},{"Pool":"A weak result, there is room for you to work harder."},{"Pool":"A very weak performance, you seriously need to work harder."},{"Pool":"Poor performance, your need to work on your weak subjects"},{"Pool":"Excellent performance, keep it up."},{"Pool":null}],"Teacher Comment":[{"Pool":"Always willing to take up responsiblities"},{"Pool":"Very friendly and humorous"},{"Pool":"A highly talented student"},{"Pool":"Very hardworking and conscioustious"},{"Pool":"Friendly but needs to put in more efforts"},{"Pool":"Easy-going and very neat"},{"Pool":"Always punctual to class and all other activities"},{"Pool":"Highly dependable and trustworthy"},{"Pool":"Very reliable and hardworking"},{"Pool":"Active and always willing to take up responsibilities"},{"Pool":"Friendly but needs to pay more attention to his/her studies"},{"Pool":"Friendly but needs to settle down for serious work"},{"Pool":"Easy-going but needs to be more punctual"},{"Pool":"Very studious and always attentive in class"},{"Pool":"Very reliable and willing to assist others"},{"Pool":"Highly cooperative and dependable"},{"Pool":"Friendly but needs to be less playful"},{"Pool":"Friendly but restless, needs to always observe siesta"},{"Pool":"Highly organized and conscientious"},{"Pool":"Friendly and highly willing to learn"},{"Pool":"A consistent and handworking student"},{"Pool":"Very reliable and ready to take to correction"},{"Pool":null},{"Pool":null},{"Pool":null}],"Skills":[{"Pool":"Musical Instrument"},{"Pool":"Drawing and Painting"},{"Pool":"Craft"},{"Pool":"Handling of Workshop Tools"},{"Pool":"Handwriting/Fluency"}],"Sports":[{"Pool":"Indoor Games"},{"Pool":"Ball Games"},{"Pool":"Combative Games"},{"Pool":"Track/Throws"},{"Pool":"Gymnastics"}],"Curricular":[{"Pool":"JETS"},{"Pool":"Young Farmers Club"},{"Pool":"Literary & Debating"},{"Pool":"Home Maker"},{"Pool":"Drama/Culture/Music/Craft"},{"Pool":"Voluntary Service Org."},{"Pool":"Others"}],"Behaviour":[{"Pool":"Reliability"},{"Pool":"Neatness"},{"Pool":"Politeness"},{"Pool":"Honesty"},{"Pool":"Initiative/Creativity"},{"Pool":"Leadership Role"},{"Pool":"Level of Spiritual Dev.","Override":3},{"Pool":"Spirit of Co-operation"}],"ID":[{"Id":"DON'T CHANGE!!!","Start_Row":null},{"Id":"File ID","Start_Row":"Start Row"},{"Id":"Pool","Start_Row":1}]}],"logo":null,"images":["stunt toy car.jpg"],"errors":[]}
I tried your code with your data and it is -- as expected -- iterating over all 27 items. But because some condition isn't met, you are setting innError = true for the 22nd record and never setting it back to false.
So all items after that will be treated like they have an error too (which probably makes you believe they are not iterated). Just try adding a console.log(j) as first statement in your inner loop. And you will see, it will printout all values from 0 to 26 ...

InDesign Text Modification Script Skips Content

This InDesign Javascript iterates over textStyleRanges and converts text with a few specific appliedFont's and later assigns a new appliedFont:-
var textStyleRanges = [];
for (var j = app.activeDocument.stories.length-1; j >= 0 ; j--)
for (var k = app.activeDocument.stories.item(j).textStyleRanges.length-1; k >= 0; k--)
textStyleRanges.push(app.activeDocument.stories.item(j).textStyleRanges.item(k));
for (var i = textStyleRanges.length-1; i >= 0; i--) {
var myText = textStyleRanges[i];
var converted = C2Unic(myText.contents, myText.appliedFont.fontFamily);
if (myText.contents != converted)
myText.contents = converted;
if (myText.appliedFont.fontFamily == 'Chanakya'
|| myText.appliedFont.fontFamily == 'DevLys 010'
|| myText.appliedFont.fontFamily == 'Walkman-Chanakya-905') {
myText.appliedFont = app.fonts.item("Utsaah");
myText.composer="Adobe World-Ready Paragraph Composer";
}
}
But there are always some ranges where this doesn't happen. I tried iterating in the forward direction OR in the backward direction OR putting the elements in an array before conversion OR updating the appliedFont in the same iteration OR updating it a different one. Some ranges are still not converted completely.
I am doing this to convert the Devanagari text encoded in glyph based non-Unicode encoding to Unicode. Some of this involves repositioning vowel signs etc and changing the code to work with find/replace mechanism may be possible but is a lot of rework.
What is happening?
See also: http://cssdk.s3-website-us-east-1.amazonaws.com/sdk/1.0/docs/WebHelp/app_notes/indesign_text_frames.htm#Finding_and_changing_text
Sample here: https://www.dropbox.com/sh/7y10i6cyx5m5k3c/AAB74PXtavO5_0dD4_6sNn8ka?dl=0
This is untested since I'm not able to test against your document, but try using getElements() like below:
var doc = app.activeDocument;
var stories = doc.stories;
var textStyleRanges = stories.everyItem().textStyleRanges.everyItem().getElements();
for (var i = textStyleRanges.length-1; i >= 0; i--) {
var myText = textStyleRanges[i];
var converted = C2Unic(myText.contents, myText.appliedFont.fontFamily);
if (myText.contents != converted)
myText.contents = converted;
if (myText.appliedFont.fontFamily == 'Chanakya'
|| myText.appliedFont.fontFamily == 'DevLys 010'
|| myText.appliedFont.fontFamily == 'Walkman-Chanakya-905') {
myText.appliedFont = app.fonts.item("Utsaah");
myText.composer="Adobe World-Ready Paragraph Composer";
}
}
A valid approach is to use hyperlink text sources as they stick to the genuine text object. Then you can edit those source texts even if they were actually moved elsewhere in the flow.
//Main routine
var main = function() {
//VARS
var doc = app.properties.activeDocument,
fgp = app.findGrepPreferences.properties,
cgp = app.changeGrepPreferences.properties,
fcgo = app.findChangeGrepOptions.properties,
text, str,
found = [], srcs = [], n = 0;
//Exit if no documents
if ( !doc ) return;
app.findChangeGrepOptions = app.findGrepPreferences = app.changeGrepPreferences = null;
//Settings props
app.findChangeGrepOptions.properties = {
includeHiddenLayers:true,
includeLockedLayersForFind:true,
includeLockedStoriesForFind:true,
includeMasterPages:true,
}
app.findGrepPreferences.properties = {
findWhat:"\\w",
}
//Finding text instances
found = doc.findGrep();
n = found.length;
//Looping through instances and adding hyperlink text sources
//That's all we do at this stage
while ( n-- ) {
srcs.push ( doc.hyperlinkTextSources.add(found[n] ) );
}
//Then we edit the stored hyperlinks text sources 's texts objects contents
n = srcs.length;
while ( n-- ) {
text = srcs[n].sourceText;
str = text.contents;
text.contents = str+str+str+str;
}
//Eventually we remove the added hyperlinks text sources
n = srcs.length;
while ( n-- ) srcs[n].remove();
//And reset initial properties
app.findGrepPreferences.properties = fgp;
app.changeGrepPreferences.properties = cgp;
app.findChangeGrepOptions.properties =fcgo;
}
//Running script in a easily cancelable mode
var u;
app.doScript ( "main()",u,u,UndoModes.ENTIRE_SCRIPT, "The Script" );

Making a game with 4 possible answer buttons

I am new to ActionScript-3 and I am attempting to make a game to learn more.
For every picture that is displayed I want there to be 4 choices (buttons) and only one of them to be the correct one. But how can I make it so that the text from the buttons will be random.
As you can see I've made it so the 4th button is always the correct answer. I don't want to make all this thing for every picture that is displayed...to much pointless code.
Can anybody help me? If you need extra information I will gladly provide it.
var k:int;
for(k=1;k<=3;k++)
{
GAME.variante.buttonMode=true;
GAME.variante.addEventListener(MouseEvent.MOUSE_OVER,mouse_over_variante);
GAME.variante.addEventListener(MouseEvent.MOUSE_OUT,mouse_out_variante);
GAME.variante.varianta_corecta.addEventListener(MouseEvent.CLICK,variante);
GAME.variante.varianta_gresita1.addEventListener(MouseEvent.CLICK,variante_gresiteunu);
GAME.variante.varianta_gresita2.addEventListener(MouseEvent.CLICK,variante_gresitedoi);
GAME.variante.varianta_gresita3.addEventListener(MouseEvent.CLICK,variante_gresitetrei);
GAME.varianta1.text = "Cameleon";
GAME.varianta2.text = "Snake";
GAME.varianta3.text = "Frog";
GAME.varianta4.text = "Snail";
function variante_gresiteunu(e:MouseEvent){
if (varianta_gresita_apasata1 == 1){
totalScore -= score_variante_gresite;
GAME.text1.text = totalScore;
varianta_gresita_apasata1 = 2;
}
}
function variante_gresitedoi(e:MouseEvent){
if (varianta_gresita_apasata2 == 1){
totalScore -= score_variante_gresite;
GAME.text1.text = totalScore;
varianta_gresita_apasata2 = 2;
}
}
function variante_gresitetrei(e:MouseEvent){
if (varianta_gresita_apasata3 == 1){
totalScore -= score_variante_gresite;
GAME.text1.text = totalScore;
varianta_gresita_apasata3 = 2;
}
}
}
GAME.extra_points.visible = false;
function variante (e:MouseEvent) {
if (GAME.stichere.sticker1.currentFrame == (1)){
GAME.extra_points.visible = true;
GAME.extra_points.plus_ten1.gotoAndPlay(1);
}
//go to great job screen
GAME.greatJob.stars.gotoAndPlay(1);
GAME.greatJob.visible = true;
}
function mouse_over_variante (e:MouseEvent) {
trace(e.target.name);
e.target.gotoAndPlay(1);
}
function mouse_out_variante (e:MouseEvent) {
e.target.gotoAndStop(1);
}
You like to have 4 images and they will be tested right?
The text below the images will be randomness. I saw your code and I
confess I was confused. I made a different one.
I undestand that this code is a little diferent of what you ask, but i think it will > give you some new ideas and help you on your app...
//start button added on the sceen named f3toc. I give a function name for him f3roll.
f3toc.addEventListener(MouseEvent.CLICK,f3roll);
function f3roll(e:MouseEvent):void{
//creating variables for the picture.
var bola:Number
var quadrado:Number
var pentagono:Number
//Here is just a randomization code, you can change it to the what you want to use after the =
bola = Math.ceil(Math.random() * 10);
pentagono = Math.ceil(Math.random() * 10);
f3res1_txt.text = String (bola + 8 + 8);
f3res2_txt.text = String(bola - 1 + bola);
f3res3_txt.text = String (pentagono + 10 - bola);
//converting number to string so we can put tem into the text fields.
var pentagonotring:String = pentagono.toString();
var bolastring:String = bola.toString();
//function to check wen the name is correct. each wrong do nothing and every correct add 1 to a variable, in the end wen this variable reach 3 it does something.
f3check_bnt.addEventListener (MouseEvent.CLICK, f3check);
function f3check (e:MouseEvent):void{
if (f3inp2_txt.text == pentagonotring){
f3ver_ext2.text = "Correct"
} else {f3ver_ext2.text = "Wrong";}
if (f3inp1_txt.text == bolastring){
f3ver_ext1.text = "Correct"
}else {f3ver_ext1.text = "Wrong";}
// function to check wen the variable pass reach 3
pass = 0;
if (f3ver_ext1.text == "Correct"){
pass++
}
if (f3ver_ext2.text == "Correct"){
pass++
if (pass == 3){
nextFrame();
}}}}

How do I shuffle nodes in a linked list?

I just started a project for my Java2 class and I've come to a complete stop. I just can't get
my head around this method. Especially when the assignment does NOT let us use any other DATA STRUCTURE or shuffle methods from java at all.
So I have a Deck.class in which I've already created a linked list containing 52 nodes that hold 52 cards.
public class Deck {
private Node theDeck;
private int numCards;
public Deck ()
{
while(numCards < 52)
{
theDeck = new Node (new Card(numCards), theDeck);
numCards++;
}
}
public void shuffleDeck()
{
int rNum;
int count = 0;
Node current = theDeck;
Card tCard;
int range = 0;
while(count != 51)
{
// Store whatever is inside the current node in a temp variable
tCard = current.getItem();
// Generate a random number between 0 -51
rNum = (int)(Math.random()* 51);
// Send current on a loop a random amount of times
for (int i=0; i < rNum; i ++)
current = current.getNext(); ******<-- (Btw this is the line I'm getting my error, i sort of know why but idk how to stop it.)
// So wherever current landed get that item stored in that node and store it in the first on
theDeck.setItem(current.getItem());
// Now make use of the temp variable at the beginning and store it where current landed
current.setItem(tCard);
// Send current back to the beginning of the deck
current = theDeck;
// I've created a counter for another loop i want to do
count++;
// Send current a "count" amount of times for a loop so that it doesn't shuffle the cards that have been already shuffled.
for(int i=0; i<count; i++)
current = current.getNext(); ****<-- Not to sure about this last loop because if i don't shuffle the cards that i've already shuffled it will not count as a legitimate shuffle? i think? ****Also this is where i sometimes get a nullpointerexception****
}
}
}
Now I get different kinds of errors
When I call on this method:
it will sometimes shuffle just 2 cards but at times it will shuffle 3 - 5 cards then give me a NullPointerException.
I've pointed out where it gives me this error with asterisks in my code above
at one point I got it to shuffle 13 cards but then everytime it did that it didn't quite shuffle them the right way. one card kept always repeating.
at another point I got all 52 cards to go through the while loop but again it repeated one card various times.
So I really need some input in what I'm doing wrong. Towards the end of my code I think my logic is completely wrong but I can't seem to figure out a way around it.
Seems pretty long-winded.
I'd go with something like the following:
public void shuffleDeck() {
for(int i=0; i<52; i++) {
int card = (int) (Math.random() * (52-i));
deck.addLast(deck.remove(card));
}
}
So each card just gets moved to the back of the deck in a random order.
If you are authorized to use a secondary data structure, one way is simply to compute a random number within the number of remaining cards, select that card, move it to the end of the secondary structure until empty, then replace your list with the secondary list.
My implementation shuffles a linked list using a divide-and-conquer algorithm
public class LinkedListShuffle
{
public static DataStructures.Linear.LinkedListNode<T> Shuffle<T>(DataStructures.Linear.LinkedListNode<T> firstNode) where T : IComparable<T>
{
if (firstNode == null)
throw new ArgumentNullException();
if (firstNode.Next == null)
return firstNode;
var middle = GetMiddle(firstNode);
var rightNode = middle.Next;
middle.Next = null;
var mergedResult = ShuffledMerge(Shuffle(firstNode), Shuffle(rightNode));
return mergedResult;
}
private static DataStructures.Linear.LinkedListNode<T> ShuffledMerge<T>(DataStructures.Linear.LinkedListNode<T> leftNode, DataStructures.Linear.LinkedListNode<T> rightNode) where T : IComparable<T>
{
var dummyHead = new DataStructures.Linear.LinkedListNode<T>();
DataStructures.Linear.LinkedListNode<T> curNode = dummyHead;
var rnd = new Random((int)DateTime.Now.Ticks);
while (leftNode != null || rightNode != null)
{
var rndRes = rnd.Next(0, 2);
if (rndRes == 0)
{
if (leftNode != null)
{
curNode.Next = leftNode;
leftNode = leftNode.Next;
}
else
{
curNode.Next = rightNode;
rightNode = rightNode.Next;
}
}
else
{
if (rightNode != null)
{
curNode.Next = rightNode;
rightNode = rightNode.Next;
}
else
{
curNode.Next = leftNode;
leftNode = leftNode.Next;
}
}
curNode = curNode.Next;
}
return dummyHead.Next;
}
private static DataStructures.Linear.LinkedListNode<T> GetMiddle<T>(DataStructures.Linear.LinkedListNode<T> firstNode) where T : IComparable<T>
{
if (firstNode.Next == null)
return firstNode;
DataStructures.Linear.LinkedListNode<T> fast, slow;
fast = slow = firstNode;
while (fast.Next != null && fast.Next.Next != null)
{
slow = slow.Next;
fast = fast.Next.Next;
}
return slow;
}
}
Just came across this and decided to post a more concise solution which allows you to specify how much shuffling you want to do.
For the purposes of the answer, you have a linked list containing PlayingCard objects;
LinkedList<PlayingCard> deck = new LinkedList<PlayingCard>();
And to shuffle them use something like this;
public void shuffle(Integer swaps) {
for (int i=0; i < swaps; i++) {
deck.add(deck.remove((int)(Math.random() * deck.size())));
}
}
The more swaps you do, the more randomised the list will be.

text box percentage validation in javascript

How can we do validation for percentage numbers in textbox .
I need to validate these type of data
Ex: 12-3, 33.44a, 44. , a3.56, 123
thanks in advance
sri
''''Add textbox'''''
<asp:TextBox ID="PERCENTAGE" runat="server"
onkeypress="return ispercentage(this, event, true, false);"
MaxLength="18" size="17"></asp:TextBox>
'''''Copy below function as it is and paste in tag..'''''''
<script type="text/javascript">
function ispercentage(obj, e, allowDecimal, allowNegative)
{
var key;
var isCtrl = false;
var keychar;
var reg;
if (window.event)
{
key = e.keyCode;
isCtrl = window.event.ctrlKey
}
else if (e.which)
{
key = e.which;
isCtrl = e.ctrlKey;
}
if (isNaN(key)) return true;
keychar = String.fromCharCode(key);
// check for backspace or delete, or if Ctrl was pressed
if (key == 8 || isCtrl)
{
return true;
}
ctemp = obj.value;
var index = ctemp.indexOf(".");
var length = ctemp.length;
ctemp = ctemp.substring(index, length);
if (index < 0 && length > 1 && keychar != '.' && keychar != '0')
{
obj.focus();
return false;
}
if (ctemp.length > 2)
{
obj.focus();
return false;
}
if (keychar == '0' && length >= 2 && keychar != '.' && ctemp != '10') {
obj.focus();
return false;
}
reg = /\d/;
var isFirstN = allowNegative ? keychar == '-' && obj.value.indexOf('-') == -1 : false;
var isFirstD = allowDecimal ? keychar == '.' && obj.value.indexOf('.') == -1 : false;
return isFirstN || isFirstD || reg.test(keychar);
}
</script>
You can further optimize this expression. Currently its working for all given patterns.
^\d*[aA]?[\-.]?\d*[aA]?[\-.]?\d*$
If you're talking about checking that a given text is a valid percentage, you can do one of a few things.
validate it with a regex like ^[0-9]+\.?[0-9]*$ then just convert that to a floating point value and check it's between 0 and 100 (that particular regex requires a zero before the decimal for values less than one but you can adapt it to handle otherwise).
convert it to a float using a method that raises an exception on invalid data (rather than just stopping at the first bad character.
use a convoluted regex which checks for valid entries without having to convert to a float.
just run through the text character by character counting numerics (a), decimal points (b) and non-numerics (c). Provided a is at least one, b is at most one, and c is zero, then convert to a float.
I have no idea whether your environment support any of those options since you haven't actually specified what it is :-)
However, my preference is to go for option 1, 2, 4 and 3 (in that order) since I'm not a big fan of convoluted regexes. I tend to think that they do more harm than good when thet become to complex to understand in less than three seconds.
Finally i tried a simple validation and works good :-(
function validate(){
var str = document.getElementById('percentage').value;
if(isNaN(str))
{
//alert("value out of range or too much decimal");
}
else if(str > 100)
{
//alert("value exceeded");
}
else if(str < 0){
//alert("value not valid");
}
}

Resources