Word Interop spellchecking a document word by word is too slow

Word Interop spellchecking a document word by word is too slow - interop

With the code below basically I'm checking a document word by word and separate the correct ones. I works, but too slow that I have to wait for more than 10 minutes for 40 thousand words. When I open the same document word checks it in a few seconds. What am I doing wrong?
var application = new Application();
application.Visible = false;
Document document = application.Documents.Open("C:\\Users\\hrzafer\\Desktop\\spellcheck.docx");
// Loop through all words in the document.
int count = document.Words.Count;
IList<string> corrects = new List<string>();
for (int i = 1; i <= count; i++)
{
if (document.Words[i].SpellingErrors.Count == 0)
{
string text = document.Words[i].Text;
corrects.Add(text);
}
}
File.WriteAllLines("corrects.txt", corrects);
application.Quit();

a workaround, as it seems you only want the correct words in your document:
for i=ActiveDocument.Range.SpellingErrors.Count to 1 step -1
ActiveDocument.Range.SpellingErrors(i).Delete
next
then save the document with a new name
ActiveDocument.SaveAs FileName:="C:\MyWordDocs\NewFile.docx", _
FileFormat:=wdFormatXMLDocument

Related

For loop not looping through spreadsheet

I made a small search box to retrieve values from a datasheet. For some reason it only loops once. Also this is the first time I'm trying something in google sheets, so if I want to search a string in column B, then the SEARCH_COL_IDX should be 1 right?
I put in a few loggers to check the process. I do see Logger.log(str), Logger.log("Check") and Logger.log("Check2). The latter two only once, where it should be multiple times imo. The Logger.log(row[#]) I don't get to see at all, which means it doesn't find anything.
I also tried doing if(row[SEARCH_COL_IDX] = str) { to see where it is looking. Than the Logger.log(row[#]) do come back, but from different rows AND columns. I am doing something wrong, but I can't find it.. any suggestions?
var SPREADSHEET_NAME = "Database";
var SEARCH_COL_IDX = 1;
var RETURN_COL_IDX= 1;
function searchStr(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var formSS = ss.getSheetByName("Userform");
var str = formSS.getRange("d17").getValue();
Logger.log(str);
var values = ss.getSheetByName("Database").getDataRange().getValues();
for (var i = 0; i < values.length; i++) {
Logger.log("check");
var row = values[i];
Logger.log("check2");
if(row[SEARCH_COL_IDX] == str) {
formSS.getRange("d3").setValue(row[0]);
Logger.log(row[0]);
formSS.getRange("d5").setValue(row[1]);
Logger.log(row[1]);
formSS.getRange("d7").setValue(row[2]);
Logger.log(row[2]);
formSS.getRange("d9").setValue(row[3]);
Logger.log(row[3]);
formSS.getRange("d11").setValue(row[4]);
Logger.log(row[4]);
formSS.getRange("d13").setValue(row[5]);
Logger.log(row[5]);
}
Logger.log("nothing found");
return row[RETURN_COL_IDX];
}

Your function has a return command in the last line of your loop block.

InDesign Text Modification Script Skips Content

This InDesign Javascript iterates over textStyleRanges and converts text with a few specific appliedFont's and later assigns a new appliedFont:-
var textStyleRanges = [];
for (var j = app.activeDocument.stories.length-1; j >= 0 ; j--)
for (var k = app.activeDocument.stories.item(j).textStyleRanges.length-1; k >= 0; k--)
textStyleRanges.push(app.activeDocument.stories.item(j).textStyleRanges.item(k));
for (var i = textStyleRanges.length-1; i >= 0; i--) {
var myText = textStyleRanges[i];
var converted = C2Unic(myText.contents, myText.appliedFont.fontFamily);
if (myText.contents != converted)
myText.contents = converted;
if (myText.appliedFont.fontFamily == 'Chanakya'
|| myText.appliedFont.fontFamily == 'DevLys 010'
|| myText.appliedFont.fontFamily == 'Walkman-Chanakya-905') {
myText.appliedFont = app.fonts.item("Utsaah");
myText.composer="Adobe World-Ready Paragraph Composer";
}
}
But there are always some ranges where this doesn't happen. I tried iterating in the forward direction OR in the backward direction OR putting the elements in an array before conversion OR updating the appliedFont in the same iteration OR updating it a different one. Some ranges are still not converted completely.
I am doing this to convert the Devanagari text encoded in glyph based non-Unicode encoding to Unicode. Some of this involves repositioning vowel signs etc and changing the code to work with find/replace mechanism may be possible but is a lot of rework.
What is happening?
See also: http://cssdk.s3-website-us-east-1.amazonaws.com/sdk/1.0/docs/WebHelp/app_notes/indesign_text_frames.htm#Finding_and_changing_text
Sample here: https://www.dropbox.com/sh/7y10i6cyx5m5k3c/AAB74PXtavO5_0dD4_6sNn8ka?dl=0

This is untested since I'm not able to test against your document, but try using getElements() like below:
var doc = app.activeDocument;
var stories = doc.stories;
var textStyleRanges = stories.everyItem().textStyleRanges.everyItem().getElements();
for (var i = textStyleRanges.length-1; i >= 0; i--) {
var myText = textStyleRanges[i];
var converted = C2Unic(myText.contents, myText.appliedFont.fontFamily);
if (myText.contents != converted)
myText.contents = converted;
if (myText.appliedFont.fontFamily == 'Chanakya'
|| myText.appliedFont.fontFamily == 'DevLys 010'
|| myText.appliedFont.fontFamily == 'Walkman-Chanakya-905') {
myText.appliedFont = app.fonts.item("Utsaah");
myText.composer="Adobe World-Ready Paragraph Composer";
}
}

A valid approach is to use hyperlink text sources as they stick to the genuine text object. Then you can edit those source texts even if they were actually moved elsewhere in the flow.
//Main routine
var main = function() {
//VARS
var doc = app.properties.activeDocument,
fgp = app.findGrepPreferences.properties,
cgp = app.changeGrepPreferences.properties,
fcgo = app.findChangeGrepOptions.properties,
text, str,
found = [], srcs = [], n = 0;
//Exit if no documents
if ( !doc ) return;
app.findChangeGrepOptions = app.findGrepPreferences = app.changeGrepPreferences = null;
//Settings props
app.findChangeGrepOptions.properties = {
includeHiddenLayers:true,
includeLockedLayersForFind:true,
includeLockedStoriesForFind:true,
includeMasterPages:true,
}
app.findGrepPreferences.properties = {
findWhat:"\\w",
}
//Finding text instances
found = doc.findGrep();
n = found.length;
//Looping through instances and adding hyperlink text sources
//That's all we do at this stage
while ( n-- ) {
srcs.push ( doc.hyperlinkTextSources.add(found[n] ) );
}
//Then we edit the stored hyperlinks text sources 's texts objects contents
n = srcs.length;
while ( n-- ) {
text = srcs[n].sourceText;
str = text.contents;
text.contents = str+str+str+str;
}
//Eventually we remove the added hyperlinks text sources
n = srcs.length;
while ( n-- ) srcs[n].remove();
//And reset initial properties
app.findGrepPreferences.properties = fgp;
app.changeGrepPreferences.properties = cgp;
app.findChangeGrepOptions.properties =fcgo;
}
//Running script in a easily cancelable mode
var u;
app.doScript ( "main()",u,u,UndoModes.ENTIRE_SCRIPT, "The Script" );

splitting up the contents of a single line

I just went through a problem, where input is a string which is a single word.
This line is not readable,
Like, I want to leave is written as Iwanttoleave.
The problem is of separating out each of the tokens(words, numbers, abbreviations, etc)
I have no idea where to start
The first thought that came to my mind is making a dictionary and then mapping accordingly but I think making a dictionary is not at all a good idea.
Can anyone suggest some algorithm to do it ?

First of all, create a dictionary which helps you to identify if some string is a valid word or not.
bool isValidString(String s){
if(dictionary.contains(s))
return true;
return false;
}
Now, you can write a recursive code to split the string and create an array of actually useful words.
ArrayList usefulWords = new ArrayList<String>; //global declaration
void split(String s){
int l = s.length();
int i,j;
for(i = l-1; i >= 0; i--){
if(isValidString(s.substr(i,l)){ //s.substr(i,l) will return substring starting from index `i` and ending at `l-1`
usefulWords.add(s.substr(i,l));
split(s.substr(0,i));
}
}
}
Now, use these usefulWords to generate all possible strings. Maybe something like this:
ArrayList<String> splits = new ArrayList<String>[10]; //assuming max 10 possible outputs
ArrayList<String>[] allPossibleStrings(String s, int level){
for(int i = 0; i < s.length(); i++){
if(usefulWords.contains(s.substr(0,i)){
splits[level].add(s.substr(0,i));
allPossibleStrings(s.substr(i,s.length()),level);
level++;
}
}
}
Now, this code gives you all possible splits in a somewhat arbitrary manner. eg.
dictionary = {cat, dog, i, am, pro, gram, program, programmer, grammer}
input:
string = program
output:
splits[0] = {pro, gram}
splits[1] = {program}
input:
string = iamprogram
output:
splits[0] = {i, am, pro, gram} //since `mer` is not in dictionary
splits[1] = {program}
I did not give much thought to the last part, but I think you should be able to formulate a code from there as per your requirement.
Also, since no language is tagged, I've taken the liberty of writing the code in JAVA-like syntax as it is really easy to understand.

Instead of using a Dictionary, I'd suggest you use a Trie with all your valid words (the whole English dictionary?). Then you can start moving one letter at a time in your input line and the trie at the same time. If the letter leads to more results in the trie, you can continue expanding the current word, and if not, you can start looking for a new word in the trie.
This won't be a forward only search for sure, so you'll need some sort of backtracking.
// This method Generates a list with all the matching phrases for the given input
List<string> CandidatePhrases(string input) {
Trie validWords = BuildTheTrieWithAllValidWords();
List<string> currentWords = new List<string>();
List<string> possiblePhrases = new List<string>();
// The root of the trie has an empty key that points to all the first letters of all words
Trie currentWord = validWords;
int currentLetter = -1;
// Calls a backtracking method that creates all possible phrases
FindPossiblePhrases(input, validWords, currentWords, currentWord, currentLetter, possiblePhrases);
return possiblePhrases;
}
// The Trie structure could be something like
class Trie {
char key;
bool valid;
List<Trie> children;
Trie parent;
Trie Next(char nextLetter) {
return children.FirstOrDefault(c => c.key == nextLetter);
}
string WholeWord() {
Debug.Assert(valid);
string word = "";
Trie current = this;
while (current.Key != '\0')
{
word = current.Key + word;
current = current.parent;
}
}
}
void FindPossiblePhrases(string input, Trie validWords, List<string> currentWords, Trie currentWord, int currentLetter, List<string> possiblePhrases) {
if (currentLetter == input.Length - 1) {
if (currentWord.valid) {
string phrase = ""
foreach (string word in currentWords) {
phrase += word;
phrase += " ";
}
phrase += currentWord.WholeWord();
possiblePhrases.Add(phrase);
}
}
else {
// The currentWord may be a valid word. If that's the case, the next letter could be the first of a new word, or could be the next letter of a bigger word that begins with currentWord
if (currentWord.valid) {
// Try to match phrases when the currentWord is a valid word
currentWords.Add(currentWord.WholeWord());
FindPossiblePhrases(input, validWords, currentWords, validWords, currentLetter, possiblePhrases);
currentWords.RemoveAt(currentWords.Length - 1);
}
// If either the currentWord is a valid word, or not, try to match a longer word that begins with current word
int nextLetter = currentLetter + 1;
Trie nextWord = currentWord.Next(input[nextLetter]);
// If the nextWord is null, there was no matching word that begins with currentWord and has input[nextLetter] as the following letter.
if (nextWord != null) {
FindPossiblePhrases(input, validWords, currentWords, nextWord, nextLetter, possiblePhrases);
}
}
}

Finding highest number in text file

I have a text file that contains 50 student names and scores for each student in the format.
foreName.Surname:Mark
I have figured out how to split up each line into a forename, surname and mark using this code.
string[] Lines = File.ReadAllLines(#"StudentExamMarks.txt");
int i = 0;
var items = from line in Lines
where i++ != 0
let words = line.Split(' ', '.', ':')
select new
{
foreName = words[0],
Surname = words[1],
Mark = words[2]
};
I am unsure of how i would incorporate a findMax algorithm into to find the highest mark and display the pupil with the highest mark. this as i have not used text files that often.

You can use any sorting algorithm there is a Pseudo Code available to find maximum number in any list or array..

Try this code, required just parse all files.
string[] lines = File.ReadAllLines(#"StudentExamMarks.txt");
string maxForeName = null;
string maxSurName = null;
var maxMark = 0;
for (int i = 0; i < lines.Length; i++)
{
var tmp = lines[i].Split(new char[] { ' ', '.', ':' }, StringSplitOptions.RemoveEmptyEntries);
if (tmp.Length == 3)
{
int value = int.Parse(tmp[2]);
if (i == 0 || value > maxMark)
{
maxMark = value;
maxForeName = tmp[0];
maxSurName = tmp[1];
}
}
}

Creating a unique filename from a list of alphanumeric strings

I apologize for creating a similar thread to many that are out there now, but I mainly wanted to also get some insight on some methods.
I have a list of Strings (could be just 1 or over a 1000)
Format = XXX-XXXXX-XX where each one is alphanumeric
I am trying to generate a unique string (currently 18 in length but probably could be longer ensuring not to maximize file length or path length) that I could reproduce if I have that same list. Order doesn't matter; although I may be interested if its easier to restrict the order as well.
My current Java code is follows (which failed today, hence why I am here):
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName = "";
long partNum = 0;
for (String sGPN : alInput) {
sGPN = sGPN.replaceAll("-", ""); //remove dashes
partNum += Long.parseLong(sGPN, 36); //(base 36)
}
sFileName = Long.toString(partNum);
if (sFileName.length() > 19) {
sFileName.substring(0, 18); //Max length of 19
}
return alInput;
}
So obviously just adding them did not work out so well I found out (also think I should take last 18 digits and not first 18)
Are there any good methods out there (possibly CRC related) that would work?
To assist with my key creation:
The first 3 characters are almost always numeric and would probably have many duplicate (out of 100, there may only be 10 different starting numbers)
These characters are not allowed - I,O
There will never be a character then a number in the last two alphachar subset.

I would use the system time. Here's how you might do it in Java:
public String createOutputFileName() {
long mills = System.currentTimeMillis();
long nanos = System.nanoTime();
return mills + " " + nanos;
}
If you want to add some information about the items and their part numbers, you can, of course!
======== EDIT: "What do I mean by batch object" =========
class Batch {
ArrayList<Item> itemsToProcess;
String inputFilename; // input to external process
boolean processingFinished;
public Batch(ArrayList<Item> itemsToProcess) {
this.itemsToProcess = itemsToProcess;
inputFilename = null;
processingFinished = false;
}
public void processWithExternal() {
if(inputFilename != null || processingFinished) {
throw new IllegalStateException("Cannot initiate process more than once!");
}
String base = System.currentTimeMillis() + " " + System.nanoTime();
this.inputFilename = base + "_input";
writeItemsToFile();
// however you build your process, do it here
Process p = new ProcessBuilder("myProcess","myargs", inputFilename);
p.start();
p.waitFor();
processingFinished = true;
}
private void writeItemsToFile() {
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(inputFilename)));
int flushcount = 0;
for(Item item : itemsToProcess) {
String output = item.getFileRepresentation();
out.println(output);
if(++flushcount % 10 == 0) out.flush();
}
out.flush();
out.close();
}
}

In addition to GlowCoder's response, I have thought of another "decent one" that would work.
Instead of just adding the list in base 36, I would do two separate things to the same list.
In this case, since there is no way for negative or decimal numbers, adding every number and multiplying every number separately and concatenating these base36 number strings isn't a bad way either.
In my case, I would take the last nine digits of the added number and last nine of the multiplied number. This would eliminate my previous errors and make it quite robust. It obviously is still possible for errors once overflow starts occurring, but could also work in this case. Extending the allowable string length would make it more robust as well.
Sample code:
public String createOutputFileName(ArrayList alInput, EnumFPFunction efpf, boolean pHeaders) {
/* create file name based on input list */
String sFileName1 = "";
String sFileName2 = "";
long partNum1 = 0; // Starting point for addition
long partNum2 = 1; // Starting point for multiplication
for (String sGPN : alInput) {
//remove dashes
sGPN = sGPN.replaceAll("-", "");
partNum1 += Long.parseLong(sGPN, 36); //(base 36)
partNum2 *= Long.parseLong(sGPN, 36); //(base 36)
}
// Initial strings
sFileName1 = "000000000" + Long.toString(partNum1, 36); // base 36
sFileName2 = "000000000" + Long.toString(partNum2, 36); // base 36
// Cropped strings
sFileName1 = sFileName1.substring(sFileName1.length()-9, sFileName1.length());
sFileName2 = sFileName2.substring(sFileName2.length()-9, sFileName2.length());
return sFileName1 + sFileName2;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Word Interop spellchecking a document word by word is too slow - interop

Related

For loop not looping through spreadsheet

InDesign Text Modification Script Skips Content

splitting up the contents of a single line

Finding highest number in text file

Creating a unique filename from a list of alphanumeric strings

Categories

Resources