Applescript Duplicate Word Count - applescript

How might I create an applescript that would count duplicate words in a pdf, and then display the results in a hierarchy with the most duplicated word at the top (with its count) and the second most second, so on and so forth? I'd like to use this in school, so that after converting ppt's to pdf I can run this script to see what is most important in the presentation.
Ideally it would filter out words such as: the, so, it, etc.

The last part you are looking for is simple.
Just set up a list and check if the word is in it or not.
set ignoreList to {"to", "is"}
set reportFile to "/Users/USERNAME/Desktop/Word Frequencies.txt"
set theTextFile to "Users/USERNAME/Desktop/foo.txt")
set word_list to every word of (do shell script "cat " & quoted form of theTextFile)
set word_frequency_list to {}
repeat with the_word_ref in word_list
set the_current_word to contents of the_word_ref
if the_current_word is not in ignoreList then
set word_info to missing value
repeat with record_ref in word_frequency_list
if the_word of record_ref = the_current_word then
set word_info to contents of record_ref
exit repeat
end if
end repeat
if word_info = missing value then
set word_info to {the_word:the_current_word, the_count:1}
set end of word_frequency_list to word_info
else
set the_count of word_info to (the_count of word_info) + 1
end if
end if
end repeat
--return word_frequency_list
set the_report_list to {}
repeat with word_info in word_frequency_list
set end of the_report_list to quote & the_word of word_info & ¬
quote & " - appears " & the_count of word_info & " times."
end repeat
set AppleScript's text item delimiters to return
set the_report to the_report_list as text
do shell script "echo " & quoted form of the_report & " > " & quoted form of reportFile
set AppleScript's text item delimiters to ""
delay 1
do shell script " open " & quoted form of reportFile
I have also changed some of the code to use shell script to read/write the file. Only because I prefer using it rather than Textedit.

While it is doable in applescript, as shown by markhunte, it is VERY slow. If you are processing larger pieces of text or lots of files, applescript is extremely slow. In my tests I gave up on it. So, here is a short shell script, which you can call from applescript if you need to, that is very fast.
#!/bin/sh
[ "$1" = "" ] || [ "$2" = "" ] && echo "$0 [wordsfile] [textfile]" && exit 1
INFILE="$2"
WORDS="${2}.words"
EXWORDS="$1"
echo "File $INFILE has `cat $INFILE | wc -w ` words."
echo "Excluding the `cat $EXWORDS | wc -w` words."
echo "Extracting words from file and removing common words..."
grep -o -E '\w{3,}' $INFILE | grep -x -i -v -f $EXWORDS > $WORDS
echo "Top 10 most frequest words in $INFILE are..."
cat "$WORDS" | tr [:upper:] [:lower:] | sort | uniq -c | sort -rn | head -10
# Clean up
rm $WORDS

Related

Applescript Error: "Can’t make Can’t make some object into type some object."

I just updated to Catalina and one of my applescripts no longer works. Wondering if anyone has any ideas as to why? All of the folders selected (single or batch) are on a work server.
The result I get from the Script Editor after selecting a folder is:
-- 'ascr''err '{ '----':'utxt'("Can’t make Can’t make some object
into type some object."), 'errn':-1700,
'erob':'alis'("file:///System/Volumes/Data/data/ARLSCAN2/DIGI/ORIG/FRETSCHEL/Auto/DONE/F2509")
}
--Script Name:
set ScriptName to "jhoveValidation"
--Define user name
set userName to do shell script "whoami"
--Initalize ErrorCount
set errorCount to 0
--Initalize stampCount
set stampCount to 0
-- BEGIN SCRIPT!
--Option to change location mode
set locationModeChoice to display dialog "Would you like to check a single box folder or batch check a directory of box folders?" buttons ["Exit", "Single", "Batch"] default button 3 with title ScriptName with icon caution
set locationModeChoice to button returned of locationModeChoice
if locationModeChoice = "Exit" then
return
end if
if locationModeChoice = "Single" then
set imageDirectory to choose folder with prompt "Select Single Box Folder"
else if locationModeChoice = "Batch" then
set customFolder to choose folder with prompt "Please select a folder of box folders with standard ICT hierarchy:"
set masterFolder to customFolder
end if
-- BEGIN PREFLIGHT
--Populates list of files in finder
set progress description to "jhoveValidation"
set progress additional description to "Loading box and file information..."
set progress total steps to -1
delay 1
if locationModeChoice = "Batch" then
tell application "Finder"
set masterList to folders in masterFolder
if (count of items in masterList) is 0 then
display dialog "ERROR: No files were detected" buttons ["Quit"]
return
end if
end tell
else if locationModeChoice = "Single" then
set masterList to {}
set masterList to masterList & imageDirectory
end if
-- BEGIN IMAGE PROCESSING
set folderCount to count masterList
set folderCounter to 1
repeat with aFolder in masterList
set currentWorkingBox to getLastPathItem(aFolder)
tell application "Finder"
try
set inputFolder to (every folder in aFolder whose name begins with "TIFF") as alias
on error
display dialog "There was an error finding your TIFF folder for box:
" & currentWorkingBox & "
Please place the images you want to inspect in a completed box folder with a TIFF folder inside." buttons ["Quit"]
return
end try
end tell
set progress description to (currentWorkingBox as text) & ": jhoveValidation - Box " & folderCounter & " of " & folderCount
set progress additional description to "Loading TIFF files..."
set progress total steps to -1
tell application "Finder"
try
set filesList to files in inputFolder
set filesCount to count filesList
on error
display dialog "There was an error finding your TIFF folder for box:
" & currentWorkingBox & "
Please place the images you want to inspect in a completed box folder with a TIFF folder inside." buttons ["Quit"]
return
end try
end tell
--***BEGIN LOOP***
--Individual file processing begins here
set loggingCounter to 0
set totalSteps to count filesList
set progress total steps to totalSteps
repeat with aFile in filesList
set loggingCounter to loggingCounter + 1
log loggingCounter
set progress additional description to "Processing file " & loggingCounter & " of " & totalSteps
set progress completed steps to loggingCounter
set AppleScript's text item delimiters to "."
if (the last text item of (aFile as text) is "tif") then
tell application "Finder"
set theFile to aFile as alias
set theFilePath to POSIX path of theFile
log theFilePath
end tell
do shell script "cd ~/Desktop/jhove; ./jhove -c conf/jhove.conf -l SEVERE -o " & (theFilePath as text) & ".txt -m TIFF-hul " & theFilePath
else
end if
end repeat
set progress additional description to "Processing Results & Generating Reports..."
set progress total steps to -1
set currentWorkingFolderPath to POSIX path of (aFolder as alias)
try
do shell script "mkdir " & currentWorkingFolderPath & "VALIDATION"
end try
do shell script "mv " & currentWorkingFolderPath & "TIFF/*.txt " & currentWorkingFolderPath & "VALIDATION"
do shell script "grep -H 'Status' " & currentWorkingFolderPath & "VALIDATION/*.txt | sed 's/:/,/' > " & currentWorkingFolderPath & "VALIDATION/results.csv"
set theResults to {}
set theResults to theResults & paragraphs of (do shell script "cat " & currentWorkingFolderPath & "VALIDATION/results.csv")
set resultsCount to count theResults
set rejectList to {}
set reportStatus to "PASS"
set passCounter to 0
set failCounter to 0
repeat with aResult in theResults
set AppleScript's text item delimiters to ","
log text item 2 of aResult
set thisVar to text item 2 of aResult
if (text item 2 of aResult as text) = " Status: Well-Formed and valid" then
set passCounter to passCounter + 1
log "PASS"
else
set failCounter to failCounter + 1
set rejectList to rejectList & aResult
set reportStatus to "FAIL"
log "FAIL"
end if
end repeat
set totalCount to passCounter + failCounter
if reportStatus = "PASS" then
do shell script "echo \"All files have passed JHOVE Validation.\" > " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \" \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"" & passCounter & " files were validated.\" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \" \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "date >> " & currentWorkingFolderPath & "jhoveReport.txt"
else
do shell script "echo \"!!! ATTENTION - some files have failed JHOVE Validation!!! \" > " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"" & passCounter & " files passed.\" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"" & failCounter & " files failed.\" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \" \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"----- \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"The following have reported failed JHOVE Validation: \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
repeat with aReject in rejectList
do shell script "echo \"" & aReject & "\" >> " & currentWorkingFolderPath & "jhoveReport.txt"
end repeat
do shell script "echo \" \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "echo \"----- \" >> " & currentWorkingFolderPath & "jhoveReport.txt"
do shell script "date >> " & currentWorkingFolderPath & "jhoveReport.txt"
end if
set folderCounter to folderCounter + 1
end repeat
--***END LOOP***
-- END SCRIPT
-- Function: Returns the document name without extension (if present)
on getBaseName(fName)
set baseName to fName
repeat with idx from 1 to (length of fName)
if (item idx of fName = ".") then
set baseName to (items 1 thru (idx - 1) of fName) as string
end if
end repeat
return baseName
end getBaseName
on getLastPathItem(thePathToParse)
set thePathToParse to (thePathToParse as text)
set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ":"
set lastPathLength to the number of text items in thePathToParse
set lastPathTarget to lastPathLength - 1
set the lastPathItemItem to text item lastPathTarget of thePathToParse
set AppleScript's text item delimiters to oldDelims
return lastPathItemItem
end getLastPathItem
on getLastPathItemFile(thePathToParse)
set thePathToParse to (thePathToParse as text)
set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ":"
set the lastPathItemItem to the last text item of thePathToParse
set AppleScript's text item delimiters to oldDelims
return lastPathItemItem
end getLastPathItemFile

Applescript: Break a set of lines into blocks based on a condition

I have an Automator service for uploading images and returning URLs to a text editor.
It works very well, but there's something I want to improve:
When the workflow starts, Automator will get all images in a folder and upload them to an FTP server.
Then it returns a result like below:
("http://url/0530/pic01/pic-01.jpg",
"http://url/0530/pic01/pic-02.jpg",
"http://url/0530/pic01/pic-03.jpg",
"http://url/0530/pic01/pic-04.jpg",
"http://url/0530/pic02/pic-01.jpg",
"http://url/0530/pic02/pic-02.jpg",
"http://url/0530/pic02/pic-03.jpg",
"http://url/0530/pic02/pic-04.jpg",
"http://url/0530/pic03/pic-01.jpg",
"http://url/0530/pic03/pic-02.jpg",
"http://url/0530/pic03/pic-03.jpg",
"http://url/0530/pic03/pic-04.jpg")
And the workflow gets these strings and puts them in a new text document.
Finally, here is the result in a text editor:
http://url/0530/pic01/pic-01.jpg
http://url/0530/pic01/pic-02.jpg
http://url/0530/pic01/pic-03.jpg
http://url/0530/pic01/pic-04.jpg
http://url/0530/pic02/pic-01.jpg
http://url/0530/pic02/pic-02.jpg
http://url/0530/pic02/pic-03.jpg
http://url/0530/pic02/pic-04.jpg
http://url/0530/pic03/pic-01.jpg
http://url/0530/pic03/pic-02.jpg
http://url/0530/pic03/pic-03.jpg
http://url/0530/pic03/pic-04.jpg
But I want it to be:
http://url/0530/pic01/pic-01.jpg
http://url/0530/pic01/pic-02.jpg
http://url/0530/pic01/pic-03.jpg
http://url/0530/pic01/pic-04.jpg
http://url/0530/pic02/pic-01.jpg
http://url/0530/pic02/pic-02.jpg
http://url/0530/pic02/pic-03.jpg
http://url/0530/pic02/pic-04.jpg
http://url/0530/pic03/pic-01.jpg
http://url/0530/pic03/pic-02.jpg
http://url/0530/pic03/pic-03.jpg
http://url/0530/pic03/pic-04.jpg
I think I can use AppleScript in Automator to do this before the result is sent to the text editor.
Do you have any coding advice?
===================================================
Thanks # user309603 and # mklement0 so much!!
I choose pure applescript method.
Here is my code in automator that run applescript in workflow:
on run {input, list}
set txtG to first item of input
repeat with txtLine in items 2 thru -1 of input
if txtLine contains "-01" then set txtG to txtG & linefeed
set txtG to txtG & linefeed & txtLine
end repeat
return txtG
end run
Here is a pure applescript solution.
I would prefer mklement0's awk-solution. It is much faster if you have long URL-lists.
set txt to "http://url/0530/pic01/pic-01.jpg
http://url/0530/pic01/pic-02.jpg
http://url/0530/pic01/pic-03.jpg
http://url/0530/pic02/pic-01.jpg
http://url/0530/pic03/pic-01.jpg
http://url/0530/pic03/pic-02.jpg
http://url/0530/pic03/pic-03.jpg
http://url/0530/pic03/pic-04.jpg
http://url/0530/pic04/pic-01.jpg"
set txtG to first paragraph of txt
repeat with txtLine in paragraphs 2 thru -1 of txt
if txtLine contains "-01" then set txtG to txtG & linefeed
set txtG to txtG & linefeed & txtLine
end repeat
return txtG
The following AppleScript snippet demonstrates the approach you can use in principle - I'm unclear on the exact circumstances:
If you must process the raw result:
# Sample input text.
set txt to "(\"http://url/0530/pic01/pic-01.jpg\",
\"http://url/0530/pic01/pic-02.jpg\",
\"http://url/0530/pic01/pic-03.jpg\",
\"http://url/0530/pic01/pic-04.jpg\",
\"http://url/0530/pic02/pic-01.jpg\",
\"http://url/0530/pic02/pic-02.jpg\",
\"http://url/0530/pic02/pic-03.jpg\",
\"http://url/0530/pic02/pic-04.jpg\",
\"http://url/0530/pic03/pic-01.jpg\",
\"http://url/0530/pic03/pic-02.jpg\",
\"http://url/0530/pic03/pic-03.jpg\",
\"http://url/0530/pic03/pic-04.jpg\")"
# Group (break into blocks) by files containing "-01."
set txtGrouped to do shell script ¬
"printf %s " & quoted form of txt & " | tr -d '()\"'" & ¬
" | awk -F, 'NR>1 && /-01\\./ { print \"\" } { print $1 }'" ¬
without altering line endings
If you already have a cleaned-up set of URL-only lines:
# Sample input text.
set txt to "http://url/0530/pic01/pic-01.jpg
http://url/0530/pic01/pic-02.jpg
http://url/0530/pic01/pic-03.jpg
http://url/0530/pic01/pic-04.jpg
http://url/0530/pic02/pic-01.jpg
http://url/0530/pic02/pic-02.jpg
http://url/0530/pic02/pic-03.jpg
http://url/0530/pic02/pic-04.jpg
http://url/0530/pic03/pic-01.jpg
http://url/0530/pic03/pic-02.jpg
http://url/0530/pic03/pic-03.jpg
http://url/0530/pic03/pic-04.jpg"
# Group (break into blocks) by files containing "-01."
set txtGrouped to do shell script ¬
"printf %s " & quoted form of txt & ¬
" | awk 'NR>1 && /-01\\./ { print \"\" } { print }'" ¬
without altering line endings
Uses do shell script to have awk perform the desired grouping.
Note:
without altering line endings is required to prevent AppleScript from replacing \n chars in the output with Mac-style \r line endings.
The result will have a trailing \n char, even if the input didn't. To fix this, use:
set txtGrouped to text 1 thru ((length of txtGrouped) - 1) of txtGrouped
If the goal is to always divide the list items into paragraphs grouped by four, here is an applescript you can put in an automator action:
set l to input as list
set r to ""
repeat with n from 1 to (count l) by 4
set r to r & item n of l & return & item (n + 1) of l & return & item (n + 2) of l & return & item (n + 3) of l & return & return
end repeat
return r
Clarify if instead what you need is to group all the versions of one pic together, but it won't always be 4 each, and that the script needs to check the file names to determine.
Here is a solution : This script checks the name of the parent folder for each file, when the folder name is not the same, it add a blank line.
Append the Run Shell Script action into your workflow, select the "/bin/bash" shell and select "to stdin"
Put this script in the Run Shell Script action :
awk -F/ '{f=$(NF - 1); if (NR==1) {fName=f} else if (f != fName) {print ""; fName=f} print}' < "/dev/stdin"

Move Files X Per Folder

I've put a script together that moves a predefined number of files into folders that are created sequentially.
It seems somewhat sluggish and being new to this I'm wondering if there's a more elegant do shell script command to aid in this.
set filesPerFolder to 100
set zeroPad to 3
tell application "Finder" to set chosenFolder to (target of Finder window 1) as text
set thisDir to POSIX path of chosenFolder
set folderCount to 1
repeat
set folderCount to zero_pad(folderCount, zeroPad)
set filesToMove to (do shell script "ls -1 " & thisDir & " | wc -l") as integer
if filesToMove is 0 then
return
end if
set theNewFolder to thisDir & folderCount
set asDir to POSIX file theNewFolder
tell application "Finder"
if exists asDir then
-- do nothing
else
do shell script "mkdir -p " & theNewFolder
end if
end tell
tell application "Finder" to set firstFile to first file of folder chosenFolder as alias
set fileToMove to POSIX path of firstFile
set theMove to quoted form of fileToMove & " '" & theNewFolder & "/'"
do shell script "mv -f " & theMove
set filesInFolder to (do shell script "ls -1 " & theNewFolder & " | wc -l") as integer
if filesInFolder ≥ 10 then
set folderCount to folderCount + 1
end if
end repeat
on zero_pad(value, string_length)
set string_zeroes to ""
set digits_to_pad to string_length - (length of (value as string))
if digits_to_pad > 0 then
repeat digits_to_pad times
set string_zeroes to string_zeroes & "0" as string
end repeat
end if
set padded_value to string_zeroes & value as string
return padded_value
end zero_pad
Thanks to Lri's shell command the script is leaner and efficient.
tell application "Finder" to set thisDir to (target of Finder window 1) as string
set rootDirectory to quoted form of POSIX path of thisDir
set counTed to (do shell script "ls -1 " & rootDirectory & " | wc -l") as integer
set filesToMove to (do shell script "ls -2 " & rootDirectory & " | wc -l") as integer
if filesToMove is 0 then
display alert "There are no files in the root of this directory to move."
return
end if
set filesPerFolder to text returned of (display dialog "There are " & counTed & " files in this folder.
How many files would you like to move per folder: " default answer "100")
set fileCount to (do shell script "cd " & rootDirectory & " && i=0;for f in *;do d=$(printf %03d $((i/" & filesPerFolder & "+1)));let i++;mkdir -p $d;mv \"$f\" $d;done")
set filesLeft to (do shell script "ls -2 " & rootDirectory & " | wc -l") as integer
if filesLeft is 0 then
display alert "Completed."
return
end if
i=0;for f in *;do d=$(printf %03d $((i/100+1)));let i++;mkdir -p $d;mv "$f" $d;done
Or using GNU parallel:
ls|parallel -k -N100 x=\$\(printf %03d {#}\)\;mkdir -p \$x\;mv {} \$x
-k keeps the order of the lines and {#} is the sequence number.

Get part of string in Applescript

I try to make a applescript that read files in a folder and takes only part of the filename. The files would look like this: the.name.of.a.tv.show.s01e01
I could search for s01 but then i have to make a rule for every season that can come.
Is there some way to look for s--e-- and then take the part of the filename before that?
Try:
set xxx to "the.name.of.a.tv.show.s01e01 etc etc"
set yyy to (do shell script "echo " & xxx & " | sed 's/.s[0-9][0-9]e[0-9][0-9].*//'")
return yyy
Incorporating your previous question:
set seasonList to {}
repeat with aShow in listOfShows
set aShow to aShow as string
set end of seasonList to (do shell script "echo " & aShow & " | sed 's/.s[0-9][0-9]e[0-9][0-9].*//'")
end repeat
return seasonList

Applescript- For every item that grep finds

This code obviously won't work but I'm looking for a syntax change to make it work.
for every item that grep finds
set myCommand to do shell script "grep -w word test.log"
set mySecondCommand to do shell script "grep -w end test.log"
end
The following output should be correct (What I want):
word
end
word
end
instead I get because I do not have this theoretical "for every item that grep finds" statement (I don't want this output):
word
word
end
end
Your initial grep results will be in string format eg. one long string. In order to iterate them you will need to turn the string into a list, thus I use the "paragraphs" command. Once you have the initial grep results in list format, then you can use a repeat loop to process the items in the list. When you process the items you will need to store those results in a new list so that at the end of the script you can view the results in total. Something like this...
set firstWord to "word"
set secondWord to "end"
-- use the ls command and grep to find all the txt documents in your documents folder
set aFolder to (path to documents folder) as text
set grepResults to do shell script "ls " & quoted form of POSIX path of aFolder & " | grep \"txt\""
set grepResultsList to paragraphs of grepResults
-- search the found txt documents for the words
set totalResults to {}
repeat with aResult in grepResultsList
set thisPath to aFolder & aResult
try
set myCommand to paragraphs of (do shell script "grep -w " & firstWord & space & quoted form of POSIX path of thisPath)
set myCommandCount to count of myCommand
set end of totalResults to {thisPath, firstWord, myCommandCount, myCommand}
end try
try
set mySecondCommand to paragraphs of (do shell script "grep -w " & secondWord & space & quoted form of POSIX path of thisPath)
set mySecondCommandCount to count of mySecondCommand
set end of totalResults to {thisPath, secondWord, mySecondCommandCount, mySecondCommand}
end try
end repeat
return totalResults

Resources