How do raw PDF data and convert it into another PDF file using a bash script? - bash

I am trying to convert raw PDF data into a PDF file to run pdftotext on.
The data from file1.pdf is the data I want. If I call the following:
cat file1.pdf > file2.pdf
pdftotext works fine on file2.pdf.
However, if I try to run the following:
VAR1=$(cat file1.pdf)
echo $VAR1 > file2.pdf
pdftotext file2.pdf -
I end up with the following errors:
Syntax Error (99): Illegal character ')'
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
Is there anyway I can use the latter structure? I need to do it this way since the bash script will accept the contents of a PDF file and not the PDF file itself.

Related

pandoc: "No such file or directory" when converting file with Unicode characters in filename

Using the pandoc tool on Windows 11, I am trying to convert to HTML a Markdown file with Unicode characters in its name with the following command:
pandoc -f markdown_phpextra -o 'Ahoj sv─¢te.html' 'Ahoj sv─¢te.md'
But pandoc complains with the following error:
[WARNING] Could not deduce format from file extension
Defaulting to html
pandoc.exe: svÄ>te.html': openBinaryFile: does not exist (No such file or directory)
Any ideas how to make pandoc understand the filenames containing Unicode characters correctly?
Executing the chcp 65001 command before running the pandoc command solved the issue. Thanks #tarleb for providing this suggestion.

Add part of filename as PDF metadata using bash script and exiftool

I have about 600 books in PDF format where the filename is in the format:
AuthorForename AuthorSurname - Title (Date).pdf
For example:
Foo Z. Bar - Writing Scripts for Idiots (2017)
Bar Foo - Fun with PDFs (2016)
The metadata is unfortunately missing for pretty much all of them so when I import them into Calibre the Author field is blank.
I'm trying to write a script that will take everything that appears before the '-', removes the trailing space, and then adds it as the author in the PDF metadata using exiftool.
So far I have the following:
for i in "*.pdf";
do exiftool -author=$(echo $i | sed 's/-.*//' | sed 's/[ \t]*$//') "$i";
done
When trying to run it, however, the following is returned:
Error: File not found - Z.
Error: File not found - Bar
Error: File not found - *.pdf
0 image files updated
3 files weren't updated due to errors
What about the -author= phrase is breaking here? Please could someone enlighten me?
You don't need to script this. In fact, doing so will be much slower than letting exiftool do it by itself as you would require exiftool to startup once for every file.
Try this
exiftool -ext pdf '-author<${filename;s/\s+-.*//}' /path/to/target/directory
Breakdown:
-ext pdf process only PDF files
-author the tag to copy to
< The copy from another tag option. In this case, the filename will be treated as a pseudo-tag
${filename;s/\s+-.*//} Copying from the filename, but first performing a regex on it. In this case, looking for 1 or more spaces, a dash, and the rest of the name and removing it.
Add -r if you want to recurse into subdirectories. Add -overwrite_original to avoid making backupfiles with _original added to the filename.
The error with your first command was that the value you wanted to assign had spaces in it and needed to be enclosed by quotes.

Running lua file: unexpected symbol near char(226)

I'm doing a tutorial on learning lua: https://www.lua.org/pil/1.html
I'm trying to open a simple file called hello.lua that I created with Textedit, located in the folder "luaProjects". The file contains the following line:
print("Hello World")
I get an error however, when I try to run the hello world script like this:
luaProjects username$ lua hello.lua
lua: hello.lua:1: unexpected symbol near char(226)
I think that lua is installed correctly:
User-MacBook-Air:~ username$ lua -v
Lua 5.2.4 Copyright (C) 1994-2015 Lua.org, PUC-Rio
And I think that I have set the folder and file up correctly:
User-MacBook-Air:luaProjects username$ tree
.
└── hello.lua
0 directories, 1 file
Q: Does anyone know how to fix this?
It could be that your double quotation marks are not ascii, but unicode left/right double quotation marks. Those start with 0xe2, exactly yours 226.
Try some simpler editor, or explicitly save file as ascii.
Avoid saving lua code files as unicode and convert your existing files via:
iconv -f utf-8 -t ascii YOURFILE

Converting from ANSI to UTF-8 using script

I have created a script (.sh file) to convert a CSV file from ANSI encoding to UTF-8.
The command I used is:
iconv -f "windows-1252" -t "UTF-8" $csvname -o $newcsvname
I got this from another Stack Overflow post.
but the iconv command doesn't seem to be working.
Snapshot of input file contents in Notepad++
Snapshot of firstcsv file below
Snapshot of second csv file below,
EDIT: I tried reducing the problematic input CSV file contents to a few lines (similar to the first file), and now it gets converted fine. Is there something wrong with the file contents itself then? How do I check that?
You can use python chardet Character Encoding Detector to ensure existing character encoding format.
iconv -f {character encoding} -t utf-8 {FileName} > {Output FileName}
This should work. Also check if any junk characters are exist in file or not, that may create error in conversion.

INI file parsing with KSH

I found this bash script here on stackoverflow that parses an ini file with bash, and it works great. But I'd like to convert this to ksh but get this message when running it with ksh...
ini_test02.ksh[24]: eval: syntax error at line 7: `end of file' unexpected
ini_test02.ksh[51]: cfg.section.DEFAULT: not found [No such file or directory]
How can you convert this into a ksh script?
If you do not have a lot of vars, just parse them one by one. When the values do not contain an equal sign the following might do:
keyx="$(grep "^keyx=" my.ini | cut -d= -f2 | sed 's/ *$//')"
Or put this in a function and call the function like
keyx="$(readini my.ini keyx)"
Since this is one of the top search results on google with ksh and INI parsing I'd like to point to https://github.com/wallyhall/shini for INI parsing in KSH.
The only thing that needs to be done is to implement the function(s) __shini_parsed and optional __shini_parsed_section.

Resources