Pandoc convert docx to markdown with embedded images - pandoc

When converting .docx file to markdown, the embedded image is not extracted from the docx archive, yet the output contains ![](media/image1.png){width="6.291666666666667in"
height="3.1083333333333334in"}
Is there a parameter that needs to be set in order to get the embedded pictures extracted?

pandoc --extract-media ./myMediaFolder input.docx -o output.md
From the manual:
--extract-media=DIR Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. Media are downloaded, read from the file system, or extracted from a binary container (e.g. docx), as needed. The original file paths are used if they are relative paths not containing ... Otherwise filenames are constructed from the SHA1 hash of the contents.

Referring to the comment by gridtrak and the problem of an unnecessarily deep directory strucutre (e.g. media/media/image2.jpeg), use the current directory as path DIR, then a folder media is created within the current directory (e.g. media/image2.jpeg):
pandoc --extract-media=. input.docx -o output.md

Related

Batch copy metadata from one file to another (EXIFTOOL)

Im currently using tags such as exiftool -FileModifyDate(<)datetimeoriginal, etc. in terminal/cmd...
Im switching from icloud and the dates in the metadata are exif (meaning finder and windows explorer just see the date they were downloaded)..
It's working but for any sloMo videos that are M4V, they dont change.. I have the originals which do have the right dates and was wondering if there is a way to match file names (123.mp4 = 123.m4v) and copy the metadata over... But I also want to do it in batches. (since every month I will be offloading my iphone every month or so) Thanks!
It will depend upon your directory structure, but your command should be something like this:
exiftool -TagsFromFile %d%f.mp4 "-FileModifyDate<datetimeoriginal" -ext m4v DIR
This assumes the m4v files are in the same directory as the mp4 files. If not, change the %d to the directory path to the mp4 files.
Breakdown:
-TagsFromFile: Instructs exiftool that it will be copying tags from one file to another.
%d%f.mp4: This is the source file for the copy. %d is a exiftool variable for the directory of the current m4v file being processed. %f is the filename of the current m4v file being processed, not including the extension. The thing to remember is that you are processing m4v files that are in DIR and this arguments tells exiftool how to find the source mp4 file for the tag copy. A common mistake is to think that exiftool is finding the source files (mp4 in this case) to copy to the target files (m4v) when exiftool is doing the reverse.
"-FileModifyDate<datetimeoriginal": The tag copy operation you want to do. Copies the DateTimeOriginal tag in the file to the system FileModifyDate.
-ext m4v: Process only m4v files.
Replace DIR with the filenames/directory paths you want to process. Add -r to recurse into sub-directories. If this command is run under Unix/Mac, reverse any double/single quotes to avoid bash interpretation.

Extracting specific files with file extension from a .tar.xz archive using MacOS terminal

I have a number of compressed archives with the extension .tar.xz. I am advised that, when decompressed, the total size required is around 2TB.
Within the archives are a number of images that I am solely after.
Is there a method to solely extract files for example with the extensions .jpeg, .jpeg and .gif from the compressed archives without having to extract every file?
Thanks
It's trivial to just extract one of the file types; for example:
tar -xjf archive.tar.xz '*.jpeg'
will extract all files with the .jpeg extension. It's important to quote the *, as otherwise the shell would attempt to expand it, and would only try to match only the files that were found (or fail because there were no files with that name).
You can similarly use other patterns like '*.gif', or both together:
tar -xjf archive.tar.xz '*.jpeg' '*.gif'
Because you tag that you're using OSX, I'll skip the need to use the --wildcards option, which is needed when trying to extract only those files under linux.

How to read images from folders in matlab

I have six folders like this >> Images
and each folder contains some images. I know how to read images in matlab BUT my question is how I can traverse through these folders and read images in abc.m file (this file is shown in this image)
So basically you want to read images in different folders without putting all of the images into one folder and using imread()? Because you could just copy all of the images (and name them in a way that lets you know which folder they came from) into a your MATLAB working directory and then load them that way.
Use the cd command to change directories (like in *nix) and then load/read the images as you traverse through each folder. You might need absolute path names.
The easiest way is certainly a right clic on the forlder in matlab and "Add to Path" >> "Selected Folders and Subfolders"
Then you can just get images with imread without specifying the path.
if you know the path to the image containing directory, you can use dir on it to list all the files (and directories) in it. Filter the files with the image extension you want and voila, you have an array with all the images in the directory you specified:
dirname = 'images';
ext = '.jpg';
sDir= dir( fullfile(dirname ,['*' ext]) );;
sDir([sDir.isdir])=[]; % remove directories
% following is obsolete because wildcarded dir ^^
b=arrayfun(#(x) strcmpi(x.name(end-length(ext)+1:end),ext),sDir); % filter on extension
sFiles = sDir(b);
You probably want to prefix the name of each file with the directory before using:
sFileName(ii) = fullfile(dirname, sFiles(ii));
You can process this resulting files as you want. Loading all the files for example:
for ii=1:numel(sFiles)
data{i}=imread(sFiles(ii).name)
end
If you also want to recurse the subdirectories, I suggest you take a look at:
How to get all files under a specific directory in MATLAB?
or other solutions on the FEX:
http://www.mathworks.com/matlabcentral/fileexchange/8682-dirr-find-files-recursively-filtering-name-date-or-bytes
http://www.mathworks.com/matlabcentral/fileexchange/15505-recursive-dir
EDIT: added Amro's suggestion of wildcarding the dir call

dblatex ignore --texstyle or -s command

I want to write an asciidoc document and convert it into a pdf document. However, I want to use a format style different than the default ones. To do so I convert the txt file to docbook using asciidoc and then try to convert the resulting docbook xml to a pdf file using dblatex.
The idea is to set a particular tex style for dblatex to obtain the desired pdf result. I've copied the existing docbook.sty style as it is recommended here to do a small style modification. The only change done in the ./docbook file is \setlength{\textwidth}{18cm} to \setlength{\textwidth}{12cm}. However, when I run the command
dblatex --texstyle=./docbook.sty test.txt
Or the command
dblatex -s ./docbook.sty test.txt
Both produce the same result in the style change: none. I mean, no matter which modification I do to ./docbook.sty file, these modifications are not applied to the output. I obtain always the same result, a pdf with the default formatting. Do you guys have any idea where is the problem?
Thanks in advance.
I would recommend:
Copy the Dblatex docbook.sty to a new filename in your working directory which is "obviously yours" (e.g., mydbstyle.sty).
Continue to supply a full or relative path argument to the --texstyle option (e.g., /path/to/mydbstyle.sty or ./mydbstyle.sty). Failing to do so requires that mydbstyle.sty be in a directory enumerated by the TEXINPUTS environment variable (which you likely have not explicitly set).
Within mydbstyle.sty, use the following directives to initialize your style:
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{mydbstyle}[2013/02/15 DocBook Style]
\RequirePackageWithOptions{docbook}
% ...
% your LaTeX commands here
Pass a DocBook 4.5 XML file as an argument to Dblatex (in your example you are passing test.txt which makes me uncertain whether you're passing an AsciiDoc source file).
dblatex --texstyle=./mydbstyle.sty mybook.xml

Listing the contents of a LZMA compressed file?

Is it possible to list the contents of a LZMA file (.7zip) without uncompressing the whole file? Also, can I extract a single file from the LZMA file?
My problem: I have a 30GB .7z file that uncompresses to >5TB. I would like to manipulate the original .7z file without needing to do a full uncompress.
Yes. Start with XZ Utils. There are Perl and Python APIs.
You can find the file you want from the headers. Each file is compressed separately, so you can extract just the one you want.
Download lzma922.tar.bz2 from the LZMA SDK files page on Sourceforge, then extract the files and open up C/Util/7z/7zMain.c. There, you will find routines to extract a specific archive file from a .7z archive. You don't need to extract all the data from all the entries, the example code shows how to extract just the one you are interested in. This same code has logic to list the entries without extracting all the compressed data.
I solved this problem by installing 7zip (https://www.7-zip.org/) and using the parameter l. For example:
7z l file.7z
The output has some descriptive information and the list of files in the compressed files. Then, I call this inside python using the subprocess library:
import subprocess
output = subprocess.Popen(["7z","l", "file.7z"], stdout=subprocess.PIPE)
output = output.stdout.read().decode("utf-8")
Don't forget to make sure the program 7z is accessible in your PATH variable. I had to do this manually in Windows.

Resources