Sejda-console split by text : set the output file name to the changed value? - sejda

I'm looking to use Sejda to burst a PDF file with payslips into individual payslip files. The split by text option perfectly splits the files per employee number (changing value on the page).
I would like to include this changing value in the output filenames, so I can identify the payslips for each employee.
Any idea how this can be achieved?

You can use the [TEXT] prefix which is part of the output prefix functionality in Sejda. It is not in the documentation yet because version 2 of Sejda is still under development and subject to change.
So if you add link -p [TEXT] that should do the trick.

Summary
The [TEXT] prefix allows the file to be named according to the text used to splitbytext.
Example
The flag -p [TEXT] successfully named the output documents for a number of my projects. The following is an example of a successful splitbytext extraction performed in Windows:
sejda-console splitbytext -f input.pdf --top 514 --left 61 --width 75 --height 22 -p [TEXT] -o .\outdirectory
FYI The text I was using to split the document was in the rectangle that began at points coordinates (x,y)=(514,61) [sic].
Thanks to Andrea Vacondio for their answer.

Related

Populate a value in a particular column in csv

I have a folder where there are 50 excel sheets in CSV format. I have to populate a particular value say "XYZ" in the column I of all the sheets in that folder.
I am new to unix and have looked for a couple of pages Here and Here . Can anyone please provide me the sample script to begin with?
For example :
Let's say column C in this case:
A B C
ASFD 2535
BDFG 64486
DFGC 336846
I want to update column C to value "XYZ".
Thanks.
I would export those files into csv format
- with semikolon as field separator
- eventually by leaving out column descriptions (otherwise see comment below)
Then the following combination of SHELL and SED script could more or less do already the trick
#! /bin/sh
for i in *.csv
do
sed -i -e "s/$/;XZY/" $i
done
-i means to edit the file in place, here you could append the value to all lines
-e specifies the regular expresssion for substitution
You might want to use a similar script like this, to rename "XYZ" to "C" only in the 1st line if the csv files should contain also column descriptions.

Wrong number format when opening ASCII with labtalk origin9.1

I have a problem reading ASCII-Files into Origin9.1. My ASCII-File looks like below: (note that I have 1 space before , 2 spaces between and 1 space after the numbers)
C:\amiX_TimeHist_P1.dat:
0,19325E-02 0,10000E+00
0,97679E-11 0,99997E-11
0,19769E+10 0,10025E+00
0,39169E+00 0,11636E+00
0,47918E+00 0,13156E+00
later I want to do the following with a scr-File but for now I write the following in Origin2015 in the Script-LabTalk-window:
open -w C:\amiX_TimeHist_P1.dat;
That command works but the numbers I get are in a wrong format:
When I read the file with the Import-wizzard or with ASCII-Import I can choose several options to fit the numbers correctly in the my columns. But this has to be done automatically.
Is there a way to read an ASCII-File uncluding setting parameters when using a script?
Instead of open you should use impASC to import ASCII data. Then you can specify some options to the command. In your case the following should work:
impASC fname:=C:\amiX_TimeHist_P1.dat options.FileStruct.DataStruct:=2 options.FileStruct.MultipleDelimiters:=" " options.FileStruct.NumericSeparator:=1;
If you just type impASC in your script window, in the following dialog box you can edit the import options and display the corresponding skript command.

Can I set command line arguments using the YAML metadata

Pandoc supports a YAML metadata block in markdown documents. This can set the title and author, etc. It can also manipulate the appearance of the PDF output by changing the font size, margin width and the frame sizes given to figures that are included. Lots of details are given here.
I'd like to use the metadata block to remember the command line arguments that I'm supposed to be using, such as --toc and --number-sections. I tried this, adding the following to the top of my markdown:
---
title: My Title
toc: yes
number-sections: yes
---
Then I used the command line:
pandoc -o guide.pdf articheck_guide.md
This did produce a table of contents, but didn't number the sections. I wondered why this was, and if there is a way I can specify this kind of thing from the document so that I don't need to add it on the command line.
YAML metadata are not passed to pandoc as arguments, but as variables. When you call pandoc on your MWE, it does not produce this :
pandoc -o guide.pdf articheck_guide.md --toc --number-sections
as we think it would. rather, it calls :
pandoc -o guide.pdf articheck_guide.md -V toc:yes -V number-sections:yes
Why, then, does you MWE produces a toc? Because the default latex template makes use of a toc variable :
~$ pandoc -D latex | grep toc
$if(toc)$
\setcounter{tocdepth}{$toc-depth$}
So setting toc to any value should produce a table of contents, at least in latex output. In this template, there is no number-sections variables, so this one doesn't work. However, there is a numbersections variable :
~$ pandoc -D latex | grep number
$if(numbersections)$
Setting numbersections to any value will produce numbering in a latex output with the default template
---
title: My Title
toc: yes
numbersections: yes
---
The trouble with this solution is that it only works with some output format. I thought I had read somewhere on the pandoc mailing-list that we soon would be able to use metadata in YAML blocks as intended (ie. as arguments rather than variables), but I can't find it anymore, so maybe it won't happen very soon.
Have a look at panzer (GitHub repository).
This was recently announced and released by Mark Sprevak -- a piece of software, that adds the notion of 'styles' to Pandoc.
It's basically a wrapper around Pandoc. It exploits the concept of YAML metadata blocks to the maximum.
The 'styles' provide a way to set all options for a Pandoc document conversion process with one line ("I want this document be an article/CV/notes/letter.").
You can regard this as more general abstraction than Pandoc templates. Styles are combinations of...
...Pandoc command line options,
...metadata settings,
...templates,
...instructions to run filters, and
...instructions to run pre/postprocessors.
These settings can be customized on a per output type as well as a per document basis. Styles can be...
...combined and
...can bear inheritance relations to each other.
panzer styles simplify Makefiles: they bundle everything concerning the look of a document in one place -- the YAML metadata (a block in the Markdown file, or a separate file).
You just add one line of metadata (style: ...) to your document, and it will be treated as a letter/article/CV/notebook or whatever.

JMeter testdata for distributed testing

Here is my JMeter setup:
testing web services
distrbiuted testing, 1 master, 20 slaves (potentially 100 if we decide to go with blazemeter)
a file containing testdata, integer per line see [1] for an example
a thread group with 20 users (20x20=400 requests)
CSV Data Set Config, with \n as separator
[1] Example of testdata file, each line represents an id that will be used as Web Service parameter:
23
8677
10029
29957
1001
My question is: how to distribute the data amoung the slaves so that each machine will use distinct part of the testfile and select test data items in a random manner? One way would be to split the test file into separat parts, but is it possible to make it more dynamic? I am thinking towards "machine x will read lines 0-20, machine y 21-40 and so on". In the answer to this question it is mentioned that CSVs are local, but it is possible to dynamically read different lines of the csv?
If you do go with BlazeMeter, they have a built in function that does exactly this. In advanced options there is a checkbox that says:
[ ] Split any CSV file to unique files and distribute among the load servers.
Have you looked at the split command?
$ split --help
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is `x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N use suffixes of length N (default 2)
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes use numeric suffixes instead of alphabetic
-l, --lines=NUMBER put NUMBER lines per output file
--verbose print a diagnostic to standard error just
before each output file is opened
--help display this help and exit
--version output version information and exit
You could do something like:
split -l 20 filename

Compile multiple files into one with title blocks

I'd like to know how to compile multiple pandoc files into one output document, where each input file has a title block.
E.g. suppose I have two files:
ch1.md:
% Chapter 1
% John Doe
% 1 Jan 2014
Here is chapter 1.
ch2.md:
% Chapter 2
% Jane Smith
% 3 Jan 2014
Here is chapter 2.
Typically with multiple input files you can compile them by providing them to pandoc:
pandoc ch1.md ch2.md --standalone -o output.html
However pandoc concatenates the input files before compiling, meaning only the first title block (from ch1.md) is styled appropriately.
I would like each title block to be styled appropriately (e.g. in html, the first line of the title block is styled with <h1 class="title">, the second <h2 class="author"> and so on).
(Note: I have also tried compiling each chapter as standalone separately, then concatenating these together using pandoc. This removes the title styling for chapters after 1, though keeps styling for the authors/date).
Why? I can:
compile each chapter in its own separate document and the author/title/date is marked up appropriately
compile the entire document together and author/title/date is marked up appropriately for each chapter (can use the --chapters option)
I could just specify the heading with '#' (h1), author with '##' (h2), and date with '###' (h3) in each chapter file directly but this means pandoc doesn't "know" what the title/heading/date of my document are, so (e.g.) if I compile to latex it won't use the \date{} or \author{} tags appropriately.
I wrote a pandoc filter that when run on each individual chapter's file, inserts the title block as headings (level 1 for title, level 2 for author, level 3 for date. This is what the HTML writer does).
This lets you run pandoc on each chapter individually (to produce the pandoc'd output plus the formatted title block), and then run pandoc on all the chapters together to compile the single document.
The filter is here on gist (I take no responsibility for malfunctioning code, etc): https://gist.github.com/mathematicalcoffee/e4f25350449e6004014f
You could modify it if you wanted it to format differently (for example like this the author/date appear in the table of contents since they are headings, which is not quite right... but that's a different problem as it happens with the default HTML writer too).
My workflow is now something like this:
FORMAT=latex # as understood by -t <format> in pandoc
FLAGS=--toc # other flags for pandoc, --smart, etc
OUT=pdf # output extension
for f in Chapter*.md; do \
pandoc $FLAGS -t $FORMAT --filter ./chapter.hs $f; \
echo ""; \
done | pandoc $FLAGS --standalone -o thesis.$OUT
where I've chmod +x chapter.hs and it's in the current directory.
(I additionally have a title.txt that I stick out the front with the entire thesis' title block (as opposed to each chapter's title block)).
I received some help from the pandoc google group which was great.
You can't do this with the % title blocks, but you can do it with the new YAML title blocks.
Start each document like this:
---
title: Chapter One
author: Me
date: June 4
...
When the documents are concatenated together, the first value set will take precedence over the others, so the subsequent YAML lines using the same parameter (e.g. "title:") will be ignored. (See the readme under "Extension: yaml_metadata_block".)

Resources