Pass Every Line of Input as stdin for Invocation of Utility - xpath

I have a file containing valid xmls (one per line) and I want to execute a utility (xpath) on each line one by one.
I tried xargs but that seems doesn't seem to have an option to pass the line as stdin :-
% cat <xmls-file> | xargs -p -t -L1 xpath -p "//Path/to/node"
Cannot open file '//Path/to/node' at /System/Library/Perl/Extras/5.12/XML/XPath.pm line 53.
I also tried parallel --spreadstdin but that doesn't seem to work either :-
% cat <xmls-file> | parallel --spreadstdin xpath -p "//Path/to/node"
junk after document element at line 2, column 0, byte 1607

If you want every line of a file to be split off and made stdin for a utility
you could use a for loop in bash shell:
cat xmls-file | while read line
do ( echo $f > /tmp/input$$;
xpath -p "//Path/to/node" </tmp/input$$
rm -f /tmp/input$$
);
done
The $$ appends the process id number, creating a unique name
I assume xmls-file contains, on each line, what you want iterated into $f and that you want this as stdin for a command line, not as a parameter to the command.
On the other hand, your specification may be incorrect and maybe instead you need each line
to be part of a command. In that case, delete the echo and rm lines, and change the xpath command to include $f wherever the line from the file is needed.
I've not done much XML so the do command may need to be edited.

You are very close with the GNU Parallel version; only -n1 missing:
cat <xmls-file> | parallel -n1 --spreadstdin xpath -p "//Path/to/node"

Related

Writing a Bash script that takes a text file as input and pipes the text file through several commands

I keep text files with definitions in a folder. I like to convert them to spoken word so I can listen to them. I already do this manually by running a few commands to insert some pre-processing codes into the text files and then convert the text to spoken word like so:
sed 's/\..*$/[[slnc 2000]]/' input.txt inserts a control code after first period
sed 's/$/[[slnc 2000]]/' input.txt" inserts a control code at end of each line
cat input.txt | say -v Alex -o input.aiff
Instead of having to retype these each time, I would like to create a Bash script that pipes the output of these commands to the final product. I want to call the script with the script name, followed by an input file argument for the text file. I want to preserve the original text file so that if I open it again, none of the control codes are actually inserted, as the only purpose of the control codes is to insert pauses in the audio file.
I've tried writing
#!/bin/bash
FILE=$1
sed 's/$/ [[slnc 2000]]/' FILE -o FILE
But I get hung up immediately as it says sed: -o: No such file or directory. Can anyone help out?
If you just want to use foo.txt to generate foo.aiff with control characters, you can do:
#!/bin/sh
for file; do
test "${file%.txt}" = "${file}" && continue
sed -e 's/\..*$/[[slnc 2000]]/' "$file" |
sed -e 's/$/[[slnc 2000]]/' |
say -v Alex -o "${file%.txt}".aiff
done
Call the script with your .txt files as arguments (eg, ./myscript *.txt) and it will generate the .aiff files. Be warned, if say overwrites files, then this will as well. You don't really need two sed invocations, and the sed that you're calling can be cleaned up, but I don't want to distract from the core issue here, so I'm leaving that as you have it.
This will:-
a} Make a list of your text files to process in the current directory, with find.
b} Apply your sed commands to each text file in the list, but only for the current use, allowing you to preserve them intact.
c} Call "say" with the edited files.
I don't have say, so I can't test that or the control codes; but as long as you have Ed, the loop works. I've used it many times. I learned it as a result of exposure to FORTH, which is a language that still permits unterminated loops. I used to have problems with remembering to invoke next at the end of the script in order to start it, but I got over that by defining my words (functions) first, in FORTH style, and then always placing my single-use commands at the end.
#!/bin/sh
next() {
[[ -s stack ]] && main
end
}
main() {
line=$(ed -s stack < edprint+.txt)
infile=$(cat "${line}" | sed 's/\..*$/[[slnc 2000]]/' | sed 's/$/[[slnc 2000]]/')
say "${infile}" -v Alex -o input.aiff
ed -s stack < edpop+.txt
next
}
end() {
rm -v ./stack
rm -v ./edprint+.txt
rm -v ./edpop+.txt
exit 0
}
find *.txt -type -f > stack
cat >> edprint+.txt << EOF
1
q
EOF
cat >> edpop+.txt << EOF
1d
wq
EOF
next

Sending file contents to another command bash

I have a plain text file with two columns. I need to take each line which contains two columns and send them to a command.
The source file looks like this:
potato potato2
the line needs to be sent to another command so it looks like this
command potato potato2
output I can just have to std out.
Been such a long time that I've tried a simple bash script...
I assume that your file contains two columns per line, separated by either spaces or tabs.
xargs -n 2 command < file.txt
See: man xargs
Looks like you just need to read a file line by line, so the following code should do:
while read -r line
do
echo "$line" | xargs your-other-command #Use xargs to convert input into arguments
done < source-file.txt

Concatenate a string to each file line

I have a text file (delete_names.txt) with a name by line. The file have around 200 lines, like this:
ABAA742_2012-01-13_decont.fa
ABAA1502_2014-08-08_decont.fa
I want read each line of the file and concatenate the string ".faa" in the end of each line in order to get:
ABAA742_2012-01-13_decont.fa.faa
ABAA1502_2014-08-08_decont.fa.faa
I try to get the final string, but the sub string is always added in the start.
while read line; do
a='.faa'
echo -E "$line$a";
done < delete_names.txt
output from the code above:
.faa742_2012-01-13_decont.fa
.faa1502_2014-08-08_decont.fa
The final goal, after get the concatenated name is to delete the file inside the directory.
Short sed + xargs approach:
sed 's/$/.faa/' delete_names.txt | xargs -I {} rm {}
Or just with single xargs command:
xargs -a delete_names.txt -I {} rm {}.faa
-a file - read items from file instead of standard input
I frequently use gnu-parallel for those kind of jobs, it could be something like:
cat delete_names.txt | parallel rm -rf "{}a"
what it does is to pipe the each name to the parallel command, and it forks the to one or multiple process. I just added "{}a" but you can also use "{.}.faa" that removes the extension of the your string.

Search text and append to each end of line of text file - OSX

I'm new to OSX command line tools.
I am trying to find a block of text in a file and append this text at the end of all lines in another text file. At run time I don't know what this text will be, I just know it will be located within "BEGINHMM" and "ENDHMM". Also, I don't know the makeup of the destination file, except for that it will not be an empty text file.
The command which finds the block of text of interest is:
sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto
where "proto" is a text file containing the text of interest.
I've been trying to pipe the output of the above command to another 'sed' command, in the following manner:
xargs -I '{}' sed -i .bak 's/$/{}/' monophones0.txt
but I am getting some bizarre results, I see the "{}" inserted in the text for example.
I've also tried piping to:
xargs -0 sed -i .bak 's/$/&/' monophones0.txt
but I just get the printout (similar to terminal echo) of the text I am trying to grab.
Ultimately I want to loop over several 'proto' files in multiple directories and copy the text between the "BEGINHMM", "ENDHMM" block in each directory, and append the selected text to that directory's monophones.txt lines.
I am running the commands in the terminal, bash, OSX 10.12.2
Any help would be appreciated.
(1) Your sed command is of the form sed -n '/A/,/B/p'; this will include the lines on which A and B occur, even if these strings do not appear at the beginning of the line. This form may have other surprises in store for you as well (what do expect will happen if B is missing or repeated?), but the remainder of this post assumes that's what you want.
(2) It's not clear how you intend to specify the "proto" files, but you do indicate they might be in several directories, so for the remainder of this post, I'll assume they are listed, one per line, in a file named proto.txt in each directory. This will ensure that you don't run into any limitations on command-line length, but the following can easily be modified if you don't want to create such a file.
(3) Here is a script which will use the sed command you've mentioned to copy segments from each of the "proto" files specified in a directory to monophones0.txt in the directory in which the script is executed.
#!/bin/bash
OUT=monophones0.txt
cat proto.txt | while read file
do
if [ -r "$file" ] ; then
sed -n '/<BEGINHMM>/,/<ENDHMM>/p' "$file" >> $OUT
elif [ -n "$file" ] ; then
echo "NOT FOUND: $file" >&2
fi
done
Just like what you did before. tmpfile=$(mktemp); sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto >$tmpfile; sed -i .bak "r $tmpfile" monophones0.txt; rm $tmpfile. This is the basic idea; there are other checks you need to perform to make this a robust script.
– 4ae1e1

appending file contents as parameter for unix shell command

I'm looking for a unix shell command to append the contents of a file as the parameters of another shell command. For example:
command << commandArguments.txt
xargs was built specifically for this:
cat commandArguments.txt | xargs mycommand
If you have multiple lines in the file, you can use xargs -L1 -P10 to run ten copies of your command at a time, in parallel.
xargs takes its standard in and formats it as positional parameters for a shell command. It was originally meant to deal with short command line limits, but it is useful for other purposes as well.
For example, within the last minute I've used it to connect to 10 servers in parallel and check their uptimes:
echo server{1..10} | tr ' ' '\n' | xargs -n 1 -P 50 -I ^ ssh ^ uptime
Some interesting aspects of this command pipeline:
The names of the servers to connect to were taken from the incoming pipe
The tr is needed to put each name on its own line. This is because xargs expects line-delimited input
The -n option controls how many incoming lines are used per command invocation. -n 1 says make a new ssh process for each incoming line.
By default, the parameters are appended to the end of the command. With -I, one can specify a token (^) that will be replaced with the argument instead.
The -P controls how many child processes to run concurrently, greatly widening the space of interesting possibilities..
command `cat commandArguments.txt`
Using backticks will use the result of the enclosed command as a literal in the outer command

Resources