I have a Python script which needs a --file xyz.json argument.
Thing is, my JSON is immense, hence it is Gzipped. Of course I could un-gzip it and then run the script, but that seems wasteful. Is there a clever way I can get this to work while doing a zcat xyz.json.gz | myscript.py --file ?????. I don't want to go into modifying myscript.py to read stdin instead of a file unless there's no way to get this done otherwise.
Thanks!
Try:
myscript.py --file <(zcat xyz.json.gz)
A file descriptor containing the pipe is returned. Provided that the script just reads the file, and does not search forward and backward, this should work.
The <( ... ) is called process substitution.
As an elaboration on what happens:
% awk 'BEGIN{print "filename:", ARGV[1]};1' <(echo first; sleep 1; echo second)
filename: /proc/self/fd/11
first
second
The second gets printed after a delay. So: Awk gets the filename /proc/self/fd/11, and starts to process it. It will immediately see the first line, and print it out. Then, after the sleep, it will see the second line, and print it as well.
You can use /dev/stdin or (equivalently) /dev/fd/0:
zcat xyz.json.gz | myscript.py --file /dev/stdin
zcat xyz.json.gz | myscript.py --file /dev/fd/0
Related
I keep text files with definitions in a folder. I like to convert them to spoken word so I can listen to them. I already do this manually by running a few commands to insert some pre-processing codes into the text files and then convert the text to spoken word like so:
sed 's/\..*$/[[slnc 2000]]/' input.txt inserts a control code after first period
sed 's/$/[[slnc 2000]]/' input.txt" inserts a control code at end of each line
cat input.txt | say -v Alex -o input.aiff
Instead of having to retype these each time, I would like to create a Bash script that pipes the output of these commands to the final product. I want to call the script with the script name, followed by an input file argument for the text file. I want to preserve the original text file so that if I open it again, none of the control codes are actually inserted, as the only purpose of the control codes is to insert pauses in the audio file.
I've tried writing
#!/bin/bash
FILE=$1
sed 's/$/ [[slnc 2000]]/' FILE -o FILE
But I get hung up immediately as it says sed: -o: No such file or directory. Can anyone help out?
If you just want to use foo.txt to generate foo.aiff with control characters, you can do:
#!/bin/sh
for file; do
test "${file%.txt}" = "${file}" && continue
sed -e 's/\..*$/[[slnc 2000]]/' "$file" |
sed -e 's/$/[[slnc 2000]]/' |
say -v Alex -o "${file%.txt}".aiff
done
Call the script with your .txt files as arguments (eg, ./myscript *.txt) and it will generate the .aiff files. Be warned, if say overwrites files, then this will as well. You don't really need two sed invocations, and the sed that you're calling can be cleaned up, but I don't want to distract from the core issue here, so I'm leaving that as you have it.
This will:-
a} Make a list of your text files to process in the current directory, with find.
b} Apply your sed commands to each text file in the list, but only for the current use, allowing you to preserve them intact.
c} Call "say" with the edited files.
I don't have say, so I can't test that or the control codes; but as long as you have Ed, the loop works. I've used it many times. I learned it as a result of exposure to FORTH, which is a language that still permits unterminated loops. I used to have problems with remembering to invoke next at the end of the script in order to start it, but I got over that by defining my words (functions) first, in FORTH style, and then always placing my single-use commands at the end.
#!/bin/sh
next() {
[[ -s stack ]] && main
end
}
main() {
line=$(ed -s stack < edprint+.txt)
infile=$(cat "${line}" | sed 's/\..*$/[[slnc 2000]]/' | sed 's/$/[[slnc 2000]]/')
say "${infile}" -v Alex -o input.aiff
ed -s stack < edpop+.txt
next
}
end() {
rm -v ./stack
rm -v ./edprint+.txt
rm -v ./edpop+.txt
exit 0
}
find *.txt -type -f > stack
cat >> edprint+.txt << EOF
1
q
EOF
cat >> edpop+.txt << EOF
1d
wq
EOF
next
I am trying to to provide a file for my shell as an input which in return should test if the file contains a specific word and decide what command to execute. I am not figuring out yet where the mistake might lie. Please find the shell script that i wrote:
#!/bin/(shell)
input_file="$1"
output_file="$2"
grep "val1" | awk -f ./path/to/script.awk $input_file > $output_file
grep "val2" | sh ./path/to/script.sh $input_file > $output_file
when I input the the file that uses awk everything get executed as expected, but for the second command I don't even get an output file. Any help is much appreciated
Cheers,
You haven't specified this in your question, but I'm guessing you have a file with the keyword, e.g. file cmdfile that contains x-g301. And then you run your script like:
./script "input_file" "output_file" < cmdfile
If so, the first grep command will consume the whole cmdfile on stdin while searching for the first pattern, and nothing will be left for the second grep. That's why the second grep, and then your second script, produces no output.
There are many ways to fix this, but choosing the right one depends on what exactly you are trying to do, and how does that cmdfile look like. Assuming that's a larger file with other things than just the command pattern, you could pass that file as a third argument to your script, like this:
./script "input_file" "output_file" "cmdfile"
And have your script handle it like this:
#!/bin/bash
input_file="$1"
output_file="$2"
cmdfile="$3"
if grep -q "X-G303" "$cmdfile"; then
awk -f ./mno/script.awk "$input_file" > t1.json
fi
if grep -q "x-g301" "$cmdfile"; then
sh ./mno/tm.sh "$input_file" > t2.json
fi
Here I'm also assuming that your awk and sh scripts don't really need the output from grep, since you're giving them the name of the input file.
Note the proper way to use grep for existence search is via its exit code (and the muted output with -q). Instead of the if we could have used shortcircuiting (grep ... && awk ...), but this way is probably more readable.
I am looking for a bash one-liner that duplicates stdin to stdout without interleaving. The only solution I have found so far is to use tee, but that does produced interleaved output. What do I mean by this:
If e.g. a file f reads
a
b
I would like to execute
cat f | HERE_BE_COMMAND
to obtain
a
b
a
b
If I use tee - as the command, the output typically looks something like
a
a
b
b
Any suggestions for a clean solution?
Clarification
The cat f command is just an example of where the input can come from. In reality, it is a command that can (should) only be executed once. I also want to refrain from using temporary files, as the processed data is sort of sensitive and temporary files are always error-prone when the executed command gets interrupted. Furthermore, I am not interested in a solution that involves additional scripts (as stated above, it should be a one-liner) or preparatory commands that need to be executed prior to the actual duplication command.
Solution 1:
<command_which_produces_output> | { a="$(</dev/stdin)"; echo "$a"; echo "$a"; }
In this way, you're saving the content from the standard input in a (choose a better name please), and then echo'ing twice.
Notice $(</dev/stdin) is a similar but more efficient way to do $(cat /dev/stdin).
Solution 2:
Use tee in the following way:
<command_which_produces_output> | tee >(echo "$(</dev/stdin)")
Here, you're firstly writing to the standard output (that's what tee does), and also writing to a FIFO file created by process substitution:
>(echo "$(</dev/stdin)")
See for example the file it creates in my system:
$ echo >(echo "$(</dev/stdin)")
/dev/fd/63
Now, the echo "$(</dev/stdin)" part is just the way I found to firstly read the entire file before printing it. It echo'es the content read from the process substitution's standard input, but once all the input is read (not like cat that prints line by line).
Store the second input in a temp file.
cat f | tee /tmp/showlater
cat /tmp/showlater
rm /tmp/showlater
Update:
As shown in the comments (#j.a.) the solution above will need to be adjusted into the OP's real needs. Calling will be easier in a function and what do you want to do with errors in your initial commands and in the tee/cat/rm ?
I recommend tee /dev/stdout.
cat f | tee /dev/stdout
One possible solution I found is the following awk command:
awk '{d[NR] = $0} END {for (i=1;i<=NR;i++) print d[i]; for (i=1;i<=NR;i++) print d[i]}'
However, I feel there must be a more "canonical" way of doing this using.
a simple bash script ?
But this will store all the stdin, why not store the output to a file a read the file both if you need ?
full=""
while read line
do
echo "$line"
full="$full$line\n"
done
printf $full
The best way would be to store the output in a file and show it later on. Using tee has the advantage of showing the output as it comes:
if tmpfile=$(mktemp); then
commands | tee "$tmpfile"
cat "$tmpfile"
rm "$tmpfile"
else
echo "Error creating temporary file" >&2
exit 1
fi
If the amount of output is limited, you can do this:
output=$(commands); echo "$output$output"
Specifically, I'm using a combination of >> and tee in a custom alias to store new Homebrew updates in a text file, as well as output on screen:
alias bu="echo `date "+%Y-%m-%d at %H:%M"` \
>> ~/Documents/Homebrew\ Updates.txt && \
brew update | tee -a ~/Documents/Homebrew\ Updates.txt"
Question: What if I wish to prepend this output in my textfile, i.e. placed at the beginning of the file as opposed to appending it to the end?
Edit1: As someone reported in the answers below, the use of temp files might be a good approach, which at least helped me partially:
targetLog="~/Documents/Homebrew\ Updates.txt"
alias bu="(brew update | cat - $targetLog \
> /tmp/out1 && mv /tmp/out1 $targetLog \
&& echo `date "+%Y-%m-%d at %H:%M":%S` | \
cat - $targetLog > /tmp/out2 \
&& mv /tmp/out2 $targetLog)"
But the problem is the output to STDOUT (previously made possible by tee), which I'm not sure can be incorporated in this tempfile approach …?
sed will happily do that for you, using -i to edit in place, eg.
sed -i -e "1i `date "+%Y-%m-%d at %H:%M"`" some_file
This works by creating an output file:
Let's say we have the initial contents on file.txt
echo "first line" > file.txt
echo "second line" >> file.txt
So, file.txt is our 'bottom' text file. Now prepend into a new 'output' file
echo "add new first line" | cat - file.txt > output.txt # <--- Just this command
Now, output has the contents the way we want. If you need your old name:
mv output.txt file.txt
cat file.txt
The only simple and safe way to modify an input file using bash tools, is to use a temp file, eg. sed -i uses a temp file behind the scenes (but to be robust sed needs more).
Some of the methods used have a subtle "can break things" trap, when, rather than running your command on the real data file, you run it on a symbolic link (to the file you intend to modify). Unless catered for correctly, this can break the link and convert it into a real file which receives the mods and leaves the original real file without the intended mods and without the symlink (no error exit-code results)
To avoid this with sed, you need to use the --follow-symlinks option.
For other methods, just be aware that it needs to follow symlinks (when you act on such a link)
Using a temp file, then rm temp file works only if "file" is not a symlink.
One safe way is to use sponge from package moreutils
Unlike a shell redirect, sponge soaks up all its input before
opening
the output file. This allows for constructing pipelines that read from
and write to the same file.
sponge is a good general way to handle this type of situation.
Here is an example, using sponge
hbu=~/'Documents/Homebrew Updates.txt'
{ date "+%Y-%m-%d at %H:%M"; cat "$hbu"; } | sponge "$hbu"
Simplest way IMO would be to use echo and cat:
echo "Prepend" | cat - inputfile > outputfile
Or for your example basically replace the tee -a ~/Documents/Homebrew\ Updates.txt with cat - ~/Documents/Homebrew\ Updates.txt > ~/Documents/Homebrew\ Updates.txt
Edit: As stated by hasturkun this won't work, try:
echo "Prepend" | cat - file | tee file
But this isn't the most efficient way of doing it any more...
Similar to the accepted answer, however if you are coming here because you want to prepend to the first line - rather than prepend an entirely new line - then use this command.
sed -i "1 s/^/string_replacement/" some_file
The -i flag will do a replacement within the file (rather than creating a new file).
Then the 1 will only do the replacement on line 1.
Finally, the s command is used which has the following syntax s/find/replacement/flags.
In our case we don't need any flags. The ^ is called a caret and it is used to represent the very start of a string.
Try this http://www.unix.com/shell-programming-scripting/42200-add-text-beginning-file.html
There is no direct operator or command AFAIK.You use echo, cat, and mv to get the effect.
{ date; brew update |tee /dev/tty; cat updates.txt; } >updates.txt.new
mv updates.txt.new updates.txt
I've no idea why you want to do this. It's pretty standard that logs like this have later entries appearing, well, later in the file.
How to run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?
I'd like to call the first command listed in a simple text file with \n as the separator in a pop-like fashion:
Figure 1:
cmdqueue.lst :
proc_C1
proc_C2
proc_C3
.
.
Figure 2:
Pop the first command via popcmd:
proc_A | proc_B | popcmd cmdqueue.lst | proc_D
Figure 3:
cmdqueue.lst :
proc_C2
proc_C3
proc_C4
.
.
Ooh, that's an amusing one-liner.
Okay, here's the deal. What you want is a program that, when called, prints the first line of the file to stdout, then delete that line from the file. Sounds like a job for sed(1).
Try
proc_A | proc_B | `(head -1 cmdstack.lst; sed -i -e '1d' cmdstack.lst)` | proc_D
I'm sure that someone who had already had their coffee could change the sed program to not need the head(1) call, but that works, and shows off using a subshell ("( foo )" runs in a sub-process.)
pop-cmd.py:
#!/usr/bin/env python
import os, shlex, sys
from subprocess import call
filename = sys.argv[1]
lines = open(filename).readlines()
if lines:
command = lines[0].rstrip()
open(filename, "w").writelines(lines[1:])
if command:
sys.exit(call(shlex.split(command) + sys.argv[2:]))
Example:
proc_A | proc_B | python pop-cmd.py cmdstack.lst | proc_D
I assume that you are constantly appending to the file also, so rewriting the file puts you in danger of overwriting data. For this type of task I think you would be better using individual files for each queue entry, using date/time to determine order, and then as you process each file you could append the data to a log file and then delete the trigger file.
Really need more information in order to suggest a good solution. It's important to know how the file is getting updated. Is it a lot of separate processes, just one process, etc.
I think you would need to rewrite the file - e.g. run a command to list all lines but the first, write that to a temporary file and rename it to the original. That could be done using tail or awk or perl depending on the commands you have available.
If you want to treat a file like a stack, then a better approach would be to have the top of the stack at the end of the file.
Thus you can easily cut off the file at the beginning of the last line (= pop), and simply append to the file as you push.
You can use a little bash script; name it "popcmd":
#!/bin/bash
cmd=`head -n 1 $1`
tail -n +2 $1 > ~tmp~
mv -f ~tmp~ $1
$cmd
edit: Using sed for the middle two lines, like Charlie Martin showed, is much more elegant, of course:
#!/bin/bash
cmd=`head -n 1 $1`
sed -i -e '1d' $1
$cmd
edit: You can use this exactly as in your example usage code:
proc_A | proc_B | popcmd cmdstack.lst | proc_D
You can't write to the beginning of a file, so cutting out line 1 would be a lot of work (rewrite the rest of the file (which isn't actually that much work for the programmer (it's what every other answer post has written for you :) ) ) ).
I'd recommend keeping the whole thing in memory and using a classic stack rather than a file.