Read from STDIN and output to a file - bash

I'm having trouble with what I thought would be a very basic script, but has turned out to be more complicated than I imagined. I want to read data from STDIN and then write the data out to a file.
After much mucking about, I have a script which kindof works; it seems to work fine for text files (at least the MD5 sums match) but creates an unparseable file if you try it with a JPEG image.
# Start with a clean slate
rm file1
# http://unix.stackexchange.com/q/194388/5769
IFS=
#while read -r -N 8192 data; do
while read -r -N 40 data; do # Reduced bytesize for debugging
echo -n "$data" >> file1
done;
# Some data still remains because of how 'read' uses exit codes
echo -n "$data" >> file1
And the usage*:
$ curl -s "http://loripsum.net/api/plaintext/5/" | ./save.sh # Sucess
$ curl -s "http://lorempixel.com/400/200/food/" | ./save.sh # Failure: No error messages, but the file can't be opened with an image viewer
What's wrong with my code, and why doesn't it work for binary files?
* Yes, in this example, I could just use > to redirect the data directly to a file, but I'm eventually using this code to save POST data from an HTTP form coming in from busybox's httpd through STDIN.

If you want to accept from STDIN and output to a file, this works pretty well...
#!/usr/bin/bash
cat >file1

echo does not deal with binary data properly. See this answer for details. You may be better advised to use a scripting language like perl if you want to do anything more than simple redirection.

Related

Bash script using gzip and bcftools running out of memory with large files

This bash script is meant to be part of a pipeline that processes zipped .vcf file that contain genomes from multiple patients (which means the files are huge even when zipped, like 3-5GB).
My problem is that I keep running out of memory when running this script. It is being run in a GCP high mem VM.
I am hoping there is a way to optimize the memory usage so that this doesn't fail. I looked into it but found nothing.
#!/bin/bash
for filename in ./*.vcf.gz; do
[ -e "$filename" ] || continue
name=${filename##*/}
base=${name%.vcf.gz}
bcftools query -l "$filename" >> ${base}_list.txt
for line in `cat ${base}_list.txt`; do
bcftools view -s "$line" "$filename" -o ${line}.vcf.gz
gzip ${line}.vcf
done
done
If you run out of memory when using bcftools query/view or gzip look for options in the manual that might reduce the memory footprint. In case of gzip you might also switch to an alternative implementation. You could even consider switching the compression algorithm altogether (zstd is pretty good).
However, I have a feeling that the problem could be for line in `cat ${base}_list.txt`;. The whole file ..._list.txt is loaded into memory before the loop even starts. Also, reading lines that way has all kinds of problems, like splitting lines at whitespace, expanding globs like * and so on. Use this instead:
while read -r line; do
bcftools view -s "$line" "$filename" -o "$line.vcf.gz"
gzip "$line.vcf"
done < "${base}_list.txt"
By the way: Are you sure you want bcftools query -l "$filename" >> ${base}_list.txt to append. The file ${base}_list.txt will keep growing each time the script is executed. Consider overwriting the file using > instead of >>.
However, in that case you might not need the file at all as you could use this instead:
bcftools query -l "$filename" |
while read -r line; do
bcftools view -s "$line" "$filename" -o "$line.vcf.gz"
gzip "$line.vcf"
done
You can try to use split on each file (into a constant size) and then gzip the file splits.
https://man7.org/linux/man-pages/man1/split.1.html

How do I use `sed` to alter a variable in a bash script?

I'm trying to use enscript to print PDFs from Mutt, and hitting character encoding issues. One way around them seems to be to just use sed to replace the problem characters: sed -ir 's/[“”]/"/g' {input}
My test input file is this:
“very dirty”
we’re
I'm hoping to get "very dirty" and we're but instead I'm still getting
â\200\234very dirtyâ\200\235
weâ\200\231re
I found a nice little post on printing to PDFs from Mutt that I used as a starting point. I have a bash script that I point to from my .muttrc with set print_command="$HOME/.mutt/print.sh" -- the script currently reads about like this:
#!/bin/bash
input="$1" pdir="$HOME/Desktop" open_pdf=evince
# Straighten out curly quotes
sed -ir 's/[“”]/"/g' $input
sed -ir "s/[’]/'/g" $input
tmpfile="`mktemp $pdir/mutt_XXXXXXXX.pdf`"
enscript --font=Courier8 $input -2r --word-wrap --fancy-header=mutt -p - 2>/dev/null | ps2pdf - $tmpfile
$open_pdf $tmpfile >/dev/null 2>&1 &
sleep 1
rm $tmpfile
It does a fine job of creating a PDF (and works fine if you give it a file as an argument) but I can't figure out how to fix the curly quotes.
I've tried a bunch of variations on the sed line:
input=sed -r 's/[“”]/"/g' $input
$input=sed -ir "s/[’]/'/g" $input
Per the suggestion at Can I use sed to manipulate a variable in bash? I also tried input=$(sed -r 's/[“”]/"/g' <<< $input) and I get an error: "Syntax error: redirection unexpected"
But none manages to actually change $input -- what is the correct syntax to change $input with sed?
Note: I accepted an answer that resolved the question I asked, but as you can see from the comments there are a couple of other issues here. enscript is taking in a whole file as a variable, not just the text of the file. So trying to tweak the text inside the file is going to take a few extra steps. I'm still learning.
On Editing Variables In General
BashFAQ #21 is a comprehensive reference on performing search-and-replace operations in bash, including within variables, and is thus recommended reading. On this particular case:
Use the shell's native string manipulation instead; this is far higher performance than forking off a subshell, launching an external process inside it, and reading that external process's output. BashFAQ #100 covers this topic in detail, and is well worth reading.
Depending on your version of bash and configured locale, it might be possible to use a bracket expression (ie. [“”], as your original code did). However, the most portable thing is to treat “ and ” separately, which will work even without multi-byte character support available.
input='“hello ’cruel’ world”'
input=${input//'“'/'"'}
input=${input//'”'/'"'}
input=${input//'’'/"'"}
printf '%s\n' "$input"
...correctly outputs:
"hello 'cruel' world"
On Using sed
To provide a literal answer -- you almost had a working sed-based approach in your question.
input=$(sed -r 's/[“”]/"/g' <<<"$input")
...adds the missing syntactic double quotes around the parameter expansion of $input, ensuring that it's treated as a single token regardless of how it might be string-split or glob-expanded.
But All That May Not Help...
The below is mentioned because your test script is manipulating content passed on the command line; if that's not the case in production, you can probably disregard the below.
If your script is invoked as ./yourscript “hello * ’cruel’ * world”, then information about exactly what the user entered is lost before the script is started, and nothing you can do here will fix that.
This is because $1, in that scenario, will only contain “hello; ’cruel’ and world” are in their own argv locations, and the *s will have been replaced with lists of files in the current directory (each such file substituted as a separate argument) before the script was even started. Because the shell responsible for parsing the user's command line (which is not the same shell running your script!) did not recognize the quotes as valid at the time when it ran this parsing, by the time the script is running, there's nothing you can do to recover the original data.
Abstract: The way to use sed to change a variable is explored, but what you really need is a way to use and edit a file. It is covered ahead.
Sed
The (two) sed line(s) could be solved with this (note that -i is not used, it is not a file but a value):
input='“very dirty”
we’re'
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
But it should be faster (for small strings) to use the internals of the shell:
input='“very dirty”
we’re'
input=${input//[“”]/\"}
input=${input//[’]/\'}
printf '%s\n' "$input"
$1
But there is an underlying problem with your script, you are trying to clean an input received from the command line. You are using $1 as the source of the string. Once somebody writes:
./script “very dirty”
we’re
That input is lost. It is broken into shell's tokens and "$1" will be “very only.
But I do not believe that is what you really have.
file
However, you are also saying that the input comes from a file. If that is the case, then read it in with:
input="$(<infile)" # not $1
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
Or, if you don't mind to edit (change) the file, do this instead:
sed -i 's/[“”]/\"/g;s/’/'\''/g' infile
input="$(<infile)"
Or, if you are clear and certain that what is being given to the script is a filename, like:
./script infile
You can use:
infile="$1"
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
input="$(<"$infile")"
Other comments:
Then:
Quote your variables.
Do not use the very old `…` syntax, use $(…) instead.
Do not use variables in UPPER case, those are reserved for environment variables.
And (unless you actually meant sh) use a shebang (first line) that targets bash.
The command enscript most definitively requires a file, not a variable.
Maybe you should use evince to open the PS file, there is no need of the step to make a pdf, unless you know you really need it.
I believe that is better use a file to store the output of enscript and ps2pdf.
Do not hide the errors printed by the commands until everything is working as desired, then, just call the script as:
./script infile 2>/dev/null
Or as required to make it less verbose.
Final script.
If you call the script with the name of the file that enscript is going to use, something like:
./script infile
Then, the whole script will look like this (runs both in bash or sh):
#!/usr/bin/env bash
Usage(){ echo "$0; This script require a source file"; exit 1; }
[ $# -lt 1 ] && Usage
[ ! -e $1 ] && Usage
infile="$1"
pdir="$HOME/Desktop"
open_pdf=evince
# Straighten out curly quotes
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
tmpfile="$(mktemp "$pdir"/mutt_XXXXXXXX.pdf)"
outfile="${tmpfile%.*}.ps"
enscript --font=Courier10 "$infile" -2r \
--word-wrap --fancy-header=mutt -p "$outfile"
ps2pdf "$outfile" "$tmpfile"
"$open_pdf" "$tmpfile" >/dev/null 2>&1 &
sleep 5
rm "$tmpfile" "$outfile"

Bash read from ttyUSB0 and send to URL

I am a bash novice and I am struggling with putting it all together.
What I am trying to do is:
1) Set Port (stty)
2) Read from dev/ttyUSB0 - data should look like 000118110000101 (cat or Gawk?)
3) Set read data into a variable eg DATA and create a URL eg http://domain.com/get_data.php?data=$DATA
4) load the URL with wget?
5) Wait for more data from ttyUSB0 (polling or loop?)
I have tried the php DIO extention that does work but is not reliable because it stops/starts for some reason.
ANY suggestions would be much appreciated, I will be very great-full if anyone could advise the best way to do this
Thanks
Brent
This is what I used.
#Set permisions
sudo chmod o+rwx /dev/ttyUSB0
#!/bin/bash
# Port setting
stty -F /dev/ttyUSB0 cs7 cstopb -ixon raw speed 1200
# Loop
while [ 1 ];
do
echo 'LOADING...'
READ=`dd if=/dev/ttyUSB0 count=22 | sed 's/ //g'`
echo $READ
wget http://localhost/BASHtest/test.php?signal=$READ
echo '[PRESS Ctrl + C TO EXIT]'
done
For the first step i would advise to read to a file and then use od to get an octal (there's no binary as far as i can see) representation, because standard awk doesn't cope with NULs (i think gawk too). So after you get the bytes, you pipe it through sed script to change octals to binaries, grab the output with $() (or apostrophs) and make an URL, which you feed to wget.
The only problem i can see is blocked/nonblocked read from usb. Please report if there will be one.

ftp script in bash

I have the following script that pushes files to remote location:
#!/usr/bin/bash
HOST1='a.b.c.d'
USER1='load'
PASSWD1='load'
DATE=`date +%Y%m%d%H%M`
DATE2=`date +%Y%m%d%H`
DATE3=`date +%Y%m%d`
FTPLOGFILE=/logs/Done.$DATE2.log
D_FOLDER='/dir/load01/input'
PUTFILE='file*un'
ls $PUTFILE | while read file
do
echo "${file} transfered at $DATE" >> /logs/$DATE3.log
done
ftp -n -v $HOST1 <<SCRIPT >> ${FTPLOGFILE} 2>&1
quote USER $USER1
quote PASS $PASSWD1
cd $D_FOLDER
ascii
prompt off
mput /data/file*un
quit
SCRIPT
mv *un test/
ls test/*un | awk '{print("mv "$1" "$1)}' | sed 's/\.un/\.processed/2' |sh
rm *unl
I am getting this error output:
200 PORT command successful.
553 /data/file1.un: A file or directory in the path name does not exist.
200 PORT command successful.
Some improvements:
#!/usr/bin/bash
HOST1='a.b.c.d'
USER1='load'
PASSWD1='load'
read Y m d H M <<<$(date "+%Y %m %d %H %M") # only one call to date
DATE='$Y$m$d$H$M'
DATE2='$Y$m$d$H'
DATE3='$Y$m$d'
FTPLOGFILE=/logs/Done.$DATE2.log
D_FOLDER='/dir/load01/input'
PUTFILE='file*un'
for file in $PUTFILE # no need for ls
do
echo "${file} transfered at $DATE"
done >> /logs/$DATE3.log # output can be done all at once at the end of the loop.
ftp -n -v $HOST1 <<SCRIPT >> ${FTPLOGFILE} 2>&1
quote USER $USER1
quote PASS $PASSWD1
cd $D_FOLDER
ascii
prompt off
mput /data/file*un
quit
SCRIPT
mv *un test/
for f in test/*un # no need for ls and awk
do
mv "$f" "${f/%.un/.processed}"
done
rm *unl
I recommend using lower case or mixed case variables to reduce the chance of name collisions with shell variables.
Are all those directories really directly off the root directory?
Ftp to the the remote site and execute the ftp commands by hand. When the error occurs, look around to see what is the cause. (Use "help" if you don't know the ftp command line.)
Probably the /data directory does not exist. has anyone reorganized the upload directory recently, or maybe moved the root directory of the ftp server?
The problem with scripting an FTP session is that FTP believes it has executed itself correctly if it reports errors to stdout. Consequently, it's devilishly hard to pick up errors, since it will only return a fail on something catastrophic. If you need anything more than the most simple of command lists, you should really be using something like expect or a java or perl program that can easily test the result of each action.
That said, you can run the ftp as a coprocess, or set it up so that it runs in background with it's stdin and stdout fitted to named pipes, or some structure like that where you can read and parse the output from one command before deciding what to pass in for the next one.
A read loop that cycles on a case statement which tests for known responses and behaves accordingly is a passably acceptable all-bash version. if you always terminate every command block with something like an image command that returns a fixed and known value, you can scan for known errors, and check for the return from that command in the case statement, and when you get the "sentinal" return loop back and read the next input. This makes for a largish and fairly complicated shell script, though.
Also, you need to test that when you get (for example) a 5[0-9][0-9] *) return it isn't actually "553 bytes*" because ftp screws you that way too.
Apologies for the length of the answer without including a code example - I just wanted to mention some ideas and caveats that wouldn't fit readably in a comment.

Shell script takes a list of commands as input, tries to execute them, and fails

I am, like many non-engineers or non-mathematicians who try writing algorithms, an intuitive. My exact psychological typology makes it quite difficult for me to learn anything serious like computers or math. Generally, I prefer audio, because I can engage my imagination more effectively in the learning process.
That said, I am trying to write a shell script that will help me master Linux. To that end, I copied and pasted a list of Linux commands from the O'Reilly website's index to the book Python In a Nutshell. I doubt they'll mind, and I thank them for providing it. These are the textfile `massivelistoflinuxcommands,' not included fully below in order to save space...
OK, now comes the fun part. How do I get this script to work?
#/bin/sh
read -d 'massivelistoflinuxcommands' commands <<EOF
accept
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
EOF
for i in $commands
do
$i --help | less | cat > masterlinuxnow
text2wave masterlinuxnow -o ml.wav
done
It really helps when you include error messages or specific ways that something deviates from expected behavior.
However, your problem is here:
read -d 'massivelistoflinuxcommands' commands <<EOF
It should be:
read -d '' commands <<EOF
The delimiter to read causes it to stop at the first character it finds that matches the first character in the string, so it stops at "bzc" because the next character is "m" which matches the "m" at the beginning of "massive..."
Also, I have no idea what this is supposed to do:
$i --help | less | cat > masterlinuxnow
but it probably should be:
$i --help > masterlinuxnow
However, you should be able to pipe directly into text2wave and skip creating an intermediate file:
$i --help | text2wave -o ml.wav
Also, you may want to prevent each file from overwriting the previous one:
$i --help | text2wave -o ml-$i.wav
That will create files named like "ml-accept.wav" and "ml-bison.wav".
I would point out that if you're learning Linux commands, you should prioritize them by frequency of use and/or applicability to a beginner. For example, you probably won't be using bison right away`.
The first problem here is that not every command has a --help option!! In fact the very first command, accept, has no such option! A better approach might be executing man on each command since a manual page is more likely to exist for each of the commands. Thus change;
$i --help | less | cat > masterlinuxnow
to
man $i >> masterlinuxnow
note that it is essential you use the append output operator ">>" instead of the create output operator ">" in this loop. Using the create output operator will recreate the file "masterlinuxnow" on each iteration thus containing only the output of the last "man $i" processed.
you also need to worry about whether the command exists on your version of linux (many commands are not included in the standard distribution or may have different names). Thus you probably want something more like this where the -n in the head command should be replace by the number of lines you want, so if you want only the first 2 lines of the --help output you would replace -n with -2:
if [ $(which $i) ]
then
$i --help | head -n >> masterlinuxnow
fi
and instead of the read command, simply define the variable commands like so:
commands="
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
"
Putting this all together, the following script works quite nicely:
commands="
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
"
for i in $commands
do
if [ $(which $i) ]
then
$i --help | head -1 >> masterlinuxnow 2>/dev/null
fi
done
You're going to learn to use Linux by listening to help descriptions? I really think that's a bad idea.
Those help commands usually list every obscure option to a command, including many that you will never use-- especially as a beginner.
A guided tutorial or book would be much better. It would only present the commands and options that will be most useful. For example, that list of commands you gave has many that I don't know-- and I've been using Linux/Unix extensively for 10 years.

Resources