Sending script and file content via STDIN - bash

I generate (dynamically) a script concatenating the following files:
testscript1
echo Writing File
cat > /tmp/test_file <<EOF
testcontent
line1
second line
testscript2
EOF
echo File is written
And I execute by calling
$ cat testscript1 testcontent testscript2 | ssh remote_host bash -s --
The effect is that the file /tmp/test_file is filled with the desired content.
Is there also a variant thinkable where binary files can be supplied in a similar fashion? Instead of cat of course dd could be used or other Tools, but the problem I see is 'telling' them that the STDIN now ended (can I send ^D through that stream?)
I am not able to get my head around that problem, but there is likely no comparable solution. However, I might be wrong, so I'd be happy to hear from you.
Regards,
Mazze

can I send ^D through that stream
Yes but you don't want to.
Control+D, commonly notated ^D, is just a character -- or to be pedantic (as I often am), a codepoint in the usual character code (ASCII or a superset like UTF-8) that we treat as a character. You can send that character/byte by a number of methods, most simply printf '\004', but the receiving system won't treat it as end-of-file; it will instead be stored in the destination file, just like any other data byte, followed by the subsequent data that you meant to be a new command and file etc.
^D only causes end-of-file when input from a terminal (more exactly, a 'tty' device) -- and then only in 'cooked' mode (which is why programs like vi and less can do things very different from ending a file when you type ^D). The form of ssh you used doesn't make the input a 'tty' device. ssh can make the input (and output) a 'tty' (more exactly a subclass of 'tty' called a pseudo-tty or 'pty', but that doesn't matter here) if you add the -t option (in some situations you may need to repeat it as -t -t or -tt). But then if your binary file contains any byte with the value \004 -- or several other special values -- which is quite possible, then your data will be corrupted and garbage commands executed (sometimes), which definitely won't do what you want and may damage your system.
The traditional approach to what you are trying to do, back in the 1980s and 1990s, was 'shar' (shell archive) and the usual solution to handling binary data was 'uuencode', which converts binary data into only printable characters that could go safely go through a link like this, matched by 'uudecode' which converts it back. See this surviving example from GNU. uuencode and uudecode themselves were part of a communication protocol 'uucp' used mostly for email and Usenet, which are (all) mostly obsolete and forgotten.
However, nearly all systems today contain a 'base64' program which provides equivalent (though not identical) functionality. Within a single system you can do:
base64 <infile | base64 -d >outfile
to get the same effect as cp infile outfile. In your case you can do something like:
{ echo "base64 -d <<END# >outfile"; base64 <infile; echo "END#"; otherstuff; } | ssh remote bash

You can also try:
cat testscript1 testcontent testscript2 | base64 | ssh <options> "base64 --decode | bash"
Don't worry about ^D, because when your input is exhausted, the next processes of the pipeline will notice that they have reached the end of the input file.

Related

Why does bash use/need so many input redirect symbols?

I am curious as to the nature and purpose of using multiple "<" characters to satisfy certain bash redirections. When is each of the <, <<, <<<, syntax correct/preferred? And under what conditions? Shouldn't a single "<" be sufficient for a properly written command, function, or subroutine? In unix 'everything' is a file. So why mask this with process-substitution? Isn't that already just a mask for the natural (grouping) capability of any shell? Or in some cases just a matter of proper order of execution?
Efficiency and performance always have trade-offs, as do ease of read/write ability or ease of usability. I'm an old dog trying to learn new tricks. 10 lines of code I understand, to perform the same task as one line of code that I do not understand, is worth the trade-off to me. In my years of scripting, I have had very few situations require writing to non-volatile storage, unless it was intended to be left there ""permanently.
I have not seen such reference for output. A single ">" will create/overwrite a file. A double ">>" will create/append a file. Is there a ">>>" for output too? This is a redundant question. I am only interested in the input redirect.
In simple words, they all have different meanings.
< Redirection of input
<< Here Document
<<< Here String (variant of here document)
Examples
< Redirection of input
grep foo < a-file.txt
This redirects the contents of a-file.txt to grep's standard input. grep searches for occurrences of string 'foo' in file a-file.txt.
<< Here Document
grep foo <<EOF
foo
foobar
baz
bar
EOF
Notice the EOF right after << and in the last line. From man bash:
This type of redirection instructs the shell to read input from the current source until a line containing only delimiter (with no trailing blanks) is seen.
So effectively, grep gets the string enclosed by the two EOFs as input.
<<< Here String (variant of here document)
grep foo <<<"foobar"
You could see this as a "single line" here document (<<). grep gets the string "foobar" as input.
Shouldn't a single "<" be sufficient for a properly written command, function, or subroutine?
So, which variant is the correct one to use depends on your use case and is indepent from the command you're using, as your shell (most likely bash) will take care of them.
I recommend section 3.6 Redirection of bash's manual for further reading. The sections concerning <, << and <<< are 3.6.1, 3.6.6, 3.6.7: https://www.gnu.org/software/bash/manual/bash.html#Redirections

Bash: Trying to append to a variable name in the output of a function

this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk

script to find similar email users

We have a mail server and I am trying to write a script that will find all users with similar names to avoid malicious users from impersonating legitimate users. For example, a legit user may have the name of james2014#domain.com but a malicious user may register as james20l4#domain.com. The difference, if you notice carefully, is that I replaced the number 'one' with the letter 'l' (el). So I am trying to write something that can consult my /var/vmail/domain/* and find similar names and alert me (the administrator). I will then take the necessary steps to do what I need. Really appreciate any help.
One hacky way to do this is to derive "normalized" versions of your usernames, put those in an associative array as keys mapping to the original input, and use those to find problems.
The example I posted below uses bash associative arrays to store the mapping from normalized name to original name, and tr to switch some characters for other characters (and delete other characters entirely).
I'm assuming that your list of users will fit into memory; you'll also need to tweak the mapping of modified and removed characters to hit your favorite balance between effectiveness and false positives. If your list can't fit in memory, you can use a single file or the filesystem to approximate it, but honestly if you're processing that many names you're probably better off with a non-shell programming language.
Input:
doc
dopey
james2014
happy
bashful
grumpy
james20l4
sleepy
james.2014
sneezy
Script:
#!/bin/bash
# stdin: A list of usernames. stdout: Pairs of names that match.
CHARS_TO_REMOVE="._\\- "
CHARS_TO_MAP_FROM="OISZql"
CHARS_TO_MAP_TO="0152g1"
normalize() {
# stdin: A word. stdout: A modified version of the same word.
exec tr "$CHARS_TO_MAP_FROM" "$CHARS_TO_MAP_TO" \
| tr --delete "$CHARS_TO_REMOVE" \
| tr "A-Z" "a-z"
}
declare -A NORMALIZED_NAMES
while read NAME; do
NORMALIZED_NAME=$(normalize <<< "$NAME")
# -n tests for non-empty strings, as it would be if the name were set already.
if [[ -n ${NORMALIZED_NAMES[$NORMALIZED_NAME]} ]]; then
# This name has been seen before! Print both of them.
echo "${NORMALIZED_NAMES[$NORMALIZED_NAME]} $NAME"
else
# This name has not been seen before. Store it.
NORMALIZED_NAMES["$NORMALIZED_NAME"]="$NAME"
fi
done
Output:
james2014 james20l4
james2014 james.2014

Stream processing lots of stuff to OVA

So one of our developers needs me to batch a bunch of information and process it into an OVA to be presented back for download. This is an easy process using the long method (ie writing to the filesystem), but the developers want a cleaner, streamlined solution that will scale better. They have therefore requested that I stream the entire processes which is proving difficult. Can someone please give me some direction. Here are the steps that need to be accomplished:
Get input from webserver (Webserver will pass these as stream eventually.)
Random password
XML file
Modify boot script on file system (ie insert random password generated by server)
Create ISO of XML file and boot script
Calculate the SHA1 sum of ISO
Append SHA1 sum of ISO to manifest file in OVF directory
Create OVA from OVF directory
Here is an example directory structure (I outlined this in / just for simplicity)
/--
|
|--ISO/
| |
| |--boot.sh (Where the random password gets inserted)
| |--config.xml (This is handed from the web server. Needs to stream from server)
|
|--OVF/
|
|--disk.vmdk
|--ovf.xml
|--manifest.mf (Contains SHA1 of all files in OVF directory)
|--boot.iso (This file will exist once created from ISO directory)
Here is what I have so far (I'll explain the issues afterwards. Yes... there are a lot of issues):
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%" | mkisofs /ISO/* | echo "SHA1(boot.iso)= " && sha1sum >> manifest.mf | tar -cvf success.ova /OVF/*
NOTE
In boot.sh there is a variable set to DEFAULT like this (Just for testing purposes):
PASSWORD="DEFAULT"
NOTE
This is what a line in the manifest file should look like:
SHA1(boot.iso)= 5fbc0d70 BLAH BLAH BLAH a91c9121bb
So I've never tried to write an entire script in one stream before. Usually I write to the filesystem a lot as I go. The first issue I see with this is that sed is replacing the string, but what it's piping over to mkisofs will not be used as mkiosfs is just going to make an iso of what it finds in /ISO. I dont even know if you can pass something like that to mkisofs. Piping is sometimes weird to think about.
Next, I think mkisofs is ok because I didnt specify a file output, therefore it should output to stdout which will be passed to sha1sum, but and here is the next problem I see. I need to append some additional text to the file before the SHA1 sum gets added which kinda interrupts the stream.
Finally, the last problem I see is how to pass everything to be tar into OVA without writing to the filesystem first (writing to manifest.mf).
Oh and the last BIG problem which I should have mentioned first is the config.xml file. Right now im dealing with it as just a file. The dev guys want to pass it to this script as a stream as well. I dont have a clue how to handle that.
Any help would be greatly appreciated. These concepts are a little beyond my knowledge.
Thanks!
UPDATE 12/11/13 2:11PM EST
Testing each part individually right now. Will report findings below soon.
UPDATE 12/11/13 2:14PM EST
The following works:
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%"
and produces the following output:
RANDOM="RANDOM"
Exactly as expected.
You are correct NeronLeVelu, I will have to come back later and look at sed more carefully when real random passwords are being generated. ie. Making sure proper characters are escaped. Right now though, I'm just testing the logic. I will worry about regex and escaping later. We have not even decided on random password yet. It's only temporary and will most likely be alphanumeric.
Moving onto next part. Still not sure how to take the output from sed (stdout) and use it to include in ISO creation without actually creating a file that gets written to the file system. It may not be possible without writing to file system. More to come soon
# for the password if it contain & \ and separator used in your sed (default is /)
Password4Sed="`echo \"${PASSWORD} | sed \"s/[\\/&]/\\\\&/g\"`"
# no need of a cat with a sed
sed "s/DEFAULT/${Password4Sed}/"/ISO/boot.sh > /tmp/mkisofs.input
Treat rest from this input and put some test to validate each step like empty crc value or mkisofs.input. This will help at runtime when production error occur

Arduino returning more responses than queries have been sent

I have a problem when using Arduino to post data to Pachube. The Arduino is configured to return JSON data for the temperature when you send a 't' and return JSON data for the light level when you send an 'l'. This works perfectly through the Arduino Serial Monitor. I then created two bash scripts. One regularly sends the 't' and 'l' commands to Arduino and waits 10 seconds in between each request.
while true; do
echo -n t > /dev/ttyACM0
echo "$(date): Queried Arduino for temperature."
sleep 10
echo -n l > /dev/ttyACM0
echo "$(date): Queried Arduino for light."
sleep 10
done
This works fine. I get an echo message every 10 seconds. The other script reads the generated JSON from serial port (I basically copied it from some Web page).
ARDUINO_PORT=/dev/ttyACM0
ARDUINO_SPEED=9600
API_KEY='MY_PACHUBE_KEY'
FEED_ID='MY_FEED_ID'
# Set speed for usb
stty -F $ARDUINO_PORT ispeed $ARDUINO_SPEED ospeed $ARDUINO_SPEED raw
exec 6<$ARDUINO_PORT
# Read data from Arduino
while read -u 6 f ;do
# Remove trailing carriage return character added
# by println to satisfy stupid MS-DOS Computers
f=${f:0:${#f} - 1}
curl --request PUT --header "X-PachubeApiKey: $API_KEY" --data-binary "{ \"version\":\"1.0.0\", \"datastreams\":[ $f ] }" "http://api.pachube.com/v2/feeds/MY_FEED_ID"
echo "$(date) $f was read."
done
Unfortunately, this script goes crazy with echo messages telling me several times per 10 seconds that it posted data to Pachube although it should only do it every 10 seconds (whenever the first script told Arduino to create a JSON message). I thought it might be an issue with buffered messages on the Arduino but even when switching it off and on again the problem remains. Any thoughts? Thanks in advance.
I am completely unfamiliar with Arduino and a handful of other things you're doing here but here are a few general things I see:
Bash is almost entirely incapable of handling binary data reliably. There is no way to store a NUL byte in a Bash string. Looks like you're trying to pull some trickery to make arbitrary data readable - hopefully you're sending nothing but character data into read, otherwise this isn't likely going to work.
read reads newline-delimited input (or the given value of -d if your bash is new enough). I don't know the format the while loop is reading, but it has to be a newline delimited string of characters.
Use read -r unless you want escape sequences interpreted. (You almost always want -r with read.)
Unconditionally stripping a character off the end of each string isn't the greatest. I'd use: f=${f%+($'\r')}, which removes 1 or more adjacent \r's from the end of f. Remember to shopt -s extglob at the top of your script if this isn't the default.
This shouldn't be actually causing an issue, but I prefer not using exec unless it's really required - which it isn't here. Just put done <$ARDUINO_PORT to terminate the while loop and remove the -u 6 argument from read (unless something inside the loop is specifically reading from stdin and can't conflict, which doesn't appear to be the case). The open FD will automatically close when exiting the loop.
Don't create your own all-caps variable names in scripts because they are reserved and can conflict with variables from the environment. Use at least one lower-case letter. This of course doesn't apply if those variables are set by something in your system and you're only using or modifying them.

Resources