Stream processing lots of stuff to OVA - bash

So one of our developers needs me to batch a bunch of information and process it into an OVA to be presented back for download. This is an easy process using the long method (ie writing to the filesystem), but the developers want a cleaner, streamlined solution that will scale better. They have therefore requested that I stream the entire processes which is proving difficult. Can someone please give me some direction. Here are the steps that need to be accomplished:
Get input from webserver (Webserver will pass these as stream eventually.)
Random password
XML file
Modify boot script on file system (ie insert random password generated by server)
Create ISO of XML file and boot script
Calculate the SHA1 sum of ISO
Append SHA1 sum of ISO to manifest file in OVF directory
Create OVA from OVF directory
Here is an example directory structure (I outlined this in / just for simplicity)
/--
|
|--ISO/
| |
| |--boot.sh (Where the random password gets inserted)
| |--config.xml (This is handed from the web server. Needs to stream from server)
|
|--OVF/
|
|--disk.vmdk
|--ovf.xml
|--manifest.mf (Contains SHA1 of all files in OVF directory)
|--boot.iso (This file will exist once created from ISO directory)
Here is what I have so far (I'll explain the issues afterwards. Yes... there are a lot of issues):
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%" | mkisofs /ISO/* | echo "SHA1(boot.iso)= " && sha1sum >> manifest.mf | tar -cvf success.ova /OVF/*
NOTE
In boot.sh there is a variable set to DEFAULT like this (Just for testing purposes):
PASSWORD="DEFAULT"
NOTE
This is what a line in the manifest file should look like:
SHA1(boot.iso)= 5fbc0d70 BLAH BLAH BLAH a91c9121bb
So I've never tried to write an entire script in one stream before. Usually I write to the filesystem a lot as I go. The first issue I see with this is that sed is replacing the string, but what it's piping over to mkisofs will not be used as mkiosfs is just going to make an iso of what it finds in /ISO. I dont even know if you can pass something like that to mkisofs. Piping is sometimes weird to think about.
Next, I think mkisofs is ok because I didnt specify a file output, therefore it should output to stdout which will be passed to sha1sum, but and here is the next problem I see. I need to append some additional text to the file before the SHA1 sum gets added which kinda interrupts the stream.
Finally, the last problem I see is how to pass everything to be tar into OVA without writing to the filesystem first (writing to manifest.mf).
Oh and the last BIG problem which I should have mentioned first is the config.xml file. Right now im dealing with it as just a file. The dev guys want to pass it to this script as a stream as well. I dont have a clue how to handle that.
Any help would be greatly appreciated. These concepts are a little beyond my knowledge.
Thanks!
UPDATE 12/11/13 2:11PM EST
Testing each part individually right now. Will report findings below soon.
UPDATE 12/11/13 2:14PM EST
The following works:
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%"
and produces the following output:
RANDOM="RANDOM"
Exactly as expected.
You are correct NeronLeVelu, I will have to come back later and look at sed more carefully when real random passwords are being generated. ie. Making sure proper characters are escaped. Right now though, I'm just testing the logic. I will worry about regex and escaping later. We have not even decided on random password yet. It's only temporary and will most likely be alphanumeric.
Moving onto next part. Still not sure how to take the output from sed (stdout) and use it to include in ISO creation without actually creating a file that gets written to the file system. It may not be possible without writing to file system. More to come soon

# for the password if it contain & \ and separator used in your sed (default is /)
Password4Sed="`echo \"${PASSWORD} | sed \"s/[\\/&]/\\\\&/g\"`"
# no need of a cat with a sed
sed "s/DEFAULT/${Password4Sed}/"/ISO/boot.sh > /tmp/mkisofs.input
Treat rest from this input and put some test to validate each step like empty crc value or mkisofs.input. This will help at runtime when production error occur

Related

Sending script and file content via STDIN

I generate (dynamically) a script concatenating the following files:
testscript1
echo Writing File
cat > /tmp/test_file <<EOF
testcontent
line1
second line
testscript2
EOF
echo File is written
And I execute by calling
$ cat testscript1 testcontent testscript2 | ssh remote_host bash -s --
The effect is that the file /tmp/test_file is filled with the desired content.
Is there also a variant thinkable where binary files can be supplied in a similar fashion? Instead of cat of course dd could be used or other Tools, but the problem I see is 'telling' them that the STDIN now ended (can I send ^D through that stream?)
I am not able to get my head around that problem, but there is likely no comparable solution. However, I might be wrong, so I'd be happy to hear from you.
Regards,
Mazze
can I send ^D through that stream
Yes but you don't want to.
Control+D, commonly notated ^D, is just a character -- or to be pedantic (as I often am), a codepoint in the usual character code (ASCII or a superset like UTF-8) that we treat as a character. You can send that character/byte by a number of methods, most simply printf '\004', but the receiving system won't treat it as end-of-file; it will instead be stored in the destination file, just like any other data byte, followed by the subsequent data that you meant to be a new command and file etc.
^D only causes end-of-file when input from a terminal (more exactly, a 'tty' device) -- and then only in 'cooked' mode (which is why programs like vi and less can do things very different from ending a file when you type ^D). The form of ssh you used doesn't make the input a 'tty' device. ssh can make the input (and output) a 'tty' (more exactly a subclass of 'tty' called a pseudo-tty or 'pty', but that doesn't matter here) if you add the -t option (in some situations you may need to repeat it as -t -t or -tt). But then if your binary file contains any byte with the value \004 -- or several other special values -- which is quite possible, then your data will be corrupted and garbage commands executed (sometimes), which definitely won't do what you want and may damage your system.
The traditional approach to what you are trying to do, back in the 1980s and 1990s, was 'shar' (shell archive) and the usual solution to handling binary data was 'uuencode', which converts binary data into only printable characters that could go safely go through a link like this, matched by 'uudecode' which converts it back. See this surviving example from GNU. uuencode and uudecode themselves were part of a communication protocol 'uucp' used mostly for email and Usenet, which are (all) mostly obsolete and forgotten.
However, nearly all systems today contain a 'base64' program which provides equivalent (though not identical) functionality. Within a single system you can do:
base64 <infile | base64 -d >outfile
to get the same effect as cp infile outfile. In your case you can do something like:
{ echo "base64 -d <<END# >outfile"; base64 <infile; echo "END#"; otherstuff; } | ssh remote bash
You can also try:
cat testscript1 testcontent testscript2 | base64 | ssh <options> "base64 --decode | bash"
Don't worry about ^D, because when your input is exhausted, the next processes of the pipeline will notice that they have reached the end of the input file.

Update the tar. bz2 compressed file

We have 100 hundreds of file in trx_date.tar.bz2 compressed file which has request and response . below is file structure of trx_date.tar.bz2 : trx_date.tar: trx_date contains : log1 ,log2,log3 files which has xml request having some sensitive info and i would like to mask it to some default value. Request Request is having tag 1234567 and i want to mask it to i.e update it to log file to 3333333
I am able to grep it using the the :
Number1=bzcat $LOGDIR/$LOG_FORMAT | grep "<number>[0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9]"
how we can override the those value in the log files using shell script ?
Log file contains request and response.. Where we have tag like 123456 and also other tags as well . I want to read all the line of log file and replace that specific tag like below 333333 and save the info into same file. We have info tag with 333333 as well but I don't want to consider that.
In principle, you cannot do directly what you want (without extracting the file from your .tar.bz2 compressed archive), since a .tar.bz2 file is a bzip2-ed compression of a tar archive. So the only good solution would be to extract files from the archive, do the modification on the extracted files (e.g. with sed(1) or awk), and recreate an archive from it. Using sed on one particular textual file to replace a pattern like <number>[0-9]*</number> by <number>0000000</number> is easy. Writing a bash for loop to iterate that on several files is easy. So combine both approaches, or write a tiny shell or Python script doing that (on the extracted files).
In practice (but that is risky and I don't recommend that) you could hope that <number> digits </number> happens only in the files part of the tar archive you want to modify in place, and then you could perhaps replace (directly in the uncompressed tar archive), using e.g. sed(1), such sequences with other sequences of the same byte length (read more about the tar format: meta data such as file sizes appear in textual form, NUL bytes completed).
You might also consider using tardy, a tar post-processor (that you need to install).
I strongly recommend extracting the tar archive, operate on the extracted files, then recreate that archive again. Of course, you need enough disk space, and you have to estimate it. But tell your manager that disk space is cheap, generally cheaper than your labor costs.
PS. The command given in your question is really wrong and does not do what you dream of. Read more about redirection, pipelines, globbing, unix shells. Read carefully the documentation of Bash (notably basic shell features, shell expansion, command substitution). Read also the documentation of each command that you want to use, e.g. tar(1), grep(1), sed(1), etc....). Read the relevant man-pages(7) perhaps with the man(1) command.

Replace/sync only certain lines using Bash, SSH and rsync

I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.

redirecting email text from procmail into bash script

I am trying to redirect emails that match a particular pattern to a shell script which will create files containing the texts, with datestamped filenames.
First, here is the routine from .procmailrc that hands the emails off to the script:
:0c:
* Subject: ^Ingest_q.*
| /home/myname/procmail/process
and here is the script 'process':
#!/bin/bash
DATE=`date +%F_%N`
FILE=/home/myname/procmail/${DATE}_email.txt
while read line
do
echo "$line" 1>>"$FILE";
done
I have gotten very frustrated with this because I can pipe text to this script on the command line and it works fine:
mybox-248: echo 'foo' | process
mybox-249: ls
2013-07-31_856743000_email.txt process
The file contains the word 'foo.'
I have been trying to get an email text to get output as a date-stamped file for hours now, and nothing has worked.
(I've also turned logging on in my .procmailrc and that isn't working either -- I'm not trying to ask a second question by mentioning that, just wondering if that might provide some hint as to what I might be doing wrong ...).
Thanks,
GB
Quoting your attempt:
:0c:
* Subject: ^Ingest_q.*
| /home/myname/procmail/process
The regex is wrong, ^ only matches at beginning of line, so it cannot occur after Subject:. Try this instead.
:0c:process.lock
* ^Subject: Ingest_q
| /home/myname/procmail/process
I also specified a named lockfile; I do not believe Procmail can infer a lock file name from just a script name. As you might have multiple email messages being delivered at the same time, and you don't want their logging intermingled in the log file, using a lock file is required here.
Finally, the trailing .* in the regex is completely redundant, so I removed it.
(The olde Procmail mini-FAQ also addresses both of these issues.)
I realize your recipe is probably just a quick test before you start on something bigger, but the entire recipe invoking the process script can be completely replaced by something like
MAILDIR=/home/myname/procmail
DATE=`date +%F_%N`
:0c:
${DATE}_email.txt
This will generate Berkeley mbox format, i.e. each message should have a From_ pseudo-header before the real headers; if you are not sure whether this is already the case, you should probably use procmail -Yf- to make sure to make it so (otherwise there is really no way to tell where one message ends and another begins; this applies both to your original solution, and this replacement).
Because Procmail sees the file name you are delivering to, it can infer a lockfile name now, as a minor bonus.
Using MAILDIR to specify the directory is the conventional way to do this, but you can specify a complete path to an mbox file if you prefer, of course.

List only files that are unencrypted

First off, I am not a Unix expert by any stretch, so please forgive a little naiveity in my question.
I have a requirement to list the unencrypted files in a given directory that potentially contains both encryped and unencrypted files.
I cannot reliably identify these files by file extension alone and was hoping someone in the SO community might be able to help me out.
I can run:
file * | egrep -w 'text|XML'
but that will only identify the files that are either text or XML. I could possibly use this if I can't do much better as currently the only other files in the directry are text or XML files but I really wanted to identify all unencrypted files whatever type they may be.
Is this possible in a single line command?
EDIT: the encrypted files are encrypted via openSSL
The command I use to unencrypt the files is:
openssl -d -aes128 -in <encrypted_filename> -out <unencrypted_filename>
Your problem is not a trivial one. The solaris file command uses "magic" - /etc/magic. This is a set of rules to attempt to attempt to determine what flavor a file is. It is not perfect.
If you read the /etc/magic file, note that the last column is verbiage that is in the output of the file command when it recognizes something, some structure in a file.
Basically the file command looks at the first few bytes of a file, just like the exec() family of system calls does. So, #/bin/sh in the very first line of a file, in the first characters of the line, identifies to exec() the "command interpreter" that exec() needs to invoke to "run" the file. file gets the same idea and says "command text" "awk text" etc.
Your issues are that you have to work out what types of files you are going to see as output from file. You need to spend time delving into the non-encrypted files to see what "answers" you can expect from file. Otherwise you can run file over the whole directory tree and sort out all of what you think are correct answers.
find /path/to/files -type f -exec file {} \; | nawk -F':' '!arr[$2]++' > outputfile
This gives you a list of distinct answers about what file thinks you have. Put the ones you like in a file, call it good.txt
find /path/to/files -type f -exec file {} \; > bigfile
nawk -F':' 'FILENAME=="good.txt" {arr$1]++}
FILENAME=="bigfile" {if($2 in arr) {print $1}} ' good.txt bigfile > nonencryptedfiles.txt
THIS IS NOT 100% guaranteed. file can be fooled.
The way to identify encrypted files is by the amount of randomness, or entropy, they contain. Files that are encrypted (or at least files that are encrypted well) should look random in the statistical sense. Files that contain unencrypted information—whether text, graphics, binary data, or machine code—are not statistically random.
A standard way to calculate randomness is with an autocorrelation function. You'd probably need to autocorrelate only the first few hundred bytes of each file, so the process can be fairly quick.
It's a hack, but you might be able to take advantage of one of the properties of compression algorithms: they work by removing randomness from data. Encrypted files cannot be compressed (or again, at least not much), so you might try compressing some portion of each file and comparing the compression ratios.
SO has several other questions about finding randomness or entropy, and many of them have good suggestions, like this one:
How can I determine the statistical randomness of a binary string?
Good luck!

Resources