Hashing chunks of big file

Hashing chunks of big file - bash

I have a big audio file (2 hours) foo.mp3. I want to enable a third party P to verify the date of its creation. However P shall only be able to view (hear) small certain parts of foo.mp3. On the date of the creation it is unclear which parts will be provided to P.
So my plan is to split foo.mp3 into small chunks, hash these chunks and publish the hashes somewhere on the net for later verification.
How would a bash script look like that splits foo.mp3 and creates hashes?

You don't have to split the file into chunks. Just publish the SHA256 of the whole file now and when the other party gets access to the file they will be able to see that it has the same SHA256 sum that you published earlier. (I'd recommend using at least SHA256 as the hash algorithm.)
Now if you still decide you have to do it like this for some reason, I suggest using split(1) and then running sha256sum:
mkdir split
split -b8192 foo split/
sha256sum split/*
This creates splits of the file into 8KB chunks in the split/ directory and prints the SHA256 hashes.

1. Splitting the file into several chunks
mp3splt -a -t 0.01 -o foo-#n -d outputDir foo.mp3
2. Create hashes
sha256sum outputDir/* > hashes.sha
3. Document results on the web
The outcome of a published file with sha256 hashes and/or sha256 hashes of big sha256 hashfiles could look like this:)
5b2718841d2f610ce264191760383dca309fd5a6f8c745e7466aa6c157e0b279 Files1.sha
0b568a4e802e9bc29740e8904c09f361c844dc4bd11d8855a188e81672f453ba Files2.sha
d8fec62a53228cf372ce81f4e3fd220b2df7bb5d5f8b144c8ffa4e41a4aeaad4 Camera.zip
ac0e59caa3f91f49824540fa452e285b9e47843f24dda309955a5287db4c89b5 Camerasmall.zip
e1f9b37f15eed99a8814f066021e6d370fd4e346b2779fc6494ee685b9e02467 SoundsSmall.zip
d054a2ac8adf069c2051659c68e38a7b6e26172afb2a6c377f7d067c666f9f6e Sounds.zip

Related

Cannot mount a splitted ISO after concatenation

I have a splitted ISO (7 files of 7GB) which I concatenate with the terminal as
cat /Volumes/Blah/*.iso > /Volumes/Blah/concatenated.iso
I can see that the concatenated.iso file has size 7 x 7GB, but when I use any mounting software afterwards on Mac OSX, (tried Keka & Disk Utility), the mounted disk shows only a size of 7GB & seems to only contain the first one. What am I doing wrong here?

Assuming the split parts do not have headers of their own, and can simply be concatenated:
You need to ensure they are concatenated in the correct order. The order of cat /Volumes/Blah/*.iso is essentially undefined, it's likely not ordered alphabetically. Run echo /Volumes/Blah/*.iso to see what order you actually end up.
So list all the files manually in the correct order, like cat /Volumes/Blah/foo_1.iso cat /Volumes/Blah/foo_2.iso cat /Volumes/Blah/foo_3.iso. If you have a very large number of files you can employ a sub-shell with a for loop but if this is a one-time job you're probably faster just copy/paste/modify-ing the path of each file manually.

Update the tar. bz2 compressed file

We have 100 hundreds of file in trx_date.tar.bz2 compressed file which has request and response . below is file structure of trx_date.tar.bz2 : trx_date.tar: trx_date contains : log1 ,log2,log3 files which has xml request having some sensitive info and i would like to mask it to some default value. Request Request is having tag 1234567 and i want to mask it to i.e update it to log file to 3333333
I am able to grep it using the the :
Number1=bzcat $LOGDIR/$LOG_FORMAT | grep "<number>[0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9][0-2,4-9]"
how we can override the those value in the log files using shell script ?
Log file contains request and response.. Where we have tag like 123456 and also other tags as well . I want to read all the line of log file and replace that specific tag like below 333333 and save the info into same file. We have info tag with 333333 as well but I don't want to consider that.

In principle, you cannot do directly what you want (without extracting the file from your .tar.bz2 compressed archive), since a .tar.bz2 file is a bzip2-ed compression of a tar archive. So the only good solution would be to extract files from the archive, do the modification on the extracted files (e.g. with sed(1) or awk), and recreate an archive from it. Using sed on one particular textual file to replace a pattern like <number>[0-9]*</number> by <number>0000000</number> is easy. Writing a bash for loop to iterate that on several files is easy. So combine both approaches, or write a tiny shell or Python script doing that (on the extracted files).
In practice (but that is risky and I don't recommend that) you could hope that <number> digits </number> happens only in the files part of the tar archive you want to modify in place, and then you could perhaps replace (directly in the uncompressed tar archive), using e.g. sed(1), such sequences with other sequences of the same byte length (read more about the tar format: meta data such as file sizes appear in textual form, NUL bytes completed).
You might also consider using tardy, a tar post-processor (that you need to install).
I strongly recommend extracting the tar archive, operate on the extracted files, then recreate that archive again. Of course, you need enough disk space, and you have to estimate it. But tell your manager that disk space is cheap, generally cheaper than your labor costs.
PS. The command given in your question is really wrong and does not do what you dream of. Read more about redirection, pipelines, globbing, unix shells. Read carefully the documentation of Bash (notably basic shell features, shell expansion, command substitution). Read also the documentation of each command that you want to use, e.g. tar(1), grep(1), sed(1), etc....). Read the relevant man-pages(7) perhaps with the man(1) command.

Stream processing lots of stuff to OVA

So one of our developers needs me to batch a bunch of information and process it into an OVA to be presented back for download. This is an easy process using the long method (ie writing to the filesystem), but the developers want a cleaner, streamlined solution that will scale better. They have therefore requested that I stream the entire processes which is proving difficult. Can someone please give me some direction. Here are the steps that need to be accomplished:
Get input from webserver (Webserver will pass these as stream eventually.)
Random password
XML file
Modify boot script on file system (ie insert random password generated by server)
Create ISO of XML file and boot script
Calculate the SHA1 sum of ISO
Append SHA1 sum of ISO to manifest file in OVF directory
Create OVA from OVF directory
Here is an example directory structure (I outlined this in / just for simplicity)
/--
|
|--ISO/
| |
| |--boot.sh (Where the random password gets inserted)
| |--config.xml (This is handed from the web server. Needs to stream from server)
|
|--OVF/
|
|--disk.vmdk
|--ovf.xml
|--manifest.mf (Contains SHA1 of all files in OVF directory)
|--boot.iso (This file will exist once created from ISO directory)
Here is what I have so far (I'll explain the issues afterwards. Yes... there are a lot of issues):
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%" | mkisofs /ISO/* | echo "SHA1(boot.iso)= " && sha1sum >> manifest.mf | tar -cvf success.ova /OVF/*
NOTE
In boot.sh there is a variable set to DEFAULT like this (Just for testing purposes):
PASSWORD="DEFAULT"
NOTE
This is what a line in the manifest file should look like:
SHA1(boot.iso)= 5fbc0d70 BLAH BLAH BLAH a91c9121bb
So I've never tried to write an entire script in one stream before. Usually I write to the filesystem a lot as I go. The first issue I see with this is that sed is replacing the string, but what it's piping over to mkisofs will not be used as mkiosfs is just going to make an iso of what it finds in /ISO. I dont even know if you can pass something like that to mkisofs. Piping is sometimes weird to think about.
Next, I think mkisofs is ok because I didnt specify a file output, therefore it should output to stdout which will be passed to sha1sum, but and here is the next problem I see. I need to append some additional text to the file before the SHA1 sum gets added which kinda interrupts the stream.
Finally, the last problem I see is how to pass everything to be tar into OVA without writing to the filesystem first (writing to manifest.mf).
Oh and the last BIG problem which I should have mentioned first is the config.xml file. Right now im dealing with it as just a file. The dev guys want to pass it to this script as a stream as well. I dont have a clue how to handle that.
Any help would be greatly appreciated. These concepts are a little beyond my knowledge.
Thanks!
UPDATE 12/11/13 2:11PM EST
Testing each part individually right now. Will report findings below soon.
UPDATE 12/11/13 2:14PM EST
The following works:
cat /ISO/boot.sh | sed "s%DEFAULT%RANDOM%"
and produces the following output:
RANDOM="RANDOM"
Exactly as expected.
You are correct NeronLeVelu, I will have to come back later and look at sed more carefully when real random passwords are being generated. ie. Making sure proper characters are escaped. Right now though, I'm just testing the logic. I will worry about regex and escaping later. We have not even decided on random password yet. It's only temporary and will most likely be alphanumeric.
Moving onto next part. Still not sure how to take the output from sed (stdout) and use it to include in ISO creation without actually creating a file that gets written to the file system. It may not be possible without writing to file system. More to come soon

# for the password if it contain & \ and separator used in your sed (default is /)
Password4Sed="`echo \"${PASSWORD} | sed \"s/[\\/&]/\\\\&/g\"`"
# no need of a cat with a sed
sed "s/DEFAULT/${Password4Sed}/"/ISO/boot.sh > /tmp/mkisofs.input
Treat rest from this input and put some test to validate each step like empty crc value or mkisofs.input. This will help at runtime when production error occur

GNU Privacy Guard(GPG) in Shell/bash

I was trying to do do some more shell scripting in the last few days.
where I am trying to calculate the value of a function.
To start with I get an
echo "blah"
read blub
md5=`md5sum $blub | cut -d ' ' -f 1`
echo $md5
echo "secretkey"
read $SK
For example, If get the values like that.
Is it possible at-all to do a calculation withGPG like this
GPG(VALUE1,VALUE2) in my case it is GPG(md5,sk)
Later on I would like to take a modular function over that value. Which is possible via the shell script.
If its possible , May I ask how is it possible to do in GPG. or is there any other better public generating algorithm that can do this?
Thanks!

If I understand you correctly, you are trying to obtain a digest (hash, checksum) of the MD5 + secret key pair (much like the MD5 is a digest of the blub).
In any case, GPG won't help you here. It's mostly a tool for encryption and signing with PGP. It can do symmetric encryption of files as well but AFAICS it doesn't do stand-alone digests, and certainly doesn't do them in decimal form.
The simplest solution seems to be using one of the common shell checksumming tools which actually have numeric output — for example cksum.
Those tools take only a single blob as an argument, so you probably want to simply concatenate the input
I think a working example would be:
gpg=$(echo "$md5$sk" | cksum | cut -d' ' -f1)
Then gpg will contain some decimal value, unique for each md5 + secret key pair.

List only files that are unencrypted

First off, I am not a Unix expert by any stretch, so please forgive a little naiveity in my question.
I have a requirement to list the unencrypted files in a given directory that potentially contains both encryped and unencrypted files.
I cannot reliably identify these files by file extension alone and was hoping someone in the SO community might be able to help me out.
I can run:
file * | egrep -w 'text|XML'
but that will only identify the files that are either text or XML. I could possibly use this if I can't do much better as currently the only other files in the directry are text or XML files but I really wanted to identify all unencrypted files whatever type they may be.
Is this possible in a single line command?
EDIT: the encrypted files are encrypted via openSSL
The command I use to unencrypt the files is:
openssl -d -aes128 -in <encrypted_filename> -out <unencrypted_filename>

Your problem is not a trivial one. The solaris file command uses "magic" - /etc/magic. This is a set of rules to attempt to attempt to determine what flavor a file is. It is not perfect.
If you read the /etc/magic file, note that the last column is verbiage that is in the output of the file command when it recognizes something, some structure in a file.
Basically the file command looks at the first few bytes of a file, just like the exec() family of system calls does. So, #/bin/sh in the very first line of a file, in the first characters of the line, identifies to exec() the "command interpreter" that exec() needs to invoke to "run" the file. file gets the same idea and says "command text" "awk text" etc.
Your issues are that you have to work out what types of files you are going to see as output from file. You need to spend time delving into the non-encrypted files to see what "answers" you can expect from file. Otherwise you can run file over the whole directory tree and sort out all of what you think are correct answers.
find /path/to/files -type f -exec file {} \; | nawk -F':' '!arr[$2]++' > outputfile
This gives you a list of distinct answers about what file thinks you have. Put the ones you like in a file, call it good.txt
find /path/to/files -type f -exec file {} \; > bigfile
nawk -F':' 'FILENAME=="good.txt" {arr$1]++}
FILENAME=="bigfile" {if($2 in arr) {print $1}} ' good.txt bigfile > nonencryptedfiles.txt
THIS IS NOT 100% guaranteed. file can be fooled.

The way to identify encrypted files is by the amount of randomness, or entropy, they contain. Files that are encrypted (or at least files that are encrypted well) should look random in the statistical sense. Files that contain unencrypted information—whether text, graphics, binary data, or machine code—are not statistically random.
A standard way to calculate randomness is with an autocorrelation function. You'd probably need to autocorrelate only the first few hundred bytes of each file, so the process can be fairly quick.
It's a hack, but you might be able to take advantage of one of the properties of compression algorithms: they work by removing randomness from data. Encrypted files cannot be compressed (or again, at least not much), so you might try compressing some portion of each file and comparing the compression ratios.
SO has several other questions about finding randomness or entropy, and many of them have good suggestions, like this one:
How can I determine the statistical randomness of a binary string?
Good luck!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio