Convert substring through command - bash

Basically, how do I make a string substitution in which the substituted string is transformed by an external command?
For example, given the line 5aaecdab287c90c50da70455de03fd1e ./2015/01/26/GOPR0083.MP4, how to pipe the second part of the line (./2015/01/26/GOPR0083.MP4) to command xargs stat -c %.6Y and then replace it with the result so that we end up with 5aaecdab287c90c50da70455de03fd1e 1422296624.010000?
This can be done with a script, however a one-liner would be nice.

#!/bin/bash
hashtime()
{
while read longhex fname; do
echo "$longhex $(stat -c %.6Y "$fname")"
done
}
if [ $# -ne 1 ]; then
echo Usage: ${0##*/} infile 1>&2
exit 1
fi
hashtime < $1
exit 0
# one liner
awk 'BEGIN { args="stat -c %.6Y " } { printf "%s ", $1; cmd=args $2; system(cmd); }' infile

A one-liner using GNU sed, which will process the whole file:
sed -E "s/([[:xdigit:]]+) +(.*)/stat -c '\1 %.6Y' '\2'/e" file
or, using plain bash
while read -r hash pathname; do stat -c "$hash %.6Y" "$pathname"; done < file

It's typical to use awk sed cut to reformat input. For example:
line="5aaecdab287c90c50da70455de03fd1e ./2015/01/26/GOPR0083.MP4"
echo "$line" |
cut -d' ' -f2- |
xargs stat -c %.6Y

Related

Bash - Counter for multiple parameters in file

I created a command, which works, but not exactly as I want. So I would like to upgrade this command to right output.
My command:
awk '{print $1}' ios-example.com.access | sort | uniq -c | sort -nr
Output of my command:
8 192.27.69.191
2 82.202.69.253
Input file:
https://pajda.fit.vutbr.cz/ios/ios-19-1-logs/blob/master/ios-example.com.access.log
Output I need(hashtags instead of numbers):
198.27.69.191 (8): ########
82.202.69.253 (2): ##
cat ios-example.com.access | sort | uniq -c | awk 'ht="#"{for(i=1;i<$1;i++){ht=ht"#"} str=sprintf("%s (%d): %s", $2,$1, ht); print str}'
expecting file with content like:
ipadress1
ipadress1
ipadress1
ipadress2
ipadress2
ipadress1
ipadress2
ipadress1
Using xargs with sh and printf. Comments in between the lines. Live version at tutorialspoint.
# sorry cat
cat <<EOF |
8 192.27.69.191
2 82.202.69.253
EOF
# for each 2 arguments
xargs -n2 sh -c '
# format the output as "$2 ($1): "
printf "%s (%s): " "$2" "$1"
# repeat the character `#` $1 times
seq "$1" | xargs printf "#%.0s"
# lastly a newline
printf "\n"
' --
I think we could shorten that a bit with:
xargs -n2 sh -c 'printf "%s (%s): %s\n" "$2" "$1" $(printf "#%.0s" $(seq $1))' --
or maybe just echo, if the input is sufficiently safe:
xargs -n2 sh -c 'echo "$2 ($1): $(printf "#%.0s" $(seq $1))"' --
You can upgrade your command by adding another awk to the list, or you can just use a single awk for the whole thing:
awk '{a[$1]++}
END { for(i in a) {
printf "%s (%d):" ,i,a[i]
for(j=0;j<a[i];++j) printf "#"; printf "\n"
}
}' file

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

copy text in another file and append different strings shell script

file=$2
isHeader=$true
while read -r line;
do
if [ $isHeader ]
then
sed "1i$line",\"BATCH_ID\"\n >> $file
else
sed "$line,1"\a >> $file
fi
isHeader=$false
done < $1
echo $file
In the first line I want to append a string and to the others lines I want to append the same string for the rest of the lines. I tried this but it doesn't work. I don't have any ideas, can somebody help me please?
Not entirely clear to me what you want to do, but if you simply want to append text at the end of each line, use echo in place of sed:
file=$2
isHeader=1
while read -r line;
do
if [ $isHeader ]
then
#sed "1i$line",\"BATCH_ID\"\n >> $file
echo "${line},\"BATCH_ID\"\n" > $file
else
#sed "$line,1"\a >> $file
echo "${line},1\a" >> $file
fi
isHeader=0
done < $1
cat $file
The accepted answer is slightly wrong because echo...\a produces a bell. Also, awk or sed support regular expressions and are 10x faster at line-by-line processing. Here it is in awk:
#! /bin/sh
script='NR == 1 { print $0 ",\"BATCH_ID\"" }
NR > 1 { print $0 ",1" }'
awk "$script" $1 > $2
In sed it's even simpler:
sed '1 s/$/,"BATCH_ID"/; 2,$ s/$/,1/' $1 > $2
To convince yourself of the speed, try this yourself:
$ time seq 100000 | while read f; do echo ${f}foo; done > /dev/null
real 0m2.068s
user 0m1.708s
sys 0m0.364s
$ time seq 100000 | sed 's/$/foo/' > /dev/null
real 0m0.166s
user 0m0.156s
sys 0m0.017s

Shell sed command

I have paths.txt like:
pathO1/:pathD1/
pathO2/:pathD2/
...
pathON/:pathDN/
How can I 'sed' insert ' * ' after each pathOX/ ?
The script is:
while read line
do
cp $(echo $line | tr ':' ' ')
done < "paths.txt"
substituted by:
while read line
do
cp $(echo $line | sed 's/:/* /1')
done < "paths.txt"
This looks to be a similar question to which you asked earlier: Shell Script: Read line in file
Just apply the trick of removing additional '*' before appliying tr like:
cp $(echo $line | sed 's/\*//1' | tr ':' '* ')
while read line
do
path=`echo "$line" | sed 's/:/ /g'`
cmd="cp $path"
echo $cmd
eval $cmd
done < "./paths.txt"
quick and dirty awk one-liner without loop to do the job:
awk -F: '$1="cp "$1' paths.txt
this will output:
cp /home/Documents/shellscripts/Origen/* /home/Documents/shellscripts/Destino/
cp /home/Documents/shellscripts/Origen2/* /home/Documents/shellscripts/Destino2/
...
if you want the cmds to get executed:
awk -F: '$1="cp "$1' paths.txt|sh
I said it quick & dirty, because:
the format must be path1:path2
your path cannot contain special letters (like space) or :
Using pure shell
while IFS=: read -r p1 p2
do
cp $p1 "$p2"
done < file

assign stat|grep|awk to a variable in bash

I have a file of filenames, and I need to be able to get the size of these files using bash.
I have the following script which does that, but It prints the filename and the size on different lines, i'd prefer it to do it all on one line if possible.
#!/bin/sh
filename="$1"
while read -r line
do
name=$line
vars=(`echo $name | tr '.' ' '`)
echo $name
stat -x $name | grep Size: | awk '{ print $2 }'
done < "$filename"
I'd love to have it of the form:
filename: $size
How can I do this?
(I am using OSX hence the slightly odd version of stat.)
Pass -n to the echo to prevent a trailing newline from being added. So change
echo $name
to
echo -n $name
and to add the : separator between the file name and file size
echo -n ${name}": "
This should do the trick:
while read f
do
echo "${f} : $(stat -L -c %s ${f})"
done < "${filename}"
echo $name: $(stat -x $name | sed -n '/^Size:/s///p')

Resources