printing comma-separated integer size of directory and contents - bash

The following will print out the size in bytes of a directory and its contents:
ls -lR | grep -v '^d' | awk '{bytes += $5} END {print "Total bytes: " bytes}'
The output looks like this:
Total bytes: 1088328265
How can I most-simply modify my command so that the output has comma-separated numbers, like this:
Total bytes: 1,088,328,265

$ awk 'BEGIN{printf "Total bytes: %\047d\n", 1088328265}'
Total bytes: 1,088,328,265
So puting aside the usual advice to not parse the output of ls and getting rid of the grep since you never need grep when you're using awk, we can make your whole command:
ls -lR | awk '!/^d/{bytes += $5} END{printf "Total bytes: %\047d\n", bytes}'
\047 is how to represent a single-quote in a single-quote-delimited awk script and then from the GNU awk manual:
A single quote or apostrophe character is a POSIX extension to ISO C. It indicates that the integer part of a floating-point value, or the entire part of an integer decimal value, should have a thousands-separator character in it. This only works in locales that support such characters. For example:
$ cat thousands.awk Show source program
-| BEGIN { printf "%'d\n", 1234567 }
$ LC_ALL=C gawk -f thousands.awk
-| 1234567 Results in "C" locale
$ LC_ALL=en_US.UTF-8 gawk -f thousands.awk
-| 1,234,567 Results in US English UTF locale
For more information about locales and internationalization issues, see Locales.

Using Perl instead of awk:
perl -lane '$bytes += $F[4];
END { substr $bytes, -3 * $_, 0, ","
for reverse 1 .. (length($bytes)-1)/3;
print "Total bytes: $bytes"}'
-l removes newlines from input and adds them to prints
-n reads the input line by line
-a splits the input on whitespace into the #F array
substr inserts spaces to each position; we use negative positions which count from the right, but we start from the leftmost position so the numbers don't change as we add the commas

Related

Multiplying all values in a txt file with another value

My aim is to multiply all values in a text file with a number. In my case it is 1000.
Original text in file:
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
I want the output to look like:
(so, changing the contents of the file to...)
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
Or even rather:
4.9
43.8
149.7
443.1
882.0
975.7
995.7
1000
I am using bash on macOS in the terminal.
If you have dc :
cat infile | dc -f - -e '1k1000sa[la*Sdz0!=Z]sZzsclZx[Ld1/psblcd1-sc1<Y]sYlYx'
Using Perl
perl -lpe ' $_=$_*1000 '
with inputs and inline replacing
$ cat andy.txt
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
$ perl -i -lpe ' $_=$_*1000 ' andy.txt
$ cat andy.txt
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
$
One decimal place
perl -lpe ' $_=sprintf("%0.1f",$_*1000 ) '
Zero decimal place and rounding off
perl -lpe ' $_=sprintf("%0.0f",$_*1000 ) '
Zero decimal place and Truncating
perl -lpe ' $_=sprintf("%0.0f",int($_*1000) ) '
awk to the rescue!
$ awk '{printf "%.1f\n", $1*1000}' file > tmp && mv tmp file
Using num-utils. For answers to 8 decimal places:
numprocess '/*1000/' n.txt
For rounded answers to 1 decimal place:
numprocess '/*1000/' n.txt | numround -n '.1'
Use sed to prefix each line with 1000*, then process the resulting mathematical expressions with bc. To show only the first digit after the decimal point you can use sed again.
sed 's/^/1000*/' yourFile | bc | sed -E 's/(.*\..).*/\1/'
This will print the latter of your expected outputs. Just as you wanted, decimals are cut rather than rounded (1.36 is converted to 1.3).
To remove all decimal digits either replace the last … | sed … with sed -E 's/\..*//' or use the following command
sed 's:^.*$:1000*&/1:' yourFile | bc
With these commands overwriting the file directly is not possible. You have to write to a temporary file (append > tmp && mv tmp yourFile) or use the sponge command from the package moreutils (append | sponge yourFile).
However, if you want to remove all decimal points after the multiplication there is a trick. Instead of actually multiplying by 1000 we can syntactically shift the decimal point. This can be done in one single sed command. sed has the -i option to overwrite input files.
sed -i.bak -E 's/\..*/&000/;s/^[^.]*$/&.000/;s/\.(...).*/\1/;s/^(-?)0*(.)/\1\2/' yourFile
The command changes yourFile's content to
4
43
149
443
882
975
995
1000
A backup yourFile.bak of the original is created.
The single sed command should work with every input number format too (even for things like -.1 → -100).

add zeros as a filling character to the left of your hexadecimal representations

I use this command to convert bunch of decimal numbers to hexadecimal
( echo "obase=16" ; cat file.txt ) | bc
However, in the output some values have less than 8 digits:
FFF95E13
7613
EE13
16613
6686E13
I would like to add zeros as a filling character to the left of your hexadecimal representation if the number of digits is less than 8. So, it ends up in this format:
FFF95E13
00007613
0000EE13
00016613
06686E13
I can possibly doing it using python but I was hoping if it is possible to do using awk or sed? If there is a way to convert to hexadecimal and also add leading zero to make sure all values are 8 digits that would be even better.
Perhaps you could just use awk instead:
awk '{ printf "%08X\n", $1 }' file
This prints the first column on each line in uppercase hexadecimal, zero-padded up to 8 characters.
$ cat file
111
222
333
444
$ awk '{ printf "%08X\n", $1 }' file
0000006F
000000DE
0000014D
000001BC
You can achieve the same with printf directly
$ seq 100 25 200 | xargs printf "%08X\n"
00000064
0000007D
00000096
000000AF
000000C8
printf doesn't read from file, you can pipe in the data from a file:
$ cat file | xargs printf "%08X\n"
or redirect:
$ <file xargs printf "%08X\n"
or, placed at the end:
$ xargs printf "%08X\n" <file
doesn't add much to the awk solution though...

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Get last four characters from a string

I am trying to parse the last 4 characters of Mac serial numbers from terminal. I can grab the serial with this command:
serial=$(ioreg -l |grep "IOPlatformSerialNumber"|cut -d ""="" -f 2|sed -e s/[^[:alnum:]]//g)
but I need to output just the last 4 characters.
Found it in a linux forum echo ${serial:(-4)}
Using a shell parameter expansion to extract the last 4 characters after the fact works, but you could do it all in one step:
ioreg -k IOPlatformSerialNumber | sed -En 's/^.*"IOPlatformSerialNumber".*(.{4})"$/\1/p'
ioreg -k IOPlatformSerialNumber returns much fewer lines than ioreg -l, so it speeds up the operation considerably (about 80% faster on my machine).
The sed command matches the entire line of interest, and replaces it with the last 4 characters before the " that ends the line; i.e., it returns the last 4 chars. of the value.
Note: The ioreg output line of interest looks something like this:
| "IOPlatformSerialNumber" = "A02UV13KDNMJ"
As for your original command: cut -d ""="" is the same as cut -d = - the shell simply removes the empty strings around the = before cut sees the value. Note that cut only accepts a single delimiter char.
You can also do: grep -Eo '.{4}$' <<< "$serial"
I don't know how the output of ioreg -l looks like, but it looks to me that you are using so many pipes to do something that awk alone could handle:
use = as field separator
vvv
awk -F= '/IOPlatformSerialNumber/ { #match lines containing IOPlatform...
gsub(/[^[:alnum:]]/, "", $2) # replace all non alpha chars from 2nd field
print substr($2, length($2)-3, length($2)) # print last 4 characters
}'
Or even sed (a bit ugly one since the repetition of command): catch the first 4 alphanumeric characters occuring after the first =:
sed -rn '/IOPlatformSerialNumber/{
s/^[^=]*=[^a-zA-Z0-9]*([a-zA-Z0-9])[^a-zA-Z0-9]*([a-zA-Z0-9])[^a-zA-Z0-9]*([a-zA-Z0-9])[^a-zA-Z0-9]*([a-zA-Z0-9]).*$/\1\2\3\4/;p
}'
Test
$ cat a
aaa
bbIOPlatformSerialNumber=A_+23B/44C//55=ttt
IOPlatformSerialNumber=A_+23B/44C55=ttt
asdfasd
The last 4 alphanumeric characters between the 1st and 2nd = are 4C55:
$ awk -F= '/IOPlatformSerialNumber/ {gsub(/[^[:alnum:]]/, "", $2); print substr($2, length($2)-3, length($2))}' a
4C55
4C55
Without you posting some sample output of ioreg -l this is untested and a guess but it looks like all you need is something like:
ioreg -l | sed -r -n 's/IOPlatformSerialNumber=[[:alnum:]]+([[:alnum:]]{4})/\1/'

cut string in a specific column in bash

How can I cut the leading zeros in the third field so it will only be 6 characters?
xxx,aaa,00000000cc
rrr,ttt,0000000yhh
desired output
xxx,aaa,0000cc
rrr,ttt,000yhh
or here's a solution using awk
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3)}1'
output
xxx,aaa,0000cc
rrr,ttt,000yhh
awk uses -F (or FS for FieldSeparator) and you must use OFS for OutputFieldSeparator) .
sub(/srchtarget/, "replacmentstring", stringToFix) is uses a regular expression to look for 4 0s at the front of (^) the third field ($3).
The 1 is a shorthand for the print statement. A longhand version of the script would be
echo " xxx,aaa,00000000cc
rrr,ttt,0000000yhh"|awk -F, -v OFS=, '{sub(/^0000/, "", $3);print}'
# ---------------------------------------------------------^^^^^^
Its all related to awk's /pattern/{action} idiom.
IHTH
If you can assume there are always three fields and you want to strip off the first four zeros in the third field you could use a monstrosity like this:
$ cat data
xxx,0000aaa,00000000cc
rrr,0000ttt,0000000yhh
$ cat data |sed 's/\([^,]\+\),\([^,]\+\),0000\([^,]\+\)/\1,\2,\3/
xxx,0000aaa,0000cc
rrr,0000ttt,000yhh
Another more flexible solution if you don't mind piping into Python:
cat data | python -c '
import sys
for line in sys.stdin():
print(",".join([f[4:] if i == 2 else f for i, f in enumerate(line.strip().split(","))]))
'
This says "remove the first four characters of the third field but leave all other fields unchanged".
Using awks substr should also work:
awk -F, -v OFS=, '{$3=substr($3,5,6)}1' file
xxx,aaa,0000cc
rrr,ttt,000yhh
It just take 6 characters from 5 position in field 3 and set it back to field 3

Resources