How to detect and remove indentation of a piped text - bash

I'm looking for a way to remove the indentation of a piped text. Below is a solution using cut -c 9- which assumes the indentation is 8 character wide.
I'm looking for a solution which can detect the number of spaces to remove. This implies going through the whole (piped) file to know the minimum number of spaces (tabs?) used to indent it, then remove them on each line.
run.sh
help() {
awk '
/esac/{b=0}
b
/case "\$arg" in/{b=1}' \
"$me" \
| cut -c 9-
}
while [[ $# -ge 1 ]]
do
arg="$1"
shift
case "$arg" in
help|h|?|--help|-h|'-?')
# Show this help
help;;
esac
done
$ ./run.sh --help
help|h|?|--help|-h|'-?')
# Show this help
help;;
Note: echo $' 4\n 2\n 3' | python3 -c 'import sys; import textwrap as tw; print(tw.dedent(sys.stdin.read()), end="")' works but I expect there is a better, way (I mean, one which doesn't only depends on software more common than python. Maybe awk? I wouldn't mind seeing a perl solution either.
Note2: echo $' 4\n 2\n 3' | python -c 'import sys; import textwrap as tw; print tw.dedent(sys.stdin.read()),' also works (Python 2.7.15rc1).

The following is pure bash, with no external tools or command substitutions:
#!/usr/bin/env bash
all_lines=( )
min_spaces=9999 # start with something arbitrarily high
while IFS= read -r line; do
all_lines+=( "$line" )
if [[ ${line:0:$min_spaces} =~ ^[[:space:]]*$ ]]; then
continue # this line has at least as much whitespace as those preceding it
fi
# this line has *less* whitespace than those preceding it; we need to know how much.
[[ $line =~ ^([[:space:]]*) ]]
line_whitespace=${BASH_REMATCH[1]}
min_spaces=${#line_whitespace}
done
for line in "${all_lines[#]}"; do
printf '%s\n' "${line:$min_spaces}"
done
Its output is:
4
2
3

Suppose you have:
$ echo $' 4\n 2\n 3\n\ttab'
4
2
3
tab
You can use the Unix expand utility to expand the tabs to spaces. Then run through an awk to count the minimum number of spaces on a line:
$ echo $' 4\n 2\n 3\n\ttab' |
expand |
awk 'BEGIN{min_indent=9999999}
{lines[++cnt]=$0
match($0, /^[ ]*/)
if(RLENGTH<min_indent) min_indent=RLENGTH
}
END{for (i=1;i<=cnt;i++)
print substr(lines[i], min_indent+1)}'
4
2
3
tab

Here's the (semi-) obvious temp file solution.
#!/bin/sh
t=$(mktemp -t dedent.XXXXXXXXXX) || exit
trap 'rm -f $t' EXIT ERR
awk '{ n = match($0, /[^ ]/); if (NR == 1 || n<min) min = n }1
END { exit min+1 }' >"$t"
cut -c $?- "$t"
This obviously fails if all lines have more than 255 leading whitespace characters because then the result won't fit into the exit code from Awk.
This has the advantage that we are not restricting ourselves to the available memory. Instead, we are restricting ourselves to the available disk space. The drawback is that disk might be slower, but the advantage of not reading big files into memory will IMHO trump that.

echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(text="$(cat)"; echo "$text" \
| cut -c "$(echo "$text" | sed 's/[^ ].*$//' | awk 'NR == 1 {a = length} length < a {a = length} END {print a + 1}')-"\
)
With explanations:
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the n-1 first characters of each line of the input, where n is:
echo "$text" | \
sed 's/[^ ].*$//' | \
awk 'NR == 1 || length < a {a = length} END {print a + 1}'
# sed: keep only the initial spaces, remove the rest
# awk:
# At the first line `NR == 1`, get the length of the line `a = length`.
# For any shorter line `a < length`, update the length `a = length`.
# At the end of the piped input, print the shortest length + 1.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
)-"
)
Update:
It is possible to avoid spawning sed. As per tripleee's comment, sed's s/// can be replace awk's sub(). Here is an even shorter option, using n = match() as in tripleee's answer.
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the a-1 first characters of each line of the input, where a is:
echo "$text" | \
awk '
{n = match($0, /[^ ]/)}
NR == 1 || n < a {a = n}
END || a == 0 {print a + 1; exit 0}'
# awk:
# At every line, get the position of the first non-space character
# At the first line `NR == 1`, copy that lenght to `a`.
# For any line with less spaces than `a` (`n < a`) update `a`, (`a = n`).
# At the end of the piped input, print a + 1.
# a is then the minimum number of common leading spaces found in all lines.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
#
# I'm not sure the whether the `a == 0 {...; exit 0}` optimisation will let the "$text" be written to the script stdout yet (which is not desirable at all). Gotta test that when I get the time.
)-"
)
Apparently, it's also possible to do in Perl 6 with the function my &f = *.indent(*);.

Another solution with awk, based on dawg’s answer. Major differences include:
No need to set an arbitrary large number for indentation, which feels hacky.
Works on text with empty lines, by not considering them when gathering the lowest indented line.
awk '
{
lines[++count] = $0
if (NF == 0) next
match($0, /[^ ]/)
if (length(min) == 0 || RSTART < min) min = RSTART
}
END {
for (i = 1; i <= count; i++) print substr(lines[i], min)
}
' <<< $' 4\n 2\n 3'
Or all on the same line
awk '{ lines[++count] = $0; if (NF == 0) next; match($0, /[^ ]/); if (length(min) == 0 || RSTART < min) min = RSTART; } END { for (i = 1; i <= count; i++) print substr(lines[i], min) }' <<< $' 4\n 2\n 3'
Explanation:
Add current line to an array, and increment count variable
{
lines[++count] = $0
If line is empty, skip to next iteration
if (NF == 0) next
Set RSTART to the start index of the first non-space character.
match($0, /[^ ]/)
If min isn’t set or is higher than RSTART, set the former to the latter.
if (length(min) == 0 || RSTART < min) min = RSTART
}
Run after all input is read.
END {
Loop over the array, and for each line print only a substring going from the index set in min to the end of the line.
for (i = 1; i <= count; i++) print substr(lines[i], min)
}

solution using bash
#!/usr/bin/env bash
cb=$(xclip -selection clipboard -o)
firstchar=${cb::1}
if [ "$firstchar" == $'\t' ];then
tocut=$(echo "$cb" | awk -F$'\t' '{print NF-1;}' | sort -n | head -n1)
else
tocut=$(echo "$cb" | awk -F '[^ ].*' '{print length($1)}' | sort -n | head -n1)
fi
echo "$cb" | cut -c$((tocut+1))- | xclip -selection clipboard
Note: assumes first line has the left-most indent
Works for both spaces and tabs
Ctrl+V some text, run that bash script, and now the dedented text is saved to your clipboard
solution using python
detab.py
import sys
import textwrap
data = sys.stdin.readlines()
data = "".join(data)
print(textwrap.dedent(data))
use with pipes
xclip -selection clipboard -o | python detab.py | xclip -selection clipboard

Related

Accept filename as argument and calculate repeated words along with count

I need to find the number or repeated characters from a text file and need to pass filename as argument.
Example:
test.txt data contains
Zoom
Output should be like:
z 1
o 2
m 1
I need a command that will accept filename as argument and then lists the number of characters from that file. In my example I have a test.txt which has zoom word. So the output will be like how many times each letter has repeated.
My attempt:
vi test.sh
#!/bin/bash
FILE="$1" --to pass filename as argument
sort file1.txt | uniq -c --to count the number of letters
Just a guess?
cat test.txt |
tr '[:upper:]' '[:lower:]' |
fold -w 1 |
sort |
uniq -c |
awk '{print $2, $1}'
m 1
o 2
z 1
Suggesting awk script that count all kinds of chars:
awk '
BEGIN{FS = ""} # make each char a field
{
for (i = 1; i <= NF; i++) { # iteratre over all fields in line
++charsArr[$i]; # count each field occourance in array
}
}
END {
for (char in charsArr) { # iterrate over chars array
printf("%3d %s\n", charsArr[char], char); # cournt char-occourances and the char
}
}' |sort -n
Or in one line:
awk '{for(i=1;i<=NF;i++)++arr[$i]}END{for(char in arr)printf("%3d %s\n",arr[char],char)}' FS="" input.1.txt|sort -n
#!/bin/bash
#get the argument for further processing
inputfile="$1"
#check if file exists
if [ -f $inputfile ]
then
#convert file to a usable format
#convert all characters to lowercase
#put each character on a new line
#output to temporary file
cat $inputfile | tr '[:upper:]' '[:lower:]' | sed -e 's/\(.\)/\1\n/g' > tmp.txt
#loop over every character from a-z
for char in {a..z}
do
#count how many times a character occurs
count=$(grep -c "$char" tmp.txt)
#print if count > 0
if [ "$count" -gt "0" ]
then
echo -e "$char" "$count"
fi
done
rm tmp.txt
else
echo "file not found!"
exit 1
fi

Combine expressions and parameter expansion in bash

Is it possible to combine parameter expansion with arithmetic expressions in bash? For example, could I do a one-liner to evaluate lineNum or numChar here?
echo "Some lines here
Here is another
Oh look! Yet another" > $1
lineNum=$( grep -n -m1 'Oh look!' $1 | cut -d : -f 1 ) #Get line number of "Oh look!"
(( lineNum-- )) # Correct for array indexing
readarray -t lines < $1
substr=${lines[lineNum]%%Y*} # Get the substring "Oh look! "
numChar=${#substr} # Get the number of characters in the substring
(( numChar -= 2 )) # Get the position of "!" based on the position of "Y"
echo $lineNum
echo $numChar
> 2
8
In other words, can I get the position of one character in a string based on the position of another in a one-line expression?
As far as for getting position of ! in a line that matches Oh look! regex, just:
awk -F'!' '/Oh look!/{ print length($1) + 1; quit }' "$file"
You can also do calculation to your liking, so with your original code I think that would be:
awk -F':' '/^[[:space:]][A-Z]/{ print length($1) - 2; quit }' "$file"
Is it possible to combine parameter expansion with arithmetic expressions in bash?
For computing ${#substr} you have to have the substring. So you could:
substr=${lines[lineNum-1]%%.*}; numChar=$((${#substr} - 2))
You could also edit your grep and have the filtering from Y done by bash, but awk is going to be magnitudes faster:
IFS=Y read -r line _ < <(grep -m1 'Oh look!' "$file")
numChar=$((${#line} - 2))
Still you could merge the 3 lines into just:
numChar=$(( $(<<<${lines[lineNum - 1]%%Y*} wc -c) - 1))

How to pad out values line by line while mainting overall record length in a Unix Shell script ksh

IFS=$'\n'
while read -r line
do
--header/trailer record
if echo ${line} | grep -e '000000000000000' -e '999999999999999' >/dev/null 2>&1
then
echo ${line} >> outfile.01.DAT.sampleNEW
elif echo ${line} | grep '+0' >/dev/null 2>&1
then
echo ${line} | sed -e 's/+/+00000000/; s/ X/X/' >> outfile.01.DAT.sampleNEW
else
echo ${line} | sed -e 's/-/-00000000/; s/ X/X/' >> outfile.01.DAT.sampleNEW
fi
done < Inputfile.01.DAT
I have a large file that I need to pad out the amount fields (signed) but retain the overall record length so have to remove some filler spaces at the end (each line ends with X). The file has a header/trailer that does not need to change. I have come up with a way but it is very slow when using a large input file. I am sure the use of grep here is not good.
Sample records. end with X - Overall length 107 bytes
000000000000000PPPPPPPPP Information INV TRANSACTION 0120160505201605052154HI203.SEQ 01 X
000000000000001PPPPP14PA 000YYYYYY488 -0001235.2520150319 X
000000000000002PPPMS PA 000RRRRR4539 +0008285.0020160301 X
000000000000003PPPP506 000TTTTTT605 -0000225.0020150608 X
9999999999999990000000000000439.940000000079802782.180000005 X
I suspect you want something like this, but it is very hard to tell given the way you have presented your question:
awk '
/000000000000000/ || /999999999999999/ {print;next}
/\+0/ {sub(/\+0/,"+00000000"); sub(/ X/,'X'); print; next}
/\-0/ {sub(/\-0/,"-00000000"); sub(/ X/,'X'); print; next}
' Inputfile.01.DAT
That says... "if the line contains a string of 15 zeroes or 15 nines, print it and move to the next line. If the line contains +0, replace it with +00000000 and remove 8 spaces before the final X, then print. Likewise for -0."
You could also maybe use Perl, and do something like this:
perl -nle '/0{15}|9{15}/ && print; s/([+-])0/$1\0000000000/ && s/ X/X/ && print' Inputfile.01.DAT

send the unix output to a csv file

I want to put the output data from unix command to a csv file.
Suppose the output which I am getting is :
A
B
C
I want to put this data in .csv file as
A B C
in three different columns but same row.
Try this :
printf '%s\n' A B C | paste -sd ' ' >> file.csv
or more classical for a CSV (delimiter with a , :
printf '%s\n' A B C | paste -sd ',' >> file.csv
printf '%s\n' A B C is just an example to have the same sample input as you. My solution works with spaces in a same line too.
EDIT from your comments, you seems to need to treat with a for loop, so :
for i in {0..5}; do printf '%s\n' {A..C} | paste -sd " " >> file.csv; done
or in pseudo code :
for ...:
unix_command | paste -sd " " >> file.csv
endfor
unix_command | tr "\n" " " > file.csv
or
unix_command | awk 'ORS=FS' > file.csv
Disadvantage in both cases: one trailing space
For my understanding, #Django needs three line into one line.
paste -d ' ' - - - < infile
If you need output as csv format (split by ,), you can use this
paste -d ',' - - - < infile
Here is the test result
$ cat infile
Manoj Mishra
Japan
Environment.
Michael Jackson
America
Environment.
$ paste -d ',' - - - < infile
Manoj Mishra,Japan,Environment.
Michael Jackson,America,Environment.
A more general answer
If the output of your command is multi-line and you want to put the
quoted output in csv format, n items per line, the following script
could be handy.
The groupby program reads from stdin and
quotes each input line
groups n quoted input lines in a csv record, using a comma as a separator
optionally, using the -s optional argument, the program discards the
last line of its output if said last line doesn't contain exactly n
items.
The -h option, as usual, echoes an usage line and exits.
Specifying another option the program prints the usage line and exits
in error.
The code
% cat groupby
#!/bin/sh
usage () { echo Usage: $0 [-s] n --- -s is for \"strict\", outputs only records of n items. ; exit $1 ; }
s=0
while getopts :sh o ; do
case "${o}" in
s) s=1 ; shift ;;
h) usage 0 ;;
*) usage 1 ;;
esac
done
awk -v n=$1 -v s=$s -v q='"' '
NR==1 {buf = q $0 q ; next}
NR%n==1 {print buf; buf = q $0 q ; next}
{buf = buf "," q $0 q}
END {if(!s||NR%n==0)print buf}'
%
An example of usage
% chmod +x groupby
% echo -e "1\n2\n3\n4\n5" | ./groupby 3
"1","2","3"
"4","5"
% echo -e "1\n2\n3\n4\n5\n6" | ./groupby 3
"1","2","3"
"4","5","6"
echo -e "1\n2\n3\n4\n5\n6\n7" | ./groupby 3
"1","2","3"
"4","5","6"
"7"
% echo -e "1\n2\n3\n4\n5\n6\n7\n8" | ./groupby -s 4
"1","2","3","4"
"5","6","7","8"
% echo -e "1\n2\n3\n4\n5\n6\n7" | ./groupby -s 4
"1","2","3","4"
%
A different angle
I changed the defaults to suit best the OP requirements, and introduced other options, see the usage string for details
#!/bin/sh
usage () { echo 'Usage: '$0' [-s] [-q quote_char] [-c separator_char] n
Reads lines from stdin and prints them grouped by n and separated by spaces.
Optional arguments:
-s is for "strict", outputs only records of n items;
-q quote_char, forces quoting of each input line;
-c separator_char, changes the field separator,
interesting alternatives are tab, comma, semicolon etc;
-h prints this help and exits.' ; exit $1 ; }
# Default options
s=0 ; q='' ; c=' '
# Treatment of optional arguments
while getopts :shc:q: o ; do
case "${o}" in
s) s=1 ; ;;
c) c="${OPTARG}" ;;
q) q="${OPTARG}" ;;
h) usage 0 ;;
*) usage 1 ;;
esac
done
shift $(($OPTIND-1))
# awk code
awk -v n=$1 -v s=$s -v q="$q" -v c="$c" '
NR==1 {buf = q $0 q ; next}
NR%n==1 {print buf; buf = q $0 q ; next}
{buf = buf c q $0 q}
END {if(!s||NR%n==0)print buf}'
just use xargs.
eg:
less filename| xargs >> filename.csv

adding numbers without grep -c option

I have a txt file like
Peugeot:406:1999:Silver:1
Ford:Fiesta:1995:Red:2
Peugeot:206:2000:Black:1
Ford:Fiesta:1995:Red:2
I am looking for a command That counts the number of red Ford Fiesta cars.
The last number in each line is the amount of that particular car.
The command I am looking for CANNOT use the -c option of grep.
so this command should just output the number 4.
Any help would be welcome, thank you.
A simple bit of awk would do the trick:
awk -F: '$1=="Ford" && $4=="Red" { c+=$5 } END { print c }' file
Output:
4
Explanation:
The -F: switch means that the input field separator is a colon, so the car manufacturer is $1 (the 1st field), the model is $2, etc.
If the 1st field is "Ford" and the 4th field is "Red", then add the value of the 5th (last) field to the variable c. Once the whole file has been processed, print out the value of c.
For a native bash solution:
c=0
while IFS=":" read -ra col; do
[[ ${col[0]} == Ford ]] && [[ ${col[3]} == Red ]] && (( c += col[4] ))
done < file && echo $c
Effectively applies the same logic as the awk one above, without any additional dependencies.
Methods:
1.) use some scripting language for counting, like awk or perl and such. Awk solution already posted, here is an perl solution.
perl -F: -lane '$s+=$F[4] if m/Ford:.*:Red/}{print $s' < carfile
#or
perl -F: -lane '$s+=$F[4] if ($F[0]=~m/Ford/ && $F[3]=~/Red/)}{print $s' < carfile
both examples prints
4
2.) The second method is based on shell-pipelining
filter out the right rows
extract the column with the count
sum the numbers
e.g some examples:
grep 'Ford:.*:Red:' carfile | cut -d: -f5 | paste -sd+ | bc
the grep filter out the right rows
the cut get the last column
the paste creates an line like 2+2 what can be counted by
the bc for counting
Another example:
sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile | paste -sd+ | bc
the sed filter and extract
another example - different way of counting
(echo 0 ; sed -n 's/\(Ford:.*:Red\):\(.*\)/\2+/p' carfile ;echo p )| dc
numbers are counted by RPN calculator called dc, e.g. it works like 0 2 + - first comes the values and as the last the operation.
the first echo puts into the stack 0
the sed creates a stream of numbers like 2+ 2+
the last echo p prints the stack
exists many other possibilies how count a strem of numbers.
e.g counting by bash
while read -r num
do
sum=$(( $sum + $num ))
done < <(sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile)
and pure bash:
while IFS=: read -r maker model year color count
do
if [[ "$maker" == "Ford" && "$color" == "Red" ]]
then
(( sum += $count ))
fi
done < carfile
echo $sum

Resources