Indent long line stdout - bash

Let's say I have a standard 80 columns terminal, execute command with long line output (i.e. stdout from ls) that splits into two or more lines, and want to indent the continuation line of all my bash stdout.
Indent should be configurable, 1 or 2 or 3 or whatever spaces.
from this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
to this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
Read this Indenting multi-line output in a shell script so I tried to pipe | sed 's/^/ /' but gives me the exact opposite, indents the first lines and not the continuation.
Ideally I would put a script in profile.rc or whatever so every time I open an interactive shell and execute any command long output gets indented .

I'd use awk for this.
awk -v width="$COLUMNS" -v spaces=4 '
BEGIN {
pad = sprintf("%*s", spaces, "") # generate `spaces` spaces
}
NF { # if current line is not empty
while (length > width) { # while length of current line is greater than `width`
print substr($0, 1, width) # print `width` chars from the beginning of it
$0 = pad substr($0, width + 1) # and leave `pad` + remaining chars
}
if ($0 != "") # if there are remaining chars
print # print them too
next
} 1' file
In one line:
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
As #Mark suggested in comments, you can put this in a function and add it to .bashrc for ease of use.
function wrap() {
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
}
Usage:
ls -l | wrap
Edit by Ed Morton per request:
Very similar to oguzismails script above but should work with Busybox or any other awk:
$ cat tst.awk
BEGIN { pad = sprintf("%" spaces "s","") }
{
while ( length($0) > width ) {
printf "%s", substr($0,1,width)
$0 = substr($0,width+1)
if ( $0 != "" ) {
print ""
$0 = pad $0
}
}
print
}
$
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=30 -f tst.awk
123456789012345678901234567890
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=15 -f tst.awk
123456789012345
678901234567
890
$ echo '' | awk -v spaces=3 -v width=15 -f tst.awk
$
That first test case is to show that you don't get a blank line printed after a full-width input line and the third is to show that it doesn't delete blank rows. Normally I'd have used sprintf("%*s",spaces,"") to create the pad string but I see in a comment that that doesn't work in the apparently non-POSIX awk you're using.

This might work for you (GNU sed):
sed 's/./&\n /80;P;D' file
This splits lines into length 80 and indents the following line(s) by 2 spaces.
Or if you prefer:
s=' ' c=80
sed "s/./&\n$s/$c;P;D" file
To prevent empty lines being printed, use:
sed 's/./&\n/80;s/\n$//;s/\n /;P;D' file
or more easily:
sed 's/./\n &/81;P;D' file

One possible solution using pure bash string manipulation.
You can make the script to read stdin and format whatever it read.
MAX_LEN=5 # length of the first/longest line
IND_LEN=2 # number of indentation spaces
short_len="$((MAX_LEN-IND_LEN))"
while read line; do
printf "${line:0:$MAX_LEN}\n"
for i in $(seq $MAX_LEN $short_len ${#line}); do
printf "%*s${line:$i:$short_len}\n" $IND_LEN
done
done
Usage: (assuming the script is saved as indent.sh)
$ echo '0123456789' | ./indent.sh
01234
567
89

Related

Selectively reformatting a file with spaces and \n

I have multiple files in the following format. This one has 3 sequences (number of sequences vary in all files, but always end in ".") with 40 positions each, as indicated by the numbers in the first line. From the beginning of the lines (except the first one) there are the names of the sequences:
3 40
00076284. ATGTCTGTGG TTCTTTAACC
00892634. TTGTCTGAGG TTCGTAAACC
00055673. TTGTCTGAGG TCCGTGAACC
GCCGGGAACA TCCGCAAAAA
ACCGTGAAAC GGGGTGAACT
TCCCCCGAAC TCCCTGAACG
I need to convert it to this format, where the sequences are continuous, with no spaces nor \n, and on a new line after their names.The only spaces that should remain are between the two numbers in the first line.
3 40
00076284.
ATGTCTGTGGTTCTTTAACCGCCGGGAACATCCGCAAAAA
00892634.
TTGTCTGAGGTTCGTAAACCACCGTGAAACGGGGTGAACT
00055673.
TTGTCTGAGGTCCGTGAACCTCCCCCGAACTCCCTGAACG
Tried sed to delete spaces and \n's but don't know how to apply it after the first line and how to avoid making one huge line.
Thanks
Here's a shell script that may provide what you need:
head -1 input
awk '
NR == 1 { sequences = $1 ; positions = $2 ; next }
{
if ( $1 ~ /^[0-9]/ ) {
sid = $1 ; $1 = "" ; sequence_name[ NR - 1 ] = sid
sequence[ NR - 1 ] = $0
} else {
sequence[ ( NR - 1 ) % ( sequences + 1 ) ] = sequence[ (NR-1) % ( sequences + 1 ) ] " " $0
}
}
END {
for ( x = 1 ; x <= length( sequence_name ) ; x++ )
{
print sequence_name[x]
print sequence[x]
}
}' input | tr -d ' '
I added head -1 to the top of the shell just to get the first line out of your file. I couldn't output the first line within the awk script because of the pipe to tr -d ' '.
I think this should work, but my output is longer since if I actually concat all the last "orphan" sequences I get a way longer line.
cat input.txt | awk '/^[0-9]+ [0-9]+$/{printf("%s\n",$0); next} /[0-9]+[.]/{ printf("\n%s\n",$1);for(i=2; i<=NF;i++){printf("%s",$i)}; next} /^ */{ for(i=1; i<=NF;i++){printf("%s",$i)}; next;}'
3 40
Please try and let me know.
Remember the position of empty line and merge the lines before empty line with those after:
awk '
NR==1{print;next}
NR!=1 && !empty{arr[NR]=$1 "\n" $2 $3}
/^$/{empty=NR-1;next}
NR!=1 && empty{printf "%s%s%s\n", arr[NR-empty], $1, $2}
' file
My second solution without awk: Merge the file with itself using empty line as separator
cat >file <<EOF
3 40
00076284. ATGTCTGTGG TTCTTTAACC
00892634. TTGTCTGAGG TTCGTAAACC
00055673. TTGTCTGAGG TCCGTGAACC
GCCGGGAACA TCCGCAAAAA
ACCGTGAAAC GGGGTGAACT
TCCCCCGAAC TCCCTGAACG
EOF
head -n1 file
paste <(sed -n '1!{ /^$/q;p; }' file) <(sed -n '1!{ /^$/,//{/^$/!p}; }' file) |
sed 's/[[:space:]]//g; s/\./.\n/'
Will output:
3 40
00076284.
ATGTCTGTGGTTCTTTAACCGCCGGGAACATCCGCAAAAA
00892634.
TTGTCTGAGGTTCGTAAACCACCGTGAAACGGGGTGAACT
00055673.
TTGTCTGAGGTCCGTGAACCTCCCCCGAACTCCCTGAACG
:
head -n1 file output first line
sed -n '1!{ /^$/q;p; }' file
1! - don't output first line
/^$/q - quit when empty line
p print everything else
sed -n '1!{ /^$/,//{/^$/!p}; }' file
1! - ignore first line
/^$/,// - from empty line until the end
/^$/!p - output if not an empty tline
paste <(..) <(...) - merge the two seds with a tab
sed 's/[[:space:]]//g; s/\./.\n/
s/[[:space:]]//g; remove all spaces
s/\./.\n/ replace a comma with a comma and a newline.

convert txt to columnated file

I need to convert test.txt file to a columnated file.
I know how to convert it with awk if the number of lines after each keyword are same but they are different in this example.
awk 'NR % 5 {printf "%s ", $0; next}1' test.txt
if the number of lines are same here is the code but this one will not work with this input file.
Anyway to convert this? Please advise.
test.txt
"abc"
4
21
22
25
"standard"
1
"test"
4
5
10
11
12
Expected Output:
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
$ awk '{printf "%s%s", (/^"/ ? ors : OFS), $0; ors=ORS} END{print ""}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
A bit magic, but works in this case:
sed -z 's/\n"/\n\x01"/g' |
tr '\n' ' ' |
tr $'\x01' '\n'
Each "header" starts is a string between " ... ". So:
Using sed I put some delimter (I chose 0x01 in hex) between a newline and a ", everywhere in the file. Note that -z is a gnu extension.
Then I substitute all newlines for a space.
Then I substitute all 0x01 bytes for newlines.
This method is a little tricky, but is simply and works in cases where the header starts with some certain character on the beginning of the line.
Live version available at tutorialspoint.
One can get with sed without gnu extension by using for example:
sed '2,$s/^"/\x01"/'
ie. for lines greater then the second if the line starts with a ", then add the 0x01 byte on the beginning of the line.
with GNU awk
$ awk -v RS='\n"' '{$1=$1; printf "%s", rt $0; rt=RT}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
POSIX awk:
$ awk '/^"/{if (s) print s; s=$0; next} {s=s OFS $0} END{print s}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
Or with perl:
$ perl -0777 -lnE 'for (/^"[^"]+"\R(?:[\s\S]+?)(?=^"|\z)/mg) {tr /\n/ /; say} ' file
If your fields do not have spaces in them, you can use a simple tr and sed pipe:
$ cat file | tr '\n' ' ' | sed -E 's/ ("[^"]*")/\
\1/g'
Or GNU sed:
$ cat file | tr '\n' ' ' | sed -E 's/ ("[^"]*")/\n\1/g'
While an awk or sed solution is advisable, since the question is also tagged bash, you can do all that is needed with a simple read loop and a flag variable to control the newline output for the first iteration. Essentially, you are reading each line and using the string indexing parameter expansion to test whether the first character is a non-digit, and on the 1st iteration simply output the string, for all additional iterations, output the string preceded by a '\n'. If the line begins with a digit, simply output it with a space preceding.
For example:
#!/bin/bash
declare -i n=0 ## simple flag to omit '\n' on first string output
while read -r line; do ## read each line
[[ ${line:0:1} =~ [^0-9] ]] && { ## begins with non-digit
## 1st iteration, just output $line, rest output '\n$line'
((n == 0)) && printf "%s" "$line" || printf "\n%s" "$line"
} || printf " %s" "$line" ## begins with digit - output " $line"
n=1 ## set flag
done < "$1"
echo "" ## tidy up with newline
Example Use/Output
$ bash fmtlines test.txt
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
While awk and sed will generally be faster (as a general rule), here with nothing more than a while read loop and a few conditionals and parameter expansions, the native bash solution would not be bad by comparison.
Look things over and let me know if you have questions.

sed - removing last comma from listed value after doing a replace

I'm using sed to replace my file of new lines \n with ',' which works fine however, in my last item, I don't want the ,.
How can I remove this?
Example:
sed 's/\n/,/g' myfile.out > myfile.csv
Output:
1,2,3,4,5,6,
Well you can use labels:
$ cat file
1
2
3
4
5
6
$ sed ':a;N;s/\n/,/;ba' file
1,2,3,4,5,6
You can also use paste command:
$ paste -sd, file
1,2,3,4,5,6
Consider jaypal singh's paste solution, which is the most efficient and elegant.
An awk alternative, which doesn't require reading the entire file into memory first:
awk '{ printf "%s%s", sep, $0; sep = "," }' myfile.out > myfile.csv
If the output should have a trailing newline (thanks, Ed Morton):
awk '{ printf "%s%s", sep, $0; sep = "," } END { printf "\n" }' myfile.out > myfile.csv
For the first input line, sep, due to being an uninitialized variable, defaults to the empty string, effectively printing just $0, the input line.
Setting sep to "," after the first print ensures that all remaining lines have a , prepended.
END { printf "\n" } prints a trailing newline after all input lines have been processed. (print "" would work too, given that print appends the output record separator (ORS), which defaults to a newline).
The net effect is that , is only placed between input lines, so the output won't have a trailing comma.
You could add a second s command after the first: sed -z 's/\n/,/g ; s/,$//. This removes a comma at the end. (The option -z is from gnu sed and I needed it to get the first s command working.)

Extract specified lines from a file

I have a file and I want to extract specific lines from that file like lines 2, 10, 15,21, .... and so on. There are around 200 thousand lines to be extracted from the file. How can I do it efficiently in bash
Maybe looking for:
sed -n -e 1p -e 4p afile
Put the linenumbers of the lines you want in a file called "wanted", like this:
2
10
15
21
Then run this script:
#!/bin/bash
while read w
do
sed -n ${w}p yourfile
done < wanted
TOTALLY ALTERNATIVE METHOD
Or you could let "awk" do it all for you, like this which is probably miles faster since you won't have to create 200,000 sed processes:
awk 'FNR==NR{a[$1]=1;next}{if(FNR in a){print;}}' wanted yourfile
The FNR==NR portion detects when awk is reading the file called "wanted" and if so, it sets element "$1" of array "a" to "1" so we know that this line number is wanted. The stuff in the second set of curly braces is active when processing your bigger file only and it prints the current line if its linenumber is in the array "a" we created when reading the "wanted" file.
$ gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
Wanted line numbers have to be stored in lines delimited by newline and they may safely be in random order. It almost exactly the same as #Mark Setchell’s second method, but uses a little more clear way to determine which file is current. Although this ARGIND is GNU extension, so gawk. If you are limited to original AWK or mawk, you can write it as:
$ awk 'FILENAME==ARGV[1] { L[$0]++ }; FILENAME==ARGV[2] && FNR in L' lines file > file.lines
Efficiency test:
$ awk 'BEGIN { for (i=1; i<=1000000; i++) print i }' > file
$ shuf -i 1-1000000 -n 200000 > lines
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m1.734s
user 0m1.460s
sys 0m0.052s
UPD:
As #Costi Ciudatu pointed out, there is room for impovement for the case when all wanted lines are in the head of a file.
#!/usr/bin/gawk -f
ARGIND==1 { L[$0]++ }
ENDFILE { L_COUNT = FNR }
ARGIND==2 && FNR in L { L_PRINTED++; print }
ARGIND==2 && L_PRINTED == L_COUNT { exit 0 }
Sript interrupts when last line is printed, so now it take few milliseconds to filter out 2000 random lines from first 1 % of a one million lines file.
$ time ./getlines.awk lines file > file.lines
real 0m0.016s
user 0m0.012s
sys 0m0.000s
While reading a whole file still takes about a second.
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m0.780s
user 0m0.756s
sys 0m0.016s
Provided your system supports sed -f - (i.e. for sed to read its script on standard input; it works on Linux, but not on some other platforms) you can turn the file of line numbers into a sed script, naturally using sed:
sed 's/$/p/' lines | sed -n -f - inputfile >output
If the lines you're interested in are close to the beginning of the file, you can make use of head and tail to efficiently extract specific lines.
For your example line numbers (assuming that list doesn't go on until close to 200,000), a dummy but still efficient approach to read those lines would be the following:
for n in 2 10 15 21; do
head -n $n /your/large/file | tail -1
done
sed Example
sed -n '2p' file
awk Example
awk 'NR==2' file
this will print 2nd line of file
use same logic in loop & try.
say a for loop
for VARIABLE in 2 10 15 21
do
awk "NR==$VARIABLE" file
done
Give your line numbers this way..

get Nth line in file after parsing another file

I have one of my large file as
foo:43:sdfasd:daasf
bar:51:werrwr:asdfa
qux:34:werdfs:asdfa
foo:234:dfasdf:dasf
qux:345:dsfasd:erwe
...............
here 1st column foo, bar and qux etc. are file names. and 2nd column 43,51, 34 etc. are line numbers. I want to print Nth line(specified by 2nd column) for each file(specified in 1st column).
How can I automate above in unix shell.
Actually above file is generated while compiling and I want to print warning line in code.
-Thanks,
while IFS=: read name line rest
do
head -n $line $name | tail -1
done < input.txt
while IFS=: read file line message; do
echo "$file:$line - $message:"
sed -n "${line}p" "$file"
done <yourfilehere
awk 'NR==4 {print}' yourfilename
or
cat yourfilename | awk 'NR==4 {print}'
The above one will work for 4th line in your file.You can change the number as per your requirement.
Just in awk, but probably worse performance than answers by #kev or #MarkReed.
However it does process each file just once. Requires GNU awk
gawk -F: '
BEGIN {OFS=FS}
{
files[$1] = 1
lines[$1] = lines[$1] " " $2
msgs[$1, $2] = $3
}
END {
for (file in files) {
split(lines[file], l, " ")
n = asort(l)
count = 0
for (i=1; i<=n; i++) {
while (++count <= l[i])
getline line < file
print file, l[i], msgs[file, l[i]]
print line
}
close(file)
}
}
'
This might work for you:
sed 's/^\([^,]*\),\([^,]*\).*/sed -n "\2p" \1/' file |
sort -k4,4 |
sed ':a;$!N;s/^\(.*\)\(".*\)\n.*"\(.*\)\2/\1;\3\2/;ta;P;D' |
sh
sed -nr '3{s/^([^:]*):([^:]*):.*$/\1 \2/;p}' namesNnumbers.txt
qux 34
-n no output by default,
-r regular expressions (simplifies using the parens)
in line 3 do {...;p} (print in the end)
s ubstitute foobarbaz with foo bar
So to work with the values:
fnUln=$(sed -nr '3{s/^([^:]*):([^:]*):.*$/\1 \2/;p}' namesNnumbers.txt)
fn=$(echo ${fnUln/ */})
ln=$(echo ${fnUln/* /})
sed -n "${ln}p" "$fn"

Resources