Output a file in two columns in BASH - bash

I'd like to rearrange a file in two columns after the nth line.
For example, say I have a file like this here:
This is a bunch
of text
that I'd like to print
as two
columns starting
at line number 7
and separated by four spaces.
Here are some
more lines so I can
demonstrate
what I'm talking about.
And I'd like to print it out like this:
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
How could I do that with a bash command or function?

Actually, pr can do almost exactly this:
pr --output-tabs=' 1' -2 -t tmp1
↓
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
-2 for two columns; -t to omit page headers; and without the --output-tabs=' 1', it'll insert a tab for every 8 spaces it added. You can also set the page width and length (if your actual files are much longer than 100 lines); check out man pr for some options.
If you're fixed upon “four spaces more than the longest line on the left,” then perhaps you might have to use something a bit more complex;
The following works with your test input, but is getting to the point where the correct answer would be, “just use Perl, already;”
#!/bin/sh
infile=${1:-tmp1}
longest=$(longest=0;
head -n $(( $( wc -l $infile | cut -d ' ' -f 1 ) / 2 )) $infile | \
while read line
do
current="$( echo $line | wc -c | cut -d ' ' -f 1 )"
if [ $current -gt $longest ]
then
echo $current
longest=$current
fi
done | tail -n 1 )
pr -t -2 -w$(( $longest * 2 + 6 )) --output-tabs=' 1' $infile
↓
This is a bunch and separated by four spa
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
… re-reading your question, I wonder if you meant that you were going to literally specify the nth line to the program, in which case, neither of the above will work unless that line happens to be halfway down.

Thank you chatraed and BRPocock (and your colleague). Your answers helped me think up this solution, which answers my need.
function make_cols
{
file=$1 # input file
line=$2 # line to break at
pad=$(($3-1)) # spaces between cols - 1
len=$( wc -l < $file )
max=$(( $( wc -L < <(head -$(( line - 1 )) $file ) ) + $pad ))
SAVEIFS=$IFS;IFS=$(echo -en "\n\b")
paste -d" " <( for l in $( cat <(head -$(( line - 1 )) $file ) )
do
printf "%-""$max""s\n" $l
done ) \
<(tail -$(( len - line + 1 )) $file )
IFS=$SAVEIFS
}
make_cols tmp1 7 4

Could be optimized in many ways, but does its job as requested.
Input data (configurable):
file
num of rows borrowed from file for the first column
num of spaces between columns
format.sh:
#!/bin/bash
file=$1
if [[ ! -f $file ]]; then
echo "File not found!"
exit 1
fi
spaces_col1_col2=4
rows_col1=6
rows_col2=$(($(cat $file | wc -l) - $rows_col1))
IFS=$'\n'
ar1=($(head -$rows_col1 $file))
ar2=($(tail -$rows_col2 $file))
maxlen_col1=0
for i in "${ar1[#]}"; do
if [[ $maxlen_col1 -lt ${#i} ]]; then
maxlen_col1=${#i}
fi
done
maxlen_col1=$(($maxlen_col1+$spaces_col1_col2))
if [[ $rows_col1 -lt $rows_col2 ]]; then
rows=$rows_col2
else
rows=$rows_col1
fi
ar=()
for i in $(seq 0 $(($rows-1))); do
line=$(printf "%-${maxlen_col1}s\n" ${ar1[$i]})
line="$line${ar2[$i]}"
ar+=("$line")
done
printf '%s\n' "${ar[#]}"
Output:
$ > bash format.sh myfile
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
$ >

Related

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

As of now how i am writing the script is to count the number of lines for the 2 files.
Then i put it though condition if it is greater than the old.
However, i am not sure how to compare it based on percentage of the old files.
I there a better way to design the script.
#!/bin/bash
declare -i new=$(< "$(ls -t file name*.csv | head -n 1)" wc -l)
declare -i old=$(< "$(ls -t file name*.csv | head -n 2)" wc -l)
echo $new
echo $old
if [ $new -gt $old ];
then
echo "okay";
else
echo "fail";
If you need to check for x% max diff line, you can count the number of '<' lines in the diff output. Recall the the diff output will look like.
+ diff node001.html node002.html
2,3c2,3
< 4
< 7
---
> 2
> 3
So that code will look like:
old=$(wc -l < file1)
diff1=$(diff file1 file2 | grep -c '^<')
pct=$((diff1*100/(old-1)))
# Check Percent
if [ "$pct" -gt 60 ] ; then
...
fi

How to check that a file has more than 1 line in a BASH conditional?

I need to check if a file has more than 1 line. I tried this:
if [ `wc -l file.txt` -ge "2" ]
then
echo "This has more than 1 line."
fi
if [ `wc -l file.txt` >= 2 ]
then
echo "This has more than 1 line."
fi
These just report errors. How can I check if a file has more than 1 line in a BASH conditional?
The command:
wc -l file.txt
will generate output like:
42 file.txt
with wc helpfully telling you the file name as well. It does this in case you're checking out a lot of files at once and want individual as well as total stats:
pax> wc -l *.txt
973 list_of_people_i_must_kill_if_i_find_out_i_have_cancer.txt
2 major_acheivements_of_my_life.txt
975 total
You can stop wc from doing this by providing its data on standard input, so it doesn't know the file name:
if [[ $(wc -l <file.txt) -ge 2 ]]
The following transcript shows this in action:
pax> wc -l qq.c
26 qq.c
pax> wc -l <qq.c
26
As an aside, you'll notice I've also switched to using [[ ]] and $().
I prefer the former because it has less issues due to backward compatibility (mostly to do with with string splitting) and the latter because it's far easier to nest executables.
A pure bash (≥4) possibility using mapfile:
#!/bin/bash
mapfile -n 2 < file.txt
if ((${#MAPFILE[#]}>1)); then
echo "This file has more than 1 line."
fi
The mapfile builtin stores what it reads from stdin in an array (MAPFILE by default), one line per field. Using -n 2 makes it read at most two lines (for efficiency). After that, you only need to check whether the array MAPFILE has more that one field. This method is very efficient.
As a byproduct, the first line of the file is stored in ${MAPFILE[0]}, in case you need it. You'll find out that the trailing newline character is not trimmed. If you need to remove the trailing newline character, use the -t option:
mapfile -t -n 2 < file.txt
if [ `wc -l file.txt | awk '{print $1}'` -ge "2" ]
...
You should always check what each subcommand returns. Command wc -l file.txt returns output in the following format:
12 file.txt
You need first column - you can extract it with awk or cut or any other utility of your choice.
How about:
if read -r && read -r
then
echo "This has more than 1 line."
fi < file.txt
The -r flag is needed to ensure line continuation characters don't fold two lines into one, which would cause the following file to report one line only:
This is a file with _two_ lines, \
but will be seen as one.
change
if [ `wc -l file.txt` -ge "2" ]
to
if [ `cat file.tex | wc -l` -ge "2" ]
If you're dealing with large files, this awk command is much faster than using wc:
awk 'BEGIN{x=0}{if(NR>1){x=1;exit}}END{if(x>0){print FILENAME,"has more than one line"}else{print FILENAME,"has one or less lines"}}' file.txt

Reading a file in a shell script and selecting a section of the line

This is probably pretty basic, I want to read in a occurrence file.
Then the program should find all occurrences of "CallTilEdb" in the file Hendelse.logg:
CallTilEdb 8
CallCustomer 9
CallTilEdb 4
CustomerChk 10
CustomerChk 15
CallTilEdb 16
and sum up then right column. For this case it would be 8 + 4 + 16, so the output I would want would be 28.
I'm not sure how to do this, and this is as far as I have gotten with vistid.sh:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r line
do
if [ "$occurance" = $(cut -f1 line) ] #line 10
then
sumTime+=$(cut -f2 line)
fi
done < "$filename"
so the execution in terminal would be
vistid.sh CallTilEdb
but the error I get now is:
/home/user/bin/vistid.sh: line 10: [: unary operator expected
You have a nice approach, but maybe you could use awk to do the same thing... quite faster!
$ awk -v par="CallTilEdb" '$1==par {sum+=$2} END {print sum+0}' hendelse.logg
28
It may look a bit weird if you haven't used awk so far, but here is what it does:
-v par="CallTilEdb" provide an argument to awk, so that we can use par as a variable in the script. You could also do -v par="$1" if you want to use a variable provided to the script as parameter.
$1==par {sum+=$2} this means: if the first field is the same as the content of the variable par, then add the second column's value into the counter sum.
END {print sum+0} this means: once you are done from processing the file, print the content of sum. The +0 makes awk print 0 in case sum was not set... that is, if nothing was found.
In case you really want to make it with bash, you can use read with two parameters, so that you don't have to make use of cut to handle the values, together with some arithmetic operations to sum the values:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r name value # read both values with -r for safety
do
if [ "$occurance" == "$name" ]; then # string comparison
((sumTime+=$value)) # sum
fi
done < "$filename"
echo "sum: $sumTime"
So that it works like this:
$ ./vistid.sh CallTilEdb
sum: 28
$ ./vistid.sh CustomerChk
sum: 25
first of all you need to change the way you call cut:
$( echo $line | cut -f1 )
in line 10 you miss the evaluation:
if [ "$occurance" = $( echo $line | cut -f1 ) ]
you can then sum by doing:
sumTime=$[ $sumTime + $( echo $line | cut -f2 ) ]
But you can also use a different approach and put the line values in an array, the final script will look like:
#!/bin/bash
declare -t filename=prova
declare -t occurance="$1"
declare -i sumTime=0
while read -a line
do
if [ "$occurance" = ${line[0]} ]
then
sumTime=$[ $sumtime + ${line[1]} ]
fi
done < "$filename"
echo $sumTime
For the reference,
id="CallTilEdb"
file="Hendelse.logg"
sum=$(echo "0 $(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1 +/p" < "$file") p" | dc)
echo SUM: $sum
prints
SUM: 28
the sed extract numbers from a lines containing the given id, such CallTilEdb
and prints them in the format number +
the echo prepares a string such 0 8 + 16 + 4 + p what is calculation in RPN format
the dc do the calculation
another variant:
sum=$(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1/p" < "$file" | paste -sd+ - | bc)
#or
sum=$(grep -oP "^$id\D*\K\d+" < "$file" | paste -sd+ - | bc)
the sed (or the grep) extracts and prints only the numbers
the paste make a string like number + number + number (-d+ is a delimiter)
the bc do the calculation
or perl
sum=$(perl -slanE '$s+=$F[1] if /^$id/}{say $s' -- -id="$id" "$file")
sum=$(ID="CallTilEdb" perl -lanE '$s+=$F[1] if /^$ENV{ID}/}{say $s' "$file")
Awk translation to script:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
sumtime=$(awk -v entry=$occurance '
$1==entry{time+=$NF+0}
END{print time+0}' $filename)

replacing multiple lines in shell script with only one output file

I have one file Length.txt having multiples names (40) line by line.
I want to write a small shell script where it will count the character count of each line of the file and if the count is less than 9 replace those lines with adding extra 8 spaces and 1 at the end of each line.
For example, if the name is
XXXXXX
replace as
XXXXXX 1
I tried with the below coding. It is working for me, however whenever it's replacing the line it is displaying all the lines at a time.
So suppose I have 40 lines in Length.txt and out of that 4 lines having less than 9 character count then my output has 160 lines.
Can anyone help me to display only 40 line output with the 4 changed lines?
#!/usr/bin/sh
#set -x
while read line;
do
count=`echo $line|wc -m`
if [ $count -lt 9 ]
then
Number=`sed -n "/$line/=" Length.txt`;
sed -e ""$Number"s/$line/$line 1/" Length4.txt
fi
done < Length.txt
A single sed command can do that:
sed -E 's/^.{,8}$/& 1/' file
To modify the contents of the file add -i:
sed -iE 's/^.{,8}$/& 1/' file
Partial output:
94605320 1
105018263
2475218231
7728563 1
1
* Fixed to include only add 8 spaces not 9, and include empty lines. If you don't want to process empty lines, use {1,8}.
$ cat foo.input
I am longer than 9 characters
I am also longer than 9 characters
I am not
Another long line
short
$ while read line; do printf "$line"; (( ${#line} < 9 )) && printf " 1"; echo; done < foo.input
I am longer than 9 characters
I am also longer than 9 characters
I am not 1
Another long line
short 1
Let me show you what is wrong with your script. The only thing missing from your script is that you need to use sed -i to edit file and re-save it after making the replacement.
I'm assuming Length4.txt is just a copy of Length.txt?
I added sed -i to your script and it should work now:
cp Length.txt Length4.txt
while read line;
do
count=`echo $line|wc -m`
if [ $count -lt 9 ]
then
Number=`sed -n "/$line/=" Length.txt`
sed -ie ""$Number"s/$line/$line 1/" Length4.txt
fi
done < Length.txt
However, you don't need sed or wc. You can simplify your script as follows:
while IFS= read -r line
do
count=${#line}
if (( count < 9 ))
then
echo "$line 1"
else
echo "$line"
fi
done < Length.txt > Length4.txt
$ awk -v FS= 'NF<9{$0=sprintf("%s%*s1",$0,8,"")} 1' file
XXXXXX 1
Note how simple it would be to check for a number other than 9 characters and to print some sequence of blanks other than 8.

How to delete all lines containing more than three characters in the second column of a CSV file?

How can I delete all of the lines in a CSV file which contain more than 3 characters in the second column? E.g.:
cave,ape,1
tree,monkey,2
The second line contains more than 3 characters in the second column, so it will be deleted.
awk -F, 'length($2)<=3' input.txt
You can use this command:
grep -vE "^[^,]+,[^,]{4,}," test.csv > filtered.csv
Breakdown of the grep syntax:
-v = remove lines matching
-E = extended regular expression syntax (also -P is perl syntax)
bash stuff:
> filename = overwrite/create a file and fill it with the standard out
Breakdown of the regex syntax:
"^[^,]+,[^,]{4,},"
^ = beginning of line
[^,] = anything except commas
[^,]+ = 1 or more of anything except commas
, = comma
[^,]{4,} = 4 or more of anything except commas
And please note that the above is simplified and would not work if the first 2 columns contained commas in the data. (it does not know the difference between escaped commas and raw ones)
No one has supplied a sed answer yet, so here it is:
sed -e '/^[^,]*,[^,]\{4\}/d' animal.csv
And here's some test data.
>animal.csv cat <<'.'
cave,ape,0
,cat,1
,orangutan,2
large,wolf,3
,dog,4,happy
tree,monkey,5,sad
.
And now to test:
sed -i'' -e '/^[^,]*,[^,]\{4\}/d' animal.csv
cat animal.csv
Only ape, cat and dog should appear in the output.
This is a filter script for your type of data. It assumes your data is in utf8
#!/bin/bash
function px {
local a="$#"
local i=0
while [ $i -lt ${#a} ]
do
printf \\x${a:$i:2}
i=$(($i+2))
done
}
(iconv -f UTF8 -t UTF16 | od -x | cut -b 9- | xargs -n 1) |
if read utf16header
then
px $utf16header
cnt=0
out=''
st=0
while read line
do
if [ "$st" -eq 1 ] ; then
cnt=$(($cnt+1))
fi
if [ "$line" == "002c" ] ; then
st=$(($st+1))
fi
if [ "$line" == "000a" ]
then
out=$out$line
if [[ $cnt -le 3+1 ]] ; then
px $out
fi
cnt=0
out=''
st=0
else
out=$out$line
fi
done
fi | iconv -f UTF16 -t UTF8

Resources