Read content of file and put particular portion of content in separate files using bash - bash

I would like to get specific file contains from single file and put into separate files via bash. I have tried getting test1 file contain using below code and able to get it but i'm failed when getting everything in respected files.
Tried code:
reportFile=/report.txt
test1File=/test1.txt
test2File=/test2.txt
test3File=/test3.txt
totalLineNo=`cat ${reportFile} | wc -l`
test1LineNo=`grep -n "Test1 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test2LineNo=`grep -n "Test2 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test3LineNo=`grep -n "Test3 file content :" ${reportFile} | grep -Eo '^[^:]+'`
exactTest1LineNo=`echo $(( ${test1LineNo} - 1 ))`
exactTest2LineNo=`echo $(( ${test2LineNo} -1 ))`
exactTest3LineNo=`echo $(( ${test3LineNo} -1 ))`
test1Content=`cat ${reportFile} | head -n ${exactTest1LineNo}`
test3Content=`cat ${reportFile} | tail -n ${exactTest3LineNo}`
echo -e "${test1Content}\r" >> ${test1File}
echo -e "${test3Content}\r" >> ${test3File}
report.txt:
-------------------------------------
My Report:
Test1 file content:
1
2
3
4
5
6
Test2 file content:
7
8
9
10
Test3 file content:
11
12
13
14
15
Note: Find my report above.
-------------------------------------
test1.txt (expected):
1
2
3
4
5
6
test2.txt (expected):
7
8
9
10
test3.txt (expected):
11
12
13
14
15

With single awk command:
awk '/^Test[0-9] file content:/{ f=1; fn=tolower($1)".txt"; next }
f && NF{ print > fn }!NF{ f=0 }' report.txt
Viewing results:
$ head test[0-9].txt
==> test1.txt <==
1
2
3
4
5
6
==> test2.txt <==
7
8
9
10
==> test3.txt <==
11
12
13
14
15

If I understand you correctly: you have a long file report.txt and you want to extract short files from it. The name of each file is followed by the string " file content:" in the file report.txt.
This is my solution:
#!/bin/bash
reportFile=report.txt
Files=`grep 'file content' $reportFile | sed 's/ .*$//'`
for F in $Files ; do
f=${F,}.txt # first letter lowercase and append .txt
awk "/$F file content/,/^\$/ {print}" $reportFile |
tail -n +2 | # remove first line with "Test* file content:"
head -n -1 > $f # remove last blank line
done

Related

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Dividing one file into separate based on line numbers

I have the following test file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:
file 1:
1
2
3
4
5
file2:
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20
That would make 4 files with 5 lines and 1 file with 4 lines.
As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk command in the if statement, instead of printing the first 5 lines, it prints the whole 20?
d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
awk 'NR<=$start' test
fi
done
Thanks!
Why not just use the split util available from your POSIX toolkit. It has an option to split on number of lines which you can give it as 5
split -l 5 input-file
From the man split page,
-l, --lines=NUMBER
put NUMBER lines/records per output file
Note that, -l is POSIX compliant also.
$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1 file2 file3 file4 file5
.
$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20
If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
oh. and wrt why your awk command awk 'NR<=$start' test didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart with the value of a shell variable named start and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test. The awk variable can also be named start or anything else sensible - it is completely unrelated to the name of the shell variable.
You could improve your code by removing the unneccesary echo cut and bc and do it like this
#!/bin/bash
for i in $(seq $(wc -l < test) ); do
(( i % 4 != 1 )) && continue
tail +$i test | head -5 > "file$(( 1+i/4 ))"
done
But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk will give you that (and a lot of other things) for free.
Use split:
$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20
Or, if you have a file:
$ split -l test_file

merge two files having the same value in bash

I am trying to merge 2 files in one single.
FILE1
2015-09-30T13:30:57+01:00 6 1
2015-09-30T13:30:58+01:00 6 1
2015-09-30T13:30:59+01:00 6 1
2015-09-30T13:31:00+01:00 6 1
2015-09-30T13:31:01+01:00 6 1
2015-09-30T13:31:02+01:00 6 1
2015-09-30T13:31:04+01:00 6 1
FILE2
2015-09-30T13:16:19+01:00 4
2015-09-30T13:16:20+01:00 7
2015-09-30T13:16:21+01:00 7
2015-09-30T13:16:22+01:00 8
2015-09-30T13:16:23+01:00 8
2015-09-30T13:16:24+01:00 7
2015-09-30T13:16:25+01:00 2
2015-09-30T13:16:26+01:00 4
2015-09-30T13:16:27+01:00 1
2015-09-30T13:30:58+01:00 1
The result that I am trying to get is to add the column 2 from FILE2 being added to FILE1 as fourth columns as the time match:
2015-09-30T13:30:57+01:00 6 1 4
2015-09-30T13:16:23+01:00 8 3 1
Thank you for your help,
Al.
Use cut to find the first column and nested while loop to compare the firsts columns:
#!/usr/bin/bash
printf "" > FILE3
while read line1; do
file1_first_col=$(printf "${line1}" | cut -f1 -d' ')
printf "${line1}" >> FILE3
while read line2; do
file2_first_col=$(printf "${line2}"| cut -f1 -d' ')
if [[ "${file1_first_col}" == "${file2_first_col}" ]]; then
file2_second_col=$(printf "${line2}" | cut -f2 -d' ')
printf " ${file2_second_col}" >> FILE3
fi
done < FILE2
printf "\n" >> FILE3
done < FILE1
Then print the result to a file called FILE3.
NOTE that for large files this may be very slow.

Add column to csv file

I have two files and I need catch the last column of a file and append to other file.
file1
1 2 3
1 2 3
1 2 3
file2
5 5
5 5
5 5
Initial proposal
#!/usr/bin/env bash
column=$(awk '{print $(NF)}' $file1)
paste -d',' $file2 < $column
Expected result
file2
5 5 3
5 5 3
5 5 3
But, This script does not work yet
OBS: I do not know how many columns have in the file. I need more generic solution.
You can use this paste command:
paste -d " " file2 <(awk '{print $NF}' file1)
5 5 3
5 5 3
5 5 3
To append last column of file1 to file2:
paste -d " " file2 <(rev file1 | cut -d " " -f 1 | rev)
Output:
5 5 3
5 5 3
5 5 3
To paste the second column of file 1 to file 2:
while read line; do
read -u 3 c1 c2 c3;
echo $line $c2;
done < file2 3< file1
You can use Perl too:
$ paste -d ' ' file2.txt <(perl -lne 'print $1 if m/(\S+)\s*$/' file1.txt)
5 5 3
5 5 3
5 5 3
Or grep:
$ paste -d ' ' file2.txt <(grep -Eo '(\S+)\s*$' file1.txt)
5 5 3
5 5 3
5 5 3

how to replace [10-15] to 10 11 12..15 in BASH

I have a file/string containing the following:
[1-9]
[11-12]
[10-15]
I then want to expand that to become this:
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
I know how to do it in a very long way (first capture the two numbers and then expand them using a for loop).
I would like to know if there is a faster/smarter way of achieve the same.
One way:(Pure bash solution)
while IFS=- read l1 l2
do
eval echo ${l1/[/{}".."${l2/]/}}
done < file
There are several solutions.
Solution 1:
sed 's/^/echo /; s/[[]/{/; s/]/}/; s/-/../' | bash
Example:
$ cat 1.txt | sed 's/^/echo /; s/[[]/{/; s/]/}/; s/-/../' | bash
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
Solution 2:
tr '[]-' ' ' | sed "s/^/seq -s' '/" | bash
Example:
$ cat 1.txt | tr '[]-' ' ' | sed "s/^/seq -s' '/" | bash
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
If you're confident that your input all matches that pattern:
while read a; do
seq -s' ' $(echo "$a" | tr '[]-' ' ')
done
Add error checking as appropriate.
Here's a one-liner:
cat lines | sed -E -e 's/\[|]//g' -e 's/-/ /g' | xargs -n 2 seq -s ' ' -t '\n'
As in:
$ cat <<EOF | sed -E -e 's/\[|]//g' -e 's/-/ /g' | xargs -n 2 seq -s ' ' -t '\n'
> [1-9]
> [11-12]
> [10-15]
> EOF
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15

Resources