merge two files having the same value in bash - bash

I am trying to merge 2 files in one single.
FILE1
2015-09-30T13:30:57+01:00 6 1
2015-09-30T13:30:58+01:00 6 1
2015-09-30T13:30:59+01:00 6 1
2015-09-30T13:31:00+01:00 6 1
2015-09-30T13:31:01+01:00 6 1
2015-09-30T13:31:02+01:00 6 1
2015-09-30T13:31:04+01:00 6 1
FILE2
2015-09-30T13:16:19+01:00 4
2015-09-30T13:16:20+01:00 7
2015-09-30T13:16:21+01:00 7
2015-09-30T13:16:22+01:00 8
2015-09-30T13:16:23+01:00 8
2015-09-30T13:16:24+01:00 7
2015-09-30T13:16:25+01:00 2
2015-09-30T13:16:26+01:00 4
2015-09-30T13:16:27+01:00 1
2015-09-30T13:30:58+01:00 1
The result that I am trying to get is to add the column 2 from FILE2 being added to FILE1 as fourth columns as the time match:
2015-09-30T13:30:57+01:00 6 1 4
2015-09-30T13:16:23+01:00 8 3 1
Thank you for your help,
Al.

Use cut to find the first column and nested while loop to compare the firsts columns:
#!/usr/bin/bash
printf "" > FILE3
while read line1; do
file1_first_col=$(printf "${line1}" | cut -f1 -d' ')
printf "${line1}" >> FILE3
while read line2; do
file2_first_col=$(printf "${line2}"| cut -f1 -d' ')
if [[ "${file1_first_col}" == "${file2_first_col}" ]]; then
file2_second_col=$(printf "${line2}" | cut -f2 -d' ')
printf " ${file2_second_col}" >> FILE3
fi
done < FILE2
printf "\n" >> FILE3
done < FILE1
Then print the result to a file called FILE3.
NOTE that for large files this may be very slow.

Related

Read content of file and put particular portion of content in separate files using bash

I would like to get specific file contains from single file and put into separate files via bash. I have tried getting test1 file contain using below code and able to get it but i'm failed when getting everything in respected files.
Tried code:
reportFile=/report.txt
test1File=/test1.txt
test2File=/test2.txt
test3File=/test3.txt
totalLineNo=`cat ${reportFile} | wc -l`
test1LineNo=`grep -n "Test1 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test2LineNo=`grep -n "Test2 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test3LineNo=`grep -n "Test3 file content :" ${reportFile} | grep -Eo '^[^:]+'`
exactTest1LineNo=`echo $(( ${test1LineNo} - 1 ))`
exactTest2LineNo=`echo $(( ${test2LineNo} -1 ))`
exactTest3LineNo=`echo $(( ${test3LineNo} -1 ))`
test1Content=`cat ${reportFile} | head -n ${exactTest1LineNo}`
test3Content=`cat ${reportFile} | tail -n ${exactTest3LineNo}`
echo -e "${test1Content}\r" >> ${test1File}
echo -e "${test3Content}\r" >> ${test3File}
report.txt:
-------------------------------------
My Report:
Test1 file content:
1
2
3
4
5
6
Test2 file content:
7
8
9
10
Test3 file content:
11
12
13
14
15
Note: Find my report above.
-------------------------------------
test1.txt (expected):
1
2
3
4
5
6
test2.txt (expected):
7
8
9
10
test3.txt (expected):
11
12
13
14
15
With single awk command:
awk '/^Test[0-9] file content:/{ f=1; fn=tolower($1)".txt"; next }
f && NF{ print > fn }!NF{ f=0 }' report.txt
Viewing results:
$ head test[0-9].txt
==> test1.txt <==
1
2
3
4
5
6
==> test2.txt <==
7
8
9
10
==> test3.txt <==
11
12
13
14
15
If I understand you correctly: you have a long file report.txt and you want to extract short files from it. The name of each file is followed by the string " file content:" in the file report.txt.
This is my solution:
#!/bin/bash
reportFile=report.txt
Files=`grep 'file content' $reportFile | sed 's/ .*$//'`
for F in $Files ; do
f=${F,}.txt # first letter lowercase and append .txt
awk "/$F file content/,/^\$/ {print}" $reportFile |
tail -n +2 | # remove first line with "Test* file content:"
head -n -1 > $f # remove last blank line
done

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Writing shell script to print a certain number of lines with certain arguments

I have 5 variables and each variables contains five values.I want to print five lines with the five values from five variables one by one
For example
$a=1 2 3 4 5
$b=4 2 3 4 5
$c=8 9 7 6 5
$d= 8 7 6 5 4
$e=5 6 7 3 3
I want to print five lines in this format
My options was a=1,b=4,c=8,d=8and e=5
My options was a=2,b=2,c=9,d=7 and e=6
and so on upto five values.
I got confused in using the loops.Can anyone help me to provide loops in script to obtain the following output.
a="1 2 3 4 5"
b="4 2 3 4 5"
c="8 9 7 6 5"
d="8 7 6 5 4"
e="5 6 7 3 3"
for i in $(seq 1 5); do
echo -e "My options was \c"
echo -e "a=$(echo $a | cut -f$i -d' ')\c"
echo -e "b=$(echo $b | cut -f$i -d' ')\c"
echo -e "c=$(echo $c | cut -f$i -d' ')\c"
echo -e "d=$(echo $d | cut -f$i -d' ') and \c"
echo -e "e=$(echo $e | cut -f$i -d' ')"
done
Using this awk command with a bash loop:
for i in {1..5}; do
awk '{printf "My options was a=%d, b=%d, c=%d, d=%d and e=%d\n", $1, $2, $3, $4, $5}' <<< $(awk '{print $'$i'}' <(echo -e "$a\n$b\n$c\n$d\n$e") | tr $'\n' ' '); done
Output:
$ a='1 2 3 4 5'
$ b='4 2 3 4 5'
$ c='8 9 7 6 5'
$ d='8 7 6 5 4'
$ e='5 6 7 3 3'
$ for i in {1..5}; do
awk '{printf "My options was a=%d, b=%d, c=%d, d=%d and e=%d\n", $1, $2, $3, $4, $5}' <<< $(awk '{print $'$i'}' <(echo -e "$a\n$b\n$c\n$d\n$e") | tr $'\n' ' '); done
My options was a=1, b=4, c=8, d=8 and e=5
My options was a=2, b=2, c=9, d=7 and e=6
My options was a=3, b=3, c=7, d=6 and e=7
My options was a=4, b=4, c=6, d=5 and e=3
My options was a=5, b=5, c=5, d=4 and e=3
If you transpose the matrix, this is really simple, portable, and idiomatic.
while read -r a b c d e; do
: stuff with "$a", "$b", etc
done <<____
1 4 8 8 5
2 2 9 7 6
3 3 7 6 7
4 4 6 5 3
5 5 5 4 3
____
Notice how the first column enumerates the a values, the second, the bs, etc.

Add column to csv file

I have two files and I need catch the last column of a file and append to other file.
file1
1 2 3
1 2 3
1 2 3
file2
5 5
5 5
5 5
Initial proposal
#!/usr/bin/env bash
column=$(awk '{print $(NF)}' $file1)
paste -d',' $file2 < $column
Expected result
file2
5 5 3
5 5 3
5 5 3
But, This script does not work yet
OBS: I do not know how many columns have in the file. I need more generic solution.
You can use this paste command:
paste -d " " file2 <(awk '{print $NF}' file1)
5 5 3
5 5 3
5 5 3
To append last column of file1 to file2:
paste -d " " file2 <(rev file1 | cut -d " " -f 1 | rev)
Output:
5 5 3
5 5 3
5 5 3
To paste the second column of file 1 to file 2:
while read line; do
read -u 3 c1 c2 c3;
echo $line $c2;
done < file2 3< file1
You can use Perl too:
$ paste -d ' ' file2.txt <(perl -lne 'print $1 if m/(\S+)\s*$/' file1.txt)
5 5 3
5 5 3
5 5 3
Or grep:
$ paste -d ' ' file2.txt <(grep -Eo '(\S+)\s*$' file1.txt)
5 5 3
5 5 3
5 5 3

Search replace string in a file based on column in other file

If we have the first file like below:
(a.txt)
1 asm
2 assert
3 bio
4 Bootasm
5 bootmain
6 buf
7 cat
8 console
9 defs
10 echo
and the second like:
(b.txt)
bio cat BIO bootasm
bio defs cat
Bio console
bio BiO
bIo assert
bootasm asm
bootasm echo
bootasm console
bootmain buf
bootmain bio
bootmain bootmain
bootmain defs
cat cat
cat assert
cat assert
and we want the output will be like this:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.
Here the command i use
awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt
3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert
how should i do to the other columns ?
Thankyou
awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt
NR is the global record number (line number default) across all files. FNR is the line number for the current file. The NR==FNR block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.
First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i] actually replaces the word at the ith column with its id. Finally the 1 at the end of the scripts causes all lines to be printed out.
Produces:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
To make the script case-insensitive, add tolower calls into the array indices:
awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt
divide and conquer!, a bit archaic but does the job =)
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$1];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 1
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$2];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 2
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$3];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 3
awk 'NR==FNR{a[$2]=$0;next}{$1=a[$4];}1' a.txt b.txt | tr ' ' ',' | awk '{ print $1 }' FS="," > 4
paste 1 2 3 4 | tr '\t' ' '
gives:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
in this case I just changed the number of columns and paste the results together with a bit of edition in between.
{
cat a.txt; echo "--EndA--";cat b.txt
} | sed -n '1 h
1 !H
$ {
x
: loop
s/^ *\([[:digit:]]\{1,\}\) *\([^[:cntrl:]]*\)\(\n\)\(.*\)\2/\1 \2\3\4\1/
t loop
s/^ *[[:digit:]]\{1,\} *[^[:cntrl:]]*\n//
t loop
s/^[[:space:]]*--EndA--\n//
p
}
'
"--EndA--" could be something else if chance that it will present in one of the file (a.txt mainly)

Resources