How to add multi column data using Bash - bash

I have a bash file say input.dat which looks like following.
1 2 4 6
2 3 6 9
3 4 8 12
I want the data in 2nd, 3rd and 4th column to be added and printed in output.dat file like following
1 12
2 18
3 24
How can this be achieved in bash ?

Using awk you can do this:
awk '{print $1, $2+$3+$4}' input.dat
and if you prefer bash it can be done like this (at least if the numbers are integers): bash sum.sh < input.dat and sum.sh is
sum.sh
while read -r v1 v2 v3 v4;
do
echo $v1 $(( v2 + v3 + v4 ))
done

Related

Cannot print in awk command in bash script

I am trying to read values from a file and print specific items into a variable which I will use later.
cat /dir1/file1 | while read blmbline2
do
BLMBFILE2=`print $blmbline2 | awk '{$1=""; print $0}'`
echo $BLMBFILE2
done
When I run that same code at the command line, it runs as expected, but, when I run it in a bash script called testme.sh, I get this error:
./testme.sh: line 3: print: command not found
If I run print by itself at the command prompt, I don't get an error (just a blank line).
If I run "bash" and then print at the command prompt, I get command not found.
I can't figure out what I'm doing wrong. Can someone suggest?
updated: I see some other posts that say to use echo or printf? Is there a difference I need to be concerned with in using one of those in bash?
Since awk can read files, you may be able to do away with the cat | while read and just use awk. Using a sample file containing:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
Declare your bash array variable and populate with the output from awk:
arr=() ; arr=($(awk '{$1=""; print $0}' /dir1/file1))
Use the following to display array size and contents:
printf "array length: %d\narray contents: %s\n" "${#arr[#]}" "${arr[*]}"
Output:
array length: 30
array contents: 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6
Change print to echo in your shell script. With printf you can format the data and with echo it will print the entire line of the file. Also, create an array so you can store multiple items:
BLMBFILE2=()
while IFS= read -r -d $'\0'
do
BLMBFILE2+=(`echo $REPLY | awk '{$1=""; print $0}'`)
echo $BLMBFILE2
done < <(cat /dir1/file1)
echo "Items found:"
for value in "${BLMBFILE2[#]}"
do
echo $value
done

select columns from the files and save the output

i am new to the programming.I have many files in a directory as shown below: and each file consists of two column data.
TS.TST_X.1990-11-22
TS.TST_Y.1990-11-22
TS.TST_Z.1990-11-22
TS.TST_X.1990-12-30
TS.TST_Y.1990-12-30
TS.TST_Z.1990-12-30
At first i want to choose only second columns of all files having same name( only difference in X,Y,Z strings)(TS.TST_X.1990-11-22,TS.TST_Y.1990-11-22,TS.TST_Z.1990-11-22) and want to save the output In a file like TSTST19901112
Similarly for (TS.TST_X.1990-12-30,TS.TST_Y.1990-12-30,TS.TST_Z.1990-12-30 )files also and want to save the output like TSTST19901230
For example:
if files contains like as below
TS.TST_X.1990-11-22 TS.TST_Y.1990-11-22 TS.TST_Z.1990-11-22
1 2 1 3.4 1 2.1
2 5 2 2.4 2 4.2
3 2 3 1.2 3 1.0
4 4 4 2.4 4 3.5
5 8 5 6.3 5 1.8
Then output file TSTST19901122 would be like
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
i tried the code
#!/bin/sh
for file in /home/min/data/*
do
awk '{print $2}' $file
done
But my written code only reads the column of all files doesn't give expected output.So here i need experts help.
Hope below example help you to start with, next time when you post in SO make sure you post input properly so that it will be easy for readers to help you:
Here is online : DEMO
[akshay#db1 tmp]$ cat test.sh
#!/usr/bin/env bash
# use sort and uniq where field sep being dot,
# we get unique date
while IFS= read -r f; do
# creates veriable like TS.TST_*.1990-11-22
i=$(sed 's/_[^.]/_*/' <<<"$f");
# modify outfile if you want any extension suffix etc
outfile=$(sed 's/[^[:alnum:]]//g' <<<"$i")".txt";
# filename expansion with unquoted variable
# finally use awk to print whatever you want
paste $i | awk 'NR>1{for(i=2; i<=NF; i+=2)printf "%s%s", $(i), (i<NF ? OFS : ORS)}' >"$outfile"
done < <(printf '%s\n' TS.TST* | sort -t'.' -u -nk3)
[akshay#db1 tmp]$ bash test.sh
[akshay#db1 tmp]$ cat TSTST19901122.txt
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
Input:
[akshay#db1 tmp]$ ls TS.TST* -1
TS.TST_X.1990-11-22
TS.TST_Y.1990-11-22
TS.TST_Z.1990-11-22
[akshay#db1 tmp]$ for i in TS.TST*; do cat "$i"; done
TS.TST_X.1990-11-22
1 2
2 5
3 2
4 4
5 8
TS.TST_Y.1990-11-22
1 3.4
2 2.4
3 1.2
4 2.4
5 6.3
TS.TST_Z.1990-11-22
1 2.1
2 4.2
3 1.0
4 3.5
5 1.8
EDIT: Since OP has mentioned in comments that actual file names are little different so adding solution as per that here(since as per OP only 3 type of files with different year and month are there)..
for file in TS.TST_BHE*
do
year=${file/*\./}
year=${year//-/}
yfile=${file/BHE/BHN}
zfile=${file/BHE/BHZ}
outfile="TSTST.$year"
##echo $file $yfile $zfile
paste "$file" "$yfile" "$zfile" | awk '{print $2,$4,$6}' > "$outfile"
done
Explanation: Adding detailed explanation for above.
for file in TS.TST_BHE*
##Going through TS.TST_BHE named files in for loop here, where variable file will have its name in it.
do
year=${file/*\./}
##Creating year where removing everything till . here.
year=${year//-/}
##Substituting all - with null in year variable.
yfile=${file/BHE/BHN}
##Substituting BHE with BHN in file variable and saving it to yfile here.
zfile=${file/BHE/BHZ}
##Substituting BHE with BHZ in file variable and saving it to zfile here.
outfile="TSTST.$year"
##Creating outfile which has TSTST. with year variable value here.
##echo $file $yfile $zfile
paste "$file" "$yfile" "$zfile" | awk '{print $2,$4,$6}' > "$outfile"
##using paste to contenate values of 3 of the files(BHE BHN and BHZ) and printing only 2nd, 4th and 6th fields out of it.
done
Could you please try following, based on comment of OP that we could simply concatenate Input_files without checking 1st column's value.
for file in TS.TST_X*
do
year=${file/*\./}
year=${year//-/}
yfile=${file/X/Y}
zfile=${file/X/Z}
outfile="TSTST.$year"
###echo $file $yfile $zfile ##Just to print variable values(optional)
paste "$file" "$yfile" "$zfile" | awk '{print $2,$4,$6}' > "$outfile"
done
For showing samples output will be as follows, above will generate file name d TS.TST_X.19901122 for shown samples.
cat TSTST.19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
The following recreation of input files:
cat <<EOF >TS.TST_X.2000-11-22
1 2
2 5
3 2
4 4
5 8
EOF
cat <<EOF >TS.TST_Y.2000-11-22
1 3.4
2 2.4
3 1.2
4 2.4
5 6.3
EOF
cat <<EOF >TS.TST_Z.2000-11-22
1 2.1
2 4.2
3 1.0
4 3.5
5 1.8
EOF
cat <<EOF >TS.TST_X.1990-11-22
1 2
2 5
3 2
4 4
5 8
EOF
cat <<EOF >TS.TST_Y.1990-11-22
1 3.4
2 2.4
3 1.2
4 2.4
5 6.3
EOF
cat <<EOF >TS.TST_Z.1990-11-22
1 2.1
2 4.2
3 1.0
4 3.5
5 1.8
EOF
When run with the following script on repl:
# get the filenames
find . -maxdepth 1 -name "TS.TST*" -printf "%f\n" |
# meh, sort them, so it looks nice
sort |
# group files according to suffix after the dot
awk -F. '
{ a[$3]=a[$3]" "$0 }
END{ for (i in a) print i, a[i] }
' |
# here we have: YYYY-MM-DD filename1 filename2 filename3
# let's transform it into TSTSTYYYYMMDD filename{1,2,3}
sed -E 's/^([0-9]{4})-([0-9]{2})-([0-9]{2})/TSTST\1\2\3/' |
while IFS=' ' read -r new f1 f2 f3; do
# get second column from all files
# if your awk doesn't sort files, they would have to be sorted here
paste "$f1" "$f2" "$f3" | awk '{print $2,$4,$6}' > "$new"
done
# just output
for i in TSTST*; do echo "$i"; cat "$i"; done
Generates the following output:
TSTST19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
TSTST20001122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
I would advise to do research on basic shell commands. Read documentation about find. Read an introduction into awk and sed scripting. Read a good introduction into bash, get to know how to iterate, sort, merge and filter list of files in bash. And also read how to read a stream line by line.

How to print 1-10,11-20 and so on number of rows of a file in loop using shell? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a file consisting of 4000 rows, I need to iterate the records of that file over shell script and extract first 10 rows and send that rows to my java code which i already wrote, and then next 10 rows and so on
To pass 10 lines at a time as arguments to your script:
< file xargs -d$'\n' -n 10 myscript
To pipe 10 lines at a time as input to your script:
< file xargs -d$'\n' -n 10 sh -c 'printf "%s\n" "$#" | myscript' {}
Assuming your input is in a file named file which I'm creating with 30 instead of 4000 lines of input:
$ seq 30 > file
and modifying to have some lines that contain spaces, some that contain shell variables, and some that contain regexp and globbing chars to show no type of shell expansion is being done:
$ head -10 file
1
here is a multi-field line
3
4
$HOME
6
.*
8
9
10
Here's 10 args at a time being passed to an awk script:
$ < file xargs -d$'\n' -n 10 awk 'BEGIN{for (i=1; i<ARGC; i++) print i, "<" ARGV[i] ">"; exit} END{print "---"}'
1 <1>
2 <here is a multi-field line>
3 <3>
4 <4>
5 <$HOME>
6 <6>
7 <.*>
8 <8>
9 <9>
10 <10>
---
1 <11>
2 <12>
3 <13>
4 <14>
5 <15>
6 <16>
7 <17>
8 <18>
9 <19>
10 <20>
---
1 <21>
2 <22>
3 <23>
4 <24>
5 <25>
6 <26>
7 <27>
8 <28>
9 <29>
10 <30>
---
and here's 10 lines of input at a time being passed to an awk script:
$ < file xargs -d$'\n' -n 10 sh -c 'printf "%s\n" "$#" | awk '\''{print NR, "<" $0 ">"} END{print "---"}'\''' {}
1 <1>
2 <here is a multi-field line>
3 <3>
4 <4>
5 <$HOME>
6 <6>
7 <.*>
8 <8>
9 <9>
10 <10>
---
1 <11>
2 <12>
3 <13>
4 <14>
5 <15>
6 <16>
7 <17>
8 <18>
9 <19>
10 <20>
---
1 <21>
2 <22>
3 <23>
4 <24>
5 <25>
6 <26>
7 <27>
8 <28>
9 <29>
10 <30>
---
Considering that OP wants to pass lines as an argument to OP's code if that is the case then could you please try following once(haven't tested it by running it since I don't have OP's java code etc).
awk '
FNR%10==0{
system("your_java_code " value OFS $0)
value=""
}
{
value=(value?value OFS:"")$0
}
END{
if(value){
system("your_java_code " value)
}
}
' Input_file
OR
awk '
{
value=(value?value OFS:"")$0
}
FNR%10==0{
system("your_java_code " value)
value=""
}
END{
if(value){
system("your_java_code " value)
}
}
' Input_file
PS: Just for safer side, I kept END section of awk code so that in case there are left over lines(let's say total number of lines are NOT completely divided by 10) then it will call java program with remaining lines to it.
This might work for you (GNU parallel):
parallel -kN10 javaProgram :::: file
This will pass the lines 1-10, 11-20, ... as arguments to program javaProgram
If you want to pass 10 lines at time, use:
parallel -kN10 --cat javaProgram :::: file
Sounds to me like you want to slice out rows from a file, then pipe those rows to java. This interpretation differs from the other answers, so let me know if I'm not understanding you:
$ file=/etc/services
$ count=$(wc -l < "${file}")
$ start=1
$ stride=10
$ for ((i=start; i<=count; i+=stride)); do
awk -v i="${i}" -v stride="${stride}" \
'NR > (i+stride) { exit } NR >= i && NR < (i + stride)' "${file}" \
| java ...
done
file holds the path to the data rows. count is the total count of rows in that file. start is the first row, stride is how many you want to slice out in each iteration.
The for loop then performs the stride addition, while awk slices out the rows so numbered. We pipe them to the java program on standard in.
Assuming that you are passing the 10 lines groups from your file to your script as command line arguments, this is an answer:
rows=4000 # the number of rows in file
groupsize=10 # the size of lines groups
OIFS="$IFS"; IFS=$'\n' # use newline as input field separator to avoid `for` splitting on spaces
groups=$(($rows / $groupsize)) # the number of groups of lines
for i in $(seq 1 $groups); do # loop through each group of lines
from=$((($i * $groupsize) - $groupsize + 1))
to=$(($i * $groupsize))
# build the arguments for each script invocation by concatenating each group of lines
for line in `sed -n -e ${from},${to}p file`; do # 'file' is your input file name
arguments=$arguments \"$line\"
done
echo script $arguments # remove echo and change 'script' with your script name
done
IFS="$OIFS" # restore original input field separator
Like this :
for ((i=0; i<=4000; i+=10)); do
arr=( ) # create a new empty array
for ((j=$i; j<=i+10; j++)); do
arr+=( $j ) # add id to array
done
printf '%s\n' "${arr[#]}" # or execute command with all the id
done

Replace a value in a file by another one (bash/awk)

I have a file (a coordinates file for those who know what it is) like following :
1 C 1
2 C 1 1 1.60000
3 H 5 1 1.10000 2 109.4700
4 H 5 1 1.10000 2 109.4700 3 109.4700 1
and so on.. My idea is to replace the value "1.60000" in the second line, by other values using a for loop.
I would like the value to start at, lets say 0, and stop at 2.0 for example, with a increment step of 0.05
Here is what I already tried:
#! /bin/bash
a=0;
for ((i=0; i<=10 (for example); i++)); do
awk '{if ((NR==2) && ($5=="1.60000")) {($5=a)} print $0 }' file.dat > ${i}_file.dat
a=$((a+0.05))
done
But, unfortunately it doesn't work. I tried a lot of combination for the {$5=a} statement but without conclusive results.
Here is what I obtained:
1 C 1
2 C 1 1
3 H 5 1 1.10000 2 109.4700
4 H 5 1 1.10000 2 109.4700 3 109.4700 1
The value 1.6000 simply dissapear or at least replaced by a blank.
Any advice ?
Thanks a lot,
Pierre-Louis
for this perhaps sed is a better alternative
$ v=0.00; for((i=0; i<=40; i++)) do
sed '2s/1.60/'"$v"'/' file > file_"$i";
v=$(echo "$v + 0.05" | bc | xargs printf "%.2f\n");
done
Explanation
sed '2s/1.60/'"$v"'/' file change the value 1.60 on second line with the value of variable v
floating point arithmetic in bash is hard, this adds 0.05 to the value and formats it (0.05 instead of .05) so that we can use it in the substitution with sed.
Exercise to you: in bash try to add 0.05 to 0.05 and format the output as 0.10 with leading zero.
example with awk (glenn's suggestion)
for ((i=0; i<=10; i++)); do
awk -v "i=$i" '
(FNR==2){ $5=sprintf("%2.1f ",i*0.5); print $0 }
' file.dat # > $i_file.dat # uncomment for a file output
done
advantage: it's awk who manage floating-point arithmetic

Replace the nth field of every mth line using awk or bash

For a file that contains entries similar to as follows:
foo 1 6 0
fam 5 11 3
wam 7 23 8
woo 2 8 4
kaz 6 4 9
faz 5 8 8
How would you replace the nth field of every mth line with the same element using bash or awk?
For example, if n = 1 and m = 3 and the element = wot, the output would be:
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
I understand you can call / print every mth line using e.g.
awk 'NR%7==0' file
So far I have tried to keep this in memory but to no avail... I need to keep the rest of the file as well.
I would prefer answers using bash or awk, but sed solutions would also be helpful. I'm a beginner in all three. Please explain your solution.
awk -v m=3 -v n=1 -v el='wot' 'NR % m == 0 { $n = el } 1' file
Note, however, that the inter-field whitespace is not guaranteed to be preserved as-is, because awk splits a line into fields by any run of whitespace; as written, the output fields of modified lines will be separated by a single space.
If your input fields are consistently separated by 2 spaces, however, you can effectively preserve the input whitespace by adding -F' ' -v OFS=' ' to the awk invocation.
-v m=3 -v n=1 -v el='wot' defines Awk variables m, n, and el
NR % m == 0 is a pattern (condition) that evaluates to true for every m-th line.
{ $n = el } is the associated action that replaces the nth field of the input line with variable el, causing the line to be rebuilt, implicitly using OFS, the output-field separator, which defaults to a space.
1 is a common Awk shorthand for printing the (possibly modified) input line at hand.
Great little exercise. While I would probably lean toward an awk solution, in bash you can also rely on parameter expansion with substring replacement to replace the nth field of every mth line. Essentially, you can read every line, preserving whitespace, then check your line count, e.g. if c is your line counter and m your variable for mth line, you could use:
if (( $((c % m )) == 0)) ## test for mth line
If the line is a replacement line, you can read each word into an array after restoring default word-splitting and then use your array element index n-1 to provide the replacement (e.g. ${line/find/replace} with ${line/"${array[$((n-1))]}"/replace}).
If it isn't a replacement line, simply output the line unchanged. A short example could be similar to the following (to which you can add additional validations as required)
#!/bin/bash
[ -n "$1" -a -r "$1" ] || { ## filename given an readable
printf "error: insufficient or unreadable input.\n"
exit 1
}
n=${2:-1} ## variables with default n=1, m=3, e=wot
m=${3:-3}
e=${4:-wot}
c=1 ## line count
while IFS= read -r line; do
if (( $((c % m )) == 0)) ## test for mth line
then
IFS=$' \t\n'
a=( $line ) ## split into array
IFS=
echo "${line/"${a[$((n-1))]}"/$e}" ## nth replaced with e
else
echo "$line" ## otherwise just output line
fi
((c++)) ## advance counter
done <"$1"
Example Use/Output
n=1, m=3, e=wot
$ bash replmn.sh dat/repl.txt
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
n=1, m=2, e=baz
$ bash replmn.sh dat/repl.txt 1 2 baz
foo 1 6 0
baz 5 11 3
wam 7 23 8
baz 2 8 4
kaz 6 4 9
baz 5 8 8
n=3, m=2, e=99
$ bash replmn.sh dat/repl.txt 3 2 99
foo 1 6 0
fam 5 99 3
wam 7 23 8
woo 2 99 4
kaz 6 4 9
faz 5 99 8
An awk solution is shorter (and avoids problems with duplicate occurrences of the replacement string in $line), but both would need similar validation of field existence, etc.. Learn from both and let me know if you have any questions.

Resources