How to iterate over text file having multiple-words-per-line using shell script? - bash

I know how to iterate over lines of text when the text file has contents as below:
abc
pqr
xyz
However, what if the contents of my text file are as below,
abc xyz
cdf pqr
lmn rst
and I need to get values "abc" stored to one variable and"xyz" stored to another variable. How would I do that?

read splits the line by $IFS as many times as you pass variables to it:
while read var1 var2 ; do
echo "var1: ${var1} var2: ${var2}"
done
You see, if you pass var1 and var2 both columns go to separate variables. But note that if the line would contain more columns var2 would contain the whole remaining line, not just column2.
Type help read for more info.

If the delimiter is a space then you can do:
#!/bin/bash
ALLVALUES=()
while read line
do
ALLVALUES+=( $line )
done < "/path/to/your/file"
So after, you can just reference an element by ${ALLVALUES[0]} or ${ALLVALUES[1]} etc

If you want to read every word in a file into a single array you can do it like this:
arr=()
while read -r -a _a; do
arr+=("${a[#]}")
done < infile
Which uses -r to avoid read from interpreting backslashes in the input and -a to have it split the words (splitting on $IFS) into an array. It then appends all the elements of that array to the accumulating array while being safe for globbing and other metacharacters.

This awk command reads the input word by word:
awk -v RS='[[:space:]]+' '1' file
abc
xyz
cdf
pqr
lmn
rst
To populate a shell array use awk command in process substitution:
arr=()
while read -r w; do
arr+=("$w")
done < <(awk -v RS='[[:space:]]+' '1' file)
And print the array content:
declare -p arr
declare -a arr='([0]="abc" [1]="xyz" [2]="cdf" [3]="pqr" [4]="lmn" [5]="rst")'

Related

read a file line by line and assign the values to variable as comma separated

I have the following a.txt file:
abc,
def,
ghi
I want to read it line-by-line, and store in a varibale as comma seperated values
var1=abc,def,ghi
i am new to shell script pls help
My try:
name="file.txt"
while IFS=read -r line
do
names=`echo $line`
done < "name"
it is displaying only value ghi to varibale
You're not concatenating, you're replacing the names variable each time through the loop.
There's no need to use echo when assigning the variable.
name="file.txt"
names=
while IFS=read -r line
do
names="$names$line"
done < "name"

Extracting file content using a for loop [duplicate]

I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"

Arithmetic operations using numbers from grep

I have FILE from which I can extract two numbers using grep. The numbers appear in the last column.
$ grep number FILE
number1: 123
number2: 456
I would like to assign the numbers to variables, e.g. $num1 and $num2, and do some arithmetic operations using the variables.
How can I do this using bash commands?
Assumptions:
we want to match on lines that start with the string number
we will always find 2 matches for ^number from the input file
not interested in storing values in an array
Sample data:
$ cat file.dat
number1: 123
not a number: abc
number: 456
We'll use awk to find the desired values and print all to a single line of output:
$ awk '/^number/ { printf "%s ",$2 }' file.dat
123 456
From here we can use read to load the variables:
$ read -r num1 num2 < <(awk '/^number/ { printf "%s ",$2 }' file.dat)
$ typeset -p num1 num2
declare -- num1="123"
declare -- num2="456"
$ echo ".${num1}.${num2}."
.123.456.
NOTE: periods added as visual delimiters
Firstly, you need to extract the numbers from the file. Assuming that the file is always in the format stated, then you can use a while loop, combined with the the read command to read the numbers into a named variable, one row at a time.
You can then use the $(( )) operator to perform integer arithmetic to keep a running total of the incoming numbers.
For example:
#!/bin/bash
declare -i total=0 # -i declares an integer.
while read discard number; do # read returns false at EOF. discard is ignored.
total=$((total+number)) # Variables don't need '$' prefix in this case.
done < FILE # while loop passes STDIN to the 'read' command.
echo "Total is: ${total}"

Generate a column for each file matching a glob

I'm having difficulties with something that sounds relatively simple. I have a few data files with single values in them as shown below:
data1.txt:
100
data2.txt
200
data3.txt
300
I have another file called header.txt and its a template file that contains the header as shown below:
Data_1 Data2 Data3
- - -
I'm trying to add the data from the data*.txt files to the last line of Master.txt
The desired output would be something like this:
Data_1 Data2 Data3
- - -
100 200 300
I'm actively working this so I'm not sure where to begin. This doesn't need to be implemented in pure shell -- use of standard UNIX tools such as awk or sed is entirely reasonable.
paste is the key tool:
#!/bin/bash
exec >>Master.txt
cat header.txt
paste $'-d\n' data1.txt data2.txt data3.txt |
while read line1
do
read line2
read line3
printf '%-10s %-10s %-10s\n' "$line1" "$line2" "$line3"
done
As a native-bash implementation:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0+ needed" >&2; exit 1;; esac
declare -A keys=( ) # define an associative array (a string->string map)
for f in data*.txt; do # iterate over data*.txt files
name=${f%.txt} # for each, remove the ".txt" extension to get our name...
keys[${name^}]=$(<"$f") # capitalize the first letter, and read the file to get the value
done
{ # start a group so we can redirect output just once
printf '%s\t' "${!keys[#]}"; echo # first line: keys in our associative array
printf '%s\t' "${keys[#]//*/-}"; echo # second line: convert values to dashes
printf '%s\t' "${keys[#]}"; echo # third line: print the values unmodified
} >>Master.txt # all the above with output redirected to Master.txt
Most of the magic here is performed by parameter expansions:
${f%.txt} trims the .txt extension from the end of $f
${name^} capitalizes the first letter of $name
"${keys[#]}" expands to all values in the array named keys
"${keys[#]//*/-} replaces * (everything) in each key with the fixed string -.
"${!keys[#]}" expands to the names of entries in the associative array keys.

Read multi variable csv bash build multi line file from it

I had what I thought was a simple concept which I could easily do as I did something similar.
I have an input file input.csv
1a,1b
2a,2b
I would like the following output
Output file 1
This is variable 1 named 1a ok
This is variable 2 named 1b ok
Output file 2
This is variable 1 named 2a ok
This is variable 2 named 2b ok
I thought I could do something similar to below
i=1
while IFS=, read var1 var2; do
echo This is variable 1 named "var1" > filenamei
echo This is variable 2 named "var2" >> filenamei
i=i+1
done </inputfile.csv
I previously wrote code to take a single variable from a long file and write output to a single file and it worked fine. Like below
Input file
a
b
Single output file
This is A
This is B
Script was
while read p;do
echo this is "$p" >>output file
done < input file
Been through lots of different errors but getting nowhere.
It will be easy by configuring double loop: the outer loop to iterate over lines and the inner one for comma-separated fields. Then how about:
#!/bin/bash
i=1
while read -r line; do
ifs_back="$IFS"
IFS=","
set -- $line
for ((j=1; j<=$#; j++)); do
echo This is variable "$j" named "${!j}" >> "filename${i}"
done
IFS="$ifs_back"
i=$((i+1))
done < "inputfile.csv"
Explanations:
In order to split the input line with commas, we temporarily set IFS to "," then assign the fields to positional parameters $1, $2.
The loop counter j for the inner loop starts with 1 and ends with $#1, number of fields.
We can access the value of the positional parameter via ${!j}.
As a clean up of the inner loop, we retrieve IFS and increment i for the next line.
The code above is flexible with #lines and #fields so would work with the input:
1a,1b
2a,2b
3a,3b
as wel as with:
1a,1b,1c
2a,2b,2c
3a,3b,3c
Hope this helps.

Resources