IFS not parsing well CSV - bash

I am trying to parse a file so I can obtain the first column. The command I'm using is:
while IFS=',' read -r a; do echo "$a"; done < test.csv
However it is still outputting the whole csv instead of the first column. An example of the csv is as follows:
NOM,CODI,DATA,SEXE,GRUP_EDAT,RESIDENCIA,CASOS_CONFIRMAT,PCR,INGRESSOS_TOTAL,INGRESSOS_CRITIC,INGRESSATS_TOTAL,INGRESSATS_CRITIC,EXITUS
MOIANÃS,42,24/08/2020,Home,Majors de 74,No,0,2,0,0,0,0,0
ALT CAMP,01,30/07/2020,Dona,Entre 15 i 64,Si,0,0,0,0,0,0,0
ALT CAMP,01,30/07/2020,Dona,Entre 65 i 74,No,0,1,0,0,0,0,0
ALT CAMP,01,30/07/2020,Dona,Entre 65 i 74,Si,0,0,0,0,0,0,0
I've been looking elsewhere and all seem to agree that this should be the correct approach when parsing csv using IFS. A thing I've noticed is that if I add a new column to the read function, say b, it outputs the first column instead of everything.
while IFS=',' read -r a b; do echo "$a"; done < test.csv
I don't understand this behaviour and it does not seem to work further than printing the first column. For example, If I were to put c and $c, it wouldn't print the third column and so on.
Can you please explain this behaviour and why this is happening?
Thank you

read is working correctly. It splits on IFS and assigns each field to a variable, with the remainder of the line going to the last variable. If you only give one variable, the whole line goes to it.

bash is not the right tool for parsing a csv file and you should consider awk for this. e.g. to printf first 2 columns use this super simple awk command:
awk -F, '{print $1, $2}' file.csv
Just to highlight your issue: Regarding your bash loop, better use an array to ready all comma separated columns into array:
while IFS=, read -ra arr; do
# print first 2 columns
echo "col1=${arr[0]}, col2=${arr[1]}"
done < file.csv

For simple CSV files, you can simply split on every comma, but you want to read the input into an array, unless you know the number of columns in every row.
For exapmle, if you know there are going to be (at most) 10 columns, you can use
while IFS=, read -r f1 f2 f3 f4 f5 f6 f7 f8 f9 f10; do
However, in bash it is simpler to read the entire split line into a single array:
while IFS=, read -ra f; do
The first field would be "${f[0]}", the second "${f[1]}", etc.

Related

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

Scripting username creation from text file?

I'm really new at Bash and scripting in general.
I have to create usernames formed of first letter of first name followed by last name. To do it, I use a provided text file that looks like this:
doe,john
smith,mike
...
I declared the following variables:
fname=$(cut -d, -f2 "file.txt" | cut -c1)
lname=$(cut -d, -f1 "file.txt")
But how do I put the elements together to form the names jdoe and msmith ? I tried the methods I know to concatenate strings and vriables, but nothing works..
I think I found a method using awk that is supposed to work, but is there any other way to "concatenate" the elements of 2 lists?
Thank you
There's a million ways to do it, this is simplest:
$ awk -F, '{print substr($2,1,1) $1}' file
jdoe
msmith
Ed Morton's awk-based answer is simplest (and probably fastest), but since you asked for a different solution:
#!/usr/bin/env bash
while IFS=, read -r last first _; do
username=${first:0:1}${last}
echo "username: $username"
done < file.txt
IFS=, read -r last first _ reads the first 2 ,-separated fields from each input line (_ is a dummy variable that receives the rest of the input line, if any; -r prevents interpretation of \ chars. in the input, which is usually what you want).
username=${first:0:1}${last} concatenates the 1st char. of variable $first's value with variable $last's value, simply by placing the two variable references next to each other.
${first:0:1} - extract 1 character from $first at position 0 - is an example of parameter expansion, specifically: substring expansion
< file.txt is an input redirection that sends file.txt's contents via stdin to the while loop.
This looks a bit too much like homework, so I'll just drop some hints.
To read the lastname and firstname into separate variables for each line of the file, see BashFAQ 1. It should not involve cut.
To grab the first character of a variable, see BashFAQ 100.

First line of text file not read in shell

Okay so Ive compile this code to read a text file however it succesfully finds the sum of every column needed except from the first line! Hence gives me the wrong summation which excludes the value on the first line it reads in. It sets value: $line = ddsdfj:jdskf:1:fjf but never extracts the 1 from the first line. Any clues would be appreciated.
FILE=$1
while read line
do
awk -F: '{summation += $3;}END{print summation;}'
done < $FILE
The while loop is completely superfluous. It looks like what you want is
awk -F: '{s+=$3}END{print s}' "$1"
quite simply.
The code you had would read the first line with read, then the other lines as standard input to awk; hence, the behavior you were observing. Something like
while read line; do
awk -F: '{s+=$3}END{print s}' <<<"$line"
done <"$1"
would have actually used the value from line for something, but of course, that would just extract the third field from each line individually, not performed any actual addition of values from different lines.

awk to write different columns from different lines into single line of output file?

I am using a while do loop to read in from a file that contains a list of hostnames, run a command against the host list, and input specific data from the results into a second file. I need the output to be from line 33 column 3 and line 224 column 7, output to a single line in the second file. I can do it for either one or the other but having trouble getting it to work for both. example:
while read i; do
/usr/openv/netbackup/bin/admincmd/bpgetconfig -M $i |\
awk -v j=33 -v k=3 'FNR == j {print $k}' > /tmp/clientversion.txt
done < /tmp/clientlist.txt
Any hints or help is greatly appreciated!
You could use something like this:
awk 'NR==33{a=$3}NR==224{print a,$7}'
This saves the value in the third column of line 33 to the variable a, then prints it out along with the seventh column of line 224.
However, you're currently overwriting the file /tmp/clientversion.txt every iteration of the while loop. Assuming you want the file to contain all of the output once the loop has run, you should move the redirection outside the loop:
while read -r i; do
/usr/openv/netbackup/bin/admincmd/bpgetconfig -M $i |\
awk 'NR==33{a=$3}NR==224{print a,$7}'
done < /tmp/clientlist.txt > /tmp/clientversion.txt
As a bonus, I have added the -r switch to read (see related discussion here). Depending on the contents of your input file, you might also want to use double quotes around "$i" as well.

Replacing a column in CSV file with another in bash

I have a csv file with a number of columns. I am trying to replace the second column with the second to last column from the same file.
For example, if I have a file, sample.csv
1,2,3,4,5,6
a,b,c,d,e,f
g,h,i,j,k,l
I want to output:
1,5,3,4,5,6
a,e,c,d,e,f
g,k,i,j,k,l
Can anyone help me with this task? Also note that I will be discarding the last two columns afterwards with the cut function so I am open to separating the csv file to begin with so that I can replace the column in one csv file with another column from another csv file. Whichever is easier to implement. Thanks in advance for any help.
How about this simpler awk:
awk 'BEGIN{FS=OFS=","} {$2=$(NF-1)}'1 sample.csv
EDIT: Noticed that you also want to discard last 2 columns. Use this awk one-liner:
awk 'BEGIN{FS=OFS=","} {$2=$(NF-1); NF=NF-2}'1 sample.csv
In bash
while IFS=, read -r -a arr; do
arr[1]="${arr[4]}";
printf -v output "%s," "${arr[#]}";
printf "%s\n" "${output%,}";
done < sample.csv
Pure bash solution, using IFS in a funny way:
# Set globally the IFS, you'll see it's funny
IFS=,
while read -ra a; do
a[1]=${a[#]: -2:1}
echo "${a[*]}"
done < file.csv
Setting globally the IFS variable is used twice: once in the read statement so that each field is split according to a coma and in the line echo "${a[*]}" where "${a[*]}" will expand to the fields of the array a separated by IFS... which is a coma!
Another special thing: you mentionned the second to last field, and that's exactly what ${a[#]: -2:1} will expand to (mind the space between : and -2), so that you don't have to count your number of fields.
Caveat. csv files need a special csv parser that is difficult to implement. This answer (and I guess all the other answers that will not use a genuine csv parser) might break if a field contains a coma, e.g.,
1,2,3,4,"a field, with a coma",5
If you want to discard the last two columns, don't use cut, but this instead:
IFS=,
while read -ra a; do
((${#a[#]}<2)) || continue # skip if array has less than two fields
a[1]=${a[#]: -2:1}
echo "${a[*]::${#a[#]}-2}"
done < file.csv

Resources