How to read a single column CSV file in bash? - bash

I am relatively new to bash/programming in general.
I have a single column CSV that looks like this:
domain1.com
domain2.com
domain3.com
domain4.com
I want to run through each entry and do something with it. Here is my code:
foo(){
i=0
while read -a line;
do
echo ${line[i]}
((i++))
done < myfile.csv
}
And nothing happens. I have figured out that if I change the file I'm pointing at to:
done< <(grep '' myfile.csv)
it will work, but only spit out the very last line of the CSV, like this:
domain4.com
Again, I am a beginner and teaching myself this stuff, so any explanations you want to give with your answers would be GREATLY appreciated!
EDIT So it appears that my new problem is removing the ^M character from my CSV file. Once I figure out how to do this, I will mark the answer here that works for me.

If you want to store your lines on an array you'd simply do:
readarray -t lines < file
And, if you want to try processing those lines you can have something like
for line in "${lines[#]}"; do
echo "$line"
done
Or by index (mind !):
for i in "${!lines[#]}"; do
echo "${lines[i]}"
done
Indices start with 0.

while read p; do
echo $p
done < myfile.csv

Looks like you have 2 issues:
Your lines are all ending with \r
There is no new line or \r at the end of last line
To fix this issue use this script:
echo >> file.csv
while read -r line; do echo "$line"; done < <(tr '\r' '\n' < file.csv)

You can also simply read the file into the array with:
array=( `<file` )
If you have need to use numerical indexes, then you can access the elements with:
for ((i=0;i<${#array[#]};i++))
printf " array [%2d]: %s\n" "$i" "${array[$i]}"
done

Related

choosing column name in .csv file

I'm really new to bash programming. I want to write the results of two variables into a .csv file. I use this command:
while IFS= read -r line; do
ip=$(dig +short $line)
echo "${line}, ${ip}" >> file.csv
done < domains
It works file. It creates two columns in file.csv and writes the result of $line in the first column and the result of $ip in the 2nd column.
I wanted to know if there is a way to choose a name for these columns. For example
column1 : $line & column2:$ip
In CSV files column names are the contents of the first row, so (before your loop) you can write:
echo "Line,Ip" > file.csv.tmp # Add columns in new temporary file
cat file.csv.tmp >> file.csv # Append all the data of the original file
rm file.csv # Remove the original file
mv file.cvs.tmp file.csv # Rename the temporary file
Or you can also simply use this other method:
echo "Line,Ip
$(cat file.cs)" > file.csv
I hope it helps.
As helen pointed out in the comments, if the file should be overwritten with every run then you can simply add echo "Line,Ip" > file.csv before the loop.

bash while loop "eats" my space characters

I am trying to parse a huge text file, say 200mb.
the text file contains some strings
123
1234
12345
12345
so my script looked like
while read line ; do
echo "$line"
done <textfile
however using this above method, my string " 12345" gets truncated to "12345"
I tried using
sed -n "$i"p textfile
but the the throughput is reduced from 27 to 0.2 lines per second, which is inacceptable ;-)
any Idea howto solve this?
You want to echo the lines without a fieldsep:
while IFS="" read line; do
echo "$line"
done <<< " 12345"
When you also want to skip interpretation of special characters, use
while IFS="" read -r line; do
echo "$line"
done <<< " 12345"
You can write the IFS without double quotes:
while IFS= read -r line; do
echo "$line"
done <<< " 12345"
This seems to be what you're looking for:
while IFS= read line; do
echo "$line"
done < textfile
The safest method is to use read -r in comparison to just read which will skip interpretation of special characters (thanks Walter A):
while IFS= read -r line; do
echo "$line"
done < textfile
OPTION 1:
#!/bin/bash
# read whole file into array
readarray -t aMyArray < <(cat textfile)
# echo each line of the array
# this will preserve spaces
for i in "${aMyArray[#]}"; do echo "$i"; done
readarray -- read lines from standard input
-t -- omit trailing newline character
aMyArray -- name of array to store file in
< <() -- execute command; redirect stdout into array
cat textfile -- file you want to store in variable
for i in "${aMyArray[#]}" -- for every element in aMyArray
"" -- needed to maintain spaces in elements
${ [#]} -- reference all elements in array
do echo "$i"; -- for every iteration of "$i" echo it
"" -- to maintain variable spaces
$i -- equals each element of the array aMyArray as it cycles through
done -- close for loop
OPTION 2:
In order to accommodate your larger file you could do this to help alleviate the work and speed up the processing.
#!/bin/bash
sSearchFile=textfile
sSearchStrings="1|2|3|space"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
This will grep the file (faster) before it cycles it through the while command. Let me know how this works for you. Notice you can add multiple search strings to the $sSearchStrings variable.
OPTION 3:
and an all in one solution to have a text file with your search criteria and everything else combined...
#!/bin/bash
# identify file containing search strings
sSearchStrings="searchstrings.file"
while IFS= read -r string; do
# if $sSearchStrings empty read in strings
[[ -z $sSearchStrings ]] && sSearchStrings="${string}"
# if $sSearchStrings not empty read in $sSearchStrings "|" $string
[[ ! -z $sSearchStrings ]] && sSearchStrings="${sSearchStrings}|${string}"
# read search criteria in from file
done <"${sSearchStrings}"
# identify file to be searched
sSearchFile="text.file"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")

how to read file from line x to the end of a file in bash

I would like know how I can read each line of a csv file from the second line to the end of file in a bash script.
I know how to read a file in bash:
while read line
do
echo -e "$line\n"
done < file.csv
But, I want to read the file starting from the second line to the end of the file. How can I achieve this?
tail -n +2 file.csv
From the man page:
-n, --lines=N
output the last N lines, instead of the last 10
...
If the first character of N (the number of bytes or lines) is a '+',
print beginning with the Nth item from the start of each file, other-
wise, print the last N items in the file.
In English this means that:
tail -n 100 prints the last 100 lines
tail -n +100 prints all lines starting from line 100
Simple solution with sed:
sed -n '2,$p' <thefile
where 2 is the number of line you wish to read from.
Or else (pure bash)...
{ for ((i=1;i--;));do read;done;while read line;do echo $line;done } < file.csv
Better written:
linesToSkip=1
{
for ((i=$linesToSkip;i--;)) ;do
read
done
while read line ;do
echo $line
done
} < file.csv
This work even if linesToSkip == 0 or linesToSkip > file.csv's number of lines
Edit:
Changed () for {} as gniourf_gniourf enjoin me to consider: First syntax generate a sub-shell, whille {} don't.
of course, for skipping only one line (as original question's title), the loop for (i=1;i--;));do read;done could be simply replaced by read:
{ read;while read line;do echo $line;done } < file.csv
There are many solutions to this. One of my favorite is:
(head -2 > /dev/null; whatever_you_want_to_do) < file.txt
You can also use tail to skip the lines you want:
tail -n +2 file.txt | whatever_you_want_to_do
Depending on what you want to do with your lines: if you want to store each selected line in an array, the best choice is definitely the builtin mapfile:
numberoflinestoskip=1
mapfile -s $numberoflinestoskip -t linesarray < file
will store each line of file file, starting from line 2, in the array linesarray.
help mapfile for more info.
If you don't want to store each line in an array, well, there are other very good answers.
As F. Hauri suggests in a comment, this is only applicable if you need to store the whole file in memory.
Otherwise, you best bet is:
{
read; # Just a scratch read to get rid (pun!) of the first line
while read line; do
echo "$line"
done
} < file.csv
Notice: there's no subshell involved/needed.
This will work
i=1
while read line
do
test $i -eq 1 && ((i=i+1)) && continue
echo -e "$line\n"
done < file.csv
I would just get a variable.
#!/bin/bash
i=0
while read line
do
if [ $i != 0 ]; then
echo -e $line
fi
i=$i+1
done < "file.csv"
UPDATE Above will check for the $i variable on every line of csv. So if you have got very large csv file of millions of line it will eat significant amount of CPU cycles, no good for Mother nature.
Following one liner can be used to delete the very first line of CSV file using sed and then output the remaining file to while loop.
sed 1d file.csv | while read d; do echo $d; done

Bash script get item from array

I'm trying to read file line by line in bash.
Every line has format as follows text|number.
I want to produce file with format as follows text,text,text etc. so new file would have just text from previous file separated by comma.
Here is what I've tried and couldn't get it to work :
FILENAME=$1
OLD_IFS=$IFSddd
IFS=$'\n'
i=0
for line in $(cat "$FILENAME"); do
array=(`echo $line | sed -e 's/|/,/g'`)
echo ${array[0]}
i=i+1;
done
IFS=$OLD_IFS
But this prints both text and number but in different format text number
here is sample input :
dsadadq-2321dsad-dasdas|4212
dsadadq-2321dsad-d22as|4322
here is sample output:
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
What did I do wrong?
Not pure bash, but you could do this in awk:
awk -F'|' 'NR>1{printf(",")} {printf("%s",$1)}'
Alternately, in pure bash and without having to strip the final comma:
#/bin/bash
# You can get your input from somewhere else if you like. Even stdin to the script.
input=$'dsadadq-2321dsad-dasdas|4212\ndsadadq-2321dsad-d22as|4322\n'
# Output should be reset to empty, for safety.
output=""
# Step through our input. (I don't know your column names.)
while IFS='|' read left right; do
# Only add a field if it exists. Salt to taste.
if [[ -n "$left" ]]; then
# Append data to output string
output="${output:+$output,}$left"
fi
done <<< "$input"
echo "$output"
No need for arrays and sed:
while IFS='' read line ; do
echo -n "${line%|*}",
done < "$FILENAME"
You just have to remove the last comma :-)
Using sed:
$ sed ':a;N;$!ba;s/|[0-9]*\n*/,/g;s/,$//' file
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Alternatively, here is a bit more readable sed with tr:
$ sed 's/|.*$/,/g' file | tr -d '\n' | sed 's/,$//'
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Choroba has the best answer (imho) except that it does not handle blank lines and it adds a trailing comma. Also, mucking with IFS is unnecessary.
This is a modification of his answer that solves those problems:
while read line ; do
if [ -n "$line" ]; then
if [ -n "$afterfirst" ]; then echo -n ,; fi
afterfirst=1
echo -n "${line%|*}"
fi
done < "$FILENAME"
The first if is just to filter out blank lines. The second if and the $afterfirst stuff is just to prevent the extra comma. It echos a comma before every entry except the first one. ${line%|\*} is a bash parameter notation that deletes the end of a paramerter if it matches some expression. line is the paramter, % is the symbol that indicates a trailing pattern should be deleted, and |* is the pattern to delete.

How to concatenate all lines from a file in Bash? [duplicate]

This question already has answers here:
How to concatenate multiple lines of output to one line?
(12 answers)
Closed 4 years ago.
I have a file csv :
data1,data2,data2
data3,data4,data5
data6,data7,data8
I want to convert it to (Contained in a variable):
variable=data1,data2,data2%0D%0Adata3,data4,data5%0D%0Adata6,data7,data8
My attempt :
data=''
cat csv | while read line
do
data="${data}%0D%0A${line}"
done
echo $data # Fails, since data remains empty (loop emulates a sub-shell and looses data)
Please help..
Simpler to just strip newlines from the file:
tr '\n' '' < yourfile.txt > concatfile.txt
In bash,
data=$(
while read line
do
echo -n "%0D%0A${line}"
done < csv)
In non-bash shells, you can use `...` instead of $(...). Also, echo -n, which suppresses the newline, is unfortunately not completely portable, but again this will work in bash.
Some of these answers are incredibly complicated. How about this.
data="$(xargs printf ',%s' < csv | cut -b 2-)"
or
data="$(tr '\n' ',' < csv | cut -b 2-)"
Too "external utility" for you?
IFS=$'\n', read -d'\0' -a data < csv
Now you have an array! Output it however you like, perhaps with
data="$(tr ' ' , <<<"${data[#]}")"
Still too "external utility?" Well fine,
data="$(printf "${data[0]}" ; printf ',%s' "${data[#]:1:${#data}}")"
Yes, printf can be a builtin. If it isn't but your echo is and it supports -n, use echo -n instead:
data="$(echo -n "${data[0]}" ; for d in "${data[#]:1:${#data[#]}}" ; do echo -n ,"$d" ; done)"
Okay, now I admit that I am getting a bit silly. Andrew's answer is perfectly correct.
I would much prefer a loop:
for line in $(cat file.txt); do echo -n $line; done
Note: This solution requires the input file to have a new line at the end of the file or it will drop the last line.
Another short bash solution
variable=$(
RS=""
while read line; do
printf "%s%s" "$RS" "$line"
RS='%0D%0A'
done < filename
)
awk 'END { print r }
{ r = r ? r OFS $0 : $0 }
' OFS='%0D%0A' infile
With shell:
data=
while IFS= read -r; do
[ -n "$data" ] &&
data=$data%0D%0A$REPLY ||
data=$REPLY
done < infile
printf '%s\n' "$data"
Recent bash versions:
data=
while IFS= read -r; do
[[ -n $data ]] &&
data+=%0D%0A$REPLY ||
data=$REPLY
done < infile
printf '%s\n' "$data"
A very simple single-line solution which requires no extra files as its quite easy to understand (I think, just cat the file together and perform sed-replace):
output=$(echo $(cat ./myFile.txt) | sed 's/ /%0D%0A/g')
Useless use of cat, punished! You want to feed the CSV into the loop
while read line; do
# ...
done < csv

Resources