bash while loop "eats" my space characters - bash

I am trying to parse a huge text file, say 200mb.
the text file contains some strings
123
1234
12345
12345
so my script looked like
while read line ; do
echo "$line"
done <textfile
however using this above method, my string " 12345" gets truncated to "12345"
I tried using
sed -n "$i"p textfile
but the the throughput is reduced from 27 to 0.2 lines per second, which is inacceptable ;-)
any Idea howto solve this?

You want to echo the lines without a fieldsep:
while IFS="" read line; do
echo "$line"
done <<< " 12345"
When you also want to skip interpretation of special characters, use
while IFS="" read -r line; do
echo "$line"
done <<< " 12345"
You can write the IFS without double quotes:
while IFS= read -r line; do
echo "$line"
done <<< " 12345"

This seems to be what you're looking for:
while IFS= read line; do
echo "$line"
done < textfile
The safest method is to use read -r in comparison to just read which will skip interpretation of special characters (thanks Walter A):
while IFS= read -r line; do
echo "$line"
done < textfile

OPTION 1:
#!/bin/bash
# read whole file into array
readarray -t aMyArray < <(cat textfile)
# echo each line of the array
# this will preserve spaces
for i in "${aMyArray[#]}"; do echo "$i"; done
readarray -- read lines from standard input
-t -- omit trailing newline character
aMyArray -- name of array to store file in
< <() -- execute command; redirect stdout into array
cat textfile -- file you want to store in variable
for i in "${aMyArray[#]}" -- for every element in aMyArray
"" -- needed to maintain spaces in elements
${ [#]} -- reference all elements in array
do echo "$i"; -- for every iteration of "$i" echo it
"" -- to maintain variable spaces
$i -- equals each element of the array aMyArray as it cycles through
done -- close for loop
OPTION 2:
In order to accommodate your larger file you could do this to help alleviate the work and speed up the processing.
#!/bin/bash
sSearchFile=textfile
sSearchStrings="1|2|3|space"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
This will grep the file (faster) before it cycles it through the while command. Let me know how this works for you. Notice you can add multiple search strings to the $sSearchStrings variable.
OPTION 3:
and an all in one solution to have a text file with your search criteria and everything else combined...
#!/bin/bash
# identify file containing search strings
sSearchStrings="searchstrings.file"
while IFS= read -r string; do
# if $sSearchStrings empty read in strings
[[ -z $sSearchStrings ]] && sSearchStrings="${string}"
# if $sSearchStrings not empty read in $sSearchStrings "|" $string
[[ ! -z $sSearchStrings ]] && sSearchStrings="${sSearchStrings}|${string}"
# read search criteria in from file
done <"${sSearchStrings}"
# identify file to be searched
sSearchFile="text.file"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")

Related

Loop through table and parse multiple arguments to scripts in Bash

I am in a situation similar to this one and having difficulties implementing this kind of solution for my situation.
I have file.tsv formatted as follows:
x y
dog woof
CAT meow
loud_goose honk-honk
duck quack
with a fixed number of columns (but variable rows) and I need to loop those pairs of values, all but the first one, in a script like the following (pseudocode)
for elements in list; do
./script1 elements[1] elements[2]
./script2 elements[1] elements[2]
done
so that script* can take the arguments from the pair and run with it.
Is there a way to do it in Bash?
I was thinking I could do something like this:
list1={`awk 'NR > 1{print $1}' file.tsv`}
list2={`awk 'NR > 1{print $2}' file.tsv`}
and then to call them in the loop based on their position, but I am not sure on how.
Thanks!
Shell tables are not multi-dimensional so table element cannot store two arguments for your scripts. However since you are processing lines from file.tsv, you can iterate on each line, reading both elements at once like this:
#!/usr/bin/env sh
# Populate tab with a tab character
tab="$(printf '\t')"
# Since printf's sub-shell added a trailing newline, remove it
tab="${tab%?}"
{
# Read first line in dummy variable _ to skip header
read -r _
# Iterate reading tab delimited x and y from each line
while IFS="$tab" read -r x y || [ -n "$x" ]; do
./script1 "$x" "$y"
./script2 "$x" "$y"
done
} < file.tsv # from this file
You could try just a while + read loop with the -a flag and IFS.
#!/usr/bin/env bash
while IFS=$' \t' read -ra line; do
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
done < <(tail -n +2 file.tsv)
Or without the tail
#!/usr/bin/env bash
skip=0 start=-1
while IFS=$' \t' read -ra line; do
if ((start++ >= skip)); then
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
fi
done < file.tsv
Remove the echo's if you're satisfied with the output.

Substitute a variable in a line read from a file

I have read the config file which has the below variable:
export BASE_DIR="\usr\usr1"
In the same script I read a file line by line and I wanted to substitute the ${BASE_DIR} with \usr\usr1.
In the script:
while read line; do
echo $line
done <file.txt
${BASE_DIR}\path1 should be printed as \usr\usr1\path1
Tried eval echo and $(( )).
Can use sed, This command will search and replace a value. The dollar sign is the separator.
sed -ie 's$\${BASE_DIR}$\\usr\\usr1$1' hello.txt
You need to set the variable when you read the line that contains the assignment. Then you can replace it later.
#!/bin/bash
while read line; do
if [[ $line =~ ^BASE_DIR= ]]
then basedir=${line#BASE_DIR=}
fi
line=${line/'${BASE_DIR}'/$basedir}
printf "%s\n" "$line"
done < file.txt > newfile.txt

bash while read loop arguments from variable issue

I have a bash script with following variable:
operators_list=$'andrii,bogdan,eios,fre,kuz,pvm,sebastian,tester,tester2,vincent,ykosogon'
while IFS=, read -r tech_login; do
echo "... $tech_login ..."
done <<< "$operators_list"
I need to read arguments from variable and work with them in loop. But it returns echo only one time with all items:
+ IFS=,
+ read -r tech_login
+ echo '... andrii,bogdan,eios,fre,kuz,pvm,sebastian,tester,tester2,vincent,ykosogon ...'
... andrii,bogdan,eios,fre,kuz,pvm,sebastian,tester,tester2,vincent,ykosogon ...
+ IFS=,
+ read -r tech_login
What am I doing wrong? How to rework script, so it will work only with one item per time?
operators_list=$'andrii,bogdan,eios,fre,kuz,pvm,sebastian,tester,tester2,vincent,ykosogon'
So you have strings separated by ,. You can do that multiple ways:
using bash arrays:
IFS=, read -a operators <<<$operators_list
for op in "${operators[#]}"; do
echo "$op"
done
Using a while loop, like you wanted:
while IFS= read -d, -r op; do
echo "$op"
done <<<$operators_list
Using xargs, because why not:
<<<$operators_list xargs -d, -n1 echo
The thing with IFS and read delimeter is: read reads until delimeter specified with -d. Then after read has read a full string (usually whole line, as default delimeter is newline), then the string is splitted into parts using IFS as delimeter. So you can:
while IFS=: read -d, -r op1 op2; do
echo "$op1" "$op2"
done <<<"op11:op12,op12:op22"

Why does outer while loop in Bash not finishing?

I don't understand why this outer loop exits just because the inner loop can finish.
The $1 refers to a file with a lot of pattern/replacement lines. The $2 is a list of words. The problem is that the outer loop exits already after the first pattern/replacement line. I want it to exit after all the lines in $1 are read.
#!/bin/bash
#Receive SED SCRIPT WORDLIST
if [ -f temp.txt ];
then
> temp.txt
else
touch temp.txt
fi
while IFS='' read -r line || [[ -n "$line" ]];
do
echo -e "s/$line/p" >> temp.txt
while IFS='' read -r line || [[ -n "$line" ]];
do
sed -nf temp.txt $2
done
> temp.txt
done < $1
I understand that you want calculate de sed expressions and write it on a file, and then apply this expresions to other file.
This is so much easier than your are doing it.
First of all, you dont need to check if temp.txt already exists. When you redirect the output of a command to a file, if this file do not exist, it will be created. But if you want to reset the file, I recommend you to use truncate command.
In the body of the script, I don't understand why you put a second while loop to read from a file, but you don't put a file to read.
I think that you need is something like this:
truncate -s 0 sed_expressions.txt
while IFS='' read -r line || [[ -n "$line" ]]; do
echo -e "s/$line/p" >> sed_expressions.txt
done < $1
sed -nf sed_expressions.txt $2 > out_file.txt
Try it and tell me if is this that you need.
Bye!

Skip line in text file which starts with '#' via KornShell (ksh)

I am trying to write a script which reads a text file and saves each line to a string. I would also like the script to skip any lines which start with a hash symbol. Any suggestions?
You should not leave skipping lines to ksh. E.g. do this:
grep -v '^#' INPUTFILE | while IFS="" read line ; do echo $line ; done
And instead of the echo part do whatever you want.
Or if ksh does not support this syntax:
grep -v '^#' INPUTFILE > tmpfile
while IFS="" read line ; do echo $line ; done < tmpfile
rm tmpfile
while read -r line; do
[[ "$line" = *( )#* ]] && continue
# do something with "$line"
done < filename
look for "File Name Patterns" or "File Name Generation" in the ksh man page.

Resources