How can I read words (instead of lines) from a file? - bash

I've read this question about how to read n characters from a text file using bash. I would like to know how to read a word at a time from a file that looks like:
example text
example1 text1
example2 text2
example3 text3
Can anyone explain that to me, or show me an easy example?
Thanks!

The read command by default reads whole lines. So the solution is probably to read the whole line and then split it on whitespace with e.g. for:
#!/bin/sh
while read line; do
for word in $line; do
echo "word = '$word'"
done
done <"myfile.txt"

The way to do this with standard input is by passing the -a flag to read:
read -a words
echo "${words[#]}"
This will read your entire line into an indexed array variable, in this case named words. You can then perform any array operations you like on words with shell parameter expansions.
For file-oriented operations, current versions of Bash also support the mapfile built-in. For example:
mapfile < /etc/passwd
echo ${MAPFILE[0]}
Either way, arrays are the way to go. It's worth your time to familiarize yourself with Bash array syntax to make the most of this feature.

Ordinarily, you should read from a file using a while read -r line loop. To do this and parse the words on the lines requires nesting a for loop inside the while loop.
Here is a technique that works without requiring nested loops:
for word in $(<inputfile)
do
echo "$word"
done

In the context given, where the number of words is known:
while read -r word1 word2 _; do
echo "Read a line with word1 of $word1 and word2 of $word2"
done
If you want to read each line into an array, read -a will put the first word into element 0 of your array, the second into element 1, etc:
while read -r -a words; do
echo "First word is ${words[0]}; second word is ${words[1]}"
declare -p words # print the whole array
done

In bash, just use space as delimiter (read -d ' '). This method requires some preprocessing to translate newlines into spaces (using tr) and to merge several spaces into a single one (using sed):
{
tr '\n' ' ' | sed 's/ */ /g' | while read -d ' ' WORD
do
echo -n "<${WORD}> "
done
echo
} << EOF
Here you have some words, including * wildcards
that don't get expanded,
multiple spaces between words,
and lines with spaces at the begining.
EOF
The main advantage of this method is that you don't need to worry about the array syntax and just work as with a for loop, but without wildcard expansion.

I came across this question and the proposed answers, but I don't see listed this simple possibile solution:
for word in `cat inputfile`
do
echo $word
done

This can be done using AWK too:
awk '{for(i=1;i<=NF;i++) {print $i}}' text_file

You can combine xargs which reads word delimited by space or newline and echo to print one per line:
<some-file xargs -n1 echo
some-command | xargs -n1 echo
That also works well for large or slow streams of data because it does not need to read the whole input at once.
I’ve used this to read 1 table name at a time from SQLite which prints table names in a column layout:
sqlite3 db.sqlite .tables | xargs -n1 echo | while read table; do echo "1 table: $table"; done

Related

Why, after removing duplicates, the array length is 0?

I tried to print the "unique_words", and it prints "hostname1 hostname2 hostname3", which is correct. However, when I check its size, it is 1 instead of 3. Why that happened?
#!/bin/bash
#Define the string value
text="hostname1 hostname2 hostname2 hostname3"
RANDOM=$$$(date +%s)
declare -i x=1
# Set space as the delimiter
IFS=' '
#Read the split words into an array based on space delimiter
read -a hostArray <<< "$text"
unique_words=($(echo ${hostArray[#]} | tr ' ' '\n' | sort | uniq))
echo ${#unique_words[#]}
Depending on word splitting and IFS to convert strings to arrays is difficult to do safely, and is best avoided. Consider this (Shellcheck-clean) alternative:
#!/bin/bash -p
words=( hostname1 hostname2 hostname2 hostname3 )
sort_output=$(printf '%s\n' "${words[#]}" | sort -u)
readarray -t unique_words <<<"$sort_output"
declare -p unique_words
In Bash code, it's much better to use arrays (like words) instead of strings to hold lists. In general, there are significant difficulties both in looping over lists in strings and in converting them to arrays. Using only arrays is much easier.
echo is not a reliable way to output variable data. Use printf instead. See the accepted, and excellent, answer to Why is printf better than echo?.
readarray (aka mapfile) is a reliable and efficient way to convert lines of text to arrays without using word splitting.
declare -p is an easy and reliable way to display the value, and attributes, of any Bash variable (including arrays and associative arrays). echo "$var" is broken in general, and the output of printf '%s\n' "$var" can hide important details.
When you assign the output of uniq to the array unique_words, IFS is still set to a space. However, the output of uniq consists of several lines, i.e. words separated by newline characters, not spaces. Therefore, when you define your array, you get one single multiline string.
If you would do a
IFS=$'\n'
unique_words=($(echo ${hostArray[#]} | tr ' ' '\n' | sort | uniq))
you would get 3 array elements.
Alternatively, you could save the old value of IFS before changing it:
oldifs=$IFS
IFS=' '
read ....
IFS=$oldifs
unique_words=....
UPDATE:
As was pointed out in a comment, saving IFS in this way is not a good solution and in particular can't be reliably used if IFS was unset in the beginning. Therefore, if you just want to restore IFS to its default behaviour, you can do a
IFS=' '
read ....
unset IFS
unique_words=....

Using sed with a regex to replace strings

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz
There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.
How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

shell: prefixing output with spaces with paste

A lot of time one needs to prefix 4 spaces to some shell output and transform it into valid markdown code. E.g. When posting a question or answer here on stackoverflow.
It's actually quite easy to do with sed:
some_command | sed -e 's/^/ /'
But I'd like to do it with paste if possible. Because paste takes 2 files as input, all I came up with was this:
some_command | paste 4_space_file -
where 4_space_file is actually a file whose whole content was 4 spaces.
Is there a neater way to achieve this with paste without having an actual file on the hard drive?
Literal Answers Using Paste
First, to answer your literal question:
some_command | paste <(printf ' \n') -
...yields the same output as passing paste the name of a file with a single line having four spaces and a newline as its content. However, the output from paste in this case is not four-character indents for each line; the first line has four spaces and a tab prepended, subsequent lines are prefixed with only a tab.
If you wanted to generate an input of the appropriate length while still using paste, then you'd end up with something uglier. Say (with bash 4.0 or newer):
ls | {
mapfile -t lines # read output from ls into an array
# our answer, here, is to move to three spaces in the input, and use paste -d' ' to
# ...add a fourth space during processing.
paste -d' ' \
<(yes ' ' | head -n "${#lines[#]}") \
<(printf '%s\n' "${lines[#]}")
}
<() is process substitution syntax, which expands to a filename which, when read from, will yield the output from the code contained.
Better Answers
For a native bash approach, you might also consider defining a function:
ident4() { while IFS= read -r line; do printf ' %s\n' "$line"; done; }
...for later use:
some_command | indent4
Unlike paste, this actually inserts exactly four spaces (with no intervening tab) on every line, for the exact number of lines in your input (no need to synthesize the correct length).
Also consider awk:
awk '{ print " " $0; }'

Add prefix to each word of each line in bash

I have a variable called deps:
deps='word1 word2'
I want to add a prefix to each word of the variable.
I tried with:
echo $deps | while read word do \ echo "prefix-$word" \ done
but i get:
bash: syntax error near unexpected token `done'
any help? thanks
With sed :
$ deps='word1 word2'
$ echo "$deps" | sed 's/[^ ]* */prefix-&/g'
prefix-word1 prefix-word2
For well behaved strings, the best answer is:
printf "prefix-%s\n" $deps
as suggested by 123 in the comments to fedorqui's answer.
Explanation:
Without quoting, bash will split the contents of $deps according to $IFS (which defaults to " \n\t") before calling printf
printf evaluates the pattern for each of the provided arguments and writes the output to stdout.
printf is a shell built-in (at least for bash) and does not fork another process, so this is faster than sed-based solutions.
In another question I just came across the markers for beginning (\<) and end (\>) of words. With those you can shorten the solution of SLePort above somewhat. The solution also nicely extends to appending a suffix, which I needed in addition to the prefix, but couldn't figure out how to use above solution for it, as the & also includes the possible trailing whitespace after the word.
So my solution is this:
$ deps='word1 word2'
# add prefix:
$ echo "$deps" | sed 's/\</prefix-/g'
prefix-word1 prefix-word2
# add suffix:
$ echo "$deps" | sed 's/\>/-suffix/g'
word1-suffix word2-suffix
Explanation: \< matches the beginning of every word, and \> matches the end of each word. You can simply "replace" these by the prefix/suffix, resulting in them being prepended/appended. There is no need to reference them anymore in the replacement, as these are not "real" characters anyway!
You can read the string into an array and then prepend the string to every item:
$ IFS=' ' read -r -a myarray <<< "word1 word2"
$ printf "%s\n" "${myarray[#]}"
word1
word2
$ printf "prefix-%s\n" "${myarray[#]}"
prefix-word1
prefix-word2

How can I split a string in shell?

I have two strings and I want to split with space and use them two by two:
namespaces="Calc Fs"
files="calc.hpp fs.hpp"
for example, I want to use like this: command -q namespace[i] -l files[j]
I'm a noob in Bourne Shell.
Put them into an array like so:
#!/bin/bash
namespaces="Calc Fs"
files="calc.hpp fs.hpp"
i=1
j=0
name_arr=( $namespaces )
file_arr=( $files )
command -q "${name_arr[i]}" -l "${file_arr[j]}"
echo "hello world" | awk '{split($0, array, " ")} END{print array[2]}'
is how you would split a simple string.
if what you want to do is loop through combinations of the two split strings, then you want something like this:
for namespace in $namespaces
do
for file in $files
do
command -q $namespace -l $file
done
done
EDIT:
or to expand on the awk solution that was posted, you could also just do:
echo $foo | awk '{print $'$i'}'
EDIT 2:
Disclaimer: I don not profess to be any kind of expert in awk at all, so there may be small errors in this explanation.
Basically what the snippet above does is pipe the contents of $foo into the standard input of awk. Awk reads from it's standard in line by line, separating each line into fields based on a field separator, which is any number of spaces by default. Awk executes the program that it is given as an argument. In this case, the shell expands '{ print $'$1' }' into { print $1 } which simply tells awk to print field number 1 of each line of its input.
If you want to learn more I think that this blog post does a pretty good job of describing the basics (as well as the basics of sed and grep) if you skip past the more theoretical stuff at the start (unless you're into that kind of thing).
I wanted to find a way to do it without arrays, here it is:
paste -d " " <(tr " " "\n" <<< $namespaces) <(tr " " "\n" <<< $files) |
while read namespace file; do
command -q $namespace -l $file
done
Two special usage here: process substitution (<(...)) and here strings (<<<). Here strings are a shortcut for echo $namespaces | tr " " "\n". Process substitution is a shortcut for fifo creation, it allows paste to be run using the output of commands instead of files.
If you are using zsh this could be very easy:
files="calc.hpp fs.hpp"
# all elements
print -l ${(s/ /)files}
# just the first one
echo ${${(s/ /)files}[1]} # just the first one

Resources