Why, after removing duplicates, the array length is 0? - bash

I tried to print the "unique_words", and it prints "hostname1 hostname2 hostname3", which is correct. However, when I check its size, it is 1 instead of 3. Why that happened?
#!/bin/bash
#Define the string value
text="hostname1 hostname2 hostname2 hostname3"
RANDOM=$$$(date +%s)
declare -i x=1
# Set space as the delimiter
IFS=' '
#Read the split words into an array based on space delimiter
read -a hostArray <<< "$text"
unique_words=($(echo ${hostArray[#]} | tr ' ' '\n' | sort | uniq))
echo ${#unique_words[#]}

Depending on word splitting and IFS to convert strings to arrays is difficult to do safely, and is best avoided. Consider this (Shellcheck-clean) alternative:
#!/bin/bash -p
words=( hostname1 hostname2 hostname2 hostname3 )
sort_output=$(printf '%s\n' "${words[#]}" | sort -u)
readarray -t unique_words <<<"$sort_output"
declare -p unique_words
In Bash code, it's much better to use arrays (like words) instead of strings to hold lists. In general, there are significant difficulties both in looping over lists in strings and in converting them to arrays. Using only arrays is much easier.
echo is not a reliable way to output variable data. Use printf instead. See the accepted, and excellent, answer to Why is printf better than echo?.
readarray (aka mapfile) is a reliable and efficient way to convert lines of text to arrays without using word splitting.
declare -p is an easy and reliable way to display the value, and attributes, of any Bash variable (including arrays and associative arrays). echo "$var" is broken in general, and the output of printf '%s\n' "$var" can hide important details.

When you assign the output of uniq to the array unique_words, IFS is still set to a space. However, the output of uniq consists of several lines, i.e. words separated by newline characters, not spaces. Therefore, when you define your array, you get one single multiline string.
If you would do a
IFS=$'\n'
unique_words=($(echo ${hostArray[#]} | tr ' ' '\n' | sort | uniq))
you would get 3 array elements.
Alternatively, you could save the old value of IFS before changing it:
oldifs=$IFS
IFS=' '
read ....
IFS=$oldifs
unique_words=....
UPDATE:
As was pointed out in a comment, saving IFS in this way is not a good solution and in particular can't be reliably used if IFS was unset in the beginning. Therefore, if you just want to restore IFS to its default behaviour, you can do a
IFS=' '
read ....
unset IFS
unique_words=....

Related

Cut string from position to character

I'd like to cut a string from a number of position until a specific character "/" :
It would cut this line :
Export text H8X7IS5G.FIC NB regs COLOLO 4138/4138
To this one :
4138
What i tried is to use cut -c with the position and the character but of course it doesn't work :
cut -c 57-'/'
If you want to stick with cut then this might be what you want:
echo 'Export text H8X7IS5G.FIC NB regs COLOLO 4138/4138' |
cut -c41- | cut -d/ -f1
There are many other ways to accomplish this task. If you have a grep which supports perl-compatible regular expressions, for instance, then I'd suggest something along this line:
grep -Po '.{40}\K[^/]*'
Or, a sed one-liner:
sed 's/.\{40\}//; s|/.*||'
Or, using pure bash
[[ $line =~ .{40}([^/]*) ]] && printf '%s\n' "${BASH_REMATCH[1]}"
Assuming you're trying to process a single variable at a time (rather than a stream with hundreds or thousands of lines), you don't need cut for this at all.
input='Export text H8X7IS5G.FIC NB regs COLOLO 4138/4138'
result=${input:40}
echo "${result%%/*}"
...emits 4138.
Both ${var:start:len} (and its shorter synonym ${var:start}) and ${var%%PATTERN} are examples of parameter expansion syntax; the former takes only a subset of a string starting at a given position; the latter trims the longest possible match of PATTERN (${var%PATTERN} trims the shortest possible match of PATTERN instead).
These and other string manipulations in bash are also documented in BashFAQ #100.

Using sed with a regex to replace strings

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz
There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.
How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

how to read one line to calculate the md5

I am using Linux bash version 4.1.2
I have a tab-delimited input_file having 5 fields and I want to calculate the MD5 for each line and put the md5sum at the end of each line.
The expected output_file should therefore has 6 fields for each line.
Here is my coding:
cat input_file | while read ONELINE
do
THEMD5=`echo "$ONELINE" | md5sum | awk '{print $1}'`
echo -e "${ONELINE}\t${THEMD5}"
done > output_file
The coding works well most of the time.
However, if ONELINE is ended with single/double tabs, the trailing tab(s) will disappear!
As a result, the output_file will sometimes contain lines of 4 or 5 fields, due to the missing tab(s).
I have tried to add IFS= or IFS='' or IFS=$'\n' or IFS-$'\012' in the while statement, but still cannot solve the problem.
Please help.
Alvin SIU
The following is quite certainly correct, if you want trailing newlines included in your md5sums (as your original code has):
while IFS= read -r line; do
read sum _ < <(printf '%s\n' "$line" | md5sum -)
printf '%s\t%s\n' "$line" "$sum"
done <input_file
Notes:
Characters inside IFS are stripped by read; setting IFS= is sufficient to prevent this effect.
Without the -r argument, read also interprets backslash literals, stripping them.
Using echo -e is dangerous: It interprets escape sequences inside your line, rather than emitting them as literals.
Using all-uppercase variable names is bad form. See the relevant spec (particularly the fourth paragraph), keeping in mind that shell variables and environment variables share a namespace.
Using echo in general is bad form when dealing with uncontrolled data (particularly including data which can contain backslash literals). See the relevant POSIX spec, particularly the APPLICATION USAGE and RATIONALE sections.
If you want to print the lines in a way that makes hidden characters visible, consider using '%q\t%s\n' instead of '%s\t%s\n' as a format string.

Shell cut delimiter before last

I`m trying to cut a string (Name of a file) where I have to get a variable in the name.
But the problem is, I have to put it in a shell variable, until now it is ok.
Here is the example of what i have to do.
NAME_OF_THE_FILE_VARIABLEiWANTtoGET_DATE
NAMEfile_VARIABLEiWANT_DATE
NAME_FILE_VARIABLEiWANT_DATE
The position of the variable I want always can change, but it will be always 1 before last. The delimiter is the "_".
Is there a way to count the size of the array to get size-1 or something like that?
OBS: when i cut strings I always use things like that:
VARIABLEiWANT=`echo "$FILENAME" | cut 1 -d "_"`
awk -F'_' '{print $(NF-1)}' file
or you have a string
awk -F'_' '{print $(NF-1)}' <<< "$FILENAME"
save the output of above oneliner into your variable.
IFS=_ read -a array <<< "$FILENAME"
variable_i_want=${array[${#array[#]}-2]}
It's a bit of a mess visually, but it's more efficient than starting a new process. ${#array[#]} is the number of elements read from FILENAME, so the indices for the array range from 0 to ${#array[#]}-1.
As of bash 4.3, though, you can use a negative index instead of computing it.
variable_i_want=${array[-2]}
If you need POSIX compatibility (no arrays), then
tmp=${FILENAME%_${FILENAME##*_}} # FILENAME with last field removed
variable_i_want=${tmp##*_} # last field of tmp
Just got it... I find someone using a cat function... I got to use it with the echo... and rev. didn't understand the rev thing, but I think it revert the order of the delimiter.
CODIGO=`echo "$ARQ_NAME" | rev | cut -d "_" -f 2 | rev `

How can I read words (instead of lines) from a file?

I've read this question about how to read n characters from a text file using bash. I would like to know how to read a word at a time from a file that looks like:
example text
example1 text1
example2 text2
example3 text3
Can anyone explain that to me, or show me an easy example?
Thanks!
The read command by default reads whole lines. So the solution is probably to read the whole line and then split it on whitespace with e.g. for:
#!/bin/sh
while read line; do
for word in $line; do
echo "word = '$word'"
done
done <"myfile.txt"
The way to do this with standard input is by passing the -a flag to read:
read -a words
echo "${words[#]}"
This will read your entire line into an indexed array variable, in this case named words. You can then perform any array operations you like on words with shell parameter expansions.
For file-oriented operations, current versions of Bash also support the mapfile built-in. For example:
mapfile < /etc/passwd
echo ${MAPFILE[0]}
Either way, arrays are the way to go. It's worth your time to familiarize yourself with Bash array syntax to make the most of this feature.
Ordinarily, you should read from a file using a while read -r line loop. To do this and parse the words on the lines requires nesting a for loop inside the while loop.
Here is a technique that works without requiring nested loops:
for word in $(<inputfile)
do
echo "$word"
done
In the context given, where the number of words is known:
while read -r word1 word2 _; do
echo "Read a line with word1 of $word1 and word2 of $word2"
done
If you want to read each line into an array, read -a will put the first word into element 0 of your array, the second into element 1, etc:
while read -r -a words; do
echo "First word is ${words[0]}; second word is ${words[1]}"
declare -p words # print the whole array
done
In bash, just use space as delimiter (read -d ' '). This method requires some preprocessing to translate newlines into spaces (using tr) and to merge several spaces into a single one (using sed):
{
tr '\n' ' ' | sed 's/ */ /g' | while read -d ' ' WORD
do
echo -n "<${WORD}> "
done
echo
} << EOF
Here you have some words, including * wildcards
that don't get expanded,
multiple spaces between words,
and lines with spaces at the begining.
EOF
The main advantage of this method is that you don't need to worry about the array syntax and just work as with a for loop, but without wildcard expansion.
I came across this question and the proposed answers, but I don't see listed this simple possibile solution:
for word in `cat inputfile`
do
echo $word
done
This can be done using AWK too:
awk '{for(i=1;i<=NF;i++) {print $i}}' text_file
You can combine xargs which reads word delimited by space or newline and echo to print one per line:
<some-file xargs -n1 echo
some-command | xargs -n1 echo
That also works well for large or slow streams of data because it does not need to read the whole input at once.
I’ve used this to read 1 table name at a time from SQLite which prints table names in a column layout:
sqlite3 db.sqlite .tables | xargs -n1 echo | while read table; do echo "1 table: $table"; done

Resources