I'm attempting to store the output of a series of beeline HQL queries into an array, so that I can parse it to pull out the interesting bits. Here's the relevant code:
#!/usr/bin/env ksh
ext_output=()
while IFS= read -r line; do
ext_output+=( "$line" )
done < <( bee --hiveconf hive.auto.convert.join=false -f temp.hql)
bee is just an alias to the full beeline command with the JDBC url, etc. Temp.hql is multiple hql queries.
And here's a snippet of what the output of each query looks like:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| tableName:myTable |
| owner:foo |
| location:hdfs://<server>/<path>...
<big snip>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
15 rows selected (0.187 seconds)
The problem is, my array is only getting the last line from each result (15 rows selected (0.187 seconds).
Am I doing something wrong here? The exact same approach is working in other instances, so I really don't understand.
Hmmmm, I'm not having any problems with the code you've posted.
I can reproduce what I think you may be seeing (ie, array contains a single value consisting of the last line of output) if I make the following change in your code:
# current/correct code - from your post
ext_output+=( "$line" )
# modified/wrong code
ext_output=+( "$line" )
Notice the placement of the plus sign (+):
when on the left side of the equal sign (+=) each $line is appended to the end of the array (see sample run - below)
when on the right side of the equal sign (=+) each $line is assigned to the first slot in the array (index=0); the plus sign (+) and parens (()) are treated as part of the data to be stored in the array (see sample run - at bottom of this post)
Could there be a typo between what you're running (with 'wrong' results) vs what you've posted here in this thread (and what you've mentioned generates the correct results in other instances)?
Here's what I get when I run your posted code (plus sign on the left of the equal sign : +=) ...
NOTE: I've replaced the bee/HCL call with an output file containing your sample lines plus a couple (bogus) data lines; also cut down the longer lines for readability:
$ cat temp.out
-----------------------------------------+--+
| tableName:myTable
| owner:foo
| location:hdfs://<server>/<path>...
abc def ghi
123 456 789
-----------------------------------------+--+
15 rows selected (0.187 seconds)
Then I ran your code against temp.out:
ext_output=()
while IFS= read -r line
do
ext_output+=( "$line" )
done < temp.out
Some stats on the array:
$ echo "array size : ${#ext_output[*]}"
array size : 10
$ echo "array indx : ${!ext_output[*]}"
array indx : 0 1 2 3 4 5 6 7 8 9
$ echo "array vals : ${ext_output[*]}"
array vals : -----------------------------------------+--+ | tableName:myTable | owner:foo | location:hdfs://<server>/<path>... abc def ghi 123 456 789 -----------------------------------------+--+ 15 rows selected (0.187 seconds)
And a dump of the array's contents:
$ for i in ${!ext_output[*]}
> do
> echo "${i} : ${ext_output[$i]}"
> done
0 : -----------------------------------------+--+
1 : | tableName:myTable
2 : | owner:foo
3 : | location:hdfs://<server>/<path>...
4 :
5 : abc def ghi
6 : 123 456 789
7 :
8 : -----------------------------------------+--+
9 : 15 rows selected (0.187 seconds)
If I modify your code to place the plus sign on the right side of the equal sign (=+) ...
ext_output=()
while IFS= read -r line
do
ext_output=+( "$line" )
done < temp.out
... the array stats:
$ echo "array size : ${#ext_output[*]}"
array size : 1
$ echo "array indx : ${!ext_output[*]}"
array indx : 0
$ echo "array vals : ${ext_output[*]}"
array vals : +( 15 rows selected (0.187 seconds) )
... and the contents of the array:
$ for i in ${!ext_output[*]}
> do
> echo "${i} : ${ext_output[$i]}"
> done
0 : +( 15 rows selected (0.187 seconds) )
!! Notice that the plus sign and parens are part of the string stored in ext_output[0]
Related
I have a file log.txt as below
1 0.694003 5.326995 7.500997 6.263974 0.633941 36.556128
2 2.221990 4.422010 4.652992 5.964420 0.660997 51.874905
3 4.376005 7.440002 6.260000 6.238917 0.728308 10.927455
4 1.914000 5.451991 0.668012 6.355688 0.634081 106.733134
5 2.530005 0.000000 8.084005 3.916278 0.687023 2252.538670
6 1.997993 1.406001 7.977006 3.923551 0.517551 37.611894
7 0.971998 1.823007 8.804005 4.110159 0.567905 905.995133
8 0.480005 3.109009 8.711002 4.060954 0.508963 553.712280
9 1.015001 3.996992 7.781004 3.547329 0.396635 16.883011
I want to read 6th column of this file into an array myArray so that it will give below:
echo ${myArray[9]} = 0.396635
Thank you.
Here's a way to do this (bash 4+), assuming log.txt's first column starts at 1 and doesn't skip any numbers.
readarray -t myArray < <(tr -s ' ' < log.txt | cut -d' ' -f6)
echo ${myArray[8]}
tr -s ' ' collapses the whitespace, for easier manipulation
cut -d' ' -f6 selects the 6th space separated column
<(...) turns the subcommand into a temporary file
readarray reads lines from the file into the variable myArray
Note that the array is 8 indexed, so I've selected [8] instead of [9].
Assumption:
first column of file is an integer
first column of file may not be sequential
OP wants (needs?) the array index to match the value in the first column
Sample data file:
$ cat log.txt
3 4.376005 7.440002 6.260000 6.238917 0.728308 10.927455
5 2.530005 0.000000 8.084005 3.916278 0.687023 2252.538670
7 0.971998 1.823007 8.804005 4.110159 0.567905 905.995133
9 1.015001 3.996992 7.781004 3.547329 0.396635 16.883011
23 0.480005 3.109009 8.711002 4.060954 0.508963 553.712280
One idea using awk (to parse the input file):
$ awk '{print $1,$6}' log.txt
3 0.728308
5 0.687023
7 0.567905
9 0.396635
23 0.508963
We can then feed this into a while loop to build the array:
unset myArray
while read -r ndx value
do
myArray["${ndx}"]="${value}"
done < <(awk '{print $1,$6}' log.txt)
Verify contents of array:
$ typeset -p myArray
declare -a myArray=([3]="0.728308" [5]="0.687023" [7]="0.567905" [9]="0.396635" [23]="0.508963")
$ for ndx in "${!myArray[#]}"
do
echo "index = ${ndx} ; value = ${myArray[${ndx}]}"
done
index = 3 ; value = 0.728308
index = 5 ; value = 0.687023
index = 7 ; value = 0.567905
index = 9 ; value = 0.396635
index = 23 ; value = 0.508963
Another approach using just bash4+ builtins. (if acceptable)
#!/usr/bin/env bash
mapfile -t rows < log.txt
read -ra column <<< "${rows[8]}"
echo "${column[5]}"
This is my simple shell script
root#Ubuntu:/tmp# cat -n script.sh
1 echo
2 while x= read -n 1 char
3 do
4 echo -e "Original value = $char"
5 echo -e "Plus one = `expr $char + 1`\n"
6 done < number.txt
7 echo
root#Ubuntu:/tmp#
And this is the content of number.txt
root#Ubuntu:/tmp# cat number.txt
12345
root#Ubuntu:/tmp#
As you can see on the code, I'm trying to read each number and process it separately. In this case, I would like to add one to each of them and print it on a new line.
root#Ubuntu:/tmp# ./script.sh
Original value = 1
Plus one = 2
Original value = 2
Plus one = 3
Original value = 3
Plus one = 4
Original value = 4
Plus one = 5
Original value = 5
Plus one = 6
Original value =
Plus one = 1
root#Ubuntu:/tmp#
Everything looks fine except for the last line. I've only have 5 numbers, however it seems like the code is processing additional one.
Original value =
Plus one = 1
Question is how does this happen and how to fix it?
It seems the input file number.txt contains a complete line, which is terminated by a line feed character (LF). (You can verify the input file is longer than 5 using ls -l.) read eventually encounters the LF and gives you an empty char (stripping the terminating LF from the input as it would without the -n option). This will give you expr + 1 resulting in 1. You can explicitely test for the empty char and terminate the while loop using the test -n for non-zero length strings:
echo "12345" | while read -n 1 char && [ -n "$char" ]; do echo "$char" ; done
This is a follow up from my other post:
Printing all palindromes from text file
I want to be able to print to amount of palindromes that I have found from my text file similar to a frequency table. It'll show the amount of the word followed by the word, similar to this format:
100 did
32 sas
17 madam
My code right now is:
#!usr/bin/env bash
function search
{
grep -oiE '[a-z]{3,}' "$1" | sort -n | tr '[:upper:]' '[:lower:]' | while read -r word; do
[[ $word == $(rev <<< "$word") ]] && echo "$word" | uniq -c
done
}
search "$1"
In comparison to the last post I did: Printing all palindromes from text file . I have added "sort -n" and "uniq -c" which from my knowledge is to sort the palindromes found in alphabetical order, then "uniq -c" is to print the number of occurrences of the words found.
Just to test script I have a testing file named: "testingfile.txt" . This contains:
testing words testing words testing words
palindromes
Sas
Sas
Sas
sas
bob
Sas
Sas
Sas Sas madam
midim poop goog tot sas did i want to go to the movies did
otuikkiuto
pop
poop
This file is just so I can test before trying this script on a much larger file in which it'll take much longer.
When typing in the console: (also to note "palindrome" is the name of my script)
source palindrome testingfile.txt
The output appears like this:
1 bob
1 did
1 did
1 goog
1 madam
1 midim
1 otuikkiuto
1 poop
1 poop
1 pop
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 tot
Is there something I am missing to get the result that I want:
9 sas
2 did
2 poop
1 bob
1 goog
1 madam
1 midim
1 otuikkiuto
1 pop
1 tot
Solutions to this would be greatly appreciated! If there are solutions with other commands that are needed an explanation of the reasoning behind the other commands are also greatly appreciated.
Thank you
You missed two important details:
You need to pass all input at once to uniq -c to count them, not one by one to one uniq each
uniq expects its input to be sorted. The sort you had in the grep pipeline is ineffective, because after the transformation to lowercase, the values would need to be sorted again
You can apply sort | uniq -c to the output of an entire loop,
by piping the loop itself:
grep -oiE '[a-z]{3,}' "$1" | tr '[:upper:]' '[:lower:]' | while read -r word; do
[[ $word == $(rev <<< "$word") ]] && echo "$word"
done | sort | uniq -c
Finally, to get an output sorted in descending order by count,
you need to further pipe the output to sort -nr.
I have 2 scripts, #1 and #2. Each work OK by themselves. I want to read a 15 row file, row by row, and process it. Script #2 selects rows. Row 0 is is indicated as firstline=0, lastline=1. Row 14 would be firstline=14, lastline=15. I see good results from echo. I want to do the same with script #1. Can't get my head around nesting correctly. Code below.
#!/bin/bash
# script 1
filename=slash
firstline=0
lastline=1
i=0
exec <${filename}
while read ; do
i=$(( $i + 1 ))
if [ "$i" -ge "${firstline}" ] ; then
if [ "$i" -gt "${lastline}" ] ; then
break
else
echo "${REPLY}" > slash1
fold -w 21 -s slash1 > news1
sleep 5
fi
fi
done
# script2
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
for ((i=0;i<${#firstline[#]};i++))
do
echo ${firstline[$i]} ${lastline[$i]};
done
Your question is very unclear, but perhaps you are simply looking for some simple function calls:
#!/bin/bash
script_1() {
filename=slash
firstline=$1
lastline=$2
i=0
exec <${filename}
while read ; do
i=$(( $i + 1 ))
if [ "$i" -ge "${firstline}" ] ; then
if [ "$i" -gt "${lastline}" ] ; then
break
else
echo "${REPLY}" > slash1
fold -w 21 -s slash1 > news1
sleep 5
fi
fi
done
}
# script2
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
for ((i=0;i<${#firstline[#]};i++))
do
script_1 ${firstline[$i]} ${lastline[$i]};
done
Note that reading the file this way is extremely inefficient, and there are undoubtedly better ways to handle this, but I am trying to minimize the changes from your code.
Update: Based on your later comments, the following idiomatic Bash code that uses sed to extract the line of interest in each iteration solves your problem much more simply:
Note:
- If the input file does not change between loop iterations, and the input file is small enough (as it is in the case at hand), it's more efficient to buffer the file contents in a variable up front, as is demonstrated in the original answer below.
- As tripleee points out in a comment: If simply reading the input lines sequentially is sufficient (as opposed to extracting lines by specific line numbers, then a single, simple while read -r line; do ... # fold and output, then sleep ... done < "$filename" is enough.
# Determine the input filename.
filename='slash'
# Count its number of lines.
lineCount=$(wc -l < "$filename")
# Loop over the line numbers of the file.
for (( lineNum = 1; lineNum <= lineCount; ++lineNum )); do
# Use `sed` to extract the line with the line number at hand,
# reformat it, and output to the target file.
fold -w 21 -s <(sed -n "$lineNum {p;q;}" "$filename") > 'news1'
sleep 5
done
A simplified version of what I think you're trying to achieve:
#!/bin/bash
# Split fields by newlines on input,
# and separate array items by newlines on output.
IFS=$'\n'
# Read all input lines up front, into array ${lines[#]}
# In terms of your code, you'd use
# read -d '' -ra lines < "$filename"
read -d '' -ra lines <<<$'line 1\nline 2\nline 3\nline 4\nline 5\nline 6\nline 7\nline 8\nline 9\nline 10\nline 11\nline 12\nline 13\nline 14\nline 15'
# Define the arrays specifying the line ranges to select.
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
# Loop over the ranges and select a range of lines in each iteration.
for ((i=0; i<${#firstline[#]}; i++)); do
extractedLines="${lines[*]: ${firstline[i]}: 1 + ${lastline[i]} - ${firstline[i]}}"
# Process the extracted lines.
# In terms of your code, the `> slash1` and `fold ...` commands would go here.
echo "$extractedLines"
echo '------'
done
Note:
The name of the array variable filled with read -ra is lines; ${lines[#]} is Bash syntax for returning all array elements as separate words (${lines[*]} also refers to all elements, but with slightly different semantics), and this syntax is used in the comments to illustrate that lines is indeed an array variable (note that if you were to use simply $lines to reference the variable, you'd implicitly get only the item with index 0, which is the same as: ${lines[0]}.
<<<$'line 1\n...' uses a here-string (<<<) to read an ad-hoc sample document (expressed as an ANSI C-quoted string ($'...')) in the interest of making my example code self-contained.
As stated in the comment, you'd read from $filename instead:
read -d '' -ra lines <"$filename"
extractedLines="${lines[*]: ${firstline[i]}: 1 + ${lastline[i]} - ${firstline[i]}}" extracts the lines of interest; ${firstline[i]} references the current element (index i) from array ${firstline[#]}; since the last token in Bash's array-slicing syntax
(${lines[*]: <startIndex>: <elementCount>}) is the count of elements to return, we must perform a calculation to determine the count, which is what 1 + ${lastline[i]} - ${firstline[i]} does.
By virtue of using "${lines[*]...}" rather than "${lines[#]...}", the extracted array elements are joined by the first character in $IFS, which in our case is a newline ($'\n') (when extracting a single line, that doesn't really matter).
I'm trying to write a bash script that calculates the average of numbers by rows and columns. An example of a text file that I'm reading in is:
1 2 3 4 5
4 6 7 8 0
There is an unknown number of rows and unknown number of columns. Currently, I'm just trying to sum each row with a while loop. The desired output is:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
And so on and so forth with each row. Currently this is the code I have:
while read i
do
echo "num: $i"
(( sum=$sum+$i ))
echo "sum: $sum"
done < $2
To call the program it's stats -r test_file. "-r" indicates rows--I haven't started columns quite yet. My current code actually just takes the first number of each column and adds them together and then the rest of the numbers error out as a syntax error. It says the error comes from like 16, which is the (( sum=$sum+$i )) line but I honestly can't figure out what the problem is. I should tell you I'm extremely new to bash scripting and I have googled and searched high and low for the answer for this and can't find it. Any help is greatly appreciated.
You are reading the file line by line, and summing line is not an arithmetic operation. Try this:
while read i
do
sum=0
for num in $i
do
sum=$(($sum + $num))
done
echo "$i Sum: $sum"
done < $2
just split each number from every line using for loop. I hope this helps.
Another non bash way (con: OP asked for bash, pro: does not depend on bashisms, works with floats).
awk '{c=0;for(i=1;i<=NF;++i){c+=$i};print $0, "Sum:", c}'
Another way (not a pure bash):
while read line
do
sum=$(sed 's/[ ]\+/+/g' <<< "$line" | bc -q)
echo "$line Sum = $sum"
done < filename
Using the numsum -r util covers the row addition, but the output format needs a little glue, by inefficiently paste-ing a few utils:
paste "$2" \
<(yes "Sum =" | head -$(wc -l < "$2") ) \
<(numsum -r "$2")
Output:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
Note -- to run the above line on a given file foo, first initialize $2 like so:
set -- "" foo
paste "$2" <(yes "Sum =" | head -$(wc -l < "$2") ) <(numsum -r "$2")