Is there any better solution to reverse uniq count - bash

I want to reverse the uniq -c output from:
1 hi
2 test
3 try
to:
hi
test
test
try
try
try
My solution now is to use a loop:
while read a b; do yes $b |head -n $a ;done <test.txt
I wonder if there are any simpler commands to achieve that?

another awk
awk '{while ($1--) print $2}' file

Here's another solution
echo "\
1 hi
2 test
3 try" | awk '{for(i=1;i<=$1;i++){print($2)}}'
output
hi
test
test
try
try
try
This will work the same way, with
awk '{for(i=1;i<=$1;i++){print($2)}}' uniq_counts.txt
The core of the script, if of course
{ for (i=1;i<=$1;i++) { print $2 }
where awk has parsed the input into 2 fields, $1 being the number (1,2,3) and $2 is the value, (hi,test,try).
The condition i<=$1 tests that the counter i has not incremented beyond the count supplied in field $1, and the print $2 prints the value each time that i<=$1 condition is true.
IHTH

You don't need awk or any other command, you can do it entirely in bash
while read n s
do
for ((i=0; i<n; i++))
do
echo $s;
done
done < test.txt

Here my solution uses the bash brace expansion and the printf internal command.
while read a b
do
eval printf "'%.${#b}s\n'" "'$b'"{1..$a}
done <test.txt
The following simple example
printf '%s\n' test{1..2}
prints two lines which contain the string test followed by a number:
test1
test2
but we can specify the exact number of characters to print by using the precision field of the printf command:
printf '%.4s\n' test{1..2}
to display:
test
test
The length of the characters to print is given by the length of the text to print (${#b}).
Finally the eval command must be used in other to use variables in the brace expansion.

Related

Unix bash script grep loop counter (for)

I am looping our the a grep result. The result contains 10 lines (every line has different content). So the loop stuff in the loop gets executed 10 times.
I need to get the index, 0-9, in the run so i can do actions based on the index.
ABC=(cat test.log | grep "stuff")
counter=0
for x in $ABC
do
echo $x
((counter++))
echo "COUNTER $counter"
done
Currently the counter won't really change.
Output:
51209
120049
148480
1211441
373948
0
0
0
728304
0
COUNTER: 1
If your requirement is to only print counter(which is as per shown samples only), in that case you could use awk(if you are ok with it), this could be done in a single awk like, without creating variable and then using grep like you are doing currently, awk could perform both search and counter printing in a single shot.
awk -v counter=0 '/stuff/{print "counter=" counter++}' Input_file
Replace stuff string above with the actual string you are looking for and place your actual file name for Input_file in above.
This should print like:
counter=1
counter=2
........and so on
Your shell script contains what should be an obvious syntax error.
ABC=(cat test.log | grep "stuff")
This fails with
-bash: syntax error near unexpected token `|'
There is no need to save the output in a variable if you only want to process one at a time (and obviously no need for the useless cat).
grep "stuff" test.log | nl
gets you numbered lines, though the index will be 1-based, not zero-based.
If you absolutely need zero-based, refactoring to Awk should solve it easily:
awk '/stuff/ { print n++, $0 }' test.log
If you want to loop over this and do something more with this information,
awk '/stuff/ { print n++, $0 }' test.log |
while read -r index output; do
echo index is "$index"
echo output is "$output"
done
Because the while loop executes in a subshell the value of index will not be visible outside of the loop. (I guess that's what your real code did with the counter as well. I don't think that part of the code you posted will repro either.)
Do not store the result of grep in a scalar variable $ABC.
If the line of the log file contains whitespaces, the variable $x
is split on them due to the word splitting of bash.
(BTW the statement ABC=(cat test.log | grep "stuff") causes a syntax error.)
Please try something like:
readarray -t abc < <(grep "stuff" test.log)
for x in "${abc[#]}"
do
echo "$x"
echo "COUNTER $((++counter))"
done
or
readarray -t abc < <(grep "stuff" test.log)
for i in "${!abc[#]}"
do
echo "${abc[i]}"
echo "COUNTER $((i + 1))"
done
you can use below increment statement-
counter=$(( $counter + 1));

Print character n times based on column value

I am trying to print the character "N" multiple times, based on the value of a column in several lines of a table
For example, the output of this table:
AATTGGCC A 1.7 4
CCGGTTAA T 0.8 3
AAGGTTCC G 2.4 7
Would ideally return:
NNNN
NNN
NNNNNNN
based on the value in column 4.
Currently, I have been using:
while read line
do
a=$4
printf "$0.sN\n" {a}
done < table.txt
But this only returns three 'N's, each on new lines.
Ideally I would also like to print out the letter from column 2 on the end of each line of output.
Can anyone please help me?
With bash:
while read x x x end; do
for ((i=0; i<$end; i++)); do
echo -n "N"
done
echo
done < file
Output:
NNNN
NNN
NNNNNNN
Building off the example you gave, I would change it to this:
while read line
do
a=$(echo ${line} | awk '{print $4}')
printf "N%.0s" $(seq $a)
echo
done < table.txt
It's not clear to me how you were able to get the 3 N's from your original solution (I could not reproduce), but maybe we have a different configuration. The solution I posted above is based on BASH. Inside the while loop, the first line of code,
a=$(echo ${line} | awk '{print $4}')
echos the line then pipes it into an awk statement that only looks at the fourth column and saves it to a variable called "a". The $(...) notation simply means to evaluate the statement within the parenthesis as a BASH command (in this case an echo piped into an awk command). On the second line,
printf "N%.0s" $(seq $a)
I print out $a number of Ns via the printf command. Once again, the $(...) simply runs the BASH command, which in this case is a seq command to print $a characters. With this in a script, I was able to get the following result for your example table.txt file:
NNNN
NNN
NNNNNNN

Column separation inside shell script

If I have file.txt with the data:
abcd!1023!92
efgh!9873!xk
and a basic tutorial.sh file which goes through each line
while read line
do
name = $line
done < $1
How do I separate the data between the "!" into a column and select the second column and add them? (I am aware of the "sed -k 2 | bc " function but I can't/ do not understand how to get it to work with a shell script.
You can use awk:
awk -F '!' '{sum += $2} END{print sum}' file
10896
To adjust your while loop:
while IFS='!' read -r a b c
do
((sum += b))
done < "$1" # always quote "$vars"
echo "$sum"
IFS is the shell's "internal field separator" used for splitting strings into words. It's normally "whitespace" but you can use it for your specific needs.

Set variable in current shell from awk

Is there a way to set a variable in my current shell from within awk?
I'd like to do some processing on a file and print out some data; since I'll read the whole file through, I'd like to save the number of lines -- in this case, FNR.
Happens though I can't seem to find a way to set a shell variable with FNR value; if not this, I'd have to read the FNR from my output file, to set, say num_lines, with FNR value.
I've tried some combinations using awk 'END{system(...)}', but could not manage it to work. Any way around this?
Here's another way.
This is especially useful when when you've got the values of your variables in a single variable and you want split them up. For example, you have a list of values from a single row in a database that you want to create variables out of.
val="hello|beautiful|world" # assume this string comes from a database query
read a b c <<< $( echo ${val} | awk -F"|" '{print $1" "$2" "$3}' )
echo $a #hello
echo $b #beautiful
echo $c #world
We need the 'here string' i.e <<< in this case, because the read command does not read from a pipe and instead reads from stdin
$ echo "$var"
$ declare $( awk 'BEGIN{print "var=17"}' )
$ echo "$var"
17
Here's why you should use declare instead of eval:
$ eval $( awk 'BEGIN{print "echo \"removing all of your files, ha ha ha....\""}' )
removing all of your files, ha ha ha....
$ declare $( awk 'BEGIN{print "echo \"removing all of your files\""}' )
bash: declare: `"removing': not a valid identifier
bash: declare: `files"': not a valid identifier
Note in the first case that eval executes whatever string awk prints, which could accidentally be a very bad thing!
You can't export variables from a subshell to its parent shell. You have some other choices, though, including:
Make another pass of the file using AWK to count records, and use command substitution to capture the result. For example:
FNR=$(awk 'END {print FNR}' filename)
Print FNR in the subshell, and parse the output in your other process.
If FNR is the same as number of lines, you can call wc -l < filename to get your count.
A warning for anyone trying to use declare as suggested by several answers.
eval does not have this problem.
If the awk (or other expression) provided to declare results in an empty string then declare will dump the current environment.
This is almost certainly not what you would want.
eg: if your awk pattern doesn't exist in the input you will never print an output, therefore you will end up with unexpected behaviour.
An example of this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {print "var=17"}' )
echo "var=$var"
var=99
The current environment as seen by declare is printed
and $var is not changed
A minor change to store the value to set in an awk variable and print it at the end solves this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
To show this working with a matching pattern
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
Make awk print out the assignment statement:
MYVAR=NewValue
Then in your shell script, eval the output of your awk script:
eval $(awk ....)
# then use $MYVAR
EDIT: people recommend using declare instead of eval, to be slightly less error-prone if something other than the assignment is printed by the inner script. It's bash-only, but it's okay when the shell is bash and the script has #!/bin/bash, correctly stating this dependency.
The eval $(...) variant is widely used, with existing programs generating output suitable for eval but not for declare (lesspipe is an example); that's why it's important to understand it, and the bash-only variant is "too localized".
To synthesize everything here so far I'll share what I find is useful to set a shell environment variable from a script that reads a one-line file using awk. Obviously a /pattern/ could be used instead of NR==1 to find the needed variable.
# export a variable from a script (such as in a .dotfile)
declare $( awk 'NR==1 {tmp=$1} END {print "SHELL_VAR=" tmp}' /path/to/file )
export SHELL_VAR
This will avoid a massive output of variables if a declare command is issued with no argument, as well as the security risks of a blind eval.
echo "First arg: $1"
for ((i=0 ; i < $1 ; i++)); do
echo "inside"
echo "Welcome $i times."
cat man.xml | awk '{ x[NR] = $0 } END { for ( i=2 ; i<=NR ; i++ ) { if (x[i] ~ // ) {x[i+1]=" '$i'"}print x[i] }} ' > $i.xml
done
echo "compleated"

In bash, how can I print the first n elements of a list?

In bash, how can I print the first n elements of a list?
For example, the first 10 files in this list:
FILES=$(ls)
UPDATE: I forgot to say that I want to print the elements on one line, just like when you print the whole list with echo $FILES.
FILES=(*)
echo "${FILES[#]:0:10}"
Should work correctly even if there are spaces in filenames.
FILES=$(ls) creates a string variable. FILES=(*) creates an array. See this page for more examples on using arrays in bash. (thanks lhunath)
Why not just this to print the first 50 files:
ls -1 | head -50
FILE="$(ls | head -1)"
Handled spaces in filenames correctly too when I tried it.
My way would be:
ls | head -10 | tr "\n" " "
This will print the first 10 lines returned by ls, and then tr replaces all line breaks with spaces. Output will be on a single line.
echo $FILES | awk '{for (i = 1; i <= 10; i++) {print $i}}'
Edit: AAh, missed your comment that you needed them on one line...
echo $FILES | awk '{for (i = 1; i <= 10; i++) {printf "%s ", $i}}'
That one does that.
to do it interactively:
set $FILES && eval eval echo \\\${1..10}
to run it as a script, create foo.sh with contents
N=$1; shift; eval eval echo \\\${1..$N}
and run it as
bash foo.sh 10 $FILES
An addition to the answer of "Ayman Hourieh" and "Shawn Chin", in case it is needed for something else than content of a directory.
In newer version of bash you can use mapfile to store the directory in an array. See help mapfile
mapfile -t files_in_dir < <( ls )
If you want it completely in bash use printf "%s\n" * instead of ls, or just replace ls with any other command you need.
Now you can access the array as usual and get the data you need.
First element:
${files_in_dir[0]}
Last element (do not forget space after ":" ):
${files_in_dir[#]: -1}
Range e.g. from 10 to 20:
${files_in_dir[#]:10:20}
Attention for large directories, this is way more memory consuming than the other solutions.
FILES=$(ls)
echo $FILES | fmt -1 | head -10

Resources