Column separation inside shell script - bash

If I have file.txt with the data:
abcd!1023!92
efgh!9873!xk
and a basic tutorial.sh file which goes through each line
while read line
do
name = $line
done < $1
How do I separate the data between the "!" into a column and select the second column and add them? (I am aware of the "sed -k 2 | bc " function but I can't/ do not understand how to get it to work with a shell script.

You can use awk:
awk -F '!' '{sum += $2} END{print sum}' file
10896

To adjust your while loop:
while IFS='!' read -r a b c
do
((sum += b))
done < "$1" # always quote "$vars"
echo "$sum"
IFS is the shell's "internal field separator" used for splitting strings into words. It's normally "whitespace" but you can use it for your specific needs.

Related

How to filter text data in bash more efficiently

I have data file which I need to filter with bash script, see data example:
name=pencils
name=apples
value=10
name=rocks
value=3
name=tables
value=6
name=beds
name=cups
value=89
I need to group name value pairs like so apples=10, if current line starts with name and next line starts with name, first line should be omitted entirely. So result file should look like this:
apples=10
rocks=3
tables=6
cups=89
I came with this simple solution which works but is very slow, it takes 5 min to complete for file with 2000 lines.
VALUES=$(cat input.txt)
for x in $VALUES; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done
I'm aware that this kind of task is not very suitable for bash, but script is already written and this is just small part of it.
How can I optimize this task in bash?
Do not run any commands in subshells, it slows your script a lot. You can do everything in the current shell.
#! /bin/bash
while IFS== read k v ; do
if [[ $k == name ]] ; then
name=$v
elif [[ $k == value ]] ; then
printf '%s=%s\n' "$name" "$v"
fi
done
There are three easy optimizations you can make that will greatly speed up the script without requiring a major rethink.
1. Replace for with while read
Loading input.txt into a string, and then looping over that string with for x in $VALUES is slow. It requires the whole file to be read into memory even though this task could be done in a streaming fashion, reading a line at a time.
A common replacement for for line in $(cat file) is while read line; do ... done < file. It turns out that loops are compound commands, and like the normal one-line commands we're used to, compound commands can have < and > redirections. Redirecting a file into a loop means that for the duration of the loop, stdin comes from the file. So if you call read line inside the loop then it will read one line each iteration.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done < input.txt
2. Redirect output outside loop
It's not just input that can be redirected. We can do the same thing for the >> output.txt redirection. Here's where you'll see the biggest speedup. When >> output.txt is inside the loop output.txt must be opened and closed every iteration, which is crazy slow. Moving it to the outside means it only needs to be opened once. Much, much faster.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}"
fi
done < input.txt > output.txt
3. Shell string processing
One final improvement is to use faster string processing. Calling grep requires forking a subprocess every time just to do a simple string split. It'd be a lot faster if we could do the string splitting using just shell constructs. Well, as it happens that's easy now that we've switched to read. read can do more than read whole lines; it can also split on a delimiter from the variable $IFS (inter-field separator).
while IFS='=' read -r key value; do
case "$key" in
name) name="$value";;
value) echo "$name=$value";;
fi
done < input.txt > output.txt
Further reading
BashFAQ/001 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
This explains why I have IFS= read -r in the first two iterations.
BashFAQ/024 - I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
cmd | while read; do ... done is another popular use of while read, but it has unique pitfalls.
BashFAQ/100 - How do I do string manipulations in bash?
More in-shell string processing options.
If you have performance issues do not use bash at all. Use a text processing tool like, for instance, awk:
$ awk -F= '{name = $2} $1 == "value" {print name "=" $2}' data.txt
apples=10
rocks=3
tables=6
cups=89
Explanation: -F= defines the field separator as character =. The first block is executed only if the first field of a line ($1) is equal to string value. It prints variable name followed by character = and the second field ($2). The second block is executed on each line and it stores the second field ($2) in variable name.
Normally, if your input resembles what you show, this should automatically skip the first line. Else, we can exclude it explicitly using a test on the NR variable which value is the line number, starting at 1:
awk -F= 'NR != 1 && $1 == "value" {print name "=" $2}
NR != 1 {name = $2}' data.txt
All this works on inputs like the one you show but not on inputs where you would have other types of lines or several value=... consecutive lines. If you really want to test that the name/value pair is on two consecutive lines we need something more. For instance, test if the first field is name and use another variable n to store the line number of the last encountered name=... line. With all these tests we can now put the 2 blocks in a slightly more intuitive order (but the opposite would work the same):
awk -F= 'NR != 1 && $1 == "name" {name = $2; n = NR}
NR != 1 && NR == n+1 && $1 == "value" {print name "=" $2}' data.txt
With awk there might be a more elegant solution but you can have:
awk 'BEGIN{RS="\n?name=";FS="\nvalue="} {if($2) printf "%s=%s\n",$1,$2}' inputs.txt
RS="\n?name=" says that the record separator is name=
FS="\nvalue=" says that the field separator for each record is value=
if($2) says to only proceed the printf is the second field exists

BASH Script : Explode string and save to file

I need some help. I have this
info.txt
[test.local]
user=test
group=test
;
[game.local]
user=game
group=game
;
this is my objective, i want it to be separated with ; and put it in a file where the file name is based on the value of [ ]
like this
test.local.txt
[test.local]
user=test
group=test
game.local.txt
[game.local]
user=game
group=game
and here my code currently files.sh
#!/bin/bash
value=$(<info.txt)
SAVEIFS=$IFS
IFS=$';'
val=($value)
IFS=$SAVEIFS
for (( i=0; i<${#val[#]}; i++ ))
do
echo "${val[$i]}"
done
in which im stuck with array only, how can i achieve it.
You may use this gnu-awk command:
awk -v RS=';\n' 'NF{f=$1; gsub(/[][]/, "", f); printf "%s", $0 > (f ".txt")}' info.txt
Details:
-v RS=';\n': sets input record separator to ; followed by newline
NF{...}: Execute only for non-empty lines
f=$1: Save $1 which is [...] line in variable f
gsub(/[][]/, "", f): Removes [ and ] from variable f
printf: Redirects a single block to a filename made with value of f and text ".txt"
You can use
fn=$(echo "${val[$i]}"|head -n 1| tr -d '[]').txt
to find the name of the file to create and
echo "${val[$i]}"|tail -n +2
to produce the content of the file.

Bash - Transpose a single field keeping the rest same and repeat it across

I have a file with pipe separated fields.
eg.
1,2,3|xyz|abc
I need the output in below format:
1|xyz|abc
2|xyz|abc
3|xyz|abc
I have a working code in bash:
while read i
do
f1=`echo $i | cut -d'|' -f1`
f2=`echo $i | cut -d'|' -f2-`
echo $f1 | tr ',' '\n' | sed "s:$:|$f2:" >> output.txt
done < pipe_delimited_file.txt
Can anyone suggest a way to achieve this witout using loop.
The file contains a large number of records.
Uses a loop, but it's inside awk, so very fast:
awk -F\| 'BEGIN{OFS="|"}{n = split($1, a, ","); $1=""; for(i=1; i<=n; i++) {print a[i] $0}}' pipe_delimited_file.txt
Perl may be a bit faster than awk:
perl -F'[|]' -ane 'for $n (split /,/, $F[0]) {$F[0] = $n; print join "|", #F}' file
bash is very slow, but here's a quicker way to use it. This uses plain bash without calling any external programs:
( # in a subshell:
IFS=, # use comma as field separator
set -f # turn off filename generation
while IFS='|' read -r f1 rest; do # temporarily using pipe as field separator,
# read first field and rest of line
for word in $f1; do # iterate over comma-separated words
echo "$word|$rest"
done
done
) < file

Is there any better solution to reverse uniq count

I want to reverse the uniq -c output from:
1 hi
2 test
3 try
to:
hi
test
test
try
try
try
My solution now is to use a loop:
while read a b; do yes $b |head -n $a ;done <test.txt
I wonder if there are any simpler commands to achieve that?
another awk
awk '{while ($1--) print $2}' file
Here's another solution
echo "\
1 hi
2 test
3 try" | awk '{for(i=1;i<=$1;i++){print($2)}}'
output
hi
test
test
try
try
try
This will work the same way, with
awk '{for(i=1;i<=$1;i++){print($2)}}' uniq_counts.txt
The core of the script, if of course
{ for (i=1;i<=$1;i++) { print $2 }
where awk has parsed the input into 2 fields, $1 being the number (1,2,3) and $2 is the value, (hi,test,try).
The condition i<=$1 tests that the counter i has not incremented beyond the count supplied in field $1, and the print $2 prints the value each time that i<=$1 condition is true.
IHTH
You don't need awk or any other command, you can do it entirely in bash
while read n s
do
for ((i=0; i<n; i++))
do
echo $s;
done
done < test.txt
Here my solution uses the bash brace expansion and the printf internal command.
while read a b
do
eval printf "'%.${#b}s\n'" "'$b'"{1..$a}
done <test.txt
The following simple example
printf '%s\n' test{1..2}
prints two lines which contain the string test followed by a number:
test1
test2
but we can specify the exact number of characters to print by using the precision field of the printf command:
printf '%.4s\n' test{1..2}
to display:
test
test
The length of the characters to print is given by the length of the text to print (${#b}).
Finally the eval command must be used in other to use variables in the brace expansion.

In Bash, how to extract a word and a following number from a file?

I've got a list which has many entries of two different formats:
Generated Request {some text} easy level group X
---or---
easy level group X {some text}
where X is a number between 1-6 digits long.
I'm trying to go through that file line by line and reduce down everything to just "group X" on each line (so that I can then compare it to another file).
I'll post my attempt below so you can join me in laughing at it, but I'm just picking up the basics of bash, awk and sed, so I apologize now for this assault on good scripting...
for line in $(< abc.txt);do
if [ ${line:0:2} == "Ge" ] then
awk '{print $8,$9}' $line >> allgood.txt
elif [ ${line:0:2} == "ea" ] then
awk '{print $3,$4}' $line >> allgood.txt
fi
done
The attempted logic was, if it starts with "Ge", then extract phrases $8 and $9 and append to a file. If it starts with "ea", then extract phrases $3 and $4 and append to the same file. However, this doesn't work at all.
Any thoughts?
The simplest approach for this problem is to use grep:
grep -o 'group [0-9]*' file
The -o option displays only the matching part of the line.
You never have to use bash to loop over every line in a file then pass the line to awk as this is exactly how awk works, it iterates over each line and applies the relevant blocks. Here is an approach using your logic in pure awk:
awk '/^Ge/{print $8,$9}/^ea/{print $3,$4}' file
You can do this with "while read" and avoid awk if you prefer:
while read a b c d e f g h i; do
if [ ${a:0:2} == "Ge" ]; then
echo $h $i >> allgood.txt;
elif [ ${a:0:2} == "ea" ]; then
echo $c $d >> allgood.txt;
fi;
done < abc.txt
The letters represent each column, so you'll need as many as you have columns. After that you just output the letters you need.

Resources