Efficient way of indexing a specific number from a text file - bash

I have a text file containing a line of various numbers (i.e. 2 4 1 7 12 1 4 4 3 1 1 2)
I'm trying to get the index for each occurrence of 1. This is my code for what I'm currently doing (subtracting each index value by 1 since my indexing starts at 0).
eq='0'
gradvec=()
count=0
length=0
for item in `cat file`
do
((count++))
if (("$item"=="$eq"))
then
((length++))
if (("$length"=='1'))
then
gradvec=$((count -1))
else
gradvec=$gradvec' '$((count - 1))
fi
fi
done
Although the code works, I was wondering if there was a shorter way of doing this? The result is the gradvec variable being
2 5 9 10

Consider this as the input file:
$ cat file
2 4 1 7 12 1
4 4 3 1 1 2
To get the indices of every occurrence of 1 in the input file:
$ awk '$1==1 {print NR-1}' RS='[[:space:]]+' file
2
5
9
10
How it works:
$1==1 {print NR-1}
If the value in any record is 1, print the record number minus 1.
RS='[[:space:]]+'
Define the record separator as one or more of any kind of space.

Related

Cannot print in awk command in bash script

I am trying to read values from a file and print specific items into a variable which I will use later.
cat /dir1/file1 | while read blmbline2
do
BLMBFILE2=`print $blmbline2 | awk '{$1=""; print $0}'`
echo $BLMBFILE2
done
When I run that same code at the command line, it runs as expected, but, when I run it in a bash script called testme.sh, I get this error:
./testme.sh: line 3: print: command not found
If I run print by itself at the command prompt, I don't get an error (just a blank line).
If I run "bash" and then print at the command prompt, I get command not found.
I can't figure out what I'm doing wrong. Can someone suggest?
updated: I see some other posts that say to use echo or printf? Is there a difference I need to be concerned with in using one of those in bash?
Since awk can read files, you may be able to do away with the cat | while read and just use awk. Using a sample file containing:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
Declare your bash array variable and populate with the output from awk:
arr=() ; arr=($(awk '{$1=""; print $0}' /dir1/file1))
Use the following to display array size and contents:
printf "array length: %d\narray contents: %s\n" "${#arr[#]}" "${arr[*]}"
Output:
array length: 30
array contents: 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6
Change print to echo in your shell script. With printf you can format the data and with echo it will print the entire line of the file. Also, create an array so you can store multiple items:
BLMBFILE2=()
while IFS= read -r -d $'\0'
do
BLMBFILE2+=(`echo $REPLY | awk '{$1=""; print $0}'`)
echo $BLMBFILE2
done < <(cat /dir1/file1)
echo "Items found:"
for value in "${BLMBFILE2[#]}"
do
echo $value
done

Replace the nth field of every mth line using awk or bash

For a file that contains entries similar to as follows:
foo 1 6 0
fam 5 11 3
wam 7 23 8
woo 2 8 4
kaz 6 4 9
faz 5 8 8
How would you replace the nth field of every mth line with the same element using bash or awk?
For example, if n = 1 and m = 3 and the element = wot, the output would be:
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
I understand you can call / print every mth line using e.g.
awk 'NR%7==0' file
So far I have tried to keep this in memory but to no avail... I need to keep the rest of the file as well.
I would prefer answers using bash or awk, but sed solutions would also be helpful. I'm a beginner in all three. Please explain your solution.
awk -v m=3 -v n=1 -v el='wot' 'NR % m == 0 { $n = el } 1' file
Note, however, that the inter-field whitespace is not guaranteed to be preserved as-is, because awk splits a line into fields by any run of whitespace; as written, the output fields of modified lines will be separated by a single space.
If your input fields are consistently separated by 2 spaces, however, you can effectively preserve the input whitespace by adding -F' ' -v OFS=' ' to the awk invocation.
-v m=3 -v n=1 -v el='wot' defines Awk variables m, n, and el
NR % m == 0 is a pattern (condition) that evaluates to true for every m-th line.
{ $n = el } is the associated action that replaces the nth field of the input line with variable el, causing the line to be rebuilt, implicitly using OFS, the output-field separator, which defaults to a space.
1 is a common Awk shorthand for printing the (possibly modified) input line at hand.
Great little exercise. While I would probably lean toward an awk solution, in bash you can also rely on parameter expansion with substring replacement to replace the nth field of every mth line. Essentially, you can read every line, preserving whitespace, then check your line count, e.g. if c is your line counter and m your variable for mth line, you could use:
if (( $((c % m )) == 0)) ## test for mth line
If the line is a replacement line, you can read each word into an array after restoring default word-splitting and then use your array element index n-1 to provide the replacement (e.g. ${line/find/replace} with ${line/"${array[$((n-1))]}"/replace}).
If it isn't a replacement line, simply output the line unchanged. A short example could be similar to the following (to which you can add additional validations as required)
#!/bin/bash
[ -n "$1" -a -r "$1" ] || { ## filename given an readable
printf "error: insufficient or unreadable input.\n"
exit 1
}
n=${2:-1} ## variables with default n=1, m=3, e=wot
m=${3:-3}
e=${4:-wot}
c=1 ## line count
while IFS= read -r line; do
if (( $((c % m )) == 0)) ## test for mth line
then
IFS=$' \t\n'
a=( $line ) ## split into array
IFS=
echo "${line/"${a[$((n-1))]}"/$e}" ## nth replaced with e
else
echo "$line" ## otherwise just output line
fi
((c++)) ## advance counter
done <"$1"
Example Use/Output
n=1, m=3, e=wot
$ bash replmn.sh dat/repl.txt
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
n=1, m=2, e=baz
$ bash replmn.sh dat/repl.txt 1 2 baz
foo 1 6 0
baz 5 11 3
wam 7 23 8
baz 2 8 4
kaz 6 4 9
baz 5 8 8
n=3, m=2, e=99
$ bash replmn.sh dat/repl.txt 3 2 99
foo 1 6 0
fam 5 99 3
wam 7 23 8
woo 2 99 4
kaz 6 4 9
faz 5 99 8
An awk solution is shorter (and avoids problems with duplicate occurrences of the replacement string in $line), but both would need similar validation of field existence, etc.. Learn from both and let me know if you have any questions.

sort a line with bunch of numbers

I have a line that goes like:
string 2 2 3 3 1 4
where the 2nd, 4th and 6th columns represent an ID (assuming each ID number is unique) and 3rd, 5th and 7th columns represent some data associated with respective ID.
How can I re-arrange the line so that it will be sorted by the ID?
string 1 4 2 2 3 3
Note: a line may have any number of IDs, unlike the example.
Using shell script, I'm thinking something like
while read n
do
echo $(echo $n | sork -k (... stuck here) )
done < infile
Another bash alternative which does not rely on how many ids there are:
#!/usr/bin/env bash
x='string 2 2 3 3 1 4'
out="${x%% *}"
in=($x)
for (( i = 1; i < ${#in[*]}; i += 2 ))
do
new[${in[i]}]=${in[i+1]}
done
for i in ${!new[#]}
do
out="$out $i ${new[i]}"
done
echo $out
You can put a loop around the lot if you then want to read a file
I'll add an gawk solution to your long list of options.
This is a standalone script:
#!/usr/bin/env gawk -f
{
line=$1
# Collect the tuples into values of an array,
for (i=2;i<NF;i+=2) a[i]=$i FS $(i+1)
# This sorts the array "a" by value, numerically, ascending...
asort(a, a, "#val_num_asc")
# And this for loop gathers the result.
for (i=0; i<length(a); i++) line=line FS a[i]
# Finally, print the line,
print line
# and clear the array for the next round.
delete a
}
This works by copying your tuples into an array, sorting the array, then reassembling the sorted tuples in a for loop that prints the array elements.
Note that it's gawk-only (not traditional awk) because of the use of asort().
$ cat infile
string 2 2 3 3 1 4
other 5 1 20 9 3 7
$ ./sorttuples infile
string 1 4 2 2 3 3
other 3 7 5 1 20 9
As a bash script this can be done with:
Code:
#!/usr/bin/env bash
# send field pairs as separate lines
function emit_line() {
while [ $# -gt 0 ] ; do
echo "$1" "$2"
shift; shift
done
}
# break the line into pieces and send to sort
function sort_line() {
echo $1
shift
emit_line $* | sort
}
# loop through the lines in the file and sort by key-value pairs
while read n; do
echo $(sort_line $n)
done < infile
File infile:
string 2 2 3 3 1 4
string 2 2 0 3 4 4 1 7
string 2 2 0 3 2 1
Output:
string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4
string 0 3 2 1 2 2
Update:
Cribbing the sort from grail's version, to remove the (much slower) external sort:
function sort_line() {
line="$1"
shift
while [ $# -gt 0 ] ; do
data[$1]=$2
shift; shift
done
for i in ${!data[#]}; do
out="$line $i ${data[i]}"
done
unset data
echo $line
}
while read n; do
sort_line $n
done < infile
You can use python for this. This function breaks up the column into a list of tuples that can then be sorted. itertools.chain is then used to re-assemble the key values pairs.
Code:
import itertools as it
def sort_line(line):
# split the line on white space
x = line.split()
# make a tuple of key value pairs
as_tuples = [tuple(x[i:i+2]) for i in range(1, len(x), 2)]
# sort the tuples, and flatten them with chain
sorted_kv = list(it.chain(*sorted(as_tuples)))
# join the results back into a string
return ' '.join([x[0]] + sorted_kv)
Test Code:
data = [
"string 2 2 3 3 1 4",
"string 2 2 0 3 4 4 1 7",
]
for line in data:
print(sort_line(line))
Results:
string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4

How to sequence lines in files if some lines are strings

I encountered a problem with bash, I started using it recently.
I realize that lot of magic stuff can be done with just one line, as my previous question was solved by it.
This time question is simple:
I have a file which has this format
2 2 10
custom
8 10
3 5 18
custom
1 5
some of the lines equal to string custom (it can be any line!) and other lines have 2 or 3 numbers in it.
I want a file which will sequence the line with numbers but keep the lines with custom (order also must be the same), so desired output is
2 4 6 8 10
custom
8 9 10
3 8 13 18
custom
1 2 3 4 5
I also wish to overwrite input file with this one.
I know that with seq I can do the sequencing, but I wish elegant way to do it on file.
You can use awk like this:
awk '/^([[:blank:]]*[[:digit:]]+){2,3}[[:blank:]]*$/ {
j = (NF==3) ? $2 : 1
s=""
for(i=$1; i<=$NF; i+=j)
s = sprintf("%s%s%s", s, (i==$1)?"":OFS, i)
$0=s
} 1' file
2 4 6 8 10
custom
8 9 10
3 8 13 18
custom
1 2 3 4 5
Explanation:
/^([[:blank:]]*[[:digit:]]+){2,3}[[:blank:]]*$/ - match only lines with 2 or 3 numbers.
j = (NF==3) ? $2 : 1 - set variable j to $2 if there are 3 columns otherwise set j to 1
for(i=$1; i<=$NF; i+=j) run a loop from 1st col to last col, increment by j
sprintf is used for formatting the generated sequence
1 is default awk action to print each line
This might work for you (GNU sed, seq and paste):
sed '/^[0-9]/s/.*/seq & | paste -sd\\ /e' file
If a line begins with a digit use the lines values as parameters for the seq command which is then piped to paste command. The RHS of the substitute command is evaluated using the e flag (GNU sed specific).

reset row number count in awk

I have a file like this
file.txt
0 1 a
1 1 b
2 1 d
3 1 d
4 2 g
5 2 a
6 3 b
7 3 d
8 4 d
9 5 g
10 5 g
.
.
.
I want reset row number count to 0 in first column $1 whenever value of field in second column $2 changes, using awk or bash script.
result
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
.
.
.
As long as you don't mind a bit of excess memory usage, and the second column is sorted, I think this is the most fun:
awk '{$1=a[$2]+++0;print}' input.txt
This awk one-liner seems to work for me:
[ghoti#pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
Let's break apart the script and see what it does.
prev!=$2 {first=0;prev=$2} -- This is what resets your counter. Since the initial state of prev is empty, we reset on the first line of input, which is fine.
{$1=first;first++} -- For every line, set the first field, then increment variable we're using to set the first field.
1 -- this is awk short-hand for "print the line". It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print".
Pretty basic, really.
The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. If you want to adjust this, you can set your OFS variable:
[ghoti#pc ~]$ awk -vOFS=" " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0 1 a
1 1 b
Salt to taste.
A pure bash solution :
file="/PATH/TO/YOUR/OWN/INPUT/FILE"
count=0
old_trigger=0
while read a b c; do
if ((b == old_trigger)); then
echo "$((count++)) $b $c"
else
count=0
echo "$((count++)) $b $c"
old_trigger=$b
fi
done < "$file"
This solution (IMHO) have the advantage of using a readable algorithm. I like what's other guys gives as answers, but that's not that comprehensive for beginners.
NOTE:
((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. Also used as a synonym for let, if side effects (assignments) are needed. See http://mywiki.wooledge.org/ArithmeticExpression
Perl solution:
perl -naE '
$dec = $F[0] if defined $old and $F[1] != $old;
$F[0] -= $dec;
$old = $F[1];
say join "\t", #F[0,1,2];'
$dec is subtracted from the first column each time. When the second column changes (its previous value is stored in $old), $dec increases to set the first column to zero again. The defined condition is needed for the first line to work.

Resources