Linux bash - break a files into 2-word terms - bash

I have put together this one-liner that prints all the words in a file on different lines:
sed -e 's/[^a-zA-Z]/\n/g' test_input | grep -v "^$"
If test_input contains "My bike is fast and clean", the one-liner's output will be:
My
bike
is
fast
and
clean
What I would need now is a different version that prints all the 2-word terms in the text, like this (still with the Bash):
My bike
bike is
is fast
fast and
and clean
Would you know how to do it?

Pipe your word file to this script's standard input.
#! bash
last_word=""
while read word
do
if [ $last_word != "" ] ; then
echo $last_word $word
fi
last_word=$word
done

This also works:
paste <(head -n -1 test.dat) <(tail +2 test.dat)

use awk for this, no need anything else
$ echo "My bike is fast and clean" | awk '{for(i=1;i<NF;i++){printf "%s %s\n",$i,$(i+1) } }'
My bike
bike is
is fast
fast and
and clean

This probably requires GNU sed and there's probably a simpler way:
sed 's/[[:blank:]]*\<\(\w\+\)\>/\1 \1\n/g; s/[^ ]* \([^\n]*\)\n\([^ ]*\)/\1 \2\n/g; s/ \n//; s/\n[^ ]\+$//' inputfile

to your command add:
| awk '(PREV!="") {printf "%s %s\n", PREV, $1} {PREV=$1}'

Related

How to add multiple line of output one by one to a variable in Bash?

This might be a very basic question but I was not able to find solution. I have a script:
If I run w | awk '{print $1}' in command line in my server I get:
f931
smk591
sc271
bx972
gaw844
mbihk988
laid640
smk59
ycc951
Now I need to use this list in my bash script one by one and manipulate some operation on them. I need to check their group and print those are in specific group. The command to check their group is id username. How can I save them or iterate through them one by one in a loop.
what I have so far is
tmp=$(w | awk '{print $1})
But it only return first record! Appreciate any help.
Populate an array with the output of the command:
$ tmp=( $(printf "a\nb\nc\n") )
$ echo "${tmp[0]}"
a
$ echo "${tmp[1]}"
b
$ echo "${tmp[2]}"
c
Replace the printf with your command (i.e. tmp=( $(w | awk '{print $1}') )) and man bash for how to work with bash arrays.
For a lengthier, more robust and complete example:
$ cat ./tstarrays.sh
# saving multi-line awk output in a bash array, one element per line
# See http://www.thegeekstuff.com/2010/06/bash-array-tutorial/ for
# more operations you can perform on an array and its elements.
oSET="$-"; set -f # save original set flags and turn off globbing
oIFS="$IFS"; IFS=$'\n' # save original IFS and make IFS a newline
array=( $(
awk 'BEGIN{
print "the quick brown"
print " fox jumped\tover\tthe"
print "lazy dogs back "
}'
) )
IFS="$oIFS" # restore original IFS value
set +f -$oSET # restore original set flags
for (( i=0; i < ${#array[#]}; i++ ));
do
printf "array[%d] of length=%d: \"%s\"\n" "$i" "${#array[$i]}" "${array[$i]}"
done
printf -- "----------\n"
printf -- "array[#]=\n\"%s\"\n" "${array[#]}"
printf -- "----------\n"
printf -- "array[*]=\n\"%s\"\n" "${array[*]}"
.
$ ./tstarrays.sh
array[0] of length=22: "the quick brown"
array[1] of length=23: " fox jumped over the"
array[2] of length=21: "lazy dogs back "
----------
array[#]=
"the quick brown"
array[#]=
" fox jumped over the"
array[#]=
"lazy dogs back "
----------
array[*]=
"the quick brown fox jumped over the lazy dogs back "
A couple of non-obvious key points to make sure your array gets populated with exactly what your command outputs:
If your command output can contain globbing characters than you should disable globbing before the command (oSET="$-"; set -f) and re-enable it afterwards (set +f -$oSET).
If your command output can contain spaces then set IFS to a newline before the command (oIFS="$IFS"; IFS=$'\n') and set it back to it's old value after the command (IFS="$oIFS").
tmp=$(w | awk '{print $1}')
while read i
do
echo "$i"
done <<< "$tmp"
You can use a for loop, i.e.
for user in $(w | awk '{print $1}'); do echo $user; done
which in a script would look nicer as:
for user in $(w | awk '{print $1}')
do
echo $user
done
You can use the xargs command to do this:
w | awk '{print $1}' | xargs -I '{}' id '{}'
With the -I switch, xargs will take each line of its standard input separately, then construct and execute a command line by replacing the specified string '{}' in the command line template with the input line
I guess you should use who instead of w. Try this out,
who | awk '{print $1}' | xargs -n 1 id

Split String in Unix Shell Script

I have a String like this
//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
and want to get last part of
00000000957481f9-08d035805a5c94bf
Let's say you have
text="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
If you know the position, i.e. in this case the 9th, you can go with
echo "$text" | cut -d'/' -f9
However, if this is dynamic and your want to split at "/", it's safer to go with:
echo "${text##*/}"
This removes everything from the beginning to the last occurrence of "/" and should be the shortest form to do it.
For more information on this see: Bash Reference manual
For more information on cut see: cut man page
The tool basename does exactly that:
$ basename //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
00000000957481f9-08d035805a5c94bf
I would use bash string function:
$ string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
$ echo "${string##*/}"
00000000957481f9-08d035805a5c94bf
But following are some other options:
$ awk -F'/' '$0=$NF' <<< "$string"
00000000957481f9-08d035805a5c94bf
$ sed 's#.*/##g' <<< "$string"
00000000957481f9-08d035805a5c94bf
Note: <<< is herestring notation. They do not create a subshell, however, they are NOT portable to POSIX sh (as implemented by shells such as ash or dash).
In case you want more than just the last part of the path,
you could do something like this:
echo $PWD | rev | cut -d'/' -f1-2 | rev
You can use this BASH regex:
s='//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf'
[[ "$s" =~ [^/]+$ ]] && echo "${BASH_REMATCH[0]}"
00000000957481f9-08d035805a5c94bf
This can be done easily in awk:
string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
echo "${string}" | awk -v FS="/" '{ print $NF }'
Use "/" as field separator and print the last field.
You can try this...
echo //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf |awk -F "/" '{print $NF}'

unix command to get lines from in between first and last occurence of a word and write to a file

I want a unix command to find the lines between first & last occurence of a word
For example:
let's imagine we have 1000 lines. Tenth line contains word "stackoverflow", thirty fifth line also contains word "stackoverflow".
I want to print lines between 10 and 35 and write it to a new file.
You can make it in two steps. The basic idea is to:
1) get the line number of the first and last match.
2) print the range of lines in between these range.
$ read first last <<< $(grep -n stackoverflow your_file | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file
Explanation
read first last reads two values and stores them in $first and $last.
grep -n stackoverflow your_file greps and shows the output like this: number_of_line:output
awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}') prints the number of line of the first and last match of stackoverflow in the file.
And
awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file prints all lines from $first line number till $last line number.
Test
$ cat a
here we
have some text
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
to make more fun
blablabla
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' a
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
By steps:
$ grep -n stackoverflow a
3:stackoverflow
9:stackoverflow
11:stackoverflow
$ grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}'
3 11
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ echo "first=$first, last=$last"
first=3, last=11
If you know an upper bound of how many lines there can be (say, a million), then you can use this simple abusive script:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow) < file
You can append | tail -n +2 | head -n -1 to strip the border lines as well:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow
| tail -n +2 | head -n -1) < file
I'm not 100% sure from the question whether the output should be inclusive of the first and last matching lines, so I'm assuming it is. But this can be easily changed if we want exclusive instead.
This pure-bash solution does it all in one step - i.e. the file (or pipe) is only read once:
#!/bin/bash
function midgrep {
while read ln; do
[ "$saveline" ] && linea[$((i++))]=$ln
if [[ $ln =~ $1 ]]; then
if [ "$saveline" ]; then
for ((j=0; j<i; j++)); do echo ${linea[$j]}; done
i=0
else
saveline=1
linea[$((i++))]=$ln
fi
fi
done
}
midgrep "$1"
Save this as a script (e.g. midgrep.sh) and pipe whatever output you like to it as follows:
$ cat input.txt | ./midgrep.sh stackoverflow
This works as follows:
find the first matching line and buffer in the first element of an array
continue reading lines until the next match, buffering to the array as we go
on each subsequent matches, flush the buffer array to output
continue reading file to the end. If there are no more matches, then the last buffer is simply discarded.
The advantage of this approach is that we only read through the input one time only. The disadvantage is that we buffer everything between each match - if there are many lines between each match, then these are all buffered to memory, until we hit the next match.
Also this uses the bash =~ regular expression operator to keep this pure bash. But you could replace this with a grep instead, if you are more comfortable with that.
Using perl :
perl -00 -lne '
chomp(my #arr = split /stackoverflow/);
print join "\nstackoverflow", #arr[1 .. $#arr -1 ]
' file.txt | tee newfile.txt
The idea behind this is to feed an array of the whole input file in to chunks using "stackoverflow" string to split. Next, we print the 2nd occurrences to the last -1 with join "stackoverflow".

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

how to concatenate lines into one string

I have a function in bash that outputs a bunch of lines to stdout. I want to combine them into a single line with some delimiter between them.
Before:
one
two
three
After:
one:two:three
What is an easy way to do this?
Use paste
$ echo -e 'one\ntwo\nthree' | paste -s -d':'
one:two:three
And another way:
cat file | tr -s "\n" ":"
This might work for you:
paste -sd':' file
For fun, here's a bash-only way:
echo $'one\n2 and 3\nfour' | { mapfile -t lines; IFS=:; echo "${lines[*]}"; }
outputs
one:2 and 3:four
The {} grouping is to ensure all the commands that refer to the array variable are executed in the same subshell. The variable will not exist once the pipeline ends.
http://www.gnu.org/software/bash/manual/bashref.html#index-mapfile-140
Taking #glennJackman's corrections verbatim
awk '{printf("%s%s", sep, $0); sep=":"} END {print ""}' file
Or as you specified bash
while read line ; do printf "%s:" $line ; done < file | sed s'/:$//'
I hope this helps
Input.txt
one
two
three
Perl Solution : dummy.pl
#a = `cat /home/Input.txt`;
foreach my $x (#a)
{
chomp($x);
push(#array,"$x");
}
chomp(#array);
print "#array";
Run the script as :
$> perl dummy.pl | sed 's/ /:/g' > Output.txt
Output.txt
one:two:three

Resources