sed: interpolating variables in timestamp format - bash

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.

The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17

Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

Related

Create pattern list dynamically from file for running in grep -f

Lets say I have this input
11
12
31
40
42
90
I need to get all number lines from the file and add the | character after each that can be used as a pattern list to run with grep -f.
I was thinking of using xargs but I have no idea how to use | as a separator or put the args in the middle of a string
grep -P "(11|12|31|40|42|90)\|" o.txt
You can use
grep -E -f <(sed -E '/^[0-9]+$/!d;s/.*/&\\|/' numbers.txt) file.txt
First, get all lines that only contain digit from the numbers.txt, append to each found line a \|, and then use this as a pattern file with -f option to grep. The file.txt is a file where you are looking for matches.
See an online demo:
#!/bin/bash
text='Some 11 number and 11| that we need
Redundant line'
s="11
12
31
40
42
90"
grep -E -f <(sed -E '/^[0-9]+$/!d;s/.*/&\\|/' <<< "$s") <<< "$text"
# => Some 11 number and 11| that we need
grep -oE -f <(sed -E '/^[0-9]+$/!d;s/.*/&\\|/' <<< "$s") <<< "$text"
# => 11|

How to merge multiple files in order and append filename at the end in bash

I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T
Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'
You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.
This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.

Grep search on ONE LINE only?

I am using Bash to find the dimensions of a matrix. Here is my code to get the number of elements in one row, however it prints out for the whole file. I just need the number of elements in ONE ROW.
grep -oP "\^I" $1 | wc -l
Here is what the $1 is referring to:
1^I2^I3^I4$
5^I6^I7^I8$
For some reason, it is printing out 9 instead of 3.
Thanks in advance!
Use:
cat $1 | head -n 1 | sed 's/\^I/\n/g' | wc -l
I take the only the first row using head, replace every column delimiter with a newline using sed, then pipe that to wc.
You can use sed before calling grep to isolate one specific line of your file:
sed -n '1p' file | grep -oP "^I" | wc -l
^
^
# will print the 1st line, 2p will print the second line etc
on your input it gives:
using awk
$ awk -F'\\^I' 'NR==1{print NF-1}' $1
3
-F'\\^I' use ^I as field separator
NR==1 first line only
print NF-1 since the question is about counting number of ^I, need to print number of fields minus one
also, if $1 is argument being passed to shell script, use "$1" as good practice
and a guess, this is actual data OP is working with
$ cat ip.txt
1 2 3 4
5 6 7 8
$ cat -A ip.txt
1^I2^I3^I4$
5^I6^I7^I8$
$ # exit to avoid unnecessary processing of other lines
$ awk -F'\t' 'NR==1{print NF-1; exit}' ip.txt
3
sed 's:\^I:\n:g; q' | wc -l
^ ^
|_______|_______ change all ^I to \n
|_______ quit after first line

Bash : How to check in a file if there are any word duplicates

I have a file with 6 character words in every line and I want to check if there are any duplicate words. I did the following but something isn't right:
#!/bin/bash
while read line
do
name=$line
d=$( grep '$name' chain.txt | wc -w )
if [ $d -gt '1' ]; then
echo $d $name
fi
done <$1
Assuming each word is on a new line, you can achieve this without looping:
$ cat chain.txt | sort | uniq -c | grep -v " 1 " | cut -c9-
You can use awk for that:
awk -F'\n' 'found[$1] {print}; {found[$1]++}' chain.txt
Set the field separator to newline, so that we look at the whole line. Then, if the line already exists in the array found, print the line. Finally, add the line to the found array.
Note: If a line will only be suppressed once, so if the same line appears, say, 6 times, it will be printed 5 times.

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Resources