Substitute Date issue/ and unterminated error - bash

I am in need of some help and maybe some knowledge.
I am trying to change all the dates in a .text document from dd/mm/yyy to dd.mm.yyy . I am not going to lie using sed confuses me so much! Can any of you help me?
`# DAY SEP #2Month SEP #3year min2,max4
sed 's/\[0-3]?[0-9\][.\/]\([0-1]*[0-9]\)[-\/.]\([0-9]\{2,4\}\)/\2.\1\3/'`
Here is my error sed: file Frank_Alvarado_hw2.sed line 5: unterminated `s' command.
Presidency ,President ,Wikipedia Entry,Took office ,Left office ,Party ,Portrait ,Thumbnail,Home State
1,George Washington,http://en.wikipedia.org/wiki/George_Washington,30/04/1789,4/03/1797,Independent ,GeorgeWashington.jpg,thmb_GeorgeWashington.jpg,Virginia

If those escapes are confusing in sed then use:
sed -r 's~([0-9]{2})/([0-9]{2})/([0-9]{3})~\1.\2.\3~g' file
i.e.
Use of -r option for extended regex
Use of alternate regex delimiters like ~ to avoid escaping / in your pattern
PS: On OSX use sed -E instead of sed -r

sed 's/\[0-3]?[0-9\][.\/]\([0-1]*[0-9]\)[-\/.]\([0-9]\{2,4\}\)/\2.\1\3/'
^ ^^ ^ ^ ^ ^ ^ ^
| || | | | | | L 3th group (missing)
| || | | | | L end of group 2
| || | | | L start of group 2
| || | | L stop group 1
| || | L Start group 1
| || L class close so any `0123456789\\][./`
| |L class open
| L 0 or 1 occurence (of `]`)
L escape the `[` (not a class open) so litteral char
so mainly, missing the third group reference in second pattern

Related

Inconsistency in output field separator

We have to find the difference(d) Between last 2 nos and display rows with the highest value of d in ascending order
INPUT
1 | Latha | Third | Vikas | 90 | 91
2 | Neethu | Second | Meridian | 92 | 94
3 | Sethu | First | DAV | 86 | 98
4 | Theekshana | Second | DAV | 97 | 100
5 | Teju | First | Sangamithra | 89 | 100
6 | Theekshitha | Second | Sangamithra | 99 |100
Required OUTPUT
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
awk 'BEGIN{FS="|";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
Output:
4 $ Theekshana $ Second $ DAV $ 97 $ 100$3
5 $ Teju $ First $ Sangamithra $ 89 $ 100$11
3 $ Sethu $ First $ DAV $ 86 $ 98$12
As you can see there is space before and after $ sign but for the last column (avg) there is no space, please explain why its happening
2)
awk 'BEGIN{FS=" | ";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
OUTPUT
4$|$Theekshana$|$Second$|$0
5$|$Teju$|$First$|$0
6$|$Theekshitha$|$Second$|$0
I have not mentiond | as the output field separator but still it appears, why is this happening and the difference is zero too
I am just 6 days old in unix,please answer even if its easy
your field separator is only the pipe symbol, so surrounding whitespace is part of the field definitions and that's what you see in the output. In combined uses pipe has the regex special meaning and need to be escaped. In your second case it means space or space is the field separator.
$ awk 'BEGIN {FS=" *\\| *"; OFS="$"}
{d=sqrt(($NF-$(NF-1))^2); $1=$1;
print d "\t" $0,d}' file | sort -n | tail -3 | cut -f2-
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
a slight rewrite will eliminate the number of fields dependency and fixes the format.

Bash extract strings between two characters

I have the output of query result into a bash variable, stored as a single line.
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | -------------------------------------
I would like to ignore the column names(NAME | TEST_DATE) and store actual values of each name and test_date as a tuple in an array.
So here is the logic I am thinking, I would like to extract third string onwards between two '|' characters. These strings are comma separated and when a space is encountered we start the next tuple in the array.
Expected output:
array=(TESTTT_1,2019-01-15 TEST_2,2018-02-16 TEST_NAME_3,2020-03-17)
Any help is appreciated. Thanks.
let say your
String is stored in variable a (or pipe our query output to below command
echo "$a"
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | ------------------------------------
Command to obtain desired results is:
array="$(echo "$a" | cut -d '|' -f2,3,5,6,8,9 | tail -n1 | sed 's/ | /,/g')
Above will store ourput in variable named array as you expected
Output of above command is:
echo "$array"
TESTTT_1,2019-01-15,TEST_2,2018-02-16,TEST_NAME_3,2020-03-17
Explanation of command: output of echo $a will be piped into cut and using '|' as delimeter it will cut fields 2,3,5,6,8,9 then the output is piped into tail to remove the undesired NAME and TEST_DATE columns and provide values only and then as per your expected output | will be converted to , using sed.
Here in this string you are having only three dates if you have more then just in cut command add more field numbers and as per format of your string field numbers will be in following style 2,3,5,6,8,9,11,12,14,15 .... and so on.
Hope it solved your problem.
echo "$a" | awk -F "|" '{ for(i=2; i<=NF; i++){ print $i }}' | sed -e '1,3d' -e '$d' | tr ' ' '\n' | sed '/^$/d' | sed 's/^/,/g' | sed -e 'N;s/\n/ /' | sed 's/^.//g' | xargs | sed 's/ ,/, /g'
Above is awk based solution
Output:
TESTTT_1, 2019-01-15 TEST_2, 2018-02-16 TEST_NAME_3, 2020-03-17
Is it ok.

Replace multiple lines using sed in a bash script

I've a file with multiple lines. I'm looking for help to modify only these lines that are matching the regex pattern and then add some text after each result.
I use a mac but the bash script will run on linux, I don't know if it is relevant.
i.e
someText
StringToSearch:
| isoCode |
someOthertext
StringToSearch:
| isoCode |
againSomOtherText
StringToSearch:
| isoCode |
after matching "StringToSearch:" I need to add "| uk |" after each "| isoCode |" so the result will be something like:
someText
StringToSearch:
| isoCode |
| uk |
someOtherText
StringToSearch:
| isoCode |
| uk |
againSomeOtherText
StringToSearch:
| isoCode |
| uk |
My regex is ^\s*StringToSearch:\n[^\n]+ and a full working example is available at regex101 following the link
I can't figure out how to implement it in bash using sed.
Actually my sed looks like this: sed -E 's,\^\s*StringToSearch:\n\([^\n]+\),| uk |,' < inputFile
$ awk '1; p~/StringToSearch/ && /isoCode/{print " | uk |"} {p=$0}' ip.txt
someText
StringToSearch:
| isoCode |
| uk |
someOthertext
StringToSearch:
| isoCode |
| uk |
againSomOtherText
StringToSearch:
| isoCode |
| uk |
1 idiomatic way to print contents of $0 which contains current record
{p=$0} saves the current record in p variable
p~/StringToSearch/ && /isoCode/ this checks if previous line contains StringToSearch and current line contains isoCode
if the condition is satisfied, print " | uk |" will add the new content you need
As far as I know, this should work on all versions of awk. So mac/linux will not affect you.
If you insist on sed, you can use
sed '/StringToSearch/{N; s/$/\n | uk |/}' ip.txt
which I tested on GNU sed and not sure if syntax/feature varies with other implementations. N command will add next line of input to current pattern space. s/$/\n | uk |/ will add the new content after the two lines. sed by default prints pattern space when -n option is not used.
sed -E 's,\^\s*StringToSearch:\n\([^\n]+\),| uk |,'
\(...\) saves backreference in regular regex epressions. In extended regex use (...). Also you do not use anywhere the backreference.
\n - sed parses one line at a time. So it can't match \n, unless you append multiple lines to pattern space with N commands.
\^ is strange - it matches a ^ character. There is no such character in your text...
You can match easily multi-line with GNU sed by using -z option. Note that it will load the whole file into seds memory, so it will be memory consuming. Then write a proper regex that will globally match your expression.
Also not to remove replaced string, use & to re-restore it. Then suffix it with the string you want to add.
The commmand:
$ sed -z -E 's,\nStringToSearch:\n[^\n]+\n,& | uk |\n,g' <<EOF
someText
StringToSearch:
| isoCode |
someOthertext
StringToSearch:
| isoCode |
againSomOtherText
StringToSearch:
| isoCode |
EOF
outputs:
someText
StringToSearch:
| isoCode |
| uk |
someOthertext
StringToSearch:
| isoCode |
| uk |
againSomOtherText
StringToSearch:
| isoCode |
| uk |
Use sed:
sed -e '/^[[:space:]]*StringToSearch:/{' -e n -e n -e 'i\
\ \ \ \ | uk |' -e '}' file > outputfile
Output:
someText
StringToSearch:
| isoCode |
| uk |
someOthertext
StringToSearch:
| isoCode |
| uk |
againSomOtherText
StringToSearch:
| isoCode |
| uk |
This will match any line with optional whitespace and StringToSearch:, then -e n -e n will read two lines and clear pattern space, then -e 'i\
\ \ \ \ | uk | will insert a line of your choice, and -e '}' will close the block.
This might work for you (GNU sed):
sed '/StringToSearch/{n;p;s/[^| ]\+/uk/}' file
Match on a line containing StringToSearch.
Print that line and fetch the next.
Print that line and substitute uk for the isoCode (this line will also be printed as part of the normal sed flow).
See here for demo.

Number of intergers in a file using Command Line Interface

How to count number of integers in a file using egrep?
I tried to solve it as a pattern finding problem. Actually, I am facing problem of how to represent range of characters [0-9] continuously which include "space" before the beginning and "space or dot" after the end. I think the latter can be solved by using \< and \> respectively. Also, It should not include dot in between otherwise it will not be an integer. I am unable to convert this logic into regular expression using available tools and techniques.
My name is 2322.
33 is my sister.
I am blessed with a son named 55.
Why are you so 69. Is everything 33.
66.88 is not an integer
55whereareyou?
The right answer should be 5 i.e. for 2322, 33, 55, 69 and 33.
grep -Eo '(^| )([0-9]+[\.\?\=\:]?( |$))+' | wc -w
^^ ^ ^ ^ ^ ^ ^
|| | | | | | |
E = extended regex--------+| | | | | | |
o = extract what found-----+ | | | | | |
starts with new line or space---+ | | | | |
digits--------------------------------+ | | | |
optional dot, question mark, etc.-------------+ | | |
ends with end line or space----------------------------+ | |
repeat 1 time or more (to detect integers like "123 456")--+ |
count words------------------------------------------------------+
Note: 123. 123? 123: are also counted as integer
Test:
#!/bin/bash
exec 3<<EOF
My name is 2322.
33 is my sister.
I am blessed with a son named 55.
Why are you so 69. Is everything 33.
66.88 is not an integer
55whereareyou?
two integers 123 456.
how many tables in room 400? 50.
50? oh I thought it was 40.
23: It's late, 23:00 already
EOF
grep -Eo '(^| )([0-9]+[\.\?\=\:]?( |$))+' <&3 | \
tee >(sleep 0.5; echo -n "integer counted: "; wc -w; )
Outputs:
2322.
33
55.
69.
33.
123 456.
400? 50.
50?
40.
23:
integer counted: 12
Based on the observation that you want 66.88 excluded, I'm guessing
grep -Ec '[0-9]\.?( |$)' file
which finds a digit, optionally followed by a dot, followed by either a space or end of line.
The -c option says to report the number of lines which contain a match (so not strictly the number of matches, if there are lines which contain multiple matches) and the -E option enables extended regular expression syntax, i.e. what was traditionally calned egrep (though the command name is now obsolescent).
If you need to count matches, the -o option prints each match on a separate line, which you can then pass to wc -l (or in lucky cases combine with grep -c, but check first; this doesn't work e.g. with GNU grep currently).
On my ubuntu this code working fine
grep -P '((^)|(\s+))[-+]?\d+\.?((\s+)|($))' test

Finding all punctuation in a text file & print count

I have come close to counting all occurrences of punctuation, however punctuation characters that are right next to each other get counted as one.
Like so:
cat filename.txt |
tr -sc '[:punct:]' '\n' |
sort |
uniq -c |
sort -bnr`
Which prints something like this:
15 ,
9 !
5 .
2 ;
2 !"
2 '
1 -
1 --
1 :
1 ?
It is clearly only counting punctuation, but how would I separate those that are right next to each other?
This:
tr -sc '[:punct:]' '\n'
Basically what you do here is replace all the non-punctuation characters with \n. So when there is no such character between two punctuation chars , you get them next to each other
You want something like that:
cat filename.txt | tr -cd [:punct:] | fold -w 1 | sort | uniq -c | sort -bnr

Resources