BASH: grep text in a long string - bash

Can anyone explain how to write a regex to get a value in a very long txt file full of meta. The whole file is without any newline separators, just a very long string, which is hard to read or analyze
I need to grep values after key username. Can anyone help? Seem to be stuck writing out a proper regex exression for this case
.."somevalue\";s:7:\"text1\";s:8:\"username\";s:9:\"USER1\";s:7:\"company\";s:3:\"text2\";s:5:\ "somevalue\";s:11:\"text11\";s:8:\"username\";s:15:\"USER2\";s:7:\"company\";s:17:\"XXXX\";s:5:\... "somevalue\";s:15:\"text110000\";s:8:\"username\";s:12:\"USER3_HERE\";s:7:\"company\";s:18:\"yyyyy\";s:
In the above example I need the following output
USER1
USER2
USER3_HERE

With Perl it is
perl -wn -le 'print for /\\"username\\";.*?\\"([^\\"]+)/g' filename
-n - process file line by line, but don't print anything
-l - handle line endings
-e - run the following code
print for /\\"username\\";.*?\\"([^\\"]+)/g
Print the captured output whenever you see \"username\"; followed by something followed by \" .
Output
$ perl -wn -le 'print for /\\"username\\";.*?\\"([^\\"]+)/g'
.."somevalue\";s:7:\"text1\";s:8:\"username\";s:9:\"USER1\";s:7:\"company\";s:3:\"text2\";s:5:\ "somevalue\";s:11:\"text11\";s:8:\"username\";s:15:\"USER2\";s:7:\"company\";s:17:\"XXXX\";s:5:\... "somevalue\";s:15:\"text110000\";s:8:\"username\";s:12:\"USER3_HERE\";s:7:\"company\";s:18:\"yyyyy\";s:
USER1
USER2
USER3_HERE
See also
perlrun for the command line switches
perlre for the regular expression used

For the input lokking like this:
cat <<EOF >file
s:7:\"text1\";s:8:\"username\";s:9:\"USER1\";s:7:\"company\";s:3:\"text2\";s:5:\ "somevalue\";s:11:\"text11\";s:8:\"username\";s:15:\"USER2\";s:7:\"company\";s:17:\"XXXX\";s:5:\... "somevalue\";s:15:\"text110000\";s:8:\"username\";s:12:\"USER3_HERE\";s:7:\"company\";s:18:\"yyyyy\";
EOF
We can:
< file \
tr ';' '\n' |
sed 's/^.*:\\"\(.*\)\\"$/\1/' |
grep -x "USER1\|USER2\|USER3_HERE"
substitute the ; for newline
filter out the text in between the :\"...\"
grep only for USER1 USER2 or USER3_HERE strings

With GNU awk (I added the printout of the field number for clarity here with printing i in front of $i):
$ gawk 'BEGIN{FS="\\\\\""} {for (i=1;i<=NF;i++) if (match($i, /USER/)) print i, $i}' file
7 USER1
18 USER2
29 USER3_HERE
If you want the field following those fields:
$ gawk 'BEGIN{FS="\\\\\""} {for (i=1;i<=NF;i++) if (match($i, /USER/)) print $i, $(i+1)}' file
USER1 ;s:7:
USER2 ;s:7:
USER3_HERE ;s:7:
You can use GNU grep:
$ ggrep -oP 'USER[^;]*;([^\\]*)\\"company' file
USER1\";s:7:\"company
USER2\";s:7:\"company
USER3_HERE\";s:7:\"company
Or Perl if you just want the match group:
$ perl -lnE 'say for /USER[^;]*;([^\\]*)\\"company/g' file
s:7:
s:7:
s:7:

Related

Read line by line from a text file and print how I want in shell scripting

I want to read below file line by line from a text file and print how I want in shell scripting
Text file content:
zero#123456
one#123
two#12345678
I want to print this as:
zero#1-6
one#1-3
two#1-8
I tried the following:
file="readFile.txt"
while IFS= read -r line
do echo "$line"
done <printf '%s\n' "$file"
Create a script like below: my_print.sh
file="readFile.txt"
while IFS= read -r line
do
one=$(echo $line| awk -F'#' '{print $1}') ## This splits the line based on '#' and picks the 1st value. So, we get zero from 'zero#123456 '
len=$(echo $line| awk -F'#' '{print $2}'|wc -c) ## This takes the 2nd value which is 123456 and counts the number of characters
two=$(echo $line| awk -F'#' '{print $2}'| cut -c 1) ## This picks the 1st character from '123456' which is 1
three=$(echo $line| awk -F'#' '{print $2}'| cut -c $((len-1))) ## This picks the last character from '123456' which is 6
echo $one#$two-$three ## This is basically printing the output in the format you wanted 'zero#1-6'
done <"$file"
Run it like:
mayankp#mayank:~/$ sh my_print.sh
mayankp#mayank:~/$ cat output.txt
zero#1-6
one#1-3
two#1-8
Let me know of this helps.
It's no shell scripting (missed that first, sorry) but using perl with combined lookahead and lookbehind for a number:
$ perl -pe 's/(?<=[0-9]).*(?=[0-9])/-/' file
Text file content:
zero#1-6
one#1-3
two#1-8
Explained some:
s//-/ replace with a -
(?<=[0-9]) positive lookbehind, if preceeded by a number
(?=[0-9]) positive lookahead, if followed by a number
With sed:
sed -r 's/^(.+)#([0-9])[0-9]*([0-9])\s*$/\1#\2-\3/' readFile.txt
-r: using extented regular expressions (just to write some stuff without escaping them by a backslash)
s/expr1/expr2/: substitute expr1 by expr2
epxr1 is described by a regular expression, relevant matching patterns are caught by 3 capturing groups (parenthesized ones).
epxr2 retrieves captured strings (\1, \2, \3) and insert them in a formatted output (the one you wanted).
Regular-Expressions.info seems to be interesting to start with them. Also you can check your own regexp with Regx101.com.
Update: Also you could do that with awk:
awk -F'#' '{ \
gsub(/\s*/,"", $2) ; \
print $1 "#" substr($2, 1, 1) "-" substr($2, length($2), 1) \
}' < test.txt
I added a gsub() call because your file seems to have trailing blank characters.

Reformat date in text file (.csv) with sed and date

This is the input .csv file
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
.
.
.
Now, I want to convert the second column (date) to this specific format
date -d '10/12/2017 2:10pm' +'%Y/%m/%d %H:%M:%S', so that "10/12/2017 2:10pm" converts to "2017/10/12 14:10:00"
Expecting output file
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
.
.
.
I know it can be done by using bash or python, but I want to do it in one-line command. Any ideas? Is there a way to pass date result to sed?
One-liner awk approach.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd) }1' OFS=, infile #>>outfile
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
This will output changes in your Terminal, you need to redirect the output to a file if you need record the output or use FILENAME to redirect the output to the input infile itself.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd); print >FILENAME }' OFS=, infile
Or with GNU awk implementations which does support -i inplace identifier for in-place replace. see 'awk' save modifications in place
You can do it in one line, but that begs the question -- "How long of a line do you want?" Since you have it labeled 'shell' and not bash, etc., you are a bit limited in your string handling. POSIX shell provides enough to do what you want, but it isn't the speediest remedy. You are either going to end up with an awk or sed solution that calls date or a shell solution that calls awk or sed to parse old date from the original file and feeds the result to date to get your new date. You will have to work out which provides the most efficient remedy.
As far as the one-liner goes, you can do something similar to the following while remaining POSIX compliant. It simply uses awk to get the 2nd field from the file, pipes the result to a while loop which uses expr length "$field" to get the length and uses that within expr substr "$field" "2" <length expression - 2> to chop the double-quotes from the end of the original date olddt, followed by date -d "$olddt" +'%Y/%m/%d %H:%M:%S' to get newdt and finally sed -i "s;$olddt;$newdt;" to perform the substitution in place. Your one-liner (shown with auto line-continuations for readability)
$ awk -F, '{print $2}' timefile.txt |
while read -r field; do
olddt="$(expr substr "$field" "2" "$(($(expr length "$field") - 2))")";
newdt=$(date -d "$olddt" +'%Y/%m/%d %H:%M:%S');
sed -i "s;$olddt;$newdt;" timefile.txt; done
Example Input File
$ cat timefile.txt
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
Resulting File
$ cat timefile.txt
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
There are probably faster ways to do it, but this is a reasonable length one-liner (relatively speaking).
Revised less ugly sed method:
sed 's/^.*,"\|",.*//g;h;s#.*#date "+%Y/%m/%d %T" -d "&"#e;H;g;s#\n\|$#,#g;s/^/s,/' input.csv | sed -f - input.csv
Spread out, (it works the same):
sed 's/^.*,"\|",.*//g
h;
s#.*#date "+%Y/%m/%d %T" -d "&"#e;
H;
g;
s#\n\|$#,#g;
s/^/s,/' input.csv | sed -f - input.csv
Output:
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
How it works:
The first sed block uses the evaluate command to run date, the output of which is used to generate some new sed substitute commands. To show the new s commands, temporarily replace the shell script | pipe with a # comment:
s,10/11/2017 2:10pm,2017/10/11 14:10:00,
s,10/12/2017 3:10pm,2017/10/12 15:10:00,
These are piped to the second sed.

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

how to pass in a variable to awk commandline

I'm having some trouble passing bash script variables into awk command-line.
Here is pseudocode:
for FILE in $INPUT_DIR/*.txt; do
filename=`echo $FILE | sed -n 's/^.*\(chr[0-9A-Z]*\).*.vcf$/\1/p'`
OUTPUT_FILE=$OUTPUT_DIR/$filename.snps.txt
egrep -v "^#" $FILE | awk '{print $2,$4,$5}' > $OUTPUT_FILE
done
The final line where I awk the columns, I would like it to be flexible or user input. For example, the user could want columns 6,7,and 8 as well, or column 133 and 138, or column 245 through 248. So how do I custom this so I can have that 'print $2 .... $5' be a user input thing? For example the user would run this script like : bash script.sh input_dir output_dir [user inputs whatever string of columns], and then I would get those columns in the output. I tried passing it in, but I guess I'm not getting the syntax right.
With awk, you should declare the variable before use it. This is better than the escape method (awk '{print $'$var'}'):
awk -v var1="$col1" -v var2="$col2" 'BEGIN {print var1,var2 }'
Where $col1 and $col2 would be the input variables.
Maybe you can try an input variable as string with "$2,$4,$5" and print this variable to get the values (I am not sure if this works)
The following test works for me:
A="\$3" ; ls -l | awk "{ print $A }"

Shell: Subsitute a string between 2 Known strings

I wish to replace the contents of new_version varaiable (13.2.0/8) in between abc_def_APP and application1.war strings in file1
Script :
#!/bin/ksh
new_version="13.0.5/8"
old_version=($(grep -r "location=.*application1.war" /path/file1| awk '{print ($1)}'| cut -f8- -d"/"|sed 's/.\{1\}$//'))
echo "$old_version" 'This gives me version number from file1 which needs to be replaced(13.2.0/9)
File1 Contents:
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
Use following sed command to have your replacement:
sed -i.bak -r "s#^(.*/abc_def_APP/).*(/application1\.war.*)#\1$version1/$version2\2#" /path/file1
With GNU awk (for gensub()):
$ cat file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/9/application1.war"
$ new_version="13.2.0/8"
$ gawk -v nv="$new_version" '{$0=gensub(/^(location.*abc_def_APP\/).*(\/application1.war.*)/,"\\1" nv "\\2","")}1' file
location="cc://view/blah/blah/blah/abc_def_APP/13.2.0/8/application1.war"
The difference between this and a sed solution is that awk doesn't require you to jump through hoops due to your new_version variable containing a "/" (or any other character).

Resources