Remove starting substring http from strings using AWK? - bash

I'm wondering Is there a better and cleaner way to remove strings at beginning and last of each line in a file using AWK only?
Here's what I got so far
cat results.txt | awk '{gsub("https://", "") ;print}' | tr -d ":443"
File: results.txt
https://www.google.com:443
https://www.tiktok.com:443
https://www.instagram.com:443
To get the result
www.google.com
www.tiktok.com
www.instagram.com

With GNU awk.
Use / and : as field separators and print fourth column:
awk -F '[/:]' '{print $4}' results.txt
Or use https:// and : as field separators and print second column:
awk -F 'https://|:' '{print $2}' results.txt
Output:
www.google.com
www.tiktok.com
www.instagram.com

If it's a list of URLs like that, you could take advantage of the fact that the field separator in awk can be a regular expression:
awk -F':(//)?' '{print $2}'
This says that your field seperator is ": optionally followed by //", which would split each line into:
[$1] http
[$2] www.google.com
[$3] 443
And then we print out only field $2.

cat results.txt | awk '{gsub("https://", "") ;print}' | tr -d ":443"
I think you are misunderstading what tr -d does, it is used to delete enumerated characters (not substring), it does seems to do what you want because your test input
https://www.google.com:443
https://www.tiktok.com:443
https://www.instagram.com:443
do not contain : or 4 or 3 which should be kept, if you need test case which will shown malfunction try
https://www.normandy1944.info:443
Also code as above feature anti-pattern known as useless use of cat as GNU AWK can deal with file on its' own that is
cat results.txt | awk '{gsub("https://", "") ;print}'
can be written more succintly as
awk '{gsub("https://", "") ;print}' results.txt
I would rewrite whole your code (cat,awk,tr) to single awk as follows
awk '{gsub("^https://|:443$","");print}' results.txt
Explanation: replace https:// following start of line (^) or (|) :443 before end of line ($) using empty string (i.e. delete these parts) then print. Note that ^ and $ will prevent deleting https:// and :443 in middle of strings, though feel free to remove ^ and $ if you find these to be unlikely.

Related

grep text after keyword with unknown spaces and remove comments

I am having trouble saving variables from file using grep/sed/awk.
The text in file.txt is on the form:
NUM_ITER = 1000 # Number of iterations
NUM_STEP = 1000
And I would like to save these to bash variables without the comments.
So far, I have attempted this:
grep -oP "^NUM_ITER[ ]*=\K.*#" file.txt
which yields
1000 #
Any suggestions?
I would use awk, like this:
awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file
To store it in a variable:
NUM_ITER=$(awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file)
As long as a line can only contain a single match, this is easy with sed.
sed -n '# Remove comments
s/[ ]*#.*//
# If keyword found, remove keyword and print value
s/^NUM_ITER[ ]*=[ ]*//p' file.txt
This can be trimmed down to a one-liner if you remove the comments.
sed -n 's/[ ]*#.*//;s/^NUM_ITER[ ]*=[ ]*//p' file.txt
The -n option turns off printing, and the /p flag after the final substitution says to print that line after all only if the substitution was successful.

awk to ignore leading and trailing space and blank lines and commented lines if any from a file

Need help on awk
awk to ignore leading and trailing space and blank lines and commented lines if any from a file
Here you go,
grep "MyText" FromMyLog.log |awk -F " " '{print $2}'|awk -F "#" '{print $1}'
Here MyText is the key to grep from file FromMyLog.log
-F is used to avoid the following value, here space between quotes.
'{print $2}' will print the 2nd argument from the output, you can use $1, $2 as your requirement.
awk -F "#" This will ignore the commented lines.
This is just a hint for you, Modify the code with your requirements. This works for me while grep.
grep -v '^$\|^\s*\#' <filename> or egrep -v '^[[:space:]]*$|^ *#' <file_name> (if white spaces)
I think this is what you were asking for:
$> echo -e ' abc \t
\t efg
# alskdjfl
#
awk
# askdfh
' |
awk '
# match if first none space character is not a hash sign
/^[[:space:]]*[^#]/ {
# delete any spaces from start and end of line
sub(/^[[:space:]]*/, "");
sub(/[[:space:]]*$/, "", NF); # `NF` is Number of Fields
print
}'
abc
efg
awk
This can be folded onto a single line if so needed. Any problems, an actual example of the input (in a code block in your question) would be helpful.
Here's one way to extract required content ignoring spaces
FILE CONTENT
Server: 192.168.XX.XX
Address 1: 192.168.YY.YY
Name: central.google.com
Now to extract the server's address without spaces.
COMMAND
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -s " "
option -s for squeezing the repetition of spaces.
which gives,
192.168.XX.XX
Here, notice that there is one leading space in the address.
To completely ignore spaces you can change that to,
awk -F':' '/Server/ '{print $2}' YOURFILENAME | tr -d [:space:]
option -d for removing particular characters, which is [:space:] here.
which gives,
192.168.YY.YY
without any leading or trailing spaces.
tr is an UNIX utility for translating, or deleting, or squeezing repeated characters. tr refers to translate here.
Examples:
tr [:lower:] [:upper:]
gives,
YOUAREAWESOME
for
youareawesome
Hope that helps.

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
thanks
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
<some_path>
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Resources