grep text after keyword with unknown spaces and remove comments - bash

I am having trouble saving variables from file using grep/sed/awk.
The text in file.txt is on the form:
NUM_ITER = 1000 # Number of iterations
NUM_STEP = 1000
And I would like to save these to bash variables without the comments.
So far, I have attempted this:
grep -oP "^NUM_ITER[ ]*=\K.*#" file.txt
which yields
1000 #
Any suggestions?

I would use awk, like this:
awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file
To store it in a variable:
NUM_ITER=$(awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file)

As long as a line can only contain a single match, this is easy with sed.
sed -n '# Remove comments
s/[ ]*#.*//
# If keyword found, remove keyword and print value
s/^NUM_ITER[ ]*=[ ]*//p' file.txt
This can be trimmed down to a one-liner if you remove the comments.
sed -n 's/[ ]*#.*//;s/^NUM_ITER[ ]*=[ ]*//p' file.txt
The -n option turns off printing, and the /p flag after the final substitution says to print that line after all only if the substitution was successful.

Related

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
thanks
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
<some_path>
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources