Substituting the '*' Character in AWK using 'gsub' - shell

I'm trying to use the AWK in a unix shell script to substitue an instance of one pattern in a file with another and output it to a new file.
Specifically, if the file name is MYFILE.pc, then I'm looking to instances of '*MYFILE' with 'g_MYFILE' (without the quotes). For this, I'm using the gsub function in AWK.
I've successfully got the output file written out and all instances replaced as required, however, the script is also replacing instances of 'MYFILE' (i.e. without the star) with 'g_MYFILE'
Here is the script:
awk -v MODNAM=${OUTPUT_FILE%.pc} '
{
gsub("\*"MODNAM, "g_" MODNAM);
print
}' ${INPUT_FILE} > ${FULL_OUTPUT_FILENAME}
To clarify the script performs the following conversion:
'*MYFILE' --> 'g_MYFILE'
'MYFILE' --> 'g_MYFILE'
I only want the first conversion to be performed. Does anyone have any suggestions?

You may need to double escape the * because you are using a dynamic regexp instead of a regexp constant as the first argument to gsub. See section 3.8 of the GAWK manual for more information.
awk -v MODNAM=${OUTPUT_FILE%.pc} '
{
gsub("\\*"MODNAM, "g_" MODNAM);
print
}' ${INPUT_FILE} > ${FULL_OUTPUT_FILENAME}

Your code actually works in my mawk 1.3.3 and zsh shell. This might be a shell escape issue - have you tried to write the awk script into a file and calling it via -f?

For simple substitutions there is no need for awk at all. Try:
/home/sirch> file="*a*a*a"
/home/sirch> echo ${file//\*/g_}
g_ag_ag_a

Related

Combine multiple sed commands into one

I have a file example.txt, I want to delete and replace fields in it.
The following commands are good, but in a very messy way, unfortunately I'm a rookie to sed command.
The commands I used:
sed 's/\-I\.\.\/\.\.\/\.\.//\n/g' example.txt > example.txt1
sed 's/\-I/\n/g' example.txt1 > example.txt2
sed '/^[[:space:]]*$/d' > example.txt2 example.txt3
sed 's/\.\.\/\.\.\/\.\.//g' > example.txt3 example.txt
and then I'm deleting all the unnecessary files.
I'm trying to get the following result:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
The file looks like this:
-I../../../Common/Component -I../../../Common/Component1 -I../../../Common/Component2 -I../../../Common/Component3 -I../../../Common/Component4 -I../../../Common/Component5 -I../../../Common/Component6 -IComp -IApp ../../../
I want to know how the best way to transform input format to output format standard text-processing tool with 1 call with sed tool or awk.
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='-I\\S+' 'RT{sub(/^-I.*Common\//,"Common/Components/",RT);sub(/^-I/,"",RT);print RT}' Input_file
output with samples will be as follows:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
Explanation: Simple explanation would be, in GNU awk. Setting RS(record separator) as -I\\S+ -I till a space comes. In main awk program, check if RT is NOT NULL, substitute starting -I till Common with Common/Components/ in RT and then substitute starting -I with NULL in RT. Then printing RT here.
If you don't REALLY want the string /Components to be added in the middle of some output lines then this may be what you want, using any awk in any shell on every Unix box:
$ awk -v RS=' ' 'sub("^-I[./]*","")' file
Common/Component
Common/Component1
Common/Component2
Common/Component3
Common/Component4
Common/Component5
Common/Component6
Comp
App
That would fail if any of the paths in your input contained blanks but you don't show that as a possibility in your question so I assume it can't happen.
What about
sed -i 's/\-I\.\.\/\.\.\/\.\.//\n/g
s/\-I/\n/g
/^[[:space:]]*$/d
s/\.\.\/\.\.\/\.\.//g' example.txt

Parameter expansion not working when used inside Awk on one of the column entries

System: Linux. Bash 4.
I have the following file, which will be read into a script as a variable:
/path/sample_A.bam A 1
/path/sample_B.bam B 1
/path/sample_C1.bam C 1
/path/sample_C2.bam C 2
I want to append "_string" at the end of the filename of the first column, but before the extension (.bam). It's a bit trickier because of containing the path at the beginning of the name.
Desired output:
/path/sample_A_string.bam A 1
/path/sample_B_string.bam B 1
/path/sample_C1_string.bam C 1
/path/sample_C2_string.bam C 2
My attempt:
I did the following script (I ran: bash script.sh):
List=${1};
awk -F'\t' -vOFS='\t' '{ $1 = "${1%.bam}" "_string.bam" }1' < ${List} ;
And its output was:
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
Problem:
I followed the idea of using awk for this substitution as in this thread https://unix.stackexchange.com/questions/148114/how-to-add-words-to-an-existing-column , but the parameter expansion of ${1%.bam} it's clearly not being recognised by AWK as I intend. Does someone know the correct syntax for that part of code? That part was meant to mean "all the first entry of the first column, except the last part of .bam". I used ${1%.bam} because it works in Bash, but AWK it's another language and probably this differs. Thank you!
Note that the paramter expansion you applied on $1 won't apply inside awk as the entire command
body of the awk command is passed in '..' which sends content literally without applying any
shell parsing. Hence the string "${1%.bam}" is passed as-is to the first column.
You can do this completely in Awk
awk -F'\t' 'BEGIN { OFS = FS }{ n=split($1, arr, "."); $1 = arr[1]"_string."arr[2] }1' file
The code basically splits the content of $1 with delimiter . into an array arr in the context of Awk. So the part of the string upto the first . is stored in arr[1] and the subsequent split fields are stored in the next array indices. We re-construct the filename of your choice by concatenating the array entries with the _string in the filename part without extension.
If I understood your requirement correctly, could you please try following.
val="_string"
awk -v value="$val" '{sub(".bam",value"&")} 1' Input_file
Brief explanation: -v value means passing shell variable named val value to awk variable variable here. Then using sub function of awk to substitute string .bam with string value along with .bam value which is denoted by & too. Then mentioning 1 means print edited/non-edtied line.
Why OP's attempt didn't work: Dear, OP. in awk we can't pass variables of shell directly without mentioning them in awk language. So what you are trying will NOT take it as an awk variable rather than it will take it as a string and printing it as it is. I have mentioned in my explanation above how to define shell variables in awk too.
NOTE: In case you have multiple occurences of .bam then please change sub to gsub in above code. Also in case your Input_file is TAB delmited then use awk -F'\t' in above code.
sed -i 's/\.bam/_string\.bam/g' myfile.txt
It's a single line with sed. Just replace the .bam with _string.bam
You can try this way with awk :
awk -v a='_string' 'BEGIN{FS=OFS="."}{$1=$1 a}1' infile

Bash Script: Grabbing First Item Per Line, Throwing Into Array

I'm fairly new to the world of writing Bash scripts and am needing some guidance. I've begun writing a script for work, and so far so good. However, I'm now at a part that needs to collect database names. The names are actually stored in a file, and I can grep them.
The command I was given is cat /etc/oratab which produces something like this:
# This file is used by ORACLE utilities. It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
I turn turned around and wrote grep ":/software/oracle/ora" /etc/oratab so it can grab everything I need, which is 10 databases. Not the most elegant way, but it gets what I need:
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
So, if I want to grab the name, such as dev068 or xtst161, how do I? I think for what I need to do with this project moving forward, is storing them in an array. As mentioned in the documentation, a colon is the field terminator. How could I whip this together so I have an array, something like:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
I feel like I may be asking for too much assistance here but I'm truly at a loss. I would be happy to clarify if need be.
It is much simpler using awk:
awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
To populate a BASH array with above output use:
mapfile -t arr < <(awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab)
To check output:
declare -p arr
declare -a arr='([0]="dev068" [1]="dev299" [2]="xtst036" [3]="xtst161" [4]="dev360" [5]="dev361" [6]="xtst215" [7]="xtst216" [8]="dev298" [9]="xtst160")'
We can pipe the output of grep to the cut utility to extract the first field, taking colon as the field separator.
Then, assuming there are no whitespace or glob characters in any of the names (which would be subject to word splitting and filename expansion), we can use a command substitution to run the pipeline, and capture the output in an array by assigning it within the parentheses.
names=($(grep ':/software/oracle/ora' /etc/oratab| cut -d: -f1;));
Note that the above command actually makes use of word splitting on the command substitution output to split the names into separate elements of the resulting array. That is why we must be sure that no whitespace occurs within any single database name, otherwise that name would be internally split into separate elements of the array. The only characters within the command substitution output that we want to be taken as word splitting delimiters are the line feeds that delimit each line of output coming off the cut utility.
You could also use awk for this:
awk -F: '!/^#/ && $2 ~ /^\/software\/oracle\/ora-/ {print $1}' /etc/oratab
The first pattern excludes any commented-out lines (starting with a #). The second pattern looks for your expected directory pattern in the second field. If both conditions are met it prints the first field, which the Oracle SID. The -F: flag sets the field delimiter to a colon.
With your file that gets:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
Depending on what you're doing you could finesse it further and check the last flag is set to Y; although that is really to indicate automatic start-up, it can sometime be used to indicate that a database isn't active at all.
And you can put the results into an array with:
declare -a DBS=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
and then refer to ${DBS[1]} (which evaluates to dev299) etc.
If you'd like them into a Bash array:
$ cat > toarr.bash
#!/bin/bash
while read -r line
do
if [[ $line =~ .*Y$ ]] # they seem to end in a "Y"
then
arr[$((i++))]=${line%%:*}
fi
done < file
echo ${arr[*]} # here we print the array arr
$ bash toarr.bash
dev068 dev299 xtst036 xtst161 dev360 dev361 xtst215 xtst216 dev298 xtst160

What is the correct syntax for a bash multi line Heredoc (w/ Sed)?

While using Sed to search/ insert a config file, I'm greeted by errors. What's causing them, and how can I fix them?
The Heredoc I'm looking to insert can be defined as follows:
read -d '' APPLICATION_ENV_STATE <<'EOF'
defined('APPLICATION_ENV') || define('APPLICATION_ENV',(getenv('APPLICATION_ENV')
? getenv('APPLICATION_ENV') : 'production'));
EOF
While my Sed command uses the variable like this:
sed -i "/\/\/ \*\* MySQL settings \*\* \/\//i$APPLICATION_ENV_STATE" wp-config.php
Which results in:
sed: -e expression #1, char 1: unknown command: `?'
In addition to an extra characters after command error.
However, the following Heredoc works, but results in some less than pretty formatting in my text file:
read -d '' APPLICATION_ENV_STATE <<'EOF'
defined('APPLICATION_ENV') || define('APPLICATION_ENV', (getenv('APPLICATION_ENV') ? getenv('APPLICATION_ENV') : 'production'));
EOF
How do I get the first example to work?
AIUI, it's not the heredoc that's the problem, it's understanding which process is doing what at various times.
In your script that runs the sed command, Bash is substituting the variable before sed even sees it. Being a multi-line string, it would need escaping for sed). From the man page for sed, under the i command:
i \ Insert text, which has each embedded newline preceded by a back-slash.
Personally, I'd recommend using cat or echo if you can (or a scriping language like Python / Ruby / PHP), having broken the template up into atomic elements, so you can simply concatenate the relevant pieces together.
If you do want to continue with the current method though, you'll at least need to replace the newlines with backslashed newlines - try something like:
echo $APPLICATION_ENV_STATE | sed 's/$/\\/'
You're using the wrong tool. The only constructs you should be using in sed are s, g, and p (with -n). Just use awk and avoid all the quoting/escaping nonsense:
$ cat file
foo
// ** MySQL settings ** //
bar
$ awk -v app="defined('APPLICATION_ENV') || define('APPLICATION_ENV',(getenv('APPLICATION_ENV')
? getenv('APPLICATION_ENV') : 'production'));" '
{print} index($0,"// ** MySQL settings ** //"){print app}' file
foo
// ** MySQL settings ** //
defined('APPLICATION_ENV') || define('APPLICATION_ENV',(getenv('APPLICATION_ENV')
? getenv('APPLICATION_ENV') : 'production'));
bar
Notice that you don't need to escape the RE metachars in the string you want to search for because awk can treat a string as a string and you don't need to escape newlines in the string you're adding and you don't need a here doc with a shell variable, etc.
Your read/sed command as written would fail for various character combinations in your search and/or replacement strings - see Is it possible to escape regex metacharacters reliably with sed for how to robustly search/replace "strings" in sed but then just use awk so you don't have to worry about any of it.

Remove a line from a csv file bash, sed, bash

I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv

Resources