Unix Shell scripting in AIX(Sed command) - shell

I have a text file which consists of jobname,business name and time in min seperated with '-'(SfdcDataGovSeq-IntegraterJob-43).There are many jobs in this text file. I want to search with the jobname and change the time from 43 to 0 only for that particular row and update the same text file. Kindly advise what needs to be done.
Query that i am using : (cat test.txt | grep "SfdcDataGovSeq" | sed -e 's/43/0/' > test.txt) but the whole file is getting replaced with only one line.

sed -e '/SfdcDataGovSeq/ s/43/0/' test.txt
This will only replace if the search is positive.
Agreed with Ed, Here is a workaround to put word boundaries Although Equality with awk is robust.
sed -e '/SfdcDataGovSeq/ s/\<43\>/0/g' test.txt

You should be using awk instead of sed:
awk 'BEGIN{FS=OFS="-"} $1=="SfdcDataGovSeq" && $3==43{$3=0} 1' file
Since it does full string or numeric (not regexp) matches on specific fields, the above is far more robust than the currently accepted sed answer which would wreak havoc on your input file given various possible input values.

Related

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

Shell scripting to find the delimiter

I have a file with three columns, which has pipe as a delimiter. Now some lines in the file can have a "," instead of "|", due to some error. I want to output all such erroneous rows.
You can also use grep, it is more complicated:
egrep "\|.*\|.*\|" input
echo No pipe
egrep "^[^\|]*$" input
echo One pipe
egrep "^[^\|]*\|[^\|\]*$" input
echo 3+ pipe
egrep "\|[^\|]*\|[^\|\]*\|" input
Before combining the greps, first introduce new variables
p (pipe) and n (no pipe)
p="\|"
n="[^\|]*"
echo "p=$p, n=$n"
echo No pipe
egrep "^$n$" input
echo One pipe
egrep "^$n$p$n$" input
echo 3+ pipe
egrep "$p$n$p$n$p" input
Now bring all together
egrep "^$n$|^$n$p$n$|$p$n$p$n$p" input
Edit: The comments and variable names were about "slashes", but they are pipes (with backslashes). That was a bit confusing.
To count the number of columns with awk you can use the NF variable:
$ cat file
ABC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
$ awk -F\| 'NF!=3' file
ssdf|fdas,sdfsf
However, this does not seem to cover all the possible ways the data could be corrupted based on the various revisions of the question and the comments.
A better approach would be to define the exact format that the data must follow. For example, assuming that a line is "correct" if it is three columns, with the first and third letters only, and the second numeric, you could write the following script to match all non conforming lines:
awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2)' file
Test (notice that only the second line (which is conforming) does not get printed):
$ cat file
A,BC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|123
$ awk -F\| '!(NF==3&&$1$3~/^[a-zA-Z]+$/&&$2+0==$2)' file
A,BC|12345|EAR
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|12
You can adapt the above command to your specific needs, based on what you think is a valid input. For example, if you wanted to also restrict the length of each line to 50 characters, you could do
awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2 && length($0)<50)' file

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

Remove a line from a csv file bash, sed, bash

I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv

Resources