Combine multiple sed commands into one - bash

I have a file example.txt, I want to delete and replace fields in it.
The following commands are good, but in a very messy way, unfortunately I'm a rookie to sed command.
The commands I used:
sed 's/\-I\.\.\/\.\.\/\.\.//\n/g' example.txt > example.txt1
sed 's/\-I/\n/g' example.txt1 > example.txt2
sed '/^[[:space:]]*$/d' > example.txt2 example.txt3
sed 's/\.\.\/\.\.\/\.\.//g' > example.txt3 example.txt
and then I'm deleting all the unnecessary files.
I'm trying to get the following result:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
The file looks like this:
-I../../../Common/Component -I../../../Common/Component1 -I../../../Common/Component2 -I../../../Common/Component3 -I../../../Common/Component4 -I../../../Common/Component5 -I../../../Common/Component6 -IComp -IApp ../../../
I want to know how the best way to transform input format to output format standard text-processing tool with 1 call with sed tool or awk.

With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='-I\\S+' 'RT{sub(/^-I.*Common\//,"Common/Components/",RT);sub(/^-I/,"",RT);print RT}' Input_file
output with samples will be as follows:
Common/Components/Component
Common/Components/Component1
Common/Components/Component2
Common/Components/Component3
Common/Components/Component4
Common/Components/Component5
Common/Components/Component6
Comp
App
Explanation: Simple explanation would be, in GNU awk. Setting RS(record separator) as -I\\S+ -I till a space comes. In main awk program, check if RT is NOT NULL, substitute starting -I till Common with Common/Components/ in RT and then substitute starting -I with NULL in RT. Then printing RT here.

If you don't REALLY want the string /Components to be added in the middle of some output lines then this may be what you want, using any awk in any shell on every Unix box:
$ awk -v RS=' ' 'sub("^-I[./]*","")' file
Common/Component
Common/Component1
Common/Component2
Common/Component3
Common/Component4
Common/Component5
Common/Component6
Comp
App
That would fail if any of the paths in your input contained blanks but you don't show that as a possibility in your question so I assume it can't happen.

What about
sed -i 's/\-I\.\.\/\.\.\/\.\.//\n/g
s/\-I/\n/g
/^[[:space:]]*$/d
s/\.\.\/\.\.\/\.\.//g' example.txt

Related

How can I append a long text to a specific line in bash?

I have a bash script that copy and paste a txt file (T1,T2..T100) for a hundred times. I need to append to this text file a specific line of code at the 454th line of the txt file.
I tried to use sed and awk but they don't seem to work just fine.
I was trying to do something like this but it didn't work:
awk 'NR==454{print "try:
Mesh_1.ExportMED( r'/home/students/gbroilo/Desktop/Script/Mesh_1.med', 0, SMESH.MED_V2_2, 1, None ,1)
pass
except:
print 'ExportToMEDX() failed. Invalid file name?'"}1'
looks like your formatting of code is NOT correct which is causing syntax issues. For safer side I used \047 for ' in code and \n for new line which you need in your output.
awk 'NR==454{print "try:\n Mesh_1.ExportMED( r\047/home/students/gbroilo/Desktop/Script/Mesh_1.med\047, 0, SMESH.MED_V2_2, 1, None ,1)\n pass\nexcept:\n print \047ExportToMEDX() failed. Invalid file name?\047"}1' Input_file
OR in a non-one liner form of above code:
awk 'NR==454{print "try:\n Mesh_1.ExportMED( r\047/home/students/gbroilo/Desktop/Script/Mesh_1.med\047,\
0, SMESH.MED_V2_2, 1, None ,1)\n pass\nexcept:\n print \047ExportToMEDX() failed. \
Invalid file name?\047"}1' Input_file
Where \ is for letting terminal know that code is continuing without any new line in above code.
This might work for you (GNU sed):
sed -i '454a\line one with a quote '\''\n'\''another'\''\nlast line' fileT*
Another way:
cat <<\!>fileInsert
line one with a quote '
'another'
last line
!
sed -i '454r fileInsert' fileT*
Yet another:
cat <<\! | sed -i '454r /dev/stdin' fileT*
line one with a quote '
'another'
last line
!
If you want to insert specific text to specific files use GNU parallel, as so:
parallel -q sed -i '/454r fileInsert{}' fileT{} ::: {1..100}

Searching non-printable characters using hexadecimal notation using gawk and/or sed

In Windows command line I am trying to fix broken lines that happen in certain field separated by "|". In some business systems free text fields allow users to input return and these sometimes break the record line when the transaction is extracted.
I have GAWK(GNU Awk 3.1.0) and SED(GNU sed version 4.2.1) from UnxUtils and GnuWin. My data is as follows:
smith|Login|command line is my friend|2
oliver|Login|I have no idea
why I am here|10
dennis|Payroll|are we there yet?|100
smith|Time|going to have some fun|200
smith|Logout|here I come|10
The second line is broken due to reason explained in the first paragraph. Return at the end of broken line 2 is a regular Windows return and looks like x0D x0A in a hex editor.
While removing using sed or gawk instead of /n or /r type notations I would like to be able use a hex value(more than one is the case) to add more flexibility. The code should be able to replace it with something only if it appears in the third column. Only sed or (x)awk should be used. For gawk "sed style" on the fly replacing( as with -i parameter) method if possible would be helpful.
Tried the following but does not capture anything:
gawk -F "|" "$3 ~ /\x0D\x0A/" data.txt
Also tried replacing with
gawk -F "|" "{gsub(/\x0d\x0a/, \x20, $3); print }" OFS="|" data.txt
or
sed "s/\x0dx0a/\x20/g" data.txt
(was able to capture x20(space) with sed but no luck with returns)
It's not entirely clear what you're trying to do (why would you want to replace line endings with a blank char???) but this might get you on the right path:
awk -v RS='\r\n' -v ORS=' ' '1' file
and if you want inplace editing just add -i inplace up front.
This is all gawk-only for inplace editing and multi-char RS. You may also need to add -v BINMODE=3 (also gawk-only) depending on the platform you're running on to stop the underlying C primitives from stripping the \rs before gawk sees them.
Hang on, I see you're on gawk 3.1.0 - that is 5+ years out of date, upgrade your gawk version to get access to the latest bug fixes and features (including -i inplace).
Hang on 2 - are you actually trying to replace the newlines within records with a blank char? That's even simpler:
awk 'BEGIN{RS=ORS="\r\n"} {gsub(/\n/," ")} 1' file
For example (added a \s* before \n as your input has trailing white space that I assume you also want removed):
$ cat -v file
smith|Login|command line is my friend|2^M
oliver|Login|I have no idea
why I am here|10^M
dennis|Payroll|are we there yet?|100^M
smith|Time|going to have some fun|200^M
smith|Logout|here I come|10^M
$ awk 'BEGIN{RS=ORS="\r\n"} {gsub(/\s*\n/," ")} 1' file | cat -v
smith|Login|command line is my friend|2^M
oliver|Login|I have no idea why I am here|10^M
dennis|Payroll|are we there yet?|100^M
smith|Time|going to have some fun|200^M
smith|Logout|here I come|10^M
or to use UNIX line endings in the output instead of DOS just don't set ORS:
$ awk 'BEGIN{RS="\r\n"} {gsub(/\s*\n/," ")} 1' file | cat -v
smith|Login|command line is my friend|2
oliver|Login|I have no idea why I am here|10
dennis|Payroll|are we there yet?|100
smith|Time|going to have some fun|200
smith|Logout|here I come|10

Unix Shell scripting in AIX(Sed command)

I have a text file which consists of jobname,business name and time in min seperated with '-'(SfdcDataGovSeq-IntegraterJob-43).There are many jobs in this text file. I want to search with the jobname and change the time from 43 to 0 only for that particular row and update the same text file. Kindly advise what needs to be done.
Query that i am using : (cat test.txt | grep "SfdcDataGovSeq" | sed -e 's/43/0/' > test.txt) but the whole file is getting replaced with only one line.
sed -e '/SfdcDataGovSeq/ s/43/0/' test.txt
This will only replace if the search is positive.
Agreed with Ed, Here is a workaround to put word boundaries Although Equality with awk is robust.
sed -e '/SfdcDataGovSeq/ s/\<43\>/0/g' test.txt
You should be using awk instead of sed:
awk 'BEGIN{FS=OFS="-"} $1=="SfdcDataGovSeq" && $3==43{$3=0} 1' file
Since it does full string or numeric (not regexp) matches on specific fields, the above is far more robust than the currently accepted sed answer which would wreak havoc on your input file given various possible input values.

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

Remove a line from a csv file bash, sed, bash

I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv

Resources