grep line and line after into one line - bash

I have text file with lots of information. I'm interested in getting only aliases.
All alias and the port information are separated by a space, each port is separated by semicolon.
This command..
cat ~/Desktop/brocade_output.txt |grep -A1 alias
Gives me this output. All aliases start with a_ prefix.
> alias: a_computer_1
40:01:00:00:ab:00:00:aj; 60:01:00:0e:1e:d0:b5:fd
--
alias: a_helpdesk
41:00:00:24:fh:5c:99:9e; 81:00:00:24:ff:5c:48:9f
--
alias: a_library
91:00:00:24:fh:5c:99:9g; 91:00:00:24:ff:5c:48:9g
--
Desired output
a_computer_1 40:01:00:00:ab:00:00:aj 60:01:00:0e:1e:d0:b5:fd
a_helpdesk 41:00:00:24:fh:5c:99:9e 81:00:00:24:ff:5c:48:9f
a_library 91:00:00:24:fh:5c:99:9g 91:00:00:24:ff:5c:48:9g

In awk:
$ awk '/alias/ {f=$2;next} f{$1=$1; print f, $0; f=0 }' file
a_computer_1 40:01:00:00:ab:00:00:aj; 60:01:00:0e:1e:d0:b5:fd
a_helpdesk 41:00:00:24:fh:5c:99:9e; 81:00:00:24:ff:5c:48:9f
a_library 91:00:00:24:fh:5c:99:9g; 91:00:00:24:ff:5c:48:9g
Explained:
/alias/ {f=$2;next} when alias in record, f equals second field
f{$1=$1; print f, $0; f=0 } when fset, print it and the "next record"

Try this:
grep -A1 alias ~/Desktop/brocade_output.txt | cut -f2- -d":"
My understanding is you want the part of the line that follows the first :, so using : as the delimiter for cut and taking all fields starting at 2 should do the trick.

This can also be done in Perl with a few more characters:
perl -ane 'if (/alias/){$_=<>; s/^\s*//; print "$F[1] $_"}' file
These command-line options are used:
- -n loop around each line of the input file
- -a autosplit mode – split input lines into the #F array
- -e execute the perl code
/alias/ matches the current line
$F[1] is the second element in #F
$_=<> assigns the $_ default variable to the next line from the input file
s/^\s*// removes the leading spaces from $_

With sed:
sed -n '/alias:/N;{s/.*\(a_[^ ]* \)[[:space:]]*\(.*\)/\1\2/p;}' file
Add the -i flag to edit the file in place.

Related

Remove a substring from lines starting with a specific character

I am trying to change long names in rows starting with >, so that I only keep the part till Stage_V_sporulation_protein...:
>tr_A0A024P1W8_A0A024P1W8_9BACI_Stage_V_sporulation_protein_AE_OS=Halobacillus_karajensis_OX=195088_GN=BN983_00096_PE=4_SV=1
MTFLWAFLVGGGICVIGQILLDVFKLTPAHVMSSFVVAGAVLDAFDLYDNLIRFAGGGATVPITSFGHSLLHGAMEQADEHGVIGVAIGIFELTSAGIASAILFGFIVAVIFKPKG
>tr_A0A060LWV2_A0A060LWV2_9BACI_SpoIVAD_sporulation_protein_AEB_OS=Alkalihalobacillus_lehensis_G1_OX=1246626_GN=BleG1_2089_PE=4_SV=1
MIFLWAFLVGGVICVIGQLLMDVVKLTPAHTMSTLVVSGAVLAGFGLYEPLVDFAGAGATVPITSFGNSLVQGAMEEANQVGLIGIITGIFEITSAGISAAIIFGFIAALIFKPKG
I am doing a loop:
cat file.txt | while read line; do
if [[ $line = \>* ]] ; then
cut -d_ -f1-4 $line;
fi;
done
but in addresses files but not rows in the file (I get cut: >>tr_A0A024P1W8_A0A024P1W8_9BACI_Stage_V_sporulation_protein_AE_OS=Halobacillus_karajensis_OX=195088_GN=BN983_00096_PE=4_SV=1: No such file or directory).
My desired output is:
>tr_A0A024P1W8_A0A024P1W8_9BACI
MTFLWAFLVGGGICVIGQILLDVFKLTPAHVMSSFVVAGAVLDAFDLYDNLIRFAGGGATVPITSFGHSLLHGAMEQADEHGVIGVAIGIFELTSAGIASAILFGFIVAVIFKPKG
>tr_A0A060LWV2_A0A060LWV2_9BACI
MIFLWAFLVGGVICVIGQLLMDVVKLTPAHTMSTLVVSGAVLAGFGLYEPLVDFAGAGATVPITSFGNSLVQGAMEEANQVGLIGIITGIFEITSAGISAAIIFGFIAALIFKPKG
How do I change actual rows?
With the current state of the question, it seems easiest to do:
awk '/^>/ {print $1,$2,$3,$4; next}1' FS=_ OFS=_ file.txt
Lines that match the > at the beginning of the line get only the first four fields printed, separated by _ (the value of OFS). Lines that do not match are printing unchanged.
One way using sed:
sed -E '/^>/s/(.*)_Stage_V_sporulation_protein/\1/' file
A sed one-liner would be:
sed '/^>/s/^\(\([^_]*_\)\{3\}[^_]*\).*/\1/' file
Use this Perl one-liner to process the headers in your FASTA file:
perl -lpe 'if ( m{^>} ) { #f = split m{_}, $_; splice #f, 4; $_ = join "_", #f; }' file.txt > out.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
The one-liner uses split to split the input string on underscore into the array #f.
Then splice is used to remove from the array all elements except for the first 4 elements.
Finally, join joins these elements on an underscore.
All of the above is wrapped inside if ( m{^>} ) { ... } in order to limit the costly string manipulations only to the FASTA headers (the lines that start with >).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Cut-n-paste while preserving last blank line for empty match (awk or sed)

I have a two-line "keyword=keyvalue" line pattern (selectively excised from systemd/networkd.conf file):
DNS=0.0.0.0
DNS=
and need the following 2-line answer:
0.0.0.0
But all attempts using sed or awk resulted in omitting the newline if the last line pattern matching resulted in an empty match.
EDIT:
Oh, one last thing, this multiline-follow-cut result has to be stored back into a bash variable containing this same 'last blank-line" as well, so this is a two-step operation of preserving last-blank-line
multiline prepending-cut-out before (or save content after) the equal = symbol while preserving a newline ... in case of an empty result (this is the key here). Or possibly jerry-rig a weak fixup to attach a new-line in case of an empty match result at the last line.
save the multi-line result back into a bash variable
sed Approach
When performing cut up to and include that matched character in bash shell, the sed will remove any blank lines having an empty pattern match:
raw="DNS=0.0.0.0
DNS=
"
rawp="$(printf "%s\n" "$raw")"
kvs="$(echo "$rawp"| sed -e '/^[^=]*=/s///')"
echo "result: '${kvs}'"
gives the result:
0.0.0.0
without the corresponding blank line.
awk Approach
Awk has the same problem:
raw="DNS=0.0.0.0
DNS=
"
rawp="$(printf "%s\n" "$raw")"
kvs="$(echo "$rawp"| awk -F '=' -v OFS="" '{$1=""; print}')"
echo "result: '${kvs}'"
gives the same answer (it removed the blank line).
Please Advise
Somehow, I need the following answer:
0.0.0.0
in form of a two-line output containing 0.0.0.0 and a blank line.
Other Observations Made
I also noticed that if I provided a 3-line data as followed (two with a keyvalue and middle one without a keyvalue:
DNS=0.0.0.0
DNS=
DNS=999.999.999.999
Both sed and awk provided the correct answer:
0.0.0.0
999.999.999.999
Weird, uh?
The above regex (both sed and awk) works for:
a one-line with its keyvalue,
any-line provided that any lines have its non-empty keyvalue, BUT
last line MUST have a keyvalue.
Just doesn't work when the last-line has an empty keyvalue.
:-/
You can use this awk:
raw="DNS=0.0.0.0
DNS=
"
awk -F= 'NF == 2 {print $2}' <<< "$raw"
0.0.0.0
Following cut should also work:
cut -d= -f2 <<< "${raw%$'\n'}"
0.0.0.0
To store output including trailing line breaks use read with process substitution:
IFS= read -rd '' kvs < <(awk -F= 'NF == 2 {print $2}' <<< "$raw")
declare -p kvs
declare -- s="0.0.0.0
"
Code Demo:

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

output csv with lines that contains only one column

with input csv file
sid,storeNo,latitude,longitude
2,1,-28.03720000,153.42921670
9
I wish to output only the lines with one column, in this example it's line 3.
how can this be done in bash shell script?
Using awk
The following awk would be usfull
$ awk -F, 'NF==1' inputFile
9
What it does?
-F, sets the field separator as ,
NF==1 matches lines with NF, number of fields as 1. No action is provided hence default action, printing the entire record is taken. it is similar to NF==1{print $0}
inputFile input csv file to the awk script
Using grep
The same function can also be done using grep
$ grep -v ',' inputFile
9
-v option prints lines that do not match the pattern
, along with -v greps matches lines that do not contain , field separator
Using sed
$ sed -n '/^[^,]*$/p' inputFile
9
what it does?
-n suppresses normal printing of pattern space
'/^[^,]*$/ selects lines that match the pattern, lines without any ,
^ anchors the regex at the start of the string
[^,]* matches anything other than ,
$ anchors string at the end of string
p action p makes sed to print the current pattern space, that is pattern space matching the input
try this bash script
#!/bin/bash
while read -r line
do
IFS=","
set -- $line
case ${#} in
1) echo $line;;
*) continue;;
esac
done < file

How to output only text after a match with sed

I am using sed to find a certain match in a text file and then put this value in to a variable, my problem is that I only want the text after the match, and not the entire line.
Ans=$(sed -n '/^'$1':/,/~/{/:/{p;n};/~/q;p}' $file.txt)
Text File
q1:answer1
~
q2:answer2
~
q3:answer3
~
Actual Output
q1:answer1
Expected Output
answer1
With grep :
Ans=$(grep -oP "^$1:\K.*" file)
or with perl if your grep version doesn't support -P switch :
Ans=$(var=$1 perl -lne '/^$ENV{var}:\K.*/ and print $&' file)
In case a sed solution is needed - e.g., if answers could span multiple lines:
Ans=$(sed -r -n '/^'$1':(.*)/,/^(~)$/ { s//\1/; /^~$/q; p; }' file.txt)
(OSX users: use -E instead of -r).
Uses a backreference (\1) to replace the first matching line with its portion of interest; any other lines between the first matching one and the terminating ~ line are unaffected by the replacement (assuming they don't also start with $1:) and also printed.
Replace q with d if you don't want to quit after the first matching range.
By contrast, if the string of interest is limited to the line starting with $1:, there's no need to also match the ~ line, and the command can be simplified to:
Ans=$(sed -r -n '/^'$1':(.*)/ { s//\1/p; q; }' file.txt)
Remove q; if you don't want to quit after the first match.
However, the single-line case is more easily handled with a grep or awk solution - see #sputnick's and #anubhava's answers. If you wanted those to quit after the first match -- as in the snippets above and the code in the OP -- you'd need to add option -m 1 to the grep solution and ; exit to the awk solution (before the }).
Better use awk for this:
ans=$(awk -F':' -v s='q1' '$1 == s {print $2}' file)

Resources