Separating onto a new line based on a delimiter - shell

I have some rows in my file that look like this
ENSG00000003096:E4.2|E5.1
ENSG00000035115:E14.2|E15.1
ENSG00000140987:E5.2|ENSG00000140987:E6.1
ENSG00000154358:E46.1|E47.1
I would like to separate them onto a new line based on the delimiter "|" , such that it becomes
ENSG00000003096:E4.2
ENSG00000003096:E5.1
ENSG00000035115:E14.2
ENSG00000035115:E15.1
ENSG00000140987:E5.2
ENSG00000140987:E6.1
ENSG00000154358:E46.1
ENSG00000154358:E47.1

With input data as advised in your question, this seems to work with gnu awk:
awk -F: -v RS="[|]|\n" 'NF==1{print p FS $0;next}NF!=1{p=$1}1' file1
#Output
ENSG00000003096:E4.2
ENSG00000003096:E5.1
ENSG00000035115:E14.2
ENSG00000035115:E15.1
ENSG00000140987:E5.2
ENSG00000140987:E6.1
ENSG00000154358:E46.1
ENSG00000154358:E47.1
Logic:
| or \n are used as record separator RS
: is used as field separator FS
If a line has more than one fields then keep the first field in a variable p
if a line has only one field then print previous $1 = variable p and the line $0

You may mean something like
awk 'BEGIN{FS=":"}{ split($2, fields, "|"); print $1 ":" fields[1]; print $1 ":" fields[2]; }' my_file.txt

Related

Add string to columns in bash

I have a comma-delimited file to which I want to append a string in specific columns. I am trying to do something like this, but couldn't do it until now.
re1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT
And I want to append 'some_string' to columns 3 and 4:
re1,1,some_stringa1e,some_stringa2e,AGT
re2,2,some_stringa1w,some_stringa2w,AGT
re3,3,some_stringa1t,some_stringa2t,ACGTCA
re12,4,some_stringb1e,some_stringb2e,ACGTACT
I was trying something similar to the suggestion solution, but to no avail:
awk -v OFS=$'\,' '{ $3="some_string" $3; print}' $lookup_file
Also, I would like my string to be added to both columns. How would you do this with awk or bash?
Thanks a lot in advance
You can do that with (almost) what you have:
pax> echo 're1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT' | awk 'BEGIN{FS=OFS=","}{$3 = "pre3:"$3; $4 = "pre4:"$4; print}'
re1,1,pre3:a1e,pre4:a2e,AGT
re2,2,pre3:a1w,pre4:a2w,AGT
re3,3,pre3:a1t,pre4:a2t,ACGTCA
re12,4,pre3:b1e,pre4:b2e,ACGTACT
The begin block sets the input and output field separators, the two assignments massage fields 3 and 4, and the print outputs the modified line.
You need to set FS to comma, not just OFS. There's a shortcut for setting FS, it's the -F option.
awk -F, -v OFS=',' '{ $3="some_string" $3; $4 = "some_string" $4; print}' "$lookup_file"
awk's default action is to concatenate, so you can simply place strings next to each other and they'll be treated as one. 1 means true, so with no {action} it will assume "print". You can use Bash's Brace Expansion to assign multiple variables after the script.
awk '{$3 = "three" $3; $4 = "four" $4} 1' {O,}FS=,

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

Awk, Shell Scripting

I have a file which has the following form:
#id|firstName|lastName|gender|birthday|creationDate|locationIP|browserUsed
111|Arkas|Sarkas|male|1995-09-11|2010-03-17T13:32:10.447+0000|192.248.2.123|Midori
Every field is separated with "|". I am writing a shell script and my goal is to remove the "-" from the fifth field (birthday), in order to make comparisons as if they were numbers.
For example i want the fifth field to be like |19950911|
The only solution I have reached so far, deletes all the "-" from each line which is not what I want using sed.
i would be extremely grateful if you show me a solution to my problem using awk.
If this is a homework writing the complete script will be a disservice. Some hints: the function you should be using is gsub in awk. The fifth field is $5 and you can set the field separator by -F'|' or in BEGIN block as FS="|"
Also, line numbers are in NR variable, to skip first line for example, you can add a condition NR>1
An awk one liner:
awk 'BEGIN { FS="|" } { gsub("-","",$5); print }' infile.txt
To keep "|" as output separator, it is better to define OFS value as "|" :
... | awk 'BEGIN { FS="|"; OFS="|"} {gsub("-","",$5); print $0 }'

replace a pipe delimited column using awk

I have a column seperated using pipe delimited , I have to replace the entire column with some other value .
Example :
A|B|C
I want to replace second column with "Z" as ,
A|Z|C
Replacing the second field can be done by setting up the input and output field separators and simply changing the second field before printing:
awk 'BEGIN {FS = OFS = "|"} {$2 = "Z"; print}' inputFileName
as per the following transcript:
pax$ printf 'A|B|C\nD|E|F\n' | awk 'BEGIN{FS=OFS="|"}{$2="Z";print}'
A|Z|C
D|Z|F
Below command provided the expected solution,
awk -F'|' '{$2="string";print}' file_name > new_file_name
$2 --> denotes the second position of the delimiter, |

How to set FS to eof?

I want to read whole file not per lines. How to change field separator to eof-symbol?
I do:
awk "^[0-9]+∆DD$1[PS].*$" $(ls -tr)
$1 - param (some integer), .* - message that I want to print. There is a problem: message can contains \n. In that way this code prints only first line of file. How can I scan whole file not per lines?
Can I do this using awk, sed, grep? Script must have length <= 60 characters (include spaces).
Assuming you mean record separator, not field separator, with GNU awk you'd do:
gawk -v RS='^$' '{ print "<" $0 ">" }' file
Replace the print with whatever you really want to do and update your question with some sample input and expected output if you want help with that part too.
The portable way to do this, by the way, is to build up the record line by line and then process it in the END section:
awk '{rec = rec (NR>1?RS:"") $0} END{ print "<" rec ">" }' file
using nf = split(rec,flds) to create fields if necessary.

Resources