How to trim strings in a file - shell

I have a file like this:
10.123.214.214:445
124.4235.123:443
124.34352.124.1:80
2142354.1341:80
12435.12412:70
Is there a way I can print everything before the :? I am thinking awk or sed would be the best way, but I am not sure how to come up with the right command
Expected output:
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412

$ cut -d: -f1 file
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412
It's the only thing cut exists to do - don't take away it's raison d'être :-).

In AWK, as follows:
BEGIN { FS=":" }
{ print $1 }
This makes the colon your Field Separator, so the first word ($1) is everything before the first colon on the line.

Another awk proposal.
awk -F: '{print $1}' file
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412

Just for completeness, this is the sed equivalent to keep the first column of a : delimited file:
sed 's/:.*//g' file1

Related

How to replace all occurrence of a symbol with awk

From the cmd (awk 'some expression') I got a result in the format
Key:(white_space)Value
Key:(white_space)Value
...
How to manipulate the result to be in the format:
Key=Value
I need this because I want to put the information into .properties file format which is key=value
In other words I need to replace : with = and remove the whitespace.
Is there a command in awk that can achieve this ?
You ask for awk, while sed provides just as easy a solution. However, awk makes it trivial with sub as well:
awk '{ sub(/:[ \t]*/,"=") }1'
Example
$ echo "Key: Value" | awk '{ sub(/:[ \t]*/,"=") }1'
Key=Value
Another awk approach.
awk -F'[: ]' '{print $1 "=" $NF}' file.txt

How to place comma separated values into newline and remove all the id's before including colon

I have below command output from the Linux System where it fetches the all the account names by comma separated which I want to be placed into newline's, so remove all the command and place individual account name into newline.
$ getent group pi_infra
pi_infra:*:5899:pxf59093,pxv07744,pxa02374,pxa07513,pxa08599,pxa11102,pxa30995,pxa34158,pxf07822,pxf29346,pxf30902,pxf31604,pxf31606,pxf31953,pxf34985,pxf41740,pxf41778,pxf43236,pxf43917,pxf45518,pxf46461,pxf49051,pxf58440,pxf58523,pxf58621,pxf60794,pxf60938,pxf61299,pxf63061,pxp08000,pxp25916,pxp42841,pxp68003,pxp69833,pxp87972
$ cat pi_in| sed 's/,/\n/g'
$ cat pi_in| tr ',' '\n'
Result From the above.
pi_infra:*:5899:pxf59093
pxv07744
pxa02374
pxa07513
pxa08599
pxa11102
pxa30995
pxa34158
pxf07822
pxf29346
pxf30902
pxf31604
pxf31606
pxf31953
pxf34985
pxf41740
pxf41778
pxf43236
pxf43917
pxf45518
pxf46461
pxf49051
pxf58440
pxf58523
pxf58621
pxf60794
pxf60938
pxf61299
pxf63061
pxp08000
pxp25916
pxp42841
pxp68003
pxp69833
pxp87972
As i want to remove all the stuff before : and only want ID printed hence i've chosen to use below.
$ cat pi_in| cut -d":" -f4 | tr ',' '\n'
pxf59093
pxv07744
pxa02374
pxa07513
pxa08599
pxa11102
pxa30995
pxa34158
pxf07822
pxf29346
pxf30902
pxf31604
pxf31606
pxf31953
pxf34985
pxf41740
pxf41778
pxf43236
pxf43917
pxf45518
pxf46461
pxf49051
pxf58440
pxf58523
pxf58621
pxf60794
pxf60938
pxf61299
pxf63061
pxp08000
pxp25916
pxp42841
pxp68003
pxp69833
pxp87972
This above works fine but looking it this all can be integrated into one rather using tr and cut two times distinctly.
Preferably awk or sed would be appropriate.
Thanks.
In awk could you please try following.
awk -F':' '{gsub(",",ORS,$4);print $4}' Input_file
2nd solution:
awk '{sub(/.*:/,"");gsub(/,/,ORS)} 1' Input_file
$ sed 's/.*://; y/,/\n/' file
pxf59093
pxv07744
pxa02374
...
s/.*:// removes everything preceding the last colon, and the colon itself, and y/,/\n/ does what tr does in your approach.
This might work for you (GNU sed):
sed 'y/,/\n/;/:/!P;D' file
Translate ,'s to newlines and don't print any line with a : in it.
N.B. The solution by #oguz ismail is more efficient and faster (with regards to a sed solution).

Extract the last three columns from a text file with awk

I have a .txt file like this:
ENST00000000442 64073050 64074640 64073208 64074651 ESRRA
ENST00000000233 127228399 127228552 ARF5
ENST00000003100 91763679 91763844 CYP51A1
I want to get only the last 3 columns of each line.
as you see some times there are some empty lines between 2 lines which must be ignored. here is the output that I want to make:
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
awk  '/a/ {print $1- "\t" $-2 "\t" $-3}'  file.txt.
it does not return what I want. do you know how to correct the command?
Following awk may help you in same.
awk 'NF{print $(NF-2),$(NF-1),$NF}' OFS="\t" Input_file
Output will be as follows.
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
EDIT: Adding explanation of command too now.(NOTE this following command is for only explanation purposes one should run above command only to get the results)
awk 'NF ###Checking here condition NF(where NF is a out of the box variable for awk which tells number of fields in a line of a Input_file which is being read).
###So checking here if a line is NOT NULL or having number of fields value, if yes then do following.
{
print $(NF-2),$(NF-1),$NF###Printing values of $(NF-2) which means 3rd last field from current line then $(NF-1) 2nd last field from line and $NF means last field of current line.
}
' OFS="\t" Input_file ###Setting OFS(output field separator) as TAB here and mentioning the Input_file here.
You can use sed too
sed -E '/^$/d;s/.*\t(([^\t]*[\t|$]){2})/\1/' infile
With some piping:
$ cat file | tr -s '\n' | rev | cut -f 1-3 | rev
64073208 64074651 ESRRA
127228399 127228552 ARF5
91763679 91763844 CYP51A1
First, cat the file to tr to squeeze out repeted \ns to get rid of empty lines. Then reverse the lines, cut the first three fields and reverse again. You could replace the useless cat with the first rev.

Edit data removing line breaks and putting everything in a row

Hi I'm new in shell scripting and I have been unable to do this:
My data looks like this (much bigger actually):
>SampleName_ZN189A
01000001000000000000100011100000000111000000001000
00110000100000000000010000000000001100000010000000
00110000000000001110000010010011111000000100010000
00000110000001000000010100000000010000001000001110
0011
>SampleName_ZN189B
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
Note: After every 50 characters there is a line break, but sometimes less when the data finishes and there's a new sample name
I would like that after every 50 characters, the line break would be removed, so my data would look like this:
>SampleName_ZN189A
0100000100000000000010001110000000011100000000100000110000100000000000010000000000001100000010000000...
>SampleName_ZN189B
0011000000110100000101110000000000000000000001000100010000000000000010010000000000100100000001000000...
I tried using tr but I got an error:
tr '\n' '' < my_file
tr: empty string2
Thanks in advance
tr with "-d" deletes specified character
$ cat input.txt
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
$ cat input.txt | tr -d "\n"
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
You can use this awk:
awk '/^ *>/{if (s) print s; print; s="";next} {s=s $0;next} END {print s}' file
>SampleName_ZN189A
010000010000000000001000111000000001110000000010000011000010000000000001000000000000110000001000000000110000000000001110000010010011111000000100010000000001100000010000000101000000000100000010000011100011
>SampleName_ZN189B
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
Using awk
awk '/>/{print (NR==1)?$0:RS $0;next}{printf $0}' file
if you don't care of the result which has additional new line on first line, here is shorter one
awk '{printf (/>/?RS $0 RS:$0)}' file
This might work for you (GNU sed):
sed '/^\s*>/!{H;$!d};x;s/\n\s*//2gp;x;h;d' file
Build up the record in the hold space and when encountering the start of the next record or the end-of-file remove the newlines and print out.
you can use this sed,
sed '/^>Sample/!{ :loop; N; /\n>Sample/{n}; s/\n//; b loop; }' file.txt
Try this
cat SampleName_ZN189A | tr -d '\r'
# tr -d deletes the given/specified character from the input
Using simple awk, Same will be achievable.
awk 'BEGIN{ORS=""} {print}' SampleName_ZN189A #Output doesn't contains an carriage return
at the end, If u want an line break at the end this works.
awk 'BEGIN{ORS=""} {print}END{print "\r"}' SampleName_ZN189A
# select the correct line break charachter (i.e) \r (or) \n (\r\n) depends upon the file format.

How to extract the first column from a file with tab-separated columns

I have a file with the following format
EMAIL[TAB]NAME[TAB]ADRESSE[TAB]...
test#test.ma[TAB]toto[TAB]tatatata....
test#test.com[TAB]toto[TAB]tatatata....
[TAB] ===> tabulation
I would like with a bash script to remove all columns except the email:
the final output should be something like:
EMAIL
test#test.ma
test#test.com
You could use
cut -f1 < inputfile.txt
awk 'BEGIN { FS="\t" } { print $1 }' a
Try this:
sed -e 's/\t.*//' yourfile
(The answer using cut would be my first choice though.)

Resources