How to place comma separated values into newline and remove all the id's before including colon - bash

I have below command output from the Linux System where it fetches the all the account names by comma separated which I want to be placed into newline's, so remove all the command and place individual account name into newline.
$ getent group pi_infra
pi_infra:*:5899:pxf59093,pxv07744,pxa02374,pxa07513,pxa08599,pxa11102,pxa30995,pxa34158,pxf07822,pxf29346,pxf30902,pxf31604,pxf31606,pxf31953,pxf34985,pxf41740,pxf41778,pxf43236,pxf43917,pxf45518,pxf46461,pxf49051,pxf58440,pxf58523,pxf58621,pxf60794,pxf60938,pxf61299,pxf63061,pxp08000,pxp25916,pxp42841,pxp68003,pxp69833,pxp87972
$ cat pi_in| sed 's/,/\n/g'
$ cat pi_in| tr ',' '\n'
Result From the above.
pi_infra:*:5899:pxf59093
pxv07744
pxa02374
pxa07513
pxa08599
pxa11102
pxa30995
pxa34158
pxf07822
pxf29346
pxf30902
pxf31604
pxf31606
pxf31953
pxf34985
pxf41740
pxf41778
pxf43236
pxf43917
pxf45518
pxf46461
pxf49051
pxf58440
pxf58523
pxf58621
pxf60794
pxf60938
pxf61299
pxf63061
pxp08000
pxp25916
pxp42841
pxp68003
pxp69833
pxp87972
As i want to remove all the stuff before : and only want ID printed hence i've chosen to use below.
$ cat pi_in| cut -d":" -f4 | tr ',' '\n'
pxf59093
pxv07744
pxa02374
pxa07513
pxa08599
pxa11102
pxa30995
pxa34158
pxf07822
pxf29346
pxf30902
pxf31604
pxf31606
pxf31953
pxf34985
pxf41740
pxf41778
pxf43236
pxf43917
pxf45518
pxf46461
pxf49051
pxf58440
pxf58523
pxf58621
pxf60794
pxf60938
pxf61299
pxf63061
pxp08000
pxp25916
pxp42841
pxp68003
pxp69833
pxp87972
This above works fine but looking it this all can be integrated into one rather using tr and cut two times distinctly.
Preferably awk or sed would be appropriate.
Thanks.

In awk could you please try following.
awk -F':' '{gsub(",",ORS,$4);print $4}' Input_file
2nd solution:
awk '{sub(/.*:/,"");gsub(/,/,ORS)} 1' Input_file

$ sed 's/.*://; y/,/\n/' file
pxf59093
pxv07744
pxa02374
...
s/.*:// removes everything preceding the last colon, and the colon itself, and y/,/\n/ does what tr does in your approach.

This might work for you (GNU sed):
sed 'y/,/\n/;/:/!P;D' file
Translate ,'s to newlines and don't print any line with a : in it.
N.B. The solution by #oguz ismail is more efficient and faster (with regards to a sed solution).

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

How to trim strings in a file

I have a file like this:
10.123.214.214:445
124.4235.123:443
124.34352.124.1:80
2142354.1341:80
12435.12412:70
Is there a way I can print everything before the :? I am thinking awk or sed would be the best way, but I am not sure how to come up with the right command
Expected output:
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412
$ cut -d: -f1 file
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412
It's the only thing cut exists to do - don't take away it's raison d'ĂȘtre :-).
In AWK, as follows:
BEGIN { FS=":" }
{ print $1 }
This makes the colon your Field Separator, so the first word ($1) is everything before the first colon on the line.
Another awk proposal.
awk -F: '{print $1}' file
10.123.214.214
124.4235.123
124.34352.124.1
2142354.1341
12435.12412
Just for completeness, this is the sed equivalent to keep the first column of a : delimited file:
sed 's/:.*//g' file1

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

Edit data removing line breaks and putting everything in a row

Hi I'm new in shell scripting and I have been unable to do this:
My data looks like this (much bigger actually):
>SampleName_ZN189A
01000001000000000000100011100000000111000000001000
00110000100000000000010000000000001100000010000000
00110000000000001110000010010011111000000100010000
00000110000001000000010100000000010000001000001110
0011
>SampleName_ZN189B
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
Note: After every 50 characters there is a line break, but sometimes less when the data finishes and there's a new sample name
I would like that after every 50 characters, the line break would be removed, so my data would look like this:
>SampleName_ZN189A
0100000100000000000010001110000000011100000000100000110000100000000000010000000000001100000010000000...
>SampleName_ZN189B
0011000000110100000101110000000000000000000001000100010000000000000010010000000000100100000001000000...
I tried using tr but I got an error:
tr '\n' '' < my_file
tr: empty string2
Thanks in advance
tr with "-d" deletes specified character
$ cat input.txt
00110000001101000001011100000000000000000000010001
00010000000000000010010000000000100100000001000000
00000000000000000000000010000000000010111010000000
01000110000000110000001010010000001111110101000000
1100
$ cat input.txt | tr -d "\n"
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
You can use this awk:
awk '/^ *>/{if (s) print s; print; s="";next} {s=s $0;next} END {print s}' file
>SampleName_ZN189A
010000010000000000001000111000000001110000000010000011000010000000000001000000000000110000001000000000110000000000001110000010010011111000000100010000000001100000010000000101000000000100000010000011100011
>SampleName_ZN189B
001100000011010000010111000000000000000000000100010001000000000000001001000000000010010000000100000000000000000000000000000010000000000010111010000000010001100000001100000010100100000011111101010000001100
Using awk
awk '/>/{print (NR==1)?$0:RS $0;next}{printf $0}' file
if you don't care of the result which has additional new line on first line, here is shorter one
awk '{printf (/>/?RS $0 RS:$0)}' file
This might work for you (GNU sed):
sed '/^\s*>/!{H;$!d};x;s/\n\s*//2gp;x;h;d' file
Build up the record in the hold space and when encountering the start of the next record or the end-of-file remove the newlines and print out.
you can use this sed,
sed '/^>Sample/!{ :loop; N; /\n>Sample/{n}; s/\n//; b loop; }' file.txt
Try this
cat SampleName_ZN189A | tr -d '\r'
# tr -d deletes the given/specified character from the input
Using simple awk, Same will be achievable.
awk 'BEGIN{ORS=""} {print}' SampleName_ZN189A #Output doesn't contains an carriage return
at the end, If u want an line break at the end this works.
awk 'BEGIN{ORS=""} {print}END{print "\r"}' SampleName_ZN189A
# select the correct line break charachter (i.e) \r (or) \n (\r\n) depends upon the file format.

Resources