In shell script, can I specify more than one output delimiter in cut? - bash

I have a file with the following data
abcdefghijklmnopqrst
I need an output using command cut as the following
bcd|klm.no|pq.rst
I need to include both the delimiters | and . in my output using cut.
I've got the output with one delimiter using this code
cat file.txt |cut -c2-4,11-13,14-15,16-17,18-20 --output-delimiter='|'
as
bcd|klm|no|pq|rst
Can I get 2 output delimiters by changing something in the command?

cut supports a single delimiter at a time, but you can easily replace this with a simple sed or Awk script.
awk '{ print substr($0, 2, 4) "|" substr($0, 11, 13) "."
substr($0, 14, 15) "|" substr($0, 16, 17) "."
substr($0,18, 20) }' file.txt
Coincidentally the cat is useless.

With GNU sed:
$ sed -r 's/.(.{3}).{6}(.{3})(.{2})(.{2})/\1|\2.\3|\4./' file.txt
bcd|klm.no|pq.rst

Two cuts (suggested by Gilez's comment to OP):
cut -c2-4,11-15,16-20 --output-delimiter='|' file.txt | \
cut -c1-7,8-13,14-15 --output-delimiter='.'
Output:
bcd|klm.no|pqr.st
Given such regular input, tr can also "answer" this with no pipe:
tr -s 'a-t' 'b-d|k-m.no|p-r.st' < file.txt
Note: tr can't insert, so it sounds impossible. Generally is is impossible, but the cheat is to include the answer in SET2, and squeeze out the rest.

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

How to do text processing using awk to cut last field in a line?

I am having this scenario and need if I can improvise the awk output.
cat example.txt
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fb11b768-4d9f-4e83-b7dc-ee677f496fc9",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fbee83e8-a84a-4b22-8197-fc9cc924801f",
"id": "/subscriptions/fbfa3437-c63c-4ed7-b9d3-fe595221950d/resourceGroups/rg-ooty/providers/Microsoft.Compute/virtualMachines/fc224f83-57f4-41eb-aee3-78f18d055704",
I am looking to cut the pattern after /virtualMachines/
Hence, used the below awk command to get the output.
cat example.txt | awk '{print $2}' | awk -F"/" '{print $(NF)}' | awk -F'",' '{print $1}'
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
Is there any way I can use some options like 'getline' or multiple awk options in single awk execution or better ways to improve the command to get the output?
Please suggest.
Use " and / as field separators and print second last field:
awk -F '["/]' '{print $(NF-1)}' file
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
If the spacing of example.txt is as consistent as it seems, then it's simpler to use cut with the -characters count option:
cut -c 127-162 example.txt
Output:
fb11b768-4d9f-4e83-b7dc-ee677f496fc9
fbee83e8-a84a-4b22-8197-fc9cc924801f
fc224f83-57f4-41eb-aee3-78f18d055704
You could also use sed for this:
sed 's#.*/\([^/]*\)",#\1#' example.txt
Matches anything .* forwardslash / then captures \( any number of non-forwardslash characters [^/]*, ends the capture \) followed by a quote & comma to end ",, and replaces this with the captured group (anything between the forwardslash and the ", at the end.

Bash + sed/awk/cut to delete nth character

I trying to delete 6,7 and 8th character for each line.
Below is the file containing text format.
Actual output..
#cat test
18:40:12,172.16.70.217,UP
18:42:15,172.16.70.218,DOWN
Expecting below, after formatting.
#cat test
18:40,172.16.70.217,UP
18:42,172.16.70.218,DOWN
Even I tried with below , no luck
#awk -F ":" '{print $1":"$2","$3}' test
18:40,12,172.16.70.217,UP
#sed 's/^\(.\{7\}\).\(.*\)/\1\2/' test { Here I can remove only one character }
18:40:1,172.16.70.217,UP
Even with cut also failed
#cut -d ":" -f1,2,3 test
18:40:12,172.16.70.217,UP
Need to delete character in each line like 6th , 7th , 8th
Suggestion please
With GNU cut you can use the --complement switch to remove characters 6 to 8:
cut --complement -c6-8 file
Otherwise, you can just select the rest of the characters yourself:
cut -c1-5,9- file
i.e. characters 1 to 5, then 9 to the end of each line.
With awk you could use substrings:
awk '{ print substr($0, 1, 5) substr($0, 9) }' file
Or you could write a regular expression, but the result will be more complex.
For example, to remove the last three characters from the first comma-separated field:
awk -F, -v OFS=, '{ sub(/...$/, "", $1) } 1' file
Or, using sed with a capture group:
sed -E 's/(.{5}).{3}/\1/' file
Capture the first 5 characters and use them in the replacement, dropping the next 3.
it's a structured text, why count the chars if you can describe them?
$ awk '{sub(":..,",",")}1' file
18:40,172.16.70.217,UP
18:42,172.16.70.218,DOWN
remove the seconds.
The solutions below are generic and assume no knowledge of any format. They just delete character 6,7 and 8 of any line.
sed:
sed 's/.//8;s/.//7;s/.//6' <file> # from high to low
sed 's/.//6;s/.//6;s/.//6' <file> # from low to high (subtract 1)
sed 's/\(.....\).../\1/' <file>
sed 's/\(.{5}\).../\1/' <file>
s/BRE/replacement/n :: substitute nth occurrence of BRE with replacement
awk:
awk 'BEGIN{OFS=FS=""}{$6=$7=$8="";print $0}' <file>
awk -F "" '{OFS=$6=$7=$8="";print}' <file>
awk -F "" '{OFS=$6=$7=$8=""}1' <file>
This is 3 times the same, removing the field separator FS let awk assume a field to be a character. We empty field 6,7 and 8, and reprint the line with an output field separator OFS which is empty.
cut:
cut -c -5,9- <file>
cut --complement -c 6-8 <file>
Just for fun, perl, where you can assign to a substring
perl -pe 'substr($_,5,3)=""' file
With awk :
echo "18:40:12,172.16.70.217,UP" | awk '{ $0 = ( substr($0,1,5) substr($0,9) ) ; print $0}'
Regards!
If you are running on bash, you can use the string manipulation functionality of it instead of having to call awk, sed, cut or whatever binary:
while read STRING
do
echo ${STRING:0:5}${STRING:9}
done < myfile.txt
${STRING:0:5} represents the first five characters of your string, ${STRING:9} represents the 9th character and all remaining characters until the end of the line. This way you cut out characters 6,7 and 8 ...

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

Resources