Bash + sed/awk/cut to delete nth character - bash

I trying to delete 6,7 and 8th character for each line.
Below is the file containing text format.
Actual output..
#cat test
18:40:12,172.16.70.217,UP
18:42:15,172.16.70.218,DOWN
Expecting below, after formatting.
#cat test
18:40,172.16.70.217,UP
18:42,172.16.70.218,DOWN
Even I tried with below , no luck
#awk -F ":" '{print $1":"$2","$3}' test
18:40,12,172.16.70.217,UP
#sed 's/^\(.\{7\}\).\(.*\)/\1\2/' test { Here I can remove only one character }
18:40:1,172.16.70.217,UP
Even with cut also failed
#cut -d ":" -f1,2,3 test
18:40:12,172.16.70.217,UP
Need to delete character in each line like 6th , 7th , 8th
Suggestion please

With GNU cut you can use the --complement switch to remove characters 6 to 8:
cut --complement -c6-8 file
Otherwise, you can just select the rest of the characters yourself:
cut -c1-5,9- file
i.e. characters 1 to 5, then 9 to the end of each line.
With awk you could use substrings:
awk '{ print substr($0, 1, 5) substr($0, 9) }' file
Or you could write a regular expression, but the result will be more complex.
For example, to remove the last three characters from the first comma-separated field:
awk -F, -v OFS=, '{ sub(/...$/, "", $1) } 1' file
Or, using sed with a capture group:
sed -E 's/(.{5}).{3}/\1/' file
Capture the first 5 characters and use them in the replacement, dropping the next 3.

it's a structured text, why count the chars if you can describe them?
$ awk '{sub(":..,",",")}1' file
18:40,172.16.70.217,UP
18:42,172.16.70.218,DOWN
remove the seconds.

The solutions below are generic and assume no knowledge of any format. They just delete character 6,7 and 8 of any line.
sed:
sed 's/.//8;s/.//7;s/.//6' <file> # from high to low
sed 's/.//6;s/.//6;s/.//6' <file> # from low to high (subtract 1)
sed 's/\(.....\).../\1/' <file>
sed 's/\(.{5}\).../\1/' <file>
s/BRE/replacement/n :: substitute nth occurrence of BRE with replacement
awk:
awk 'BEGIN{OFS=FS=""}{$6=$7=$8="";print $0}' <file>
awk -F "" '{OFS=$6=$7=$8="";print}' <file>
awk -F "" '{OFS=$6=$7=$8=""}1' <file>
This is 3 times the same, removing the field separator FS let awk assume a field to be a character. We empty field 6,7 and 8, and reprint the line with an output field separator OFS which is empty.
cut:
cut -c -5,9- <file>
cut --complement -c 6-8 <file>

Just for fun, perl, where you can assign to a substring
perl -pe 'substr($_,5,3)=""' file

With awk :
echo "18:40:12,172.16.70.217,UP" | awk '{ $0 = ( substr($0,1,5) substr($0,9) ) ; print $0}'
Regards!

If you are running on bash, you can use the string manipulation functionality of it instead of having to call awk, sed, cut or whatever binary:
while read STRING
do
echo ${STRING:0:5}${STRING:9}
done < myfile.txt
${STRING:0:5} represents the first five characters of your string, ${STRING:9} represents the 9th character and all remaining characters until the end of the line. This way you cut out characters 6,7 and 8 ...

Related

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

print 1st string of a line if last 5 strings match input

I have a requirement to print the first string of a line if last 5 strings match specific input.
Example: Specified input is 2
India;1;2;3;4;5;6
Japan;1;2;2;2;2;2
China;2;2;2;2
England;2;2;2;2;2
Expected Output:
Japan
England
As you can see, China is excluded as it doesn't meet the requirement (last 5 digits have to be matched with the input).
grep ';2;2;2;2;2$' file | cut -d';' -f1
$ in a regex stands for "end of line", so grep will print all the lines that end in the given string
-d';' tells cut to delimit columns by semicolons
-f1 outputs the first column
You could use awk:
awk -F';' -v v="2" -v count=5 '
{
c=0;
for(i=2;i<=NF;i++){
if($i == v) c++
if(c>=count){print $1;next}
}
}' file
where
v is the value to match
count is the maximum number of value to print the wanted string
the for loop is parsing all fields delimited with a ; in order to find a match
This script doesn't need the 5 values 2 to be consecutive.
With sed:
sed -n 's/^\([^;]*\).*;2;2;2;2;2$/\1/p' file
It captures and output non ; first characters in lines ending with ;2;2;2;2;2
It can be shortened with GNU sed to:
sed -nE 's/^([^;]*).*(;2){5}$/\1/p' file
awk -F\; '/;2;2;2;2;2$/{print $1}' file
Japan
England

In shell script, can I specify more than one output delimiter in cut?

I have a file with the following data
abcdefghijklmnopqrst
I need an output using command cut as the following
bcd|klm.no|pq.rst
I need to include both the delimiters | and . in my output using cut.
I've got the output with one delimiter using this code
cat file.txt |cut -c2-4,11-13,14-15,16-17,18-20 --output-delimiter='|'
as
bcd|klm|no|pq|rst
Can I get 2 output delimiters by changing something in the command?
cut supports a single delimiter at a time, but you can easily replace this with a simple sed or Awk script.
awk '{ print substr($0, 2, 4) "|" substr($0, 11, 13) "."
substr($0, 14, 15) "|" substr($0, 16, 17) "."
substr($0,18, 20) }' file.txt
Coincidentally the cat is useless.
With GNU sed:
$ sed -r 's/.(.{3}).{6}(.{3})(.{2})(.{2})/\1|\2.\3|\4./' file.txt
bcd|klm.no|pq.rst
Two cuts (suggested by Gilez's comment to OP):
cut -c2-4,11-15,16-20 --output-delimiter='|' file.txt | \
cut -c1-7,8-13,14-15 --output-delimiter='.'
Output:
bcd|klm.no|pqr.st
Given such regular input, tr can also "answer" this with no pipe:
tr -s 'a-t' 'b-d|k-m.no|p-r.st' < file.txt
Note: tr can't insert, so it sounds impossible. Generally is is impossible, but the cheat is to include the answer in SET2, and squeeze out the rest.

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

Bash Text file formatting

I have some files with the following format:
555584280113;01-04-2013 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
552185022741;01-04-2013 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511965271852;01-04-2013 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511980644500;01-04-2013 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
553186398559;01-04-2013 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
558487839822;01-04-2013 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I need to have them with a sequence of 10 digits long at the beginning, removed the prefix 55 on the second column (which I have done with a simple sed 's/^55//g') and reformat the date to look like this:
0000000001;555584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;552185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;5511965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;5511980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;553186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I have the date part in a separate way:
cat file.txt | cut -d\; -f2 | awk '{print $1}' |awk -v OFS="-" -F"-" '{print $3$2$1}'
And it works, but I don't know how to put all of them together, the sequence + sed for the prefix + change the date format. The sequence part I'm not even sure how to do it.
Thanks for the help.
awk is one of the best tool out there used for text parsing and formatting. Here is one way of meeting your requirements:
awk '
BEGIN { FS = OFS = ";" }
{
printf "%010d;", NR
$1 = substr($1,3)
split($2, tmp, /[- ]/)
$2=tmp[3]tmp[2]tmp[1]" "tmp[4]
}1' file
We set the input and output field separator to ;
We use printf to format your first column number requirement
We use substr function to remove the first two characters of column 1
We use split function to format the time
Using 1 we print rest of the statement as is.
Output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
If the name of the input file is input, then the following command removes the 55, adds a 10-digit line number, and rearranges the date. With GNU sed:
nl -nrz -w10 -s\; input | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
If one is using Mac OSX (or another OS without GNU sed), then a slight change is required:
nl -nrz -w10 -s\; input | sed -E 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
Sample output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
How it works: nl is a handy *nix utility for adding line numbers. -w10 tells nl that we want 10 digit line numbers. -nrz tells nl to pad the line numbers with zeros, and -s\; tells nl to add a semicolon after the line number. (We have to escape the semicolon so that the shell ignores it.)
The remaining changes are handled by sed. The sed command s/55// removes the first occurrence of 55. The rearrangement of the date is handled by s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/.
You could actually use a Bash loop to do this.
i=0
while read f1 f2; do
((++i))
IFS=\; read n d <<< $f1
d=${d:6:4}${d:3:2}${d:0:2}
printf "%010d;%d;%d %s\n" $i $n $d $f2
done < file.txt

Resources