Allow only specifi character else null should transfer in unix - shell

Allow characters in 2nd columns are 0 to 9 and A to Z and Symbol like "+" and "-", if allow character found in 2nd column then complete record should be Transfer else null should be Transfer in 2nd column
Input
- 1|89+
- 2|-AB
- 3|XY*
- 4|PR%
Output
- 1|89+
- 2|-AB
- 3|<null>
- 4|<null>
grep -E '^[a-zA-Z0-9\+\-\|]+$' file > file1
but above code is discard complete record if matching not found, I Need all records but if matching found then it should Transfer else null Transfer.

Use sed to replace everything after a pipe, that begins with zero or more characters in the class of digits, letters, plus or minus followed by one character not in that class up to the end of the string with a pipe only.
sed 's/\|[0-9a-zA-Z+-]*[^0-9a-zA-Z+-].*$/|/' file

Using awk and character classes where supported:
$ awk 'BEGIN{FS=OFS="|"}$2~/[^[:alnum:]+-]/{$2=""}1' file
1|89+
2|-AB
3|
4|
Where not supported (such as mawk) use:
$ awk 'BEGIN{FS=OFS="|"}$2~/[^A-Za-z0-9+-]/{$2=""}1' file

Related

Sed command to delete characters on specific location?

I have this sed command which add's 3 zero's to an id (this occurs only if the id is 13 characters long):
sed 's/^\(.\{14\}\)\([0-9]\{13\}[^0-9]\)/\1000\2/' file
My input looks like this:
A:AAAA:AA: :A:**0123456789ABC **:AAA:AAA : :AA: : :
And my output is this one:
A:AAAA:AA: :A:**0000123456789ABC **:AAA:AAA : :AA: : :
I want to get rid off the 3 whitespaces after the id number. I can't delete the entire column because I have different data on other records so I want to delete the spaces just in the records/lines I expanded previously. So maybe I just need to add something to the existing command.
As you can see there are other whitespaces on the record, but I just want to delete the ones next to de ID(bold one).
I only found ways to delete entire columns, but I haven't been able to find a way to delete specific characters.
Just add three spaces after the closing \):
sed 's/^\(.\{14\}\)\([0-9]\{13\}[^0-9]\) /\1000\2/'
To make it work for your example, you also need to extend [0-9] to [0-9A-C].
You can use
sed 's/^\(.\{14\}\)\([[:alnum:]]\{13\}\)[[:space:]]*:/\1000\2:/' file
See the online demo:
#!/bin/bash
s='A:AAAA:AA: :A:0123456789ABC :AAA:AAA : :AA: : :'
sed 's/^\(.\{14\}\)\([[:alnum:]]\{13\}\)[[:space:]]*:/\1000\2:/' <<< "$s"
Output:
A:AAAA:AA: :A:0000123456789ABC:AAA:AAA : :AA: : :
Notes:
[[:alnum:]]\{13\} - matches 13 alphanumeric chars, not just digits
[[:space:]]*: matches zero or more whitespaces and a : (hence, the : must be added into the replacement pattern).
Since you are working with delimited fields, when one of the fields before the one you are working with inevitably changes in size just counting from the start a fixed length will break this.
Consider using awk instead and work solely with the 6th field. First strip out spaces, then check the length. If 13, add the leading 3 zeroes. Lastly print out the line.
$ awk -F ':' 'BEGIN { OFS=":"}{ gsub(" ", "", $6) };{if(length($6) == 13)$6="000"$6;print $0}' file.txt
A:AAAA:AA: :A:0000123456789ABC:AAA:AAA : :AA: : :
$

Remove multiple file extesions when using gnu parallel and cat in bash

I have a csv file (separated by comma), which contains
file1a.extension.extension,file1b.extension.extension
file2a.extension.extension,file2b.extension.extension
Problem is, these files are name such as file.extension.extension
I'm trying to feed both columns to parallel and removing all extesions
I tried some variations of:
cat /home/filepairs.csv | sed 's/\..*//' | parallel --colsep ',' echo column 1 = {1}.extension.extension column 2 = {2}
Which I expected to output
column 1 = file1a.extension.extension column 2 = file1b
column 1 = file2a.extension.extension column 2 = file2b
But outputs:
column 1 = file1a.extension.extension column 2 =
column 1 = file2a.extension.extension column 2 =
The sed command is working but is feeding only column 1 to parallel
As currently written the sed only prints one name per line:
$ sed 's/\..*//' filepairs.csv
file1a
file2a
Where:
\. matches on first literal period (.)
.* matches rest of line (ie, everything after the first literal period to the end of the line)
// says to remove everything from the first literal period to the end of the line
I'm guessing what you really want is two names per line ... one sed idea:
$ sed 's/\.[^,]*//g' filepairs.csv
file1a,file1b
file2a,filepath2b
Where:
\. matches on first literal period (.)
[^,]* matches on everything up to a comma (or end of line)
//g says to remove the literal period, everything afterwards (up to a comma or end of line), and the g says to do it repeatedly (in this case the replacement occurs twice)
NOTE: I don't have parallel on my system so unable to test that portion of OP's code
Use --plus:
$ cat filepairs.csv | parallel --plus --colsep , echo {1..} {2..}
file1a file1b
file2a file2b
If the input is CSV:
$ cat filepairs.csv | parallel --plus --csv echo {1..} {2..}
file1a file1b
file2a file2b

How to convert a line into camel case?

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString
To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

Fetch values from particular file and display swapped values in terminal

I have a file named input.txt which contains students data in StudentName|Class|SchoolName format.
Shriii|Fourth|ADCET
Chaitraliii|Fourth|ADCET
Shubhangi|Fourth|ADCET
Prathamesh|Third|RIT
I want to display this values in reverse order for particular college. Example:
ADCET|Fourth|Shriii
ADCET|Fourth|Chaitraliii
I used grep 'ADCET$' input.txt which gives output
Shriii|Fourth|ADCET
Chaitraliii|Fourth|ADCET
But I want it in reverse order. I also used grep 'ADCET$' input.txt | sort -r but didn't get required output
Ref1
You may use either of the following sed or awk solutions:
grep 'ADCET$' input.txt | sed 's/^\([^|]*\)\(|.*|\)\([^|]*\)$/\3\2\1/'
grep 'ADCET$' input.txt | awk 'BEGIN {OFS=FS="|"} {temp=$NF;$NF=$1;$1=temp;}1'
See the online demo
awk details
BEGIN {OFS=FS="|"} - the field separator is set to | and the same char will be used for output
{temp=$NF;$NF=$1;$1=temp;}1:
temp=$NF; - the last field value is assigned to a temp variable
$NF=$1; - the last field is set to Field 1 value
$1=temp; - the value of Field 1 is set to temp
1 - makes the awk write the output.
sed details
^ - start of the line
\([^|]*\) - Capturing group 1: any 0+ chars other than |
\(|.*|\) - Capturing group 2: |, then any 0+ chars and then a |
\([^|]*\) - Capturing group 3: any 0+ chars other than|`
$ - end of line.
The \3\2\1 are placeholders for values captured into Groups 1, 2 and 3.

awk sed backreference csv file

A question to extend previous one here. (I prefer asking new question rather editing first one. I may be wrong)
EDIT : ok, I was wrong, I should edit my first question. My bad (SO question is an art, difficult to master)
I have csv file, with semi-column as field delimiter. Here is an extract of csv file :
...;field;(:);10000(n,d);(:);field;....
...;field;123.12(b);123(a);123.00(:);....
Here is the desired output :
...;field;(:);(n,d) 10000;(:);field;....
...;field;(b) 123.12;(a) 123;(:) 123.00;....
I search a solution to swap 2 patterns in each field.
pattern 1 : any digit, with optional decimal mark (.) and optional decimal digit
e.g : 1 / 1111.00 / 444444444.3 / 32 / 32.6666666 / 1.0 / ....
pattern 2 : any string that begin with left parenthesis, follow by one or more character, ending with right parenthesis
e.g : (n,a,p) / (:) / (llll) / (d) / (123) / (1;2;3) ...
Solutions provided in first question are right for simple file that contain only one column. If I try the solution within csv file, I face multiple failures.
So I try awk similar solution, which is (I think) more "column-oriented".
I have try
awk -F";" '{print gensub(/([[:digit:].]*)(\(.*\))/, "\\2 \\1", "g")}' file
I though by fixing field delimiter (;), "my regex swap" will succes in every field. It was a mistake.
Here is an exemple of failure
;(:);7320000(n,d);(:)
desired output --> ;(:);(n,d) 7320000;(:)
My questions (finally) : why awk fail when it success with one-column file. what is the best tool to face this challenge ?
sed with very long regex ?
awk with very long regex ?
for loop ?
other tools ?
PS : I know I am not clear. I have 2 problems (English language, technical limitations). Sorry.
Your "question" is far too long, cluttered, and containing too many separate questions to wade through but here's how to get the output you want from the input you provided with any sed:
$ sed 's/\([0-9][0-9.]*\)\(([^)]*)\)/\2 \1/g' file
...;field;(:);(n,d) 10000;(:);field;....
...;field;(b) 123.12;(a) 123;(:) 123.00;....
Well, when parsing simple delimetered files without any quoted values, usually awk comes to the rescue:
awk -vFS=';' -vOFS=';' '{
for (i = 1; i < NF; i++) {
split($i, t, "(")
if (length(t[1]) != 0 && length(t[2]) != 0) {
$i="("t[2]" "t[1]
}
}
print
}' <<EOF
...;field;(:);10000(n,d);(:);field;....
...;field;123.12(b);123(a);123.00(:);....
EOF
However this will fail if fields are quoted, ie. the separator ; comes inside the values...
First we set input and output seapartor as ;
We iterate through all the fields in the line for (i = 1; i < NF; i++)
We split the line over ( character
If the first field splitted over ( is nonzero length and the second field has also nonzero length
We swap the firelds for this fields and add a space (we also remember about the removed ( on the beginning).
And then the line get's printed.
A solution using sed and xargs, but you need to know the number of fields in advance:
{
sed 's/;/\n/g' |
sed 's/\([^(]\{1,\}\)\((.*)\)/\2 \1/' |
xargs -d '\n' -n7 -- printf "%s;%s;%s;%s;%s;%s;%s\n"
} <<EOF
...;field;(:);10000(n,d);(:);field;....
...;field;123.12(b);123(a);123.00(:);....
EOF
For each ; i do a newline
For each line i substitute the string with at least on character before ( and a string inside ).
I then merge 7 lines using ; as separator with xargs and printf.
This might work for you (GNU sed):
sed -r 's/([0-9]+(\.[0-9]+)?)(\([^)]*\))/\3 \1/g' file
Look for group of numbers (possibly with a decimal point) followed by a pair of parens and rearrange them in the desired fashion, globally through out each line.

Resources