sed or awk, removing curly brackets but only if there are no commas inside brackets - bash

i have this string:
ex00/{ft_strdup.c} ex04/{ft_convert_base.c,ft_convert_base2.c} ex05/{ft_split.c}
need to remove with sed the curly brackets if there is no comma inside brackets, so desired output:
ex00/ft_strdup.c ex04/{ft_convert_base.c,ft_convert_base2.c} ex05/ft_split.c

Using any sed:
$ sed 's/{\([^,}]*\)}/\1/g' file
ex00/ft_strdup.c ex04/{ft_convert_base.c,ft_convert_base2.c} ex05/ft_split.c
Note that the above will work no matter which characters except ,, {, }, or \n exist in your file names, e.g. these are all valid file names:
$ cat file
ex00/{ft_strdup1.c} ex05/{ft-split.c} ex05/{ft=s&pl#it.c}
$ sed 's/{\([^,}]*\)}/\1/g' file
ex00/ft_strdup1.c ex05/ft-split.c ex05/ft=s&pl#it.c
If your file names can contain any of the characters I mentioned above as excluded then ask a new question including those in your sample input/output.

With your shown samples please try following awk code. Written and tested in GNU awk.
awk -v RS='{[^}]*}' '
RT{
if(!sub(/,/,"&",RT)){ gsub(/^{|}$/,"",RT) }
}
{ ORS=RT }
1
END{ print "" }
' Input_file

Using sed
$ sed -E 's/\{([[:alpha:]_.]+)}/\1/g' input_file
touch ex00/ft_strdup.c ex04/{ft_convert_base.c,ft_convert_base2.c} ex05/ft_split.c

Related

How to ignore case when using awk or sed [duplicate]

sed -i '/first/i This line to be added'
In this case,how to ignore case while searching for pattern =first
You can use the following:
sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
Otherwise, you have the /I and n/i flags:
sed 's/first/last/Ig' file
From man sed:
I
i
The I modifier to regular-expression matching is a GNU extension which
makes sed match regexp in a case-insensitive manner.
Test
$ cat file
first
FiRst
FIRST
fir3st
$ sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
last
last
last
fir3st
$ sed 's/first/last/Ig' file
last
last
last
fir3st
GNU sed
sed '/first/Ii This line to be added' file
You can try
sed 's/first/somethingelse/gI'
if you want to save some typing, try awk. I don't think sed has that option
awk -v IGNORECASE="1" '/first/{your logic}' file
For versions of awk that don't understand the IGNORECASE special variable, you can use something like this:
awk 'toupper($0) ~ /PATTERN/ { print "string to insert" } 1' file
Convert each line to uppercase before testing whether it matches the pattern and if it does, print the string. 1 is the shortest true condition, so awk does the default thing: { print }.
To use a variable, you could go with this:
awk -v var="$foo" 'BEGIN { pattern = toupper(foo) } toupper($0) ~ pattern { print "string to insert" } 1' file
This passes the shell variable $foo and transforms it to uppercase before the file is processed.
Slightly shorter with bash would be to use -v pattern="${foo^^}" and skip the BEGIN block.
Use the following, \b for word boundary
sed 's/\bfirst\b/This line to be added/Ig' file

How can I prefix the output of each match in grep with some text?

I have a file with a list of phrases
apples
banananas
oranges
I'm running cat file.txt | xargs -I% sh -c "grep -Eio '(an)' >> output.txt"
What I can't figure out, is that I want the output to contain the original line, for example:
bananas,an
oranges,an
How can I prefix the output of grep to also include the value being piped to it?
This should be a task for awk, could you please try following.
awk '/an/{print $0",an"}' Input_file
This will look for string an in all lines of Input_file and append an in them too.
Solution with sed:
sed '/an/s/$/,an/' intput_file
This finds lines that match the pattern /an/, and appends ,an to the end of the pattern space $.
Use awk instead of grep:
$ awk -v s="an" ' # search string
BEGIN {
OFS="," # separating comma
}
match($0,s) { # when there is a match
print $0,substr($0,RSTART,RLENGTH) # output
}' file
Output:
banananas,an
oranges,an

Ignore comma after backslash in a line in a text file using awk or sed

I have a text file containing several lines of the following format:
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
I need to parse the text file and print the output of fields ignoring the escaped commas. Here those will be fields 2 or 3 like this:
science, social
tennis, ping_pong, chess
I do not know how to ignore escaped characters. How can I do it with awk or sed in terminal?
Substitute \, with a character that your records do not contain normally (e.g. \n), and restore it before printing. For example:
$ awk -F',' 'NR>1{ if(gsub(/\\,/,"\n")) gsub(/\n/,",",$2); print $2 }' file
science,social
painting
Since first gsub is performed on the whole record (i.e $0), awk is forced to recompute fields. But the second one is performed on only second field (i.e $2), so it will not affect other fields. See: Changing Fields.
To be able to extract multiple fields with properly escaped commas you need to gsub \ns in all fields with a for loop as in the following example:
$ awk 'BEGIN{ FS=OFS="," } NR>1{ if(gsub(/\\,/,"\n")) for(i=1;i<=NF;++i) gsub(/\n/,"\\,",$i); print $2,$3 }' file
science\,social,football
painting,tennis\,ping_pong\,chess
See also: What's the most robust way to efficiently parse CSV using awk?.
You could replace the \, sequences by another character that won't appear in your text, split the text around the remaining commas then replace the chosen character by commas :
sed $'s/\\\,/\31/g' input | awk -F, '{ printf "Name: %s\nSubjects : %s\nSports: %s\nSchool: %s\n\n", $1, $2, $3, $4 }' | tr $'\31' ','
In this case using the ASCII control char "Unit Separator" \31 which I'm pretty sure your input won't contain.
You can try it here.
Why awk and sed when bash with coreutils is just enough:
# Sorry my cat. Using `cat` as input pipe
cat <<EOF |
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
EOF
# remove first line!
tail -n+2 |
# substitute `\,` by an unreadable character:
sed 's/\\\,/\xff/g' |
# read the comma separated list
while IFS=, read -r name list_of_subjects list_of_sports school; do
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_subjects < <(printf "%s" "$list_of_subjects")
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_sports < <(printf "%s" "$list_of_sports")
echo "list_of_subjects : ${list_of_subjects[#]}"
echo "list_of_sports : ${list_of_sports[#]}"
done
will output:
list_of_subjects : science social
list_of_sports : football
list_of_subjects : painting
list_of_sports : tennis ping_pong chess
Note that this will be most probably slower then solution using awk.
Note that the principle of operation is the same as in other answers - substitute \, string by some other unique character and then use that character to iterate over the second and third field elemetns.
This might work for you (GNU sed):
sed -E 's/\\,/\n/g;y/,\n/\n,/;s/^[^,]*$//Mg;s/\n//g;/^$/d' file
Replace quoted commas by newlines and then revert newlines to commas and commas to newlines. Remove all lines that do not contain a comma. Delete empty lines.
Using Perl. Change the \, to some control char say \x01 and then replace it again with ,
$ cat laxman.txt
john,science\,social,football,florence_school
james,painting,tennis\,ping_pong\,chess,highmount_school
$ perl -ne ' s/\\,/\x01/g and print ' laxman.txt | perl -F, -lane ' for(#F) { if( /\x01/ ) { s/\x01/,/g ; print } } '
science,social
tennis,ping_pong,chess
You can perhaps join columns with a function.
function joincol(col, i) {
$col=$col FS $(col+1)
for (i=col+1; i<NF; i++) {
$i=$(i+1)
}
NF--
}
This might get used thusly:
{
for (col=1; col<=NF; col++) {
if ($col ~ /\\$/) {
joincol(col)
}
}
}
Note that decrementing NF is undefined behaviour in POSIX. It may delete the last field, or it may not, and still be POSIX compliant. This works for me in BSDawk and Gawk. YMMV. May contain nuts.
Use gawk's FPAT:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print $3}' file
#list_of_sports
#football
#tennis\,ping_pong\,chess
then use gnusub to replace the backslashes:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print gensub("\\\\", "", "g", $3)}' file
#list_of_sports
#football
#tennis,ping_pong,chess

Find the pattern (YYYY-MM-DD) and replace it with the same value concatenating with apostrophes

I have this kind of data:
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
And looking for solution which will replace each date in the file with the old value but adding apostrophes. Basically expected result from the example will be:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
I have found the way how to find the pattern which match my date, but still can't get with what I can then replace it.
sed 's/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/????/' a.txt > b.txt
Catch the date in a group by surrounding the pattern with parentheses (). Then you can use this catched group with \1 (second group would be \2 etc.).
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g"
Note the g at the end, which ensures that all matches are replaced (if there are more than one in one line).
If you add -r switch to sed, the awkward backslashes before () can be omitted:
sed -r "s/([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])/'\1'/g"
This can be further simplified using quantifiers:
sed -r "s/([0-9]{4}-[0-9]{2}-[0-9]{2})/'\1'/g"
Or even:
sed -r "s/([0-9]{4}-([0-9]{2}){2})/'\1'/g"
As mentioned in the comments: Also, in this particular case, you may use & instead of \1, which matches the whole looked-up expression, and omit the ():
sed -r "s/[0-9]{4}(-[0-9]{2}){2}/'&'/g"
You need to use a capture group, as well as replace all matching occurrences with the g flag.
sed 's/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'"'"'\1'"'"'/g' a.txt > b.txt
The replacement text is a bit confusing because a single-quoted string in shell cannot contain a single quote, so you have to close the single-quoted string, then use a double-quoted single-quote. Using $'...'-style quoting in bash simplies it a bit, at the cost of needing to escape the backslashes.
sed $'s/\\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\\)/\'\1\'/g' a.txt > b.txt
Or, you can simply double-quote the script, since there's nothing currently in it that is subject to expansion:
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g" a.txt > b.txt
There is also the special & replacement text, which expands to whatever the regular expressions matches, so you can avoid an explicit capture group:
sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/'&'/g" a.txt > b.txt
With GNU sed:
sed -E 's/([0-9]{2,4}-?){3}/'\''&'\''/g' file
Depending on your file content, the dates may also be described as 1 or 2 followed by a combination of nine dashes or digits:
sed -E 's/[12][-0-9]{9}/'\''&'\''/g" file
Here is one in awk:
$ awk -v q="'" '
BEGIN { FS=OFS="," } # set selimiters
{
for(i=1;i<=NF;i++) # loop all fields
if($i~/[0-9]{4}-[0-9]{2}-[0-9]{2}/) # if field has a date looking string
$i=q $i q # quote it
}1' file
Output:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
Could you please try following.(REGEX mentioned inside match could be written as [0-9]{4}-[0-9]{2}-[0-9]{2} too but since my awk is of old version so couldn't test it, you could try it once)
awk -v s1="'" '
{
while(match($0,/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/)){
val=val substr($0,1,RSTART-1) s1 substr($0,RSTART,RLENGTH) s1
$0=substr($0,RSTART+RLENGTH)
}
print val
val=""
}' Input_file
Output will be as follows.
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01'
With Perl, it is simple
perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g '
with inputs - \x27 is used for single quotes
$ cat liubo.txt
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
$ perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g ' liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$
If you want to use single quotes, then escape $ and wrap the command in double quotes
$ perl -pe " s/(\d{4}-\d\d-\d\d)/\'\$1\'/g " liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$

awk: copy from A to B and output..?

my file is bookmarks, backup-6.session
inside file is long long letters, i need copy all url (many) see here example inside
......"charset":"UTF-8","ID":3602197775,"docshellID":0,"originalURI":"https://www.youtube.com/watch?v=axxxxxxxxsxsx","docIdentifier":470,"structuredCloneState":"AAAAA.....
result to output text.txt
https://www.youtube.com/watch?v=axxxxxxxxsxsx
https://www.youtube.com/watch?v=bxxxxxxxxsxsx
https://www.youtube.com/watch?v=cxxxxxxxxsxsx
https://www.youtube.com/watch?v=dxxxxxxxxsxsx
....
....
there are start before than A "originalURI":" to end "
comand to be: AWK, SED.. (i dont know what is best command for me)
thank you
With GNU awk for multi-char RS and RT:
$ awk -v RS='"originalURI":"[^"]+' 'sub(/.*"/,"",RT){print RT}' file
https://www.youtube.com/watch?v=axxxxxxxxsxsx
You could also use grep, for example:
grep -oh "https://www\.youtube\.com/watch?v=[A-Za-z0-9]*" backup-6.session > text.txt
That is if the axxxxxxxxsxsx part contains only letters from A-Z, a-z or digits 0-9, and is not followed by any of those.
Notice the flags for grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
The awk solution would be as follows:
awk -F, '{ for (i=1;i<=NF;i++) { if ( $i ~ "originalURI") { spit($i,add,":");print gensub("\"","","g",add[2])":"gensub("\"","","g",add[3])} } }' filename
We loop through each field separated by "," and then pattern match against "originalURI" Then we split this string using ":" and the function split and remove the quotation marks with the function gensub.
The sed solution would be as follows:
sed -rn 's/^.*originalURI":"(.*)","docIdentifier.*$/\1/p' filename
Run sed with extended regular expression (-r) and suppress the output (-n) Substitute the string with the regular expression enclosed in brackets (/1) printing the result.

Resources