Replacing newlines with commas at every third occurrence using AWK? - shell

For example: a given file has the following lines:
1
alpha
beta
2
charlie
delta
10
text
test
I'm trying to get the following output using awk:
1,alpha,beta
2,charlie,delta
10,text,test

Fairly simple. Use the output record separator as follows. Specify the comma delimiter when the line number is not divisible by 3 and the newline otherwise:
awk 'ORS=NR%3?",":"\n"' file

awk can handle this easily by manipulating ORS:
awk '{ORS=","} !(NR%3){ORS="\n"} 1' file
1,alpha,beta
2,charlie,delta
10,text,test

there is a tool for this kind of text processing pr
$ pr -3ats, file
1,alpha,beta
2,charlie,delta
10,text,test

You can also use xargs with sed to coalesce multiple lines into single lines, useful to know:
cat file|xargs -n3|sed 's/ /,/g'

Related

3 Is there anyway to insert new lines in-between two patterns

Is there anyway to insert new lines in-between 2 specific patterns of characters? I want to insert a new line every time "butterfly" occurs in a text file, however I want this new line to be inserted between the "butter" and "fly". For example butter\nfly
I also want to find the length of each line after splitting.
Eg:
if textfile contains:
fgsccgewvdhbejbecbecboubutterflybvdcvhkebcjl
vdjchvhecbihbutterflyglehblejkbedkbutterflyr
Then, I want a result like the following:
29 fgsccgewvdhbejbecbecboubutter
33 flybvdcvhkebcjlvdjchvhecbihbutter
22 flyglehblejkbedkbutter
4 flyr
I believe one way to tackle it would be to insert a new line using "sed" everywhere "butter" occurs and is followed by "fly". Strip out all blank line using grep with a -v flag. Then get the length of each line. However, even after trying a lot, I am unable to get the correct answer.
The Sed 's' sub-command + awk can work together:
sed -e "s/butterfly/butter\\nfly/g" < input.txt | awk '{ print length, $0 }'
This might work for you (GNU sed & bash):
sed -Ez 's/\n//g;s/(butter)(fly)/\1\n\2/g;s/^.*$/l=&;printf "%d %s\n" ${#l} &/meg' file
Slurp the file into memory using the -z sed option. Remove all existing newlines and then insert new ones between butter and fly. Using the m, g and e flags of the sed substitute command, split into separate lines and using bash make a variable l and via printf print the required format.

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

removing the first character in a text file using Sed

I have 100s of enormous text files in the form of
p 127210 3240293 23423234 3242323423
3242323 23423423 23423234 32423423
which I want to turn to
127210 3240293 23423234 3242323423
3242323 23423423 23423234 32423423
I've tried using
sed '1 s/^.//' input > output
but that gives me
127210 3240293 23423234 3242323423
3242323 23423423 23423234 32423423
i.e. an annoying space where the p was. Can anyone help me modify the sed expression to get the output without the space?
Thanks
Following awk may help you in same.
awk '!/^[0-9]+/{$1="";$0=$0;$1=$1} 1' Input_file
Explanation: Checking condition !/^[0-9]+/ which will check if a line doesn't start from digits then do following, then I am nullifying first field here(because you don't want p in output here), then I am re-arranging $0(current line) and re-arranging $1 too so that it could remove that initial space as per your request here.
Output will be as follows.
127210 3240293 23423234 3242323423
3242323 23423423 23423234 32423423
sed 's/^[[:alpha:]]?//;s/^[[:space:]]*//' infile >outifle
should do it.
You can remove non-numeric characters at the beginning of the lines:
sed 's/^[^0-9]*//' input > output
sed 's/^. //' foo.txt > foo2.txt

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

Remove a line from a csv file bash, sed, bash

I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv

Resources