Sort for specific lines in text - sorting

I have the given text:
# Blub
Hello this is a blub text.
# Bla
This is the bla text.
# Abba
Another text.
Is it possible to sort for the lines with the #? So that the resulting text is:
# Abba
Another text.
# Bla
This the bla text.
# Blub
Hello this is a blub text.
Preferably using vim or emacs.

In Emacs,
M-xsort-regexp-fields
Enter: #[^#]*
Enter: \&
The first regexp delimits the record, and the second specifies the key for sorting.
If you're at liberty to choose the marker character and use * instead of #, you may use org-mode's command org-sort-entries instead, which saves you from entering the regexps.

Something like:
:sort! /^#.+\n.+\n$/
I'm not sure about line block order.

You didn't tag it as such, but I think awk is the best tool for the job. Using gawk the following works:
gawk RS='\n\n' '{
gsub("\n$", "")
gsub("\n", "#")
print
}' file_to_be_sorted | sort | sed -e 's/$/\n/' -e 's/#/\n/'
Explanation
By setting the record separator (RS) to '\n\n' gawk creates records from each block. Each record is converted to be on one line with # as separator (gsub("\n", "#")), at this point normal sort works. sed is then used to recreate the blocks. gsub("\n$", "") fixes a whitespace issue with the last record.
Note: if any of the blocks contains # you need to choose a different separator.

Related

Replace spaces between two strings with symbol using sed

I have string like this:
20.07.2010|Berlin|id 100|bd-22.10.94|Marry Scott Robinson|msc#gmail.com
I need to replace whitespaces only between "Marry Scott Robinson" with "|". So to have bd-22.10.94|Marry|Scott|Robinson|
There many of such rows, so problem is in replace whitespace only between "bd-" and vertical line after name.
I'll assume that the name is always on the fifth column :
awk 'BEGIN{FS=OFS="|"}{gsub(/ /,OFS,$5)}1' file
If it is not the case, you can do :
awk 'BEGIN{FS=OFS="|"}{for(i=1;i<=NF;i++){if($i ~ /bd-/){break}};gsub(/ /,OFS,$(i+1))}1' file
Returns :
20.07.2010|Berlin|id 100|bd-22.10.94|Marry|Scott|Robinson|msc#gmail.com
Perl to the rescue!
perl -lne '($before, $change, $after) = /(.*\|bd-.*?\|)(.*?)(\|.*)/;
print $before, $change =~ s/ /|/gr, $after' -- file
-n reads the input line by line, running the code for each line
-l removes newlines from input and adds them to output
the first line populates three variables by values captured from the line. $before contains verything up to the first | after bd-; $change contains what follows up to the next |, and $after contains the rest.
s/ /|/gr replaces spaces by pipes (/g for "all of them") and returns (/r) the result.
This might work for you (GNU sed):
sed 's/[^|]*/\n&\n/5;:a;s/\(\n[^\n ]*\) /\1\|/;ta;s/\n//g' file
Sometimes to fix a problem we must erect scaffolding, then fix the original problem and finally remove the scaffolding.
Here we need to isolate the field by surrounding it by newlines.
Remove the spaces between the newlines by looping until failure.
Finally, remove the scaffolding i.e. the introduced newlines.
Another perl version:
$ perl -F'\|' -ne '$F[4] =~ tr/ /|/; print join("|", #F)' foo.txt
20.07.2010|Berlin|id 100|bd-22.10.94|Marry|Scott|Robinson|msc#gmail.com
Same basic idea as Corentin's first awk example. Split each line into columns based on |, replace spaces in the 5th one with |'s, print the re-joined lines.

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

Use sed to extract ascii hex string from a file

I have a file that looks like this:
$ some random
$ text
00ab2c3f03$ and more
random text
1a2bf04$ more text
blah blah
and the code that looks like this:
sed -ne 's/\(.*\)$ and.*/\1/p' "file.txt" > "output1.txt"
sed -ne 's/\(.*\)$ more.*/\1/p' "file.txt" > "output2.txt"
That gives me this 00ab2c3f03 and this 1a2bf04
So it extracts anything from the beginning of the line to the shell prompt and stores it in the file, twice for two different instances.
The problem is that the file sometimes looks like this:
/dir # some random
/dir # text
00ab2c3f03/dir # and more
random text
345fabd0067234234/dir # more text
blah blah
And I want to make an universal extractor that either:
extracts data from the beginning of the line to the '$' OR '/' characters
intelligently extracts random amount of random hex data from the beginning of the line up to the first non-hex digit
But I'm not so good with sed to actually think of an easy solution by myself...
I think you want the output like this,
$ cat file
$ some random
$ text
00ab2c3f03$ and more
random text
1a2bf04$ more text
blah blah
/dir # some random
/dir # text
00ab2c3f03/dir # and more
random text
345fabd0067234234/dir # more text
blah blah
$ sed -ne 's/\([a-f0-9]*\).* and more.*/\1/p' file
00ab2c3f03
00ab2c3f03
$ sed -ne 's/\([a-f0-9]*\).* more text.*/\1/p' file
1a2bf04
345fabd0067234234
You could try the below GNU sed command also. Because / present in your input, i changed the sed delimiter to ~,
$ sed -nr 's~([a-f0-9]*)\/*\$*.* and more.*~\1~p' file
00ab2c3f03
00ab2c3f03
$ sed -nr 's~([a-f0-9]*)\/*\$*.* more text.*~\1~p' file
1a2bf04
345fabd0067234234
Explanation:
([a-f0-9]*) - Captures all the hexdigits and stored it into a group.
OP said there may be chance of / or $ symbol present just after the hex digits so the regex should be \/*\$*(/ zero or more times, $ zero or more times) after capturing group.
First command only works on the lines which contains the strings and more.
And the second one only works on the lines which contain more text because op want the two outputs in two different files.
This seems better to me:
sed -nr 's#([[:xdigit:]]+)[$/].*#\1#p' file

Need help in displaying the output using shell script [duplicate]

This question already has answers here:
sed whole word search and replace
(5 answers)
Closed 8 years ago.
Please help me in solving the below issue. I have a file:
mat rat
mat dog
mat matress
I need to display
rat
dog
matress
I have coded with sed command to display the output: sed "s/$up//g"
($up will contain mat) . But using this command, I am getting the output as
rat
dog
ress
What do I do to resolve this?.
Please help.
The /g flag tries to apply the substitution command multiple times for each line. First two lines are fine because the word only appears once, but for the third line it will remove both.
You can solve it being more specific using zero-width assertions, like ^, or the GNU extension \b, like:
sed "s/^$up//g"
or
sed "s/$up\b//g"
Although the easier could be to remove the flag, like:
sed "s/$up//"
In all three cases the result is the same, at least for this kind of simple examples.
Using awk
awk '{print $NF}' inputFile
Test:
$ cat text
mat rat
mat dog
mat matress
$ awk '{print $NF}' text
rat
dog
matress
Your current command will remove all instances of $up anywhere, including multiple occurrences in a line and occurrences in the middle of a line.
If you want to match only $up at the very beginning of a line, and only when it is a whole (whitespace-delimited) word, try the following command:
sed "s/^$up\>//"
In GNU sed, the assertion ^ matches to the beginning of a line, and \> matches the end of a word (the zero-width "character" between a non-whitespace character and whitespace character).
If there might be whitespace before $up, you can use
sed "s/\(\s*\)$up\>/\1/"
This will remove just the $up and preserve all whitespace.
If you don't want to keep the whitespace between $up and the text after it, you can replace \> with \s\+, which matches to one or more (\+) whitespace characters (\s); i.e.,
sed "s/^$up\s\+//"
sed "s/\(\s*\)$up\s\+/\1/"
sed 's/^mat //' /path/to/file should do the trick. Note that there is no g; it's s/foo/bar; not s/foo/bar/g. Also, the ^ pegs the replacement to the beginning of each line.
If you are indeed assigning a variable such as $up, you can use sed "s/^$up//" /path/to/file.

Split a line of text in bash using SOH delimiter

How do I split a line of text in bash using SOH delimiter?
you can use the octal value of SOH as a delimiter using awk.
$ awk -F"\001" '{print $1}' file
You can set the IFS variable to change the delimiter between words. (Don't forget to save the old value so you can restore it when you're done.)
Based on a quick google I assume "SOH delimiter" is the character with code 1, so you need to get that odd character into IFS:
IFS=`echo -e '\01'`
If that's not enough, you probably need to expand on "split a line of text". What do you want to split it into?

Resources