Split line from the end (cross-platform) - bash

I have text that needs to be split; namely, put space after two characters from the end of the line. From "4.20GB" you need to get "4.20 GB". I know it can be done with sed, awk, etc., but I am looking for a light and more cross-platform method (for Linux/Unix/BSD).
Is it possible to do it with bash and its functions? For some reason, I thought printf could do it, but a quick check didn't yield anything positive.

You are looking for "more cross-platform method than sed" and then asking "Is it possible to do it with bash and its functions?"
It's safe bet to say that sed is installed (or "easily installable") on more computer architectures than bash so using sed should be more "cross-platform" than using bash.

If I understand you correctly, each line that finishes with some digits followed by GB, needs you to add a space before GB. I shouldn't use the word split that suggests you want to split one line into two lines.
Try:
sed -i 's/GB$/ GB/' [filenames ...]
I think that sed is more "cross-platform" than bash, because wherever you have bash, you easily will have sed, as #fuxoft says in his answer.

Related

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

Understanding 'sed' command

I am currently trying to install GCC-4.1.2 on my machine: Fedora 20.
In the instruction, the first three commands involve using 'sed' commands, for Makefile modification. However, I am having difficulty in using those commands properly for my case. The website link for GCC-4.1.2.
The commands are:
sed -i 's/install_to_$(INSTALL_DEST) //' libiberty/Makefile.in &&
sed -i 's#\./fixinc\.sh#-c true#' gcc/Makefile.in &&
sed -i 's/#have_mktemp_command#/yes/' gcc/gccbug.in &&
I am trying to understand them by reading the 'sed' man page, but it is not so easy to do so. Any help/tip would be appreciated!
First, the shell part: &&. That just chains the commands together, so each subsequent line will only be run if the prior one is run successfully.
sed -i means "run these commands inline on the file", that is, modify the file directly instead of printing the changed contents to STDOUT. Each sed command here (the string) is a substitute command, which we can tell because the command starts with s.
Substitute looks for a piece of text in the file, and then replaces it. So the order is always s/needle/replacement/. See how the first and last lines have those same forward-slashes? That's the traditional delimiter between the command (substitute), the needle to find in the haystack (install_to_$(INSTALL_DEST), and the text to replace it with ().
So, the first one looks for the string and deletes it (the empty replacement). The last one looks for #have_mktemp_command# and replaces it with yes.
The middle one is a bit weird. See how it starts with s# instead of s/? Well, sed will let you use any delimiter you like to separate the needle from the replacement. Since this needle had a / in it (\./fixinc\.sh), it made sense to use a different delimiter than /. It will replace the text ./fixinc.sh with -c true.
Last note: Why does the second needle have \. instead of .? Well, in a Regular Expression like the needle is (but not used in your example), some characters are magical and do magical fairy dust operations. One of those magic characters is .. To avoid the magic, we put a \ in front of it, escaping away from the magic. (The magic is "match any character", and we want a literal period. That's why.)

search&replace on huge txt files

I need a text processing tool that can perform search and replace operations PER LINE on HUGE TEXT FILES (>0.5 GB). Can be either windows or linux based. (I don't know if there is anything like a streamreader/writer in Linux but I have a feeling that it would be the ideal solution. The editors I have tries so far load the whole file into the momory.)
Bonus question: a tool that can MERGE two huge texts on a per line basis, separated with e.g. tabs
Sounds like you want sed. For example,
sed 's/foo/bar/' < big-input-file > big-output-file
should replace the first occurrence of foo by bar in each line of big-input-file, writing the results to big-output-file.
Bonus answer: I just learned about paste, which seems to be exactly what you want for your bonus question.
'sed' is built into Linux/Unix, and is available for Windows. I believe that it only loads a buffer at a time (not the whole file) -- you might try that.
What would you be trying to do with the merge -- interleaved in some way, rather than just concatenating?
Add: interleave.pl
use strict;
use warnings;
my $B;
open INA, $ARGV[0];
open INB, $ARGV[1];
while (<INA>) {
print $_;
$B = <INB>;
print $B;
}
close INA;
close INB;
run: perl interleave.pl fileA fileB > mergedFile
Note that this is a very bare-bones utility. It does not check if the files exist, and it expects that the files have the same number of lines.
I would use perl for this. It is easy to read a file line by line, has great search/repace available using regular expressions, and will enable you to merge, and you can make your perl script aware of both files.

Why does sed not replace overlapping patterns

I have a database unload file with field separated with the <TAB> character. I am running this file through sed to replace any occurences of <TAB><TAB> with <TAB>\N<TAB>. This is so that when the file is loaded into MySQL the \N in interpreted as NULL.
The sed command 's/\t\t/\t\N\t/g;' almost works except that it only replaces the first instance e.g. "...<TAB><TAB><TAB>..." becomes "...<TAB>\N<TAB><TAB>...".
If I use 's/\t\t/\t\N\t/g;s/\t\t/\t\N\t/g;' it replaces more instances.
I have a notion that despite the /g modifier this is something to do with the end of one match being the start of another.
Could anyone explain what is happening and suggest a sed command that would work or do I need to loop.
I know I could probably switch to awk, perl, python but I want to know what is happening in sed.
Not dissimilar to the perl solution, this works for me using pure sed
With #Robin A. Meade improvement
sed ':repeat;
s|\t\t|\t\n\t|g;
t repeat'
Explanation
:repeat is a label, used for branch commands, similar to batch
s|\t\t|\t\n\t|g; - Standard replace 2 tabs with tab-newline-tab. I still use the global flag because if you have, say, 15 tabs, you will only need to loop twice, rather than 14 times.
t repeat means if the "s" command did any replaces, then goto the label repeat, else it goes onto the next line and starts over again.
So it goes like this. Keep repeating (goto repeat) as long as there is a match for the pattern of 2 tabs.
While the argument can be made that you could just do two identical global replaces and call it good, this same technique could work in more complicated scenarios.
As #thorn-blake points out, sed just doesn't support advanced features like lookahead, so you need to do a loop like this.
Original Answer
sed ':repeat;
/\t\t/{
s|\t\t|\t\n\t|g;
b repeat
}'
Explanation
:repeat is a label, used for branch commands, similar to batch
/\t\t/ means match the pattern 2 tabs. If the pattern it matched, the command following the second / is executed.
{} - In this case the command following the match command is a group. So all of the commands in the group are executed if the match pattern is met.
s|\t\t|\t\n\t|g; - Standard replace 2 tabs with tab-newline-tab. I still use the global because if you have say 15 tabs, you will only need to loop twice, rather than 14 times.
b repeat means always goto (branch) the label repeat
Short version
Which can be shortened to
sed ':r;s|\t\t|\t\n\t|g; t r'
# Original answer
# sed ':r;/\t\t/{s|\t\t|\t\n\t|g; b r}'
MacOS
And the Mac (yet still Linux/Windows compatible) version:
sed $':r\ns|\t\t|\t\\\n\t|g; t r'
# Original answer
# sed $':r\n/\t\t/{ s|\t\t|\t\\\n\t|g; b r\n}'
Tabs need to be literal in BSD sed
Newlines need to be both literal and escaped at the same time, hence the single slash (that's \ before it is processed by the $, making it a single literal slash ) plus the \n which becomes an actual newline
Both label names (:r) and branch commands (b r when not the end of the expression) must end in a newline. Special characters like semicolons and spaces are consumed by the label name/branch command in BSD, which makes it all very confusing.
I know you want sed, but sed doesn't like this at all, it seems that it specifically (see here) won't do what you want. However, perl will do it (AFAIK):
perl -pe 'while (s#\t\t#\t\n\t#) {}' <filename>
As a workaround, replace every tab with tab + \N; then remove all occurrences of \N which are not immediately followed by a tab.
sed -e 's/\t/\t\\N/g' -e 's/\\N\([^\t]\)/\1/g'
... provided your sed uses backslash before grouping parentheses (there are sed dialects which don't want the backslashes; try without them if this doesn't work for you.)
Right, even with /g, sed will not match the text it replaced again. Thus, it's read <TAB><TAB> and output <TAB>\N<TAB> and then reads the next thing in from the input stream. See http://www.grymoire.com/Unix/Sed.html#uh-7
In a regex language that supports lookaheads, you can get around this with a lookahead.
Well, sed simply works as designed. The input line is scanned once, not multiple times. Maybe it helps to look at the consequences if sed used rescanning the input line to deal with overlapping patterns by default: in this case even simple substitutions would work quite differently--some might say counter-intuitively--, e.g.
s/^/ / inserting a space at the beginning of a line would never terminate
s/$/foo/ appending foo to each line - likewise
s/[A-Z][A-Z]*/CENSORED/ replacing uppercase words with CENSORED - likewise
There are probably many other situations. Of course these could all be remedied with, say, a substitution modifier, but at the time sed was designed, the current behavior was chosen.

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources