How to edit file content using zsh terminal? - terminal

I created an empty directory on zsh and added a file
called hello.rb by doing the following:
echo 'Hello, world.' >hello.rb
If I want to make changes in this file using the terminal
what's the proper way of doing it without opening the file
itself using let's say TextEditor?
I want to be able to make changes in the file hello.rb strictly
by using my zsh terminal, is this at all possible?

Zsh is not a terminal but a shell. The terminal is the window in which the shell executes. The shell is the text program prompting you commands and executing them.
If you want to edit the file within the terminal, then using vim, nano, emacs -nw or any other text-mode text editor will do it. They are not Zsh commands, but external commands that you can call from Zsh or from any other shell.
If you want to edit the file within Zsh, then use zed. You will need to run once (in ~/.zshrc)
autoload zed
and then you can edit hello.rb using:
zed hello.rb
(exit and save with Control-j)

You have already created and edited the file.
To edit it again, you can use the >> to append.
For example
echo "\nAnd you too!\n" >> hello.rb
This would edit the file by concatenating the additional string.
Edit, of course, by your use and definition of 'changing' a file, this is the simplest way to do so using the shell.
In a normal way, though you probably want to use a terminal editor.

Zed is a great answer, but to be even more stripped down - for a level of editing that even a script can do - zsh can hand all 256 characters/byte-values (including null) in variables. This means you can edit line by line or chunk by chunk almost any kind of file data directly from the command-line. This is approximately what zed/vared does. If you have a current version with all the standard modules included, it is a great benefit to have zsh/mapfile or zsh/system loaded so that you can capture any of the characters that are left out by command-expansion (zed uses $(<$file) to read a file to memory). Here is an example of a way you could use this variable manipulation method:
% typeset -T Buffer buffer $'\n'
% typeset -T Edit edit $'\n'
It is most common to use newline to divide a text file one wishes to edit.
This handy feature will make zsh give you full access to one line or a range of lines at a time without unintentionally messing with the data.
% zmodload zsh/mapfile
% Buffer=$mapfile[path/to/file]
Here, I use the handy mapfile module because I can load the contents of a file byte-for-byte. Alternately you can use % Buffer="$(<path/to/file)", like zed does, but you will always have the trailing newlines removed and other word splitting is possible with a typo or environment variation, so the simplicity of the module's method is best. When finished, you save the changes by simply assigning the $Buffer value back to the $mapfile[file] or use a more classic command like printf '%s' $Buffer >path/to/file (this is exact string writing, byte-for-byte, so any newlines or formatting you added back will be written).
You transfer the lines between Buffer and Edit using the mapped arrays as follows, however, remember that in its simplest form assigning one array to another drops elements that are completely empty (one \n \n two \n three becomes one \n two \n three). You can suppress this empty-element removal by quoting the input array and adding an '#' symbol to its index "$buffer[#]", if using the whole array; and adding the '#' symbol to the flags if using a range of the array "${(#)buffer[2,50]}". Preserving empty lines can be a bit troublesome for typing, but these multiple arrays should only be used in a script or function, since you can just edit one line at a time from the command line with buffer[54]="echo This is a newly written line."
% edit=($buffer[50,70])
...
% buffer[50,70]=($edit)
This is standard Zsh syntax, that means in the ... area you can edit and manipulate the $edit array of lines or the $Edit scalar block of text all you want, including adding more lines or taking some away. When you add the lines back into $buffer it will replace the specified block of lines (50-70) with the new lines, automatically expanding or reducing the other array elements to accommodate the reintegrated lines. -- Because of the dynamic array accommodations, you can also just insert whatever you need as a new line like this buffer[40]=("new string as new line" "$buffer[40]"). This inserts it before the index given, while swapping the order of the elements ("$buffer[40]" "new string as new line") inserts the new line after the index given. Either will adjust all following elements, including totally empty elements, to their current index plus one.
If you wanted to re-write the zed function to use this method in some complex way like: newzed /path/to/file [start-line] [end-line], that would be great and handy too.
Before I leave, I wanted to mention that using vared directly, once you have these commands typed on the interactive terminal, you may find it frustrating that you can't use "Enter" for inserting or appending new lines. I found that with my terminal and Zsh version using ESC-ENTER worked well, but I don't know about older versions (Mac usually comes stocked with a not-most-recent version, if my memory is right). If that doesn't work, you may have to do some documentation digging to learn how to set up your ZLE (Zsh Line Editor, a component of Zsh) or acquire a newer version of Zsh. Also, some other shells, when indexing a scalar variable may count by the byte because in ascii and C a byte is the same as a character, but Zsh supports UTF8 and will index a scalar string by the UTF8 character unless you turn off the shell option multibyte (on by default). This will help with manipulating each line if you need to use the old byte-character indexing. Also, if you have a version of Zsh that for whatever was not compiled with zsh/mapfile or zsh/system, then you can achieve a similar effect using number of options to the read builtin, like <path/to/file |read -u 0 -k $[5 * 2**20] -r -s Buffer ||(($#Buffer)). As you can see here, you have to make the read length big enough to accommodate the file's size or it will leave off part of the file, and the read return code will nearly always be an error because of not being able to read the full length of the string. We fix this with ||(($#Buffer)), but this builtin was simply not meant to handle large scale byte manipulation efficiently, so what you see is what you can get from it.

Related

Can't seem to use more than one -c argument for tesseract

I'm just using tesseract through bash scripting. I've finally come up with all the settings that recognize my text for my images nearly perfectly; however, I can't seem to use all of the options together. My command is as follows:
$ tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0;tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
I need the whitelist, because tesseract is picking up some lowercase characters, strange characters (such as yen sign), and other oddities. My images do not contain those characters, and since my document is quite simple I figured it would just be easier to whitelist the ones that do exist. Additionally, the image is in a "table" format (without any lines or borders), and tesseract only picks up the large spaces (which separate columns) and not individual spaces in between words within a column. Setting the tosp value to 0 seemed to fix that problem.
Now the issue is that tesseract won't process with both of those -c arguments at the same time, but the man pages explicitly states that you can use multiple -c arguments!
I've also tried to work around in the following way:
my_config_file
tosp_min_sane_kn_sp 0.0
tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
$ tesseract infile.tif outputbase --psm 6 my_config_file
The config file is saved in the correct directory, but again only one of the options will work at a time. If both options are in the config file, it seems like it ignores the tosp_min_sane_kn_sp 0.0. If I remove one, then the other works.
I'm pulling out my hair here, and I'm about to just work around this issue by running the OCR twice and then just merging the two files with an awk script. I really don't want to do that, however, because its obviously less efficient and I don't really like the idea of trying to use awk when the OCR isn't guaranteed to be formatted 100% in the way that I'm going to have to assume in my potential awk script.
Please help!
EDIT:
I forgot to mention that I have indeed tried to pass multiple -c options. Instead of guessing various field separators in between variables semicolon made the most sense to me because I understand that tesseract is written in C++ which uses semicolons to signify the end of a line. I know C++ isn't interpreted, but it just seemed to make sense. Now I'm digressing . . .
Additionally, I've tried the advice of putting the whitelist in quotation marks, but that has made no difference. I was really excited because that didn't even occur to me, but it doesn't seem that tesseract even recognizes quotations even if I run that one -c argument by itself.
You can't pass multiple arguments to a single -c option, especially not separated by semicolons. I don't have tesseract, but I'm pretty sure you need to pass a separate -c option for each config variable you want to set:
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0 -c 'tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\'
(I also enclosed the second variable setting in single-quotes, so the shell doesn't try to interpret the backslash. Without the quotes, it'd escape the newline, so the next line would be treated as a continuation of this one.)
Explanation of the original problem: When the shell sees a semicolon (and it isn't in quotes or escaped), the shell treats it as a command separator. So it treated the line as two completely separate commands (with the next line combined, because of the backslash):
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0
tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/ <whatever's on the next line of the file>
The first runs tesseract with one -c option, and the second one creates a shell variable named tessedit_char_whitelist. And even if you quoted or escaped it, so the semicolon got passed to tesseract, I suspect it wouldn't treat it as a separator the way you want it to.

Weird txt behavior

I have a centos server. I cloned a GitHub repository. And I have .txt file in that repository which contains 1 line. For some reason it does that:
[root#0-0-0-0 Some]# cat some.txt
some text[root#0-0-0-0 Some]#
And also while read i; do echo "$i"; done < some.txt don't see that line. What could cause that? And how to avoid it. If I edit it with vim adding a new line and then deleting that new line (so it still contains only one line) it starts to work properly.
The text file has no newline character at the end of it. Some programs will treat it as a valid text file whose last line doesn't happen to end in a newline. Others (apparently including bash's built-in read command, at least by default) will treat it as invalid, and perhaps ignore the last line (which isn't considered a "line" because it's not marked as one).
vim's default behavior is to quietly add a newline to the end of a file if you modify and save it.
You can add a newline to a file that lacks one by editing it with vim (or another editor that behaves similarly), or by adding it from the shell:
echo '' >> some.txt
In general, it's a good idea to ensure that text files end in a newline character in the first place, at least if they're intended to be used on UNIX-like systems.

parameter expansion using bang dollar (`!$`)

Is there any way to use !$ in a parameter expansion context? The desired usage that motivates this question is rapid (in terms of key strokes) alteration of the name of a file (e.g., instead of saving the file name in a variable and executing rsvg-convert $svg > ${svg/.svg/.png}, one could instead use rsvg-convert $! > $!{/.svg/.png}, where $!{/.svg/.png} is erroneous syntax intimating the desired effect; when the file in question was the last token on the preceding line, such a command can often be typed more quickly than alternatives like using tab completion in the presence of files sharing prefixes of varying length, or copying and pasting the file name by selecting with a mouse). As far as I can tell, there is no way to employ !$ in such a context, but perhaps through some chicanery a similar effect could be achieved.
Depending on how sophisticated you want the substitution, history expansion does support replacing the first occurrence of a string with another. You just precede the substitution with : like:
rsvg-convert !$ > !$:s/.svg/.png
You can see all the history modifiers here
At least in emacs-mode bash will also put the last argument of the previous command inline (not for expansion when you run the command) if you press alt+.. So in this case it might be fastest to type:
rsvg-convert
then alt+.>alt+. then delete the extension it just put in place with alt+bksp then the new extension: png
If you look further into the modifiers in Eric's example, you could also do:
rsvg-convert !$ > !$:r.png
Assuming .svg is a suffix of course

Assign BASH variable from file with specific criteria

A config file that the last line contains data that I want to assign everything to the RIGHT of the = sign into a variable that I can display and call later in the script.
Example: /path/to/magic.conf:
foo
bar
ThisOption=foo.bar.address:location.555
What would be the best method in a bash shell script to read the last line of the file and assign everything to the right of the equal sign? In this case, foo.bar.address:location.555.
The last line always has what I want to target and there will only ever be a single = sign in the file that happens to be the last line.
Google and searching here yielded many close but non-relative results with using sed/awk but I couldn't come up with exactly what I'm looking for.
Use sed:
variable=$(sed -n 's/^ThisOption=//p' /path/to/magic.conf)
echo "The option is: $variable")
This works by finding and removing the ThisOption= marker at the start of the line, and printing the result.
IMPORTANT: This method absolutely requires that the file be trusted 100%. As mentioned in the comments, anytime you "eval" code without any sanitization there are grave risks (a la "rm -rf /" magnitude - don't run that...)
Pure, simple bash. (well...using the tail utility :-) )
The advantage of this method, is that it only requires you to know that it will be the last line of the file, it does not require you to know any information about that line (such as what the variable to the left of the = sign will be - information that you'd need in order to use the sed option)
assignment_line=$(tail -n 1 /path/to/magic.conf)
eval ${assignment_line}
var_name=${assignment_line%%=*}
var_to_give_that_value=${!var_name}
Of course, if the var that you want to have the value is the one that is listed on the left side of the "=" in the file then you can skip the last assignment and just use "${!var_name}" wherever you need it.

jamplus: link command line too long for osx

I'm using jamplus to build a vendor's cross-platform project. On osx, the C tool's command line (fed via clang to ld) is too long.
Response files are the classic answer to command lines that are too long: jamplus states in the manual that one can generate them on the fly.
The example in the manual looks like this:
actions response C++
{
$(C++) ##(-filelist #($(2)))
}
Almost there! If I specifically blow out the C.Link command, like this:
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TC)) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
in my jamfile, I get the command line I need that passes through to the linker, but the response file isn't newline terminated, so link fails (osx ld requires newline-separated entries).
Is there a way to expand a jamplus list joined with newlines? I've tried using the join expansion $(LIST:TCJ=\n) without luck. $(LIST:TCJ=#(\n)) doesn't work either. If I can do this, the generated file would hopefully be correct.
If not, what jamplus code can I use to override the link command for clang, and generate the contents on the fly from a list? I'm looking for the least invasive way of handling this - ideally, modifying/overriding the tool directly, instead of adding new indirect targets wherever a link is required - since it's our vendor's codebase, as little edit as possible is desired.
The syntax you are looking for is:
newLine = "
" ;
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TCJ=$(newLine))) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
To be clear (I'm not sure how StackOverflow will format the above), the newLine variable should be defined by typing:
newLine = "" ;
And then placing the carat between the two quotes and hitting enter. You can use this same technique for certain other characters, i.e.
tab = " " ;
Again, start with newLine = "" and then place carat between the quotes and hit tab. In the above it is actually 4 spaces which is wrong, but hopefully you get the idea. Another useful one to have is:
dollar = "$" ;
The last one is useful as $ is used to specify variables typically, so having a dollar variable is useful when you actually want to specify a dollar literal. For what it is worth, the Jambase I am using (the one that ships with the JamPlus I am using), has this:
SPACE = " " ;
TAB = " " ;
NEWLINE = "
" ;
Around line 28...
I gave up on trying to use escaped newlines and other language-specific characters within string joins. Maybe there's an awesome way to do that, that was too thorny to discover.
Use a multi-step shell command with multiple temp files.
For jamplus (and maybe other jam variants), the section of the actions response {} between the curly braces becomes an inline shell script. And the response file syntax #(<value>) returns a filename that can be assigned within the shell script, with the contents set to <value>.
Thus, code like:
actions response C.Link
{
_RESP1=#($(2:TCJ=#)#$(NEEDLIBS:TCJ=#)#$(LINKLIBS:TCJ=#))
_RESP2=#()
perl -pe "s/[#]/\n/g" < $_RESP1 > $_RESP2
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,$_RESP2
}
creates a pair of temp files, assigned to shell variable names _RESP1 and _RESP2. File at path _RESP1 is assigned the contents of the expanded sequence joined with a # character. Search and replace is done with a perl one liner into _RESP2. And link proceeds as planned, and jamplus cleans up the intermediate files.
I wasn't able to do this with characters like :;\n, but # worked as long as it had no adjacent whitespace. Not completely satisfied, but moving on.

Resources