jamplus: link command line too long for osx - macos

I'm using jamplus to build a vendor's cross-platform project. On osx, the C tool's command line (fed via clang to ld) is too long.
Response files are the classic answer to command lines that are too long: jamplus states in the manual that one can generate them on the fly.
The example in the manual looks like this:
actions response C++
{
$(C++) ##(-filelist #($(2)))
}
Almost there! If I specifically blow out the C.Link command, like this:
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TC)) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
in my jamfile, I get the command line I need that passes through to the linker, but the response file isn't newline terminated, so link fails (osx ld requires newline-separated entries).
Is there a way to expand a jamplus list joined with newlines? I've tried using the join expansion $(LIST:TCJ=\n) without luck. $(LIST:TCJ=#(\n)) doesn't work either. If I can do this, the generated file would hopefully be correct.
If not, what jamplus code can I use to override the link command for clang, and generate the contents on the fly from a list? I'm looking for the least invasive way of handling this - ideally, modifying/overriding the tool directly, instead of adding new indirect targets wherever a link is required - since it's our vendor's codebase, as little edit as possible is desired.

The syntax you are looking for is:
newLine = "
" ;
actions response C.Link
{
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,#($(2:TCJ=$(newLine))) $(NEEDLIBS:TC) $(LINKLIBS:TC))
}
To be clear (I'm not sure how StackOverflow will format the above), the newLine variable should be defined by typing:
newLine = "" ;
And then placing the carat between the two quotes and hitting enter. You can use this same technique for certain other characters, i.e.
tab = " " ;
Again, start with newLine = "" and then place carat between the quotes and hit tab. In the above it is actually 4 spaces which is wrong, but hopefully you get the idea. Another useful one to have is:
dollar = "$" ;
The last one is useful as $ is used to specify variables typically, so having a dollar variable is useful when you actually want to specify a dollar literal. For what it is worth, the Jambase I am using (the one that ships with the JamPlus I am using), has this:
SPACE = " " ;
TAB = " " ;
NEWLINE = "
" ;
Around line 28...

I gave up on trying to use escaped newlines and other language-specific characters within string joins. Maybe there's an awesome way to do that, that was too thorny to discover.
Use a multi-step shell command with multiple temp files.
For jamplus (and maybe other jam variants), the section of the actions response {} between the curly braces becomes an inline shell script. And the response file syntax #(<value>) returns a filename that can be assigned within the shell script, with the contents set to <value>.
Thus, code like:
actions response C.Link
{
_RESP1=#($(2:TCJ=#)#$(NEEDLIBS:TCJ=#)#$(LINKLIBS:TCJ=#))
_RESP2=#()
perl -pe "s/[#]/\n/g" < $_RESP1 > $_RESP2
"$(C.LINK)" $(LINKFLAGS) -o $(<[1]:C) -Wl,-filelist,$_RESP2
}
creates a pair of temp files, assigned to shell variable names _RESP1 and _RESP2. File at path _RESP1 is assigned the contents of the expanded sequence joined with a # character. Search and replace is done with a perl one liner into _RESP2. And link proceeds as planned, and jamplus cleans up the intermediate files.
I wasn't able to do this with characters like :;\n, but # worked as long as it had no adjacent whitespace. Not completely satisfied, but moving on.

Related

Can't seem to use more than one -c argument for tesseract

I'm just using tesseract through bash scripting. I've finally come up with all the settings that recognize my text for my images nearly perfectly; however, I can't seem to use all of the options together. My command is as follows:
$ tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0;tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
I need the whitelist, because tesseract is picking up some lowercase characters, strange characters (such as yen sign), and other oddities. My images do not contain those characters, and since my document is quite simple I figured it would just be easier to whitelist the ones that do exist. Additionally, the image is in a "table" format (without any lines or borders), and tesseract only picks up the large spaces (which separate columns) and not individual spaces in between words within a column. Setting the tosp value to 0 seemed to fix that problem.
Now the issue is that tesseract won't process with both of those -c arguments at the same time, but the man pages explicitly states that you can use multiple -c arguments!
I've also tried to work around in the following way:
my_config_file
tosp_min_sane_kn_sp 0.0
tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
$ tesseract infile.tif outputbase --psm 6 my_config_file
The config file is saved in the correct directory, but again only one of the options will work at a time. If both options are in the config file, it seems like it ignores the tosp_min_sane_kn_sp 0.0. If I remove one, then the other works.
I'm pulling out my hair here, and I'm about to just work around this issue by running the OCR twice and then just merging the two files with an awk script. I really don't want to do that, however, because its obviously less efficient and I don't really like the idea of trying to use awk when the OCR isn't guaranteed to be formatted 100% in the way that I'm going to have to assume in my potential awk script.
Please help!
EDIT:
I forgot to mention that I have indeed tried to pass multiple -c options. Instead of guessing various field separators in between variables semicolon made the most sense to me because I understand that tesseract is written in C++ which uses semicolons to signify the end of a line. I know C++ isn't interpreted, but it just seemed to make sense. Now I'm digressing . . .
Additionally, I've tried the advice of putting the whitelist in quotation marks, but that has made no difference. I was really excited because that didn't even occur to me, but it doesn't seem that tesseract even recognizes quotations even if I run that one -c argument by itself.
You can't pass multiple arguments to a single -c option, especially not separated by semicolons. I don't have tesseract, but I'm pretty sure you need to pass a separate -c option for each config variable you want to set:
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0 -c 'tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\'
(I also enclosed the second variable setting in single-quotes, so the shell doesn't try to interpret the backslash. Without the quotes, it'd escape the newline, so the next line would be treated as a continuation of this one.)
Explanation of the original problem: When the shell sees a semicolon (and it isn't in quotes or escaped), the shell treats it as a command separator. So it treated the line as two completely separate commands (with the next line combined, because of the backslash):
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0
tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/ <whatever's on the next line of the file>
The first runs tesseract with one -c option, and the second one creates a shell variable named tessedit_char_whitelist. And even if you quoted or escaped it, so the semicolon got passed to tesseract, I suspect it wouldn't treat it as a separator the way you want it to.

change file extension with a ruby script

I'm trying to change the exstension of a file passing the arguments by console
system = "rename" , "'s/\#{ARGV[0]}$/\#{ARGV[1]}'", "*#{ARGV[1]}"
The code is correct because it works on console but when I put it in a script I have trouble with
s/\#
because it appears in pink and the console does not get it.
you don't want to send literal single quotes, so remove them.
you want to remove the backslashes so you let Ruby evaluate those expressions.
you're missing the trailing slash.
what's that equal sign doing?
did you want ARGV[0] in the last argument to rename, instead of ARGV[1]?
you want to use * wildcard, which requires a shell to expand into a list of files, which means you can't use the list form of system
Try
system "/usr/bin/rename -n 's/#{ARGV[0]}$/#{ARGV[1]}/' *#{ARGV[0]}"
Remove the -n option if it looks like you're going to rename the way you want.
And, of course, you don't need to call out to the shell for this:
Dir.glob("*#{ARGV[0]}").each {|fname|
newname = fname.sub(/#{ARGV[0]}$/, ARGV[1])
File.rename(fname, newname)
}

Bash script " & " symbol creating an issues

I have a tag
<string name="currencysym">$</string>
in string.xml And what to change the $ symbol dynamically. Use the below command but didn't work:
currencysym=₹
sed -i '' 's|<string name="currencysym">\(.*\)<\/string>|<string name="currencysym">'"<!\[CDATA\[${currencysym}\]\]>"'<\/string>|g'
Getting OUTPUT:
<string name="currencysym"><![CDATA[<string name="currencysym">$</string>#x20B9;]]></string>
" & " Has Removed...
But I need:
<string name="currencysym"><![CDATA[₹]]></string>
using xml-parser/tool to handle xml is first choice
instead of <..>\(.*\)<..> better use <..>\([^<]*\)<..> in case you have many tags in one line
& in replacement has special meaning, it indicates the whole match (\0) of the pattern. That's why you see <...>..</..> came to your output. If you want it to be literal, you should escape it -> \&
First problem is the line
currencysym=₹
This actually reads as "assign empty to currencysym and start no process in the background":
In bash you can set an environment variable (or variables) just or one run of a process by doing VAR=value command. This is how currencysym= is being interpreted.
The & symbol means start process in the background, except there is no command specified, so nothing happens.
Everything after # is interpreted as a comment, so #x20B9; is just whitespace from Bash's point of view.
Also, ; is a command separator, like &, which means "run in foreground". It is not used here because it is commented out by #.
You have to either escape &, # and ;, or just put your string into single quotes: currencysym=\&\#x20B9\; or currencysym='₹'.
Now on top of that, & has a special meaning in sed, so you will need to escape it before using it in the sed command. You can do this directly in the definition like currencysym=\\\&\#x20B9\; or currencysym='\₹', or you can do it in your call to sed using builtin bash functionality. Instead of accessing ${currencysym}, reference ${currencysym/&/\&}.
You should use double-quotes around variables in your sed command to ensure that your environment variables are expanded, but you should not double-quote exclamation marks without escaping them.
Finally, you do not need to capture the original currency symbol since you are going to replace it. You should make your pattern more specific though since the * quantifier is greedy and will go to the last closing tag on the line if there is more than one:
sed 's|<string name="currencysym">[^<]*</string>|<string name="currencysym"><![CDATA['"${currencysym/&/\&}"']]></string>|' test.xml
Yields
<string name="currencysym"><![CDATA[₹]]></string>
EDIT
As #fedorqui points out, you can use this example to show off correct use of capture groups. You could capture the parts that you want to repeat exactly (the tags), and place them back into the output as-is:
sed 's|\(<string name="currencysym">\)[^<]*\(</string>\)|\1<![CDATA['"${currencysym/&/\&}"']]>\2|' test.xml
sed -i '' 's|\(<string name="currencysym">\)[^<]*<|\1<![CDATA[\₹]]><|g' YourFile
the group you keep in buffer is the wrong one in your code, i keep first part not the &
grouping a .* is not the same as all untl first < you need. especially with the goption meaning several occurence could occur and in this case everithing between first string name and last is the middle part (your group).
carrefull with & alone (not escaped) that mean 'whole search pattern find' in replacement part

Printf splits a string at spaces using Bash [duplicate]

This question already has answers here:
Why a variable assignment replaces tabs with spaces
(2 answers)
Closed 7 years ago.
I'm having some troubles with the printf function in bash.
I wrote a little script on which I pass a name and two letters (such as "sh", "py", "ht") and it creates a file in the current working directory named "name.extension".
For instance, if I execute seed test py a file named test.py is created in the current working dir with the shebang #!/usr/bin/python3.
So far, so good, nothing fancy: I'm learning shell scripting and I thought this could be a simple exercise to test the knowledge gained so far.
The problem is when I want to create an HTML file. This is the function that I use:
creaHtml(){
head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
percorso=$CARTELLA_CORRENTE/$NOME_FILE.html
printf $head>>$percorso
chmod 755 $percorso
}
If I run, for instance, seed test ht the correct function (creaHtml) is called, test.html is created but if I try to look into it I only see:
<!--DOCTYPE
And nothing else.
This is the trace for that function:
[sviluppo:~/bin]$ seed test ht
+ creaHtml
+ head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ percorso=/home/sviluppo/bin/test.html
+ printf '<!--DOCTYPE' 'html-->\n<html>\n\t<head>\n\t\t<meta' 'charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ chmod 755 /home/sviluppo/bin/test.html
+ set +x
However, if I try to run printf '<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>' from the terminal, I see the correct output: the "skeleton" of an HTML file neatly displayed with indentation and everything. What am I missing here?
Try echo -e instead of printf. printf is for printing formatted strings. Since you didn't protect $head with quotes, bash splits the string to form the command. The first word (before first white space) forms the format string. The rest are just arguments for things you didn't specify to print.
echo -e "$head" > "$percorso"
The -e evaluates your \n into newlines. I changed your >> to > since it looks like you want this to be the whole file, rather than append to any existing file you might have.
You have to be careful with quotes in bash. One thing can become many things. This actually makes it more powerful, but it can be confusing for people learning. Notice that I also put the file name "$percorso" in double quotes too. This evaluates the variable and makes sure that it ends up as one thing. If you use single quotes, it will be one word, but not evaluated. Unlike Python, there is a big difference between single and double quotes.
If you want to use printf for compatibility as #chepner pointed out, just be sure to quote it:
printf "$head" > "$percorso"
Actually that is much simpler anyway.

Assign BASH variable from file with specific criteria

A config file that the last line contains data that I want to assign everything to the RIGHT of the = sign into a variable that I can display and call later in the script.
Example: /path/to/magic.conf:
foo
bar
ThisOption=foo.bar.address:location.555
What would be the best method in a bash shell script to read the last line of the file and assign everything to the right of the equal sign? In this case, foo.bar.address:location.555.
The last line always has what I want to target and there will only ever be a single = sign in the file that happens to be the last line.
Google and searching here yielded many close but non-relative results with using sed/awk but I couldn't come up with exactly what I'm looking for.
Use sed:
variable=$(sed -n 's/^ThisOption=//p' /path/to/magic.conf)
echo "The option is: $variable")
This works by finding and removing the ThisOption= marker at the start of the line, and printing the result.
IMPORTANT: This method absolutely requires that the file be trusted 100%. As mentioned in the comments, anytime you "eval" code without any sanitization there are grave risks (a la "rm -rf /" magnitude - don't run that...)
Pure, simple bash. (well...using the tail utility :-) )
The advantage of this method, is that it only requires you to know that it will be the last line of the file, it does not require you to know any information about that line (such as what the variable to the left of the = sign will be - information that you'd need in order to use the sed option)
assignment_line=$(tail -n 1 /path/to/magic.conf)
eval ${assignment_line}
var_name=${assignment_line%%=*}
var_to_give_that_value=${!var_name}
Of course, if the var that you want to have the value is the one that is listed on the left side of the "=" in the file then you can skip the last assignment and just use "${!var_name}" wherever you need it.

Resources