I like to have a command-line calculator handy. The requirements are:
Support all the basic arithmetic operators: +, -, /, *, ^ for exponentiation, plus parentheses for grouping.
Require minimal typing, I don't want to have to call a program interact with it then asking it to exit.
Ideally only one character and a space in addition to the expression itself should be entered into the command line.
It should know how to ignore commas and dollar (or other currency symbols)
in numbers to allow me to copy/paste from the web without worrying
about having to clean every number before pasting it into the calculator
Be white-space tolerant, presence or lack of spaces shouldn't cause errors
No need for quoting anything in the expression to protect it from the shell - again for the benefit of minimal typing
Since tcsh supports alias positional arguments, and since alias expansion precedes all other expansions except history-expansion, it was straight forward to implement something close to my ideal in tcsh.
I used this:
alias C 'echo '\''\!*'\'' |tr -d '\'',\042-\047'\'' |bc -l'
Now I can do stuff like the following with minimal typing:
# the basic stuff:
tcsh> C 1+2
3
# dollar signs, multiplication, exponentiation:
tcsh> C $8 * 1.07^10
15.73721085831652257992
# parentheses, mixed spacing, zero power:
tcsh> C ( 2+5 ) / 8 * 2^0
.87500000000000000000
# commas in numbers, no problem here either:
tcsh> C 1,250.21 * 1.5
1875.315
As you can see there's no need to quote anything to make all these work.
Now comes the problem. Trying to do the same in bash, where parameter aliases aren't supported forces me to implement the calculator as a shell function and pass the parameters using "$#"
function C () { echo "$#" | tr -d ', \042-\047' | bc -l; }
This breaks in various ways e.g:
# works:
bash$ C 1+2
3
# works:
bash$ C 1*2
2
# Spaces around '*' lead to file expansion with everything falling apart:
bash$ C 1 * 2
(standard_in) 1: syntax error
(standard_in) 1: illegal character: P
(standard_in) 1: illegal character: S
(standard_in) 1: syntax error
...
# Non-leading parentheses seem to work:
bash$ C 2*(2+1)
6
# but leading-parentheses don't:
bash$ C (2+1)*2
bash: syntax error near unexpected token `2+1'
Of course, adding quotes around the expression solves these issues, but is against the original requirements.
I understand why things break in bash. I'm not looking for explanations. Rather, I'm looking for a solution which doesn't require manually quoting the arguments. My question to bash wizards is is there any way to make bash support the handy minimal typing calculator alias. Not requiring quoting, like tcsh does? Is this impossible? Thanks!
If you're prepared to type C Enter instead of C Space, the sky's the limit. The C command can take input in whatever form you desire, unrelated to the shell syntax.
C () {
local line
read -p "Arithmetic: " -e line
echo "$line" | tr -d \"-\', | bc -l
}
In zsh:
function C {
local line=
vared -p "Arithmetic: " line
echo $line | tr -d \"-\', | bc -l
}
In zsh, you can turn off globbing for the arguments of a specific command with the noglob modifier. It is commonly hidden in an alias. This prevents *^() from begin interpreted literally, but not quotes or $.
quickie_arithmetic () {
echo "$*" | tr -d \"-\', | bc -l
}
alias C='noglob quickie_arithmetic'
At least preventing the expansion of * is possible using 'set -f' (following someone's blog post:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | bc -l; set +f; };
Turning it off in the alias, before the calculation, and back on afterwards
$ C 2 * 3
6
I downloaded the bash sources and looked very closely. It seems the parenthesis error occurs directly during the parsing of the command line, before any command is run or alias is expanded. And without any flag to turn it off.
So it would be impossible to do it from a bash script.
This means, it is time to bring the heavy weapons. Before parsing the command line is read from stdin using readline. Therefore, if we intercept the call to readline, we can do whatever we want with the command line.
Unfortunately bash is statically linked against readline, so the call cannot be intercepted directly. But at least readline is a global symbol, so we can get the address of the function using dlsym, and with that address we can insert arbitrary instructions in readline.
Modifying readline directly is prune to errors, if readline is changed between the different bash version, so we modify the function calling readline, leading to following plan:
Locate readline with dlsym
Replace readline with our own function that uses the current stack to locate the function calling readline (yy_readline_get) on its first call and then restores the original readline
Modify yy_readline_get to call our wrapper function
Within the wrapper function: Replace the parentheses with non problematic symbols, if the input starts with "C "
Written in C for amd64, we get:
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#ifndef __USE_GNU
#define __USE_GNU
#endif
#ifndef __USE_MISC
#define __USE_MISC
#endif
#include <dlfcn.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
//-----------Assembler helpers----------
#if (defined(x86_64) || defined(__x86_64__))
//assembler instructions to read rdp, which we need to read the stack
#define MOV_EBP_OUT "mov %%rbp, %0"
//size of a call instruction
#define RELATIVE_CALL_INSTRUCTION_SIZE 5
#define IS64BIT (1)
/*
To replace a function with a new one, we use the push-ret trick, pushing the destination address on the stack and let ret jump "back" to it
This has the advantage that we can set an additional return address in the same way, if the jump goes to a function
This struct corresponds to the following assembler fragment:
68 ???? push <low_dword (address)>
C7442404 ???? mov DWORD PTR [rsp+4], <high_dword (address) )
C3 ret
*/
typedef struct __attribute__((__packed__)) LongJump {
char push; unsigned int destinationLow;
unsigned int mov_dword_ptr_rsp4; unsigned int destinationHigh;
char ret;
// char nopFiller[16];
} LongJump;
void makeLongJump(void* destination, LongJump* res) {
res->push = 0x68;
res->destinationLow = (uintptr_t)destination & 0xFFFFFFFF;
res->mov_dword_ptr_rsp4 = 0x042444C7;
res->destinationHigh = ((uintptr_t)(destination) >> 32) & 0xFFFFFFFF;
res->ret = 0xC3;
}
//Macros to save and restore the rdi register, which is used to pass an address to readline (standard amd64 calling convention)
typedef unsigned long SavedParameter;
#define SAVE_PARAMETERS SavedParameter savedParameters; __asm__("mov %%rdi, %0": "=r"(savedParameters));
#define RESTORE_PARAMETERS __asm__("mov %0, %%rdi": : "r"(savedParameters));
#else
#error only implmented for amd64...
#endif
//Simulates the effect of the POP instructions, popping from a passed "stack pointer" and returning the popped value
static void * pop(void** stack){
void* temp = *(void**)(*stack);
*stack += sizeof(void*);
return temp;
}
//Disables the write protection of an address, so we can override it
static int unprotect(void * POINTER){
const int PAGESIZE = sysconf(_SC_PAGE_SIZE);;
if (mprotect((void*)(((uintptr_t)POINTER & ~(PAGESIZE-1))), PAGESIZE, PROT_READ | PROT_WRITE | PROT_EXEC)) {
fprintf(stderr, "Failed to set permission on %p\n", POINTER);
return 1;
}
return 0;
}
//Debug stuff
static void fprintfhex(FILE* f, void * hash, int len) {
for (int i=0;i<len;i++) {
if ((uintptr_t)hash % 8 == 0 && (uintptr_t)i % 8 == 0 && i ) fprintf(f, " ");
fprintf(f, "%.2x", ((unsigned char*)(hash))[i]);
}
fprintf(f, "\n");
}
//---------------------------------------
//Address of the original readline function
static char* (*real_readline)(const char*)=0;
//The wrapper around readline we want to inject.
//It replaces () with [], if the command line starts with "C "
static char* readline_wrapper(const char* prompt){
if (!real_readline) return 0;
char* result = real_readline(prompt);
char* temp = result; while (*temp == ' ') temp++;
if (temp[0] == 'C' && temp[1] == ' ')
for (int len = strlen(temp), i=0;i<len;i++)
if (temp[i] == '(') temp[i] = '[';
else if (temp[i] == ')') temp[i] = ']';
return result;
}
//Backup of the changed readline part
static unsigned char oldreadline[2*sizeof(LongJump)] = {0x90};
//A wrapper around the readline wrapper, needed on amd64 (see below)
static LongJump* readline_wrapper_wrapper = 0;
static void readline_initwrapper(){
SAVE_PARAMETERS
if (readline_wrapper_wrapper) { fprintf(stderr, "ERROR!\n"); return; }
//restore readline
memcpy(real_readline, oldreadline, 2*sizeof(LongJump));
//find call in yy_readline_get
void * frame;
__asm__(MOV_EBP_OUT: "=r"(frame)); //current stackframe
pop(&frame); //pop current stackframe (??)
void * returnToFrame = frame;
if (pop(&frame) != real_readline) {
//now points to current return address
fprintf(stderr, "Got %p instead of %p=readline, when searching caller\n", frame, real_readline);
return;
}
void * caller = pop(&frame); //now points to the instruction following the call to readline
caller -= RELATIVE_CALL_INSTRUCTION_SIZE; //now points to the call instruction
//fprintf(stderr, "CALLER: %p\n", caller);
//caller should point to 0x00000000004229e1 <+145>: e8 4a e3 06 00 call 0x490d30 <readline>
if (*(unsigned char*)caller != 0xE8) { fprintf(stderr, "Expected CALL, got: "); fprintfhex(stderr, caller, 16); return; }
if (unprotect(caller)) return;
//We can now override caller to call an arbitrary function instead of readline.
//However, the CALL instruction accepts only a 32 parameter, so the called function has to be in the same 32-bit address space
//Solution: Allocate memory at an address close to that CALL instruction and put a long jump to our real function there
void * hint = caller;
readline_wrapper_wrapper = 0;
do {
if (readline_wrapper_wrapper) munmap(readline_wrapper_wrapper, 2*sizeof(LongJump));
readline_wrapper_wrapper = mmap(hint, 2*sizeof(LongJump), PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (readline_wrapper_wrapper == MAP_FAILED) { fprintf(stderr, "mmap failed: %i\n", errno); return; }
hint += 0x100000;
} while ( IS64BIT && ( (uintptr_t)readline_wrapper_wrapper >= 0xFFFFFFFF + ((uintptr_t) caller) ) ); //repeat until we get an address really close to caller
//fprintf(stderr, "X:%p\n", readline_wrapper_wrapper);
makeLongJump(readline_wrapper, readline_wrapper_wrapper); //Write the long jump in the newly allocated space
//fprintfhex(stderr, readline_wrapper_wrapper, 16);
//fprintfhex(stderr, caller, 16);
//patch caller to become call <readline_wrapper_wrapper>
//called address is relative to address of CALL instruction
*(uint32_t*)(caller+1) = (uint32_t) ((uintptr_t)readline_wrapper_wrapper - (uintptr_t)(caller + RELATIVE_CALL_INSTRUCTION_SIZE) );
//fprintfhex(stderr, caller, 16);
*(void**)(returnToFrame) = readline_wrapper_wrapper; //change stack to jump to wrapper instead real_readline (or it would not work on the first entered command)
RESTORE_PARAMETERS
}
static void _calc_init(void) __attribute__ ((constructor));
static void _calc_init(void){
if (!real_readline) {
//Find readline
real_readline = (char* (*)(const char*)) dlsym(RTLD_DEFAULT, "readline");
if (!real_readline) return;
//fprintf(stdout, "loaded %p\n", real_readline);
//fprintf(stdout, " => %x\n", * ((int*) real_readline));
if (unprotect(real_readline)) { fprintf(stderr, "Failed to unprotect readline\n"); return; }
memcpy(oldreadline, real_readline, 2*sizeof(LongJump)); //backup readline's instructions
//Replace readline with readline_initwrapper
makeLongJump(real_readline, (LongJump*)real_readline); //add a push/ret long jump from readline to readline, to have readline's address on the stack in readline_initwrapper
makeLongJump(readline_initwrapper, (LongJump*)((char*)real_readline + sizeof(LongJump) - 1)); //add a push/ret long jump from readline to readline_initwrapper, overriding the previous RET
}
}
This can be compiled to an intercepting library with:
gcc -g -std=c99 -shared -fPIC -o calc.so -ldl calc.c
and then loaded in bash with:
gdb --batch-silent -ex "attach $BASHPID" -ex 'print dlopen("calc.so", 0x101)'
Now, when the previous alias extended with parenthesis replacement is loaded:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | tr [ '(' | tr ] ')' | bc -l; set +f; };
We can write:
$ C 1 * 2
2
$ C 2*(2+1)
6
$ C (2+1)*2
6
Even better it becomes, if we switch from bc to qalculate:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | tr [ '(' | tr ] ')' | xargs qalc ; set +f; };
Then we can do:
$ C e ^ (i * pi)
e^(i * pi) = -1
$ C 3 c
3 * speed_of_light = approx. 899.37737(km / ms)
Related
Is it possible to use a variable as the count in a "memory read" lldb command?
A minimal example: With a breakpoint at the return statement of the following C program
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[]) {
char *str = "Hello";
size_t len = strlen(str);
return 0; // <-- Breakpoint here
}
I can dump the contents of the string variable with
(lldb) memory read --count 5 str
0x100000fae: 48 65 6c 6c 6f Hello
but not with
(lldb) memory read --count len str
error: invalid uint64_t string value: 'len'
How can I use the value of the len variable as the count of the "memory read" command?
lldb's command line doesn't have much syntax, but one useful bit that it does have is that if you surround an argument or option value in backticks, the string inside the backticks gets passed to the expression parser, and the result of the expression evaluation gets substituted for the backtick value before being passed to the command. So you want to do:
(lldb) memory read --count `len` str
I'm trying to connect simple flex and bison code that would just recognize a character for now. Yet I'm facing this error. I've read through a lot of answers to figure out what is wrong but am lost. Any help would be highly appreciated as I'm just starting out to explore this and could not find a lot of resources for it.
This is my .l file
%{
#include <stdlib.h>
#include <stdio.h>
#include "MiniJSC.tab.h"
void yyerror (char *s);
int yylex();
%}
%%
[0-9]+ { yylval.num = atoi(yytext); return T_INT_VAL; }
%%
int yywrap (void) {return 1;}
my .y file
%{
void yyerror (char *s);
int yylex();
#include <stdio.h> /* C declarations used in actions */
#include <stdlib.h>
%}
%union {int num; char id;} /* Yacc definitions */
%start line
%token print
%token T_INT_VAL
%type <num> line
%type <num> term
%type <num> T_INT_VAL
%%
/* descriptions of expected inputs corresponding actions (in C) */
line : print term ';' {printf("Printing %d\n", $2);}
;
term : T_INT_VAL {$$ = $1;}
;
%% /* C code */
void yyerror (char *s) {
fprintf (stderr, "%s\n", s);
}
int main (void) {
return yyparse ( );
}
The compilation and output:
$ bison MiniJSC.y -d
$ lex MiniJSC.l
$ gcc lex.yy.c MiniJSC.tab.c
$ ./a.out
10
syntax error
$
line : print term ';'
According to this, a valid line contains a print token followed by a term. Since a term must be a T_INT_VAL token, that means a valid line is a print token followed by a T_INT_VAL token.
Your input consists only of a T_INT_VAL token, so it is not a valid line and that's why you get a syntax error.
Also note that your lexer never produces a print token, so even if you entered print 10 as the input, it'd be an error since the lexer isn't going to recognize print as a token. So you should add a pattern for that as well.
You should also rename print to match your naming convention for tokens (i.e. ALL_CAPS).
I have a text file and want to remove the first line (header), to read the file without header into a pipeline. This seems like a trivial question that has been answered many times, but due to the size of the files I'm facing, the solutions i found so far were not working. For my test runs i used echo "$(tail -n +2 "$FILE_NAME")" > "$FILE_NAME", but running this with my a bigger file results in the following error: bash: xrealloc: cannot allocate 18446744071562067968 bytes (1679360 bytes allocated) Is there any method that edits the file in place? Loading them into the memory wont work, some of my files are up to 400 Gb in size.
Thanks for the help!
You can use code like this:
awk 'NR!=1 {print}' input_file >output file
This will send to output file all but first line. You can use this construction to do your operations:
awk 'NR!=1 {print}' input_file|operation1|operation2...
Changing your command on this way can do the work:
tail -n +2 "$FILE_NAME" > "${FILE_NAME}.new"
This will need double diskspace
Tail is reasonably efficient for this operation.
The issue is with you wanting to overwrite the original file.
Using bash "$()" to defer the creation of the output file means bash has to hold the content in memory, hence the error message. For large files you would be better off writing the output to a temporary file, then use mv to move that over the original.
When sed is used in overwrite mode it does exactly this (for anything over a few lines).
sed -i 1d "$FILE_NAME"
It runs sed with the verysimple script 1d which picks the first line (the 1 selector) and deletes it (the d command). Thanks to the in-place option -i your file will be overwritten without using an intermediate file.
Even though you do not bother with an intermediate file, sed uses his own intermediate file internally. Your disk usage will suffer up to twice the file size during this operation.
I'm just going to address the "edit the file in place" portion of the question, although it appears that was not really what you were looking for. You will find many solutions describing features that claim to do in-place editing, but usually those solutions don't actually edit the file at all. Instead, they write to a temporary file and then overwrite the original with the temporary file. (eg, sed --in-place is a common solution which writes to a temporary file). Editing the file in place is something that you almost never actually want to do, since mutating a file is dangerous. Truly, if you believe you want to edit a file in place, give it serious thought and assume that you are wrong. However, if for some reason you really do need to do it, it's probably safest to just do it:
#include <err.h>
#include <stdio.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
FILE * xfopen(const char *path, const char *mode);
int is_regular(int, const char *);
int
main(int argc, char **argv)
{
const char *rpath = argc > 1 ? argv[1] : "stdin";
const char *wpath = argc > 1 ? argv[1] : "stdout";
FILE *fr = argc > 1 ? xfopen(rpath, "r") : stdin;
FILE *fw = argc > 1 ? xfopen(wpath, "r+") : stdout;
char buf[BUFSIZ];
int c;
size_t rc;
off_t length = 0;
/* Discard the first line */
while( (c = getc(fr)) != EOF && c != '\n' ) {
;
}
if( c != EOF) while( (rc = fread(buf, 1, BUFSIZ, fr)) > 0) {
size_t wc;
wc = fwrite(buf, 1, rc, fw);
length += wc;
if( wc!= rc) {
break;
}
}
if( fclose(fr) ) {
err(EXIT_FAILURE, "%s", rpath);
}
if( is_regular(fileno(fw), wpath) && ftruncate(fileno(fw), length)) {
err(EXIT_FAILURE, "%s", wpath);
}
if( fclose(fw)) {
err(EXIT_FAILURE, "%s", wpath);
}
return EXIT_SUCCESS;
}
FILE *
xfopen(const char *path, const char *mode)
{
FILE *fp = fopen(path, mode);
if( fp == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return fp;
}
int
is_regular(int fd, const char *name)
{
struct stat s;
if( fstat(fd, &s) == -1 ) {
perror(name);
exit(EXIT_FAILURE);
}
return !!(s.st_mode & S_IFREG);
}
By being explicit, it's pretty clear that you can easily lose data in the file. But if you want to avoid reading the entire file into memory, or avoid having two copies on some backing media at the same time, there's no way to avoid doing that and any solution which obscures that risk is fooling you. So making it explicit and knowing where the dangers lie is the right thing to do.
We can use the -i (in-place) option with sed to write the change back to the input file instead of printing the result to stdout:
sed -i '1d' FILE
I would like to parse long options in a shell script. POSIX only provides getopts to parse single letter options. Does anyone know of a portable (POSIX) way to implement long option parsing in the shell? I've looked at what autoconf does when generating configure scripts, but the result is far from elegant. I can live with accepting only the full spellings of long options. Single letter options should still be allowed, possibly in groups.
I'm thinking of a shell function taking a space separated list of args of the form option[=flags], where the flags indicate that the option takes an arg or can be specified multiple times. Unlike its C counterpart there is no need to distinguish between strings, integers and floats.
Design notes towards a portable shell getopt_long command
I have a program getoptx which works with single-letter options (hence it is not the answer to your problem), but it handles arguments with spaces correctly, which the original getopt command (as opposed to the shell built-in getopts) does not. The specification in the source code says:
/*
** Usage: eval set -- $(getoptx abc: "$#")
** eval set -- $(getoptx abc: -a -c 'a b c' -b abc 'd e f')
** The positional parameters are:
** $1 = "-a"
** $2 = "-c"
** $3 = "a b c"
** $4 = "-b"
** $5 = "--"
** $6 = "abc"
** $7 = "d e f"
**
** The advantage of this over the standard getopt program is that it handles
** spaces and other metacharacters (including single quotes) in the option
** values and other arguments. The standard code does not! The downside is
** the non-standard invocation syntax compared with:
**
** set -- $(getopt abc: "$#")
*/
I recommend the eval set -- $(getopt_long "$optspec" "$#") notation for your getopt_long.
One major issue with getopt_long is the complexity of the argument specification — the $optspec in the example.
You may want to look at the notation used in the Solaris CLIP (Command Line Interface Paradigm) for the notation; it uses a single string (like the original getopt() function) to describe the options. (Google: 'solaris clip command line interface paradigm'; using just 'solaris clip' gets you to video clips.)
This material is a partial example derived from Sun's getopt_clip():
/*
Example 2: Check Options and Arguments.
The following example parses a set of command line options and prints
messages to standard output for each option and argument that it
encounters.
This example can be expanded to be CLIP-compliant by substituting the
long string for the optstring argument:
While not encouraged by the CLIP specification, multiple long-option
aliases can also be assigned as shown in the following example:
:a(ascii)b(binary):(in-file)(input)o:(outfile)(output)V(version)?(help)
*/
static const char *arg0 = 0;
static void print_help(void)
{
printf("Usage: %s [-a][-b][-V][-?][-f file][-o file][path ...]\n", arg0);
printf("Usage: %s [-ascii][-binary][-version][-in-file file][-out-file file][path ...]\n", arg0);
exit(0);
}
static const char optstr[] =
":a(ascii)b(binary)f:(in-file)o:(out-file)V(version)?(help)";
int main(int argc, char **argv)
{
int c;
char *filename;
arg0 = argv[0];
while ((c = getopt_clip(argc, argv, optstr)) != -1)
{
...
}
...
}
I'm looking to pipe some String input to a small C program in Windows's command prompt. In bash I could use
$ printf "AAAAA\x86\x08\x04\xed" | ./program
Essentially, I need something to escape those hexadecimal numbers in command prompt.
Is there an equivalent or similar command for printf in command prompt/powershell?
Thanks
In PowerShell, you would do it this way:
"AAAAA{0}{1}{2}{3}" -f 0x86,0x08,0x04,0xed | ./program
I recently came up with the same question myself and decided that for someone developing Windows exploits it is worth installing cygwin :)
Otherwise one could build a small C program mimicking printf's functionality:
#include <string.h>
int main(int argc, char *argv[])
{
int i;
char tmp[3];
tmp[2] = '\0';
if (argc > 1) {
for (i = 2; i < strlen(argv[1]); i += 4) {
strncpy(tmp, argv[1]+i, 2);
printf("%c", (char)strtol(tmp, NULL, 16));
}
}
else {
printf("USAGE: printf.exe SHELLCODE\n");
return 1;
}
return 0;
}
The program only handles "\xAB\xCD" strings, but it shouldn't be difficult to extend it to handle "AAAAA\xAB\xCD" strings if one needs it.