sed: Argument list too long when running sed -n - bash

I am running this command from Why is my git repository so big? on a very big git repository as https://github.com/python/cpython
git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;
It works fine if I replace tail -n800 by tail -n40:
1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
...
I found this question Bash : sed -n arguments saying I could use awk instead of sed.
Do you know how do fix this sed: Argument list too long when tail is -n800 instead of -n40?

It seems you have used this anwer in the linked question: Some scripts I use:.... There is a telling comment in that answer:
This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – piojo Jul 28 '17 at 7:59
And luckily piojo has written another answer addressing this. Just use his code.

As an alternative, check if git sizer would work on your repository: that would help isolating what takes place in your repository.
If not, you have other commands in "How to find/identify large commits in git history?", which do loop around each objects and avoid the sed -nxx part
The alternative would be to redirect your result/command to a file, then sed on that file, as in here.

Related

Alias in bash_profile executes by itself

I have set up an alias in ~/.bash_profile as follows:
alias lcmt="git show $(git log --oneline | awk '{print $1;}' | head -n 1)"
However, whenever I open a terminal window, I see:
fatal: Not a git repository (or any of the parent directories): .git
I have been able to narrow it down to that particular alias because when I comment it out, there's no error message. Why does it evaluate by itself on OS X? Can I prevent it from doing so?
The $(...) inside a double-quoted expression gets executed at the time of the assignment, the creation of the alias. You can avoid that by escaping the $ of the $(...). And you want to do the same thing for the $1 inside the awk command:
alias lcmt="git show \$(git log --oneline | awk '{print \$1}' | head -n 1)"
Shell functions are better than aliases in a number of ways, including that there's no quoting weirdness like there is with aliases. Defining a shell function to do this is easy:
lcmd() { git show $(git log --oneline | awk '{print $1;}' | head -n 1); }
I'd make two other recommendations, though: put double-quotes around the $( ) expression, and have awk take care of stopping after the first line:
lcmd() { git show "$(git log --oneline | awk '{print $1; exit}')"; }

Prepending branch name to git commit

I've been reading and trying to figure out how to get this to work. I want to prepend the branch name to the commit message so I can just use git commit -m "message" and get a commit named branch message. The closest I got was to use the following code in .git/hooks/commit-msg but I get sed: 1: ".git/COMMIT_EDITMSG": invalid command code . using OSX 10.8.5.
I read it has something to do with OSX sed having different behaviours but I can't find a solution that will work. I probably just don't know enough about OSX/Linux.
ticket=$(git symbolic-ref HEAD | awk -F'/' '{print $3}')
if [ -n "$ticket" ]; then
sed -i "1i $ticket " $1
fi
Yea, OS/X is different. I tested this and it works ok, but maybe has some additional minor tweaks for you to deal with. Note that the -i flag on OS X requires a filename extension to save the backup file under, and to avoid sed insisting that the text used to add with 1i must be escaped with \ followed by another line, I used 1s instead.
ticket=$(git symbolic-ref HEAD | awk -F'/' '{print $3}')
if [ -n "$ticket" ]; then
sed -i '.bak' "1s/^/$ticket /" $1
fi

How to get all revisions in subversion URL (trunk/branch) based on a string in svn comments?

Need some help on shell command to get all revs in subversion trunk URL based on a string in svn comments.
I figured out to get it on one file but not on URL.
I tried svn log URL --stop-on-copy and svn log URL --xml to get the revs but unsuccessful.
Thanks !!
Another way using sed. It's probably not perfect but it also works with multiline comments. Replace SEARCH_STRING for your personal search.
svn log -l100 | sed -n '/^r/{h;d};/SEARCH_STRING/{g;s/^r\([[:digit:]]*\).*/\1/p}'
For Subversion 1.8 it's
svn log URL --search STRING
Try following.
x="refactoring"; svn log --limit 10 | egrep -i --color=none "($x|^r[0-9]+ \|.*lines$)" | egrep -B 1 -i --color=none $x | egrep --color=none "^r[0-9]+ \|.*lines$" | awk '{print $1}' | sed 's/^r//g'
Replace refactoring with search string.
Change svn log parameters to suite your need.
Case insensitive matching is used (egrep -i).
Edit based on comment.
x="ILIES-113493"; svn log | egrep -i --color=none "($x|^r[0-9]+ \|.*lines$)" | egrep -B 1 -i --color=none $x | egrep --color=none "^r[0-9]+ \|.*lines$" | awk '{print $1}' | sed 's/^r//g'
Notes:
x is the variable to contain the search string, and x is used in
two places in the command.
In order to use x as a variable in the shell itself, you need to put entire command on a single line (from x=".."; svn log ... sed '...'). Semicolon ; can be used to separate multiple commands on the same line.
I had used --limit 10 in example to limit the number of log entries,
change that as well as use other svn log parameters to suite your
need. Using --limit 10 will restrict the search to 10 most recent log entries.
Thanks all for the help !! This worked for me:
svn log $URL --stop-on-copy | grep -B 2 $STRING | grep "^r" | cut -d"r" -f2 | cut -d" " -f1
Use "--stop-on-copy" or "--limit" options depending on the requirement.

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

How to reverse lines of a text file?

I'm writing a small shell script that needs to reverse the lines of a text file. Is there a standard filter command to do this sort of thing?
My specific application is that I'm getting a list of Git commit identifiers, and I want to process them in reverse order:
git log --pretty=oneline work...master | grep -v DEBUG: | cut -d' ' -f1 | reverse
The best I've come up with is to implement reverse like this:
... | cat -b | sort -rn | cut -f2-
This uses cat to number every line, then sort to sort them in descending numeric order (which ends up reversing the whole file), then cut to remove the unneeded line number.
The above works for my application, but may fail in the general case because cat -b only numbers nonblank lines.
Is there a better, more general way to do this?
In GNU coreutils, there's tac(1)
There is a command for your purpose:
tail -r file.txt
Prints the lines of file.txt in reverse order!
The -r flag is non-standard, may not work on all systems, works e.g. on macOS.
Beware: Amount of lines limited. Works mostly, but when working with huge files be careful and check.
Answer is not 42 but tac.
Edit: Slower but more memory consuming using sed
sed 'x;1!H;$!d;x'
and even longer
perl -e'print reverse<>'
Similar to the sed example above, using perl - maybe more memorable (depending on how your brain is wired):
perl -e 'print reverse <>'
cat -b only numbers nonblank lines"
If that's the only issue you want to avoid, then why not use "cat -n" to number all the lines?
: "#(#)$Id: reverse.sh,v 1.2 1997/06/02 21:45:00 johnl Exp $"
#
# Reverse the order of the lines in each file
awk ' { printf("%d:%s\n", NR, $0);}' $* |
sort -t: +0nr -1 |
sed 's/^[0-9][0-9]*://'
Works like a charm for me...
In this case, just use --reverse:
$ git log --reverse --pretty=oneline work...master | grep -v DEBUG: | cut -d' ' -f1
rev <name of your text file.txt>
You can even do this:
echo <whatever you want to type>|rev
awk '{a[i++]=$0}END{for(;i-->0;)print a[i]}'
More faster than sed and compatible for embed devices like openwrt.

Resources