How to reverse lines of a text file? - shell

I'm writing a small shell script that needs to reverse the lines of a text file. Is there a standard filter command to do this sort of thing?
My specific application is that I'm getting a list of Git commit identifiers, and I want to process them in reverse order:
git log --pretty=oneline work...master | grep -v DEBUG: | cut -d' ' -f1 | reverse
The best I've come up with is to implement reverse like this:
... | cat -b | sort -rn | cut -f2-
This uses cat to number every line, then sort to sort them in descending numeric order (which ends up reversing the whole file), then cut to remove the unneeded line number.
The above works for my application, but may fail in the general case because cat -b only numbers nonblank lines.
Is there a better, more general way to do this?

In GNU coreutils, there's tac(1)

There is a command for your purpose:
tail -r file.txt
Prints the lines of file.txt in reverse order!
The -r flag is non-standard, may not work on all systems, works e.g. on macOS.
Beware: Amount of lines limited. Works mostly, but when working with huge files be careful and check.

Answer is not 42 but tac.
Edit: Slower but more memory consuming using sed
sed 'x;1!H;$!d;x'
and even longer
perl -e'print reverse<>'

Similar to the sed example above, using perl - maybe more memorable (depending on how your brain is wired):
perl -e 'print reverse <>'

cat -b only numbers nonblank lines"
If that's the only issue you want to avoid, then why not use "cat -n" to number all the lines?

: "#(#)$Id: reverse.sh,v 1.2 1997/06/02 21:45:00 johnl Exp $"
#
# Reverse the order of the lines in each file
awk ' { printf("%d:%s\n", NR, $0);}' $* |
sort -t: +0nr -1 |
sed 's/^[0-9][0-9]*://'
Works like a charm for me...

In this case, just use --reverse:
$ git log --reverse --pretty=oneline work...master | grep -v DEBUG: | cut -d' ' -f1

rev <name of your text file.txt>
You can even do this:
echo <whatever you want to type>|rev

awk '{a[i++]=$0}END{for(;i-->0;)print a[i]}'
More faster than sed and compatible for embed devices like openwrt.

Related

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

bash shell script for mac to generate word list from a file?

Is there a shell script that runs on a mac to generate a word list from a text file, listing the unique words? Even better if it could sort by frequency....
sorry forgot to mention, yeah i prefer a bash one as i'm using mac now...
oh, my file is in french... (basically i'm reading a novel and learning french, so i try to generate a word list help myself). hope this is not a problem?
If I understood you correctly, you need something like that:
cat <filename> | sed -e 's/ /\n/g' | sort | uniq -c
This command will do
cat file.txt | tr "\"' " '\n' | sort -u
Here sort -u will not work on Macintosh machines. In that case use sort | uniq -c instead. (Thanks to Hank Gay)
cat file.txt | tr "\"' " '\n' | sort | uniq -c
Just answer my question to dot down the final version i'm using:
tr -cs "[:alpha:]" "\n" < FileIn.txt | sort | uniq -c | awk '{print $2","$1}' >> FileOut.csv
some notes:
tr can be used directly to do replacement.
since i'm interested creating a word list for my french vocabulary, i used [:alpha:]
awk is used to insert a comma, so that the output is a csv file, easier for me to upload...
thanks again for everyone helping me.
sorry i didn't put it clearly at the beginning that i'm using a mac and expect a bash script.
cheers.

How to remove the last character from a bash grep output

COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2`
outputs something like this
"Abc Inc";
What I want to do is I want to remove the trailing ";" as well. How can i do that? I am a beginner to bash. Any thoughts or suggestions would be helpful.
This will remove the last character contained in your COMPANY_NAME var regardless if it is or not a semicolon:
echo "$COMPANY_NAME" | rev | cut -c 2- | rev
I'd use sed 's/;$//'. eg:
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | sed 's/;$//'`
foo="hello world"
echo ${foo%?}
hello worl
I'd use head --bytes -1, or head -c-1 for short.
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | head --bytes -1`
head outputs only the beginning of a stream or file. Typically it counts lines, but it can be made to count characters/bytes instead. head --bytes 10 will output the first ten characters, but head --bytes -10 will output everything except the last ten.
NB: you may have issues if the final character is multi-byte, but a semi-colon isn't
I'd recommend this solution over sed or cut because
It's exactly what head was designed to do, thus less command-line options and an easier-to-read command
It saves you having to think about regular expressions, which are cool/powerful but often overkill
It saves your machine having to think about regular expressions, so will be imperceptibly faster
I believe the cleanest way to strip a single character from a string with bash is:
echo ${COMPANY_NAME:: -1}
but I haven't been able to embed the grep piece within the curly braces, so your particular task becomes a two-liner:
COMPANY_NAME=$(grep "company_name" file.txt); COMPANY_NAME=${COMPANY_NAME:: -1}
This will strip any character, semicolon or not, but can get rid of the semicolon specifically, too.
To remove ALL semicolons, wherever they may fall:
echo ${COMPANY_NAME/;/}
To remove only a semicolon at the end:
echo ${COMPANY_NAME%;}
Or, to remove multiple semicolons from the end:
echo ${COMPANY_NAME%%;}
For great detail and more on this approach, The Linux Documentation Project covers a lot of ground at http://tldp.org/LDP/abs/html/string-manipulation.html
Using sed, if you don't know what the last character actually is:
$ grep company_name file.txt | cut -d '=' -f2 | sed 's/.$//'
"Abc Inc"
Don't abuse cats. Did you know that grep can read files, too?
The canonical approach would be this:
grep "company_name" file.txt | cut -d '=' -f 2 | sed -e 's/;$//'
the smarter approach would use a single perl or awk statement, which can do filter and different transformations at once. For example something like this:
COMPANY_NAME=$( perl -ne '/company_name=(.*);/ && print $1' file.txt )
don't have to chain so many tools. Just one awk command does the job
COMPANY_NAME=$(awk -F"=" '/company_name/{gsub(/;$/,"",$2) ;print $2}' file.txt)
In Bash using only one external utility:
IFS='= ' read -r discard COMPANY_NAME <<< $(grep "company_name" file.txt)
COMPANY_NAME=${COMPANY_NAME/%?}
Assuming the quotation marks are actually part of the output, couldn't you just use the -o switch to return everything between the quote marks?
COMPANY_NAME="\"ABC Inc\";" | echo $COMPANY_NAME | grep -o "\"*.*\""
you can strip the beginnings and ends of a string by N characters using this bash construct, as someone said already
$ fred=abcdefg.rpm
$ echo ${fred:1:-4}
bcdefg
HOWEVER, this is not supported in older versions of bash.. as I discovered just now writing a script for a Red hat EL6 install process. This is the sole reason for posting here.
A hacky way to achieve this is to use sed with extended regex like this:
$ fred=abcdefg.rpm
$ echo $fred | sed -re 's/^.(.*)....$/\1/g'
bcdefg
Some refinements to answer above. To remove more than one char you add multiple question marks. For example, to remove last two chars from variable $SRC_IP_MSG, you can use:
SRC_IP_MSG=${SRC_IP_MSG%??}
cat file.txt | grep "company_name" | cut -d '=' -f 2 | cut -d ';' -f 1
I am not finding that sed 's/;$//' works. It doesn't trim anything, though I'm wondering whether it's because the character I'm trying to trim off happens to be a "$". What does work for me is sed 's/.\{1\}$//'.

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

How do you pipe input through grep to another utility?

I am using 'tail -f' to follow a log file as it's updated; next I pipe the output of that to grep to show only the lines containing a search term ("org.springframework" in this case); finally I'd like to make is piping the output from grep to a third command, 'cut':
tail -f logfile | grep org.springframework | cut -c 25-
The cut command would remove the first 25 characters of each line for me if it could get the input from grep! (It works as expected if I eliminate 'grep' from the chain.)
I'm using cygwin with bash.
Actual results: When I add the second pipe to connect to the 'cut' command, the result is that it hangs, as if it's waiting for input (in case you were wondering).
Assuming GNU grep, add --line-buffered to your command line, eg.
tail -f logfile | grep --line-buffered org.springframework | cut -c 25-
Edit:
I see grep buffering isn't the only problem here, as cut doesn't allow linewise buffering.
you might want to try replacing it with something you can control, such as sed:
tail -f logfile | sed -u -n -e '/org\.springframework/ s/\(.\{0,25\}\).*$/\1/p'
or awk
tail -f logfile | awk '/org\.springframework/ {print substr($0, 0, 25);fflush("")}'
On my system, about 8K was buffered before I got any output. This sequence worked to follow the file immediately:
tail -f logfile | while read line ; do echo "$line"| grep 'org.springframework'|cut -c 25- ; done
What you have should work fine -- that's the whole idea of pipelines. The only problem I see is that, in the version of cut I have (GNU coreutiles 6.10), you should use the syntax cut -c 25- (i.e. use a minus sign instead of a plus sign) to remove the first 24 characters.
You're also searching for different patterns in your two examples, in case that's relevant.

Resources