echo last character of text file in Unix/Bash - bash

I need to see the last characters of bunch of text files (or alternatively test whether they are "}" and give a list of files that test negative ). Is there an easy way to do this from the command line.
(Ideally the solution works without reading the whole file from the start because in addition to there being many they can also be quite large.
P.S.: Any answer would be great but I would really appreciate if the function and syntax of everything in the answer can be fully explained.

It can be done fairly easily with tail and then string indexing in bash. For example, you obtain the last line in a file with, tail -n1 file. You will need to store the line in a variable using command-substitution, e.g.
lastln=$(tail -n1 file)
Then it is simply a matter of indexing the last characters, e.g.
echo ${lastln:(-1)}
(note: when indexing from the end of the string, you must put the offset (e.g. -1 in parenthesis (-1) -- or -- you must leave a space before the -1, e.g. echo ${lastln: -1} is also valid.)

You can try this:
for file in file1 file2; do tail -n 1 "$file" | grep -q '}$' || echo "$file"; done
where you should replace file1 file2 with the list of files you want to analyze, e.g. * or the like. Now what happens here? The outer part
for file in file1 file2; do ...; done
is a simple loop over the files, where inside the loop, you can refer to the current file as $file. Then,
tail -n 1 "$file"
prints the last line of the given file and
| grep -q '}$'
redirects the output to grep (turned into silent mode with -q), which looks for '}' immediatly followed by the end of the line ($). The return value of this command can be used to chain another action: when grep returns non-zero (indicating failure, i.e., the pattern is not matched), the last part
|| echo "$file"
is executed, resulting in the list of files you need.

Related

Can I do a Bash wildcard expansion (*) on an entire pipeline of commands?

I am using Linux. I have a directory of many files, I want to use grep, tail and wildcard expansion * in tandem to print the last occurrence of <pattern> in each file:
Input: <some command>
Expected Output:
<last occurrence of pattern in file 1>
<last occurrence of pattern in file 2>
...
<last occurrence of pattern in file N>
What I am trying now is grep "pattern" * | tail -n 1 but the output contains only one line, which is the last occurrence of pattern in the last file. I assume the reason is because the * wildcard expansion happens before pipelining of commands, so the tail runs only once.
Does there exist some Bash syntax so that I can achieve the expected outcome, i.e. let tail run for each file?
I know I can always use a for-loop to solve the problem. I'm just curious if the problem can be solved with a more condensed command.
I've also tried grep -m1 "pattern" <(tac *), and it seems like the aforementioned reasoning still applies: wildcard expansion applies to only to the immediate command it is associated with, and the "outer" command runs only once.
Wildcards are expanded on the command line before any command runs. For example if you have files foo and bar in your directory and run grep pattern * | tail -n1 then bash transforms this into grep pattern foo bar | tail -n1 and runs that. Since there's only one stream of output from grep, there's only one stream of input to tail and it prints the last line of that stream.
If you want to search each file and print the last line of grep's output separately you can use a loop:
for file in * ; do
grep pattern "${file}" | tail -n1
done
The problem with non-loop solutions is that tail doesn't inherently know where the output of one file ends and the output of another file begins, or indeed that there are even files involved on the other end of the pipe. It just knows input is coming in from somewhere and it has to print the last line of that input. If you didn't want a loop, you'd have to use a more powerful tool like awk and perhaps use the fact that grep prepends the names of matched files (if multiple files are matched, or with -H) to delimit the start and end of outputs from each file. But, the work to write an awk program that keeps track of the current file to know when its output ends and print its last line is probably more effort than is worth when the loop solution is so simple.
You can achieve what you want using xargs. For your example it would be:
ls * | xargs -n 1 sh -c 'grep "pattern" $0 | tail -n 1'
Can save you from having to write a loop.
You can do this with awk, although (as tjm3772 pointed out in their answer) it's actually more complicated than the shell for loop. For the record, here's what I came up with:
awk -v pattern="YourPatternHere" '(FNR==1 && line!="") {print line; line=""}; $0~pattern {line=$0}; END {if (line!="") print line}'
Explanation: when it finds a matching line ($0~pattern), it stores that line in the line variable ({line=$0}) (this means that at the end of the file, line will hold the last matching line.
(Note: if you want to just include a literal pattern in the program, remove the -v pattern="YourPatternHere" part and replace $0~pattern with just /YourPatternHere/)
There's no simple trigger to print a match at the end of each file, so that part's split into two pieces: if it's the first line of a file AND line is set because of a match in the previous file ((FNR==1 && line!="")), print line and then clear it so it's not mistaken for a match in the current file ({print line; line=""}). Finally, at the end of the final file (END), print a match found in that last file if there was one ({if (line!="") print line}).
Also, note that the print-at-beginning-of-new-file test must be before the check for a matching line, or else it'll get very confused if the first line of the new file matches.
So... yeah, a shell for loop is simpler (and much easier to get right).

Sed through files without using for loop?

I have a small script which basically generates a menu of all the scripts in my ~/scripts folder and next to each of them displays a sentence describing it, that sentence being the third line within the script commented out. I then plan to pipe this into fzf or dmenu to select it and start editing it or whatever.
1 #!/bin/bash
2
3 # a script to do
So it would look something like this
foo.sh a script to do X
bar.sh a script to do Y
Currently I have it run a for loop over all the files in the scripts folder and then run sed -n 3p on all of them.
for i in $(ls -1 ~/scripts); do
echo -n "$i"
sed -n 3p "~/scripts/$i"
echo
done | column -t -s '#' | ...
I was wondering if there is a more efficient way of doing this that did not involve a for loop and only used sed. Any help will be appreciated. Thanks!
Instead of a loop that is parsing ls output + sed, you may try this awk command:
awk 'FNR == 3 {
f = FILENAME; sub(/^.*\//, "", f); print f, $0; nextfile
}' ~/scripts/* | column -t -s '#' | ...
Yes there is a more efficient way, but no, it doesn't only use sed. This is probably a silly optimization for your use case though, but it may be worthwhile nonetheless.
The inefficiency is that you're using ls to read the directory and then parse its output. For large directories, that causes lots of overhead for keeping that list in memory even though you only traverse it once. Also, it's not done correctly, consider filenames with special characters that the shell interprets.
The more efficient way is to use find in combination with its -exec option, which starts a second program with each found file in turn.
BTW: If you didn't rely on line numbers but maybe a tag to mark the description, you could also use grep -r, which avoids an additional process per file altogether.
This might work for you (GNU sed):
sed -sn '1h;3{H;g;s/\n/ /p}' ~/scripts/*
Use the -s option to reset the line number addresses for each file.
Copy line 1 to the hold space.
Append line 3 to the hold space.
Swap the hold space for the pattern space.
Replace the newline with a space and print the result.
All files in the directory ~/scripts will be processed.
N.B. You may wish to replace the space delimiter by a tab or pipe the results to the column command.

Grep lines between two patterns, one unique and one repeated

I have a text file which looks like this
1
bbbbb
aaa
END
2
ttttt
mmmm
uu
END
3
....
END
The number of lines between the single number patterns (1,2,3) and END is variable. So the upper delimiting pattern changes, but the final one does not. Using some bash commands, I would like to grep lines between a specified upper partner and the corresponding END, for example a command that takes as input 2 and returns
2
ttttt
mmmm
uu
END
I've tried various solutions with sed and awk, but still can't figure it out. The main problem is that I may need to grep a entry in the middle of the file, so I can't use sed with /pattern/q...Any help will be greatly appreciated!
With awk we set a flag f when matching the start pattern, which is an input argument. After that row, the flag is on and it prints every line. When reaching "END" (AND the flag is on!) it exits.
awk -v p=2 '$0~p{f=1} f{print} f&&/END/{exit}' file
Use sed and its addresses to only print a part of the file between the patterns:
#!/bin/bash
start=x
while [[ $start = *[^0-9]* ]] ; do
read -p 'Enter the start pattern: ' start
done
sed -n "/^$start$/,/^END$/p" file
You can use the sed with an address range. Modify the first regular expression (RE1) in /RE1/,/RE2/ as your convenience:
sed -n '/^[[:space:]]*2$/,/^[[:space:]]*END$/p' file
Or,
sed '
/^[[:space:]]*2$/,/^[[:space:]]*END$/!d
/^[[:space:]]*END$/q
' file
This quits upon reading the END, thus may be more efficient.
Another option/solution using just bash
#!/usr/bin/env bash
start=$1
while IFS= read -r lines; do
if [[ ${lines##* } == $start ]]; then
print=on
elif [[ ${lines##* } == [0-9] ]]; then
print=off
fi
case $print in on) printf '%s\n' "$lines";; esac
done < file.txt
Run the script with the number as the argument, 1 can 2 or 3 or ...
./myscript 1
This might work for you (GNU sed):
sed -n '/^\s*2$/{:a;N;/^\s*END$/M!ba;p;q}' file
Switch off implicit printing by setting the -n option.
Gather up the lines beginning with a line starting with 2 and ending in a line starting with END, print the collection and quit.
N.B. The second regexp uses the M flag, which allows the ^ and $ to match start and end of lines when multiple lines are being matched. Another thing to bear in mind is that using a range i.e. sed -n '/start/,/end/p' file, will start printing lines the moment the first condition is met and if the second match does not materialise, it will continue printing to the end of the file.

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

How to find/replace and increment a matched number with sed/awk?

Straight to the point, I'm wondering how to use grep/find/sed/awk to match a certain string (that ends with a number) and increment that number by 1. The closest I've come is to concatenate a 1 to the end (which works well enough) because the main point is to simply change the value. Here's what I'm currently doing:
find . -type f | xargs sed -i 's/\(\?cache_version\=[0-9]\+\)/\11/g'
Since I couldn't figure out how to increment the number, I captured the whole thing and just appended a "1". Before, I had something like this:
find . -type f | xargs sed -i 's/\?cache_version\=\([0-9]\+\)/?cache_version=\11/g'
So at least I understand how to capture what I need.
Instead of explaining what this is for, I'll just explain what I want it to do. It should find text in any file, recursively, based on the current directory (isn't important, it could be any directory, so I'd configure that later), that matches "?cache_version=" with a number. It will then increment that number and replace it in the file.
Currently the stuff I have above works, it's just that I can't increment that found number at the end. It would be nicer to be able to increment instead of appending a "1" so that the future values wouldn't be "11", "111", "1111", "11111", and so on.
I've gone through dozens of articles/explanations, and often enough, the suggestion is to use awk, but I cannot for the life of me mix them. The closest I came to using awk, which doesn't actually replace anything, is:
grep -Pro '(?<=\?cache_version=)[0-9]+' . | awk -F: '{ print "match is", $2+1 }'
I'm wondering if there's some way to pipe a sed at the end and pass the original file name so that sed can have the file name and incremented number (from the awk), or whatever it needs that xargs has.
Technically, this number has no importance; this replacement is mainly to make sure there is a new number there, 100% for sure different than the last. So as I was writing this question, I realized I might as well use the system time - seconds since epoch (the technique often used by AJAX to eliminate caching for subsequent "identical" requests). I ended up with this, and it seems perfect:
CXREPLACETIME=`date +%s`; find . -type f | xargs sed -i "s/\(\?cache_version\=\)[0-9]\+/\1$CXREPLACETIME/g"
(I store the value first so all files get the same value, in case it spans multiple seconds for whatever reason)
But I would still love to know the original question, on incrementing a matched number. I'm guessing an easy solution would be to make it a bash script, but still, I thought there would be an easier way than looping through every file recursively and checking its contents for a match then replacing, since it's simply incrementing a matched number...not much else logic. I just don't want to write to any other files or something like that - it should do it in place, like sed does with the "i" option.
I think finding file isn't the difficult part for you. I therefore just go to the point, to do the +1 calculation. If you have gnu sed, it could be done in this way:
sed -r 's/(.*)(\?cache_version=)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/ge' file
let's take an example:
kent$ cat test
ello
barbaz?cache_version=3fooooo
bye
kent$ sed -r 's/(.*)(\?cache_version=)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/ge' test
ello
barbaz?cache_version=4fooooo
bye
you could add -i option if you like.
edit
/e allows you to pass matched part to external command, and do substitution with the execution result. Gnu sed only.
see this example: external command/tool echo, bc are used
kent$ echo "result:3*3"|sed -r 's/(result:)(.*)/echo \1$(echo "\2"\|bc)/ge'
gives output:
result:9
you could use other powerful external command, like cut, sed (again), awk...
Pure sed version:
This version has no dependencies on other commands or environment variables.
It uses explicit carrying. For carry I use the # symbol, but another name can be used if you like. Use something that is not present in your input file.
First it finds SEARCHSTRING<number> and appends a # to it.
It repeats incrementing digits that have a pending carry (that is, have a carry symbol after it: [0-9]#)
If 9 was incremented, this increment yields a carry itself, and the process will repeat until there are no more pending carries.
Finally, carries that were yielded but not added to a digit yet are replaced by 1.
sed "s/SEARCHSTRING[0-9]*[0-9]/&#/g;:a {s/0#/1/g;s/1#/2/g;s/2#/3/g;s/3#/4/g;s/4#/5/g;s/5#/6/g;s/6#/7/g;s/7#/8/g;s/8#/9/g;s/9#/#0/g;t a};s/#/1/g" numbers.txt
This perl command will search all files in current directory (without traverse it, you will need File::Find module or similar for that more complex task) and will increment the number of a line that matches cache_version=. It uses the /e flag of the regular expression that evaluates the replacement part.
perl -i.bak -lpe 'BEGIN { sub inc { my ($num) = #_; ++$num } } s/(cache_version=)(\d+)/$1 . (inc($2))/eg' *
I tested it with file in current directory with following data:
hello
cache_version=3
bye
It backups original file (ls -1):
file
file.bak
And file now with:
hello
cache_version=4
bye
I hope it can be useful for what you are looking for.
UPDATE to use File::Find for traversing directories. It accepts * as argument but will discard them with those found with File::Find. The directory to begin the search is the current of execution of the script. It is hardcoded in the line find( \&wanted, "." ).
perl -MFile::Find -i.bak -lpe '
BEGIN {
sub inc {
my ($num) = #_;
++$num
}
sub wanted {
if ( -f && ! -l ) {
push #ARGV, $File::Find::name;
}
}
#ARGV = ();
find( \&wanted, "." );
}
s/(cache_version=)(\d+)/$1 . (inc($2))/eg
' *
This is ugly (I'm a little rusty), but here's a start using sed:
orig="something1" ;
text=`echo $orig | sed "s/\([^0-9]*\)\([0-9]*\)/\1/"` ;
num=`echo $orig | sed "s/\([^0-9]*\)\([0-9]*\)/\2/"` ;
echo $text$(($num + 1))
With an original filename ($orig) of "something1", sed splits off the text and numeric portions into $text and $num, then these are combined in the final section with an incremented number, resulting in something2.
Just a start since it doesn't consider cases with numbers within the file name or names with no number at the end, but hopefully helps with your original goal of using sed.
This can actually be simplified within sed by using buffers, I believe (sed can operate recursively), but I'm really rusty with that aspect of it.
perl -pi -e 's/(\?cache_version=)(\d+)/$1.($2+1)/ge' FILE [FILE...]
or for a complete solution:
find . -type f | xargs perl -pi -e 's/(\?cache_version=)(\d+)/$1.($2+1)/ge'
perl substitution operator
/e modifier evaluates the replacement as if it were a Perl statement, using its return value as the replacement text.
. operator concatenates strings in Perl. The parentheses ensures that the arithmetic operation $2+1 takes precedence over concatenation.
/g modifier applies substitution to all matched strings within line
perl options
-p ensures that perl will execute the command on every line of each file
-i ensures that each file will be edited inplace
-e specifies the perl command(s) that are executed (in this case, the substitution operation)

Resources