Exclude wildcard containing string 'x' - bash

Is there a way to modify egrep -rha 'part1.*part2' to add something like: "if .* contains (not necessarily equals) string_x then pattern is not a match"? The problem is that string_x is present in every line so I can't -v it. It's okay to have this string before or after pattern, just not in the middle of it.
I'm assuming double .* with not string_x between them will get the job done, but it'll take a lot of time, plus I sometimes use .{n,m} wildcard, and in this case it would double the desired wildcard length. Maybe some sort of search termination every time it encounters string_x before part2?

Forget you ever heard about -r or any other option to let grep find files. There's a perfectly good tool for finding files with an extremely obvious name - find. Keep grep for what it's good at which is doing g/re/p. I can't imagine what the GNU guys were smoking when they decided to give grep options to find files, but hopefully they aren't now plotting to add options to sort files or pull content from web sites or print process data or do anything else that existing tools do perfectly well!
In this case you're looking for more than just g/re/p though so you should use awk:
awk '/part1.*part2/ && !/part1.*string_x.*part2/'
So the full script would be something like (untested since no sample input/output provided):
find . -type f -exec awk '/part1.*part2/ && !/part1.*string_x.*part2/' {} +

Related

How to batch replace part of filenames with the name of their parent directory in a Bash script?

All of my file names follow this pattern:
abc_001.jpg
def_002.jpg
ghi_003.jpg
I want to replace the characters before the numbers and the underscore (not necessarily letters) with the name of the directory in which those files are located. Let's say this directory is called 'Pictures'. So, it would be:
Pictures_001.jpg
Pictures_002.jpg
Pictures_003.jpg
Normally, the way this website works, is that you show what you have done, what problem you have, and we give you a hint on how to solve it. You didn't show us anything, so I will give you a starting point, but not the complete solution.
You need to know what to replace: you have given the examples abc_001 and def_002, are you sure that the length of the "to-be-replaced" part always is equal to 3? In that case, you might use the cut basic command for deleting this. In other ways, you might use the position of the '_' character or you might use grep -o for this matter, like in this simple example:
ls -ltra | grep -o "_[0-9][0-9][0-9].jpg"
As far as the current directory is concerned, you might find this, using the environment variable $PWD (in case Pictures is the deepest subdirectory, you might use cut, using '/' as a separator and take the last found entry).
You can see the current directory with pwd, but alse with echo "${PWD}".
With ${x#something} you can delete something from the beginning of the variable. something can have wildcards, in which case # deletes the smallest, and ## the largest match.
First try the next command for understanding above explanation:
echo "The last part of the current directory `pwd` is ${PWD##*/}"
The same construction can be used for cutting the filename, so you can do
for f in *_*.jpg; do
mv "$f" "${PWD##*/}_${f#*_}"
done

Python: Search for String in Filenames

I have been trying to search the root file system for a certain string (in the middle of the filename or wherever). I read about grep and I have tried this code here:
grep -rnw /home/pi/music -e "Maroon"
and something strange happens, there are three filenames with Maroon in them (same capitalization and spacing), but only two show up in the terminal. Any ideas why that is? Are there any other, easier ways to do this?
I would also like to say that I saw this StackOverflow post here, but I could not get it to work. I believe that was focusing on specific filenames, while I would like to do a general search.
All help is very much appreciated!
grep reads through the files on your disk, and searches for the word "Maroon".
What I think you want (when searching for file names) is:
find /home/pi/music -iname "*maroon*"
This will display all files that are named *maroon* (case insensitive). If you want case sensitive, take a look at -name.
man find
Will list all options for find.
The correct (or rather, the more common) way to search for files in matching a certain pattern is to use the find command, like this:
find /home/pi/music -type f -iname "*maroon*" -ls
type, limit searches to a particular type, in this case f for regular files (so it will ignore directories, pipes, sockets, etc.)
iname case insensitive name search
ls list the files found.
grep is used to search within files for matching content.
You want to use find to search for filenames
find /home/pi/music -depth 1 -name \*Maroon\*
This will find a file where the name contains the string. You need to quote the filename, so the shell doesn't glob it. -depth 1 so you only search the current directory

Bash: find references to filenames in other files

Problem:
I have a list of filenames, filenames.txt:
Eg.
/usr/share/important-library.c
/usr/share/youneedthis-header.h
/lib/delete/this-at-your-peril.c
I need to rename or delete these files and I need to find references to these files in a project directory tree: /home/noob/my-project/ so I can remove or correct them.
My thought is to use bash to extract the filename: basename filename, then grep for it in the project directory using a for loop.
FILELISTING=listing.txt
PROJECTDIR=/home/noob/my-project/
for f in $(cat "$FILELISTING"); do
extension=$(basename ${f##*.})
filename=$(basename ${f%.*})
pattern="$filename"\\."$extension"
grep -r "$pattern" "$PROJECTDIR"
done
I could royally screw up this project -- does anyone see a flaw in my logic; better: do you see a more reliable scalable way to do this over a huge directory tree? Let's assume that revision control is off the table ( it is, in fact ).
A few comments:
Instead of
for f in $(cat "$FILELISTING") ; do
...
done
it's somewhat safer to write
while IFS= read -r f ; do
...
done < "$FILELISTING"
That way, your code will have no problem with spaces, tabs, asterisks, and so on in the filenames (though it still won't support newlines).
Your goal in separating f into extension and filename, and then reassembling them with \., seems to be that you want the filename to be treated as a literal string; right? Like, you're worried that grep will treat the . as meaning "any character" rather than as "one dot". A more general solution is to use grep's -F option, which tells it to treat the pattern as a fixed string rather than a regex:
grep -r -F "$f" "$PROJECTDIR"
Your introduction mentions using basename, but then you don't actually use it. Is that intentional?
If your non-use of basename is intentional, then filenames.txt really just contains a list of patterns to search for; you don't even need to write a loop, in this case, since grep's -f option tells it to take a newline-separated list of patterns from a file:
grep -r -F -f "$FILELISTING" "$PROJECTDIR"
You should back up your project, using something like tar -czf backup.tar.gz "$PROJECTDIR". "Revision control is off the table" doesn't mean you can't have a rollback strategy!
Edited to add:
To pass all your base-names to grep at once, in the hopes that it can do something smarter with them than just looping over them just as though the calls were separate, you can write something like:
grep -r -F "$(sed 's#.*/##g' "$FILELISTING")" "$PROJECTDIR"
(I used sed rather than while+basename for brevity's sake, but you can an entire loop inside the "$(...)" if you prefer.)
This is a job for an IDE.
You're right that this is a perilous task, and unless you know the build process and the search directories and the order of the directories, you really can't say what header is with which file.
Let's take something as simple as this:
# include "sql.h"
You have a file in the project headers/sql.h. Is that file needed? Maybe it is. Maybe not. There's also a /usr/include/sql.h. Maybe that's the one that's actually used. You can't tell without looking at the Makefile and seeing the order of the include directories which is which.
Then, there are the libraries that get included and may need their own header files in order to be able to compile. And, once you get to the C preprocessor, you really will have a hard time.
This is a task for an IDE (Integrated Development Environment). An IDE builds the project and tracks file and other resource dependencies. In the Java world, most people use Eclipse, and there is a C/C++ plugin for those developers. However, there are over 2 dozen listed in Wikipedia and almost all of them are open source. The best one will depend upon your environment.

For loop in shell script - colons and hash marks?

I am trying to make heads or tails of a shell script. Could someone please explain this line?
$FILEDIR is a directory containing files. F is a marker in an array of files that is returned from this command:
files=$( find $FILEDIR -type f | grep -v .rpmsave\$ | grep -v .swp\$ )
The confusing line is within a for loop.
for f in $files; do
target=${f:${#FILEDIR}}
<<do some more stuff>>
done
I've never seen the colon, and the hash before in a shell script for loop. I haven't been able to find any documentation on them... could someone try and enlighten me? I'd appreciate it.
There are no arrays involved here. POSIX sh doesn't have arrays (assuming you're not using another shell based upon the tags).
The colon indicates a Bash/Ksh substring expansion. These are also not POSIX. The # prefix expands to the number of characters in the parameter. I imagine they intended to chop off the directory part and assign it to target.
To explain the rest of that: first find is run and hilariously piped into two greps which do what could have been done with find alone (except breaking on possible filenames containing newlines), and the output saved into files. This is also something that can't really be done correctly if restricted only to POSIX tools, but there are better ways.
Next, files is expanded unquoted and mutalated by the shell in more ridiculous ways for the for loop to iterate over the meaningless results. If the rest of the script is this bad, probably throw it out and start over. There's no way that will do what's expected.
The colon can be as a substring. So:
A=abcdefg
echo ${A:4}
will print the output:
efg
I'm not sure why they would use a file directory as the 2nd parameter though...
If you are having problems understanding the for loop section, try http://www.dreamsyssoft.com/unix-shell-scripting/loop-tutorial.php

how much should I worry about argument list too long?

I have a shell script, which will use some * to do wildcard. For example:
mv /someplace/*.DAT /someotherplace
And
for file in /someplace/*.DAT
do
echo $file
done
Then when I think about error handling, I am worrying about the infamuse argument list too long error.
How much should I worry about it? Actually how long can the shell holds? For example, will it dies at 500 files or 1000 files? Does it depends on the length of the filenames?
EDIT:
I have found out the argument max is 131072 bytes. I am not looking for solution to overcome argument too long problem. What I really what to need is -- How long does it translate to normal string command? i.e : How "long" would that be the command? Does it count space?
pardon my ignorance
If i remember correctly, is capped at 32Kb of data
first command
find /someplace -name '*.DAT' -print0 | xargs -r0 mv --target='/someotherplace'
second command
find /someplace -type f -name "*.DAT"
Yes, it depends on filename length. The command line maximum is a single hardcoded limit, so long filenames will exhaust it faster. And it's usually a kernel limitation, so there is no way around it within bash. And yes, this is serious: errors that occur only infrequently are always more serious than obvious errors, because quality assurance will probably miss them, and when they do happen it is almost guaranteed to be with a nightmarish unreadable command line that you can't even reconstruct properly!
For all these reasons: deal with the problem now rather than later.
Whether
How much should you worry about it? You may as well ask "What is the lifespan of my code?"
I would urge you to always worry about the argument list limit. This limit is set at compile time and can easily be different on different systems, shells, etc.. Do you know for sure that your code will always run in its original environment with expected input and that environment's original limit?
If the expansion of a glob could result in an unknown number of files or files with an unknown length being expanded or that expansion could exceed the limit that will be in effect in any unknown future environment then you should write your code from day one so as to avoid this bug.
How
There are three find-based solution for this problem. The classic solution uses xargs
find ... | xargs command
xargs will execute command with as many matches as it can without overflowing the argument list, then repeat that invocation as necessary until there are no more results from find.
This solution is problematic because file names may contain newlines. If you're lucky you have a nicer version of find which supports null-terminating file names with -print0 and you can use the safer solution
find ... -print0 | xargs -0 command
This is the same as the first find except it's safe for all legal file names.
Newer versions of find may support -exec with the + terminator, which allows for another solution
find ... -exec command {} +
This is functionally identical to the second find command above: safe for all file names, splits invocations of command into chunks that won't overflow the argument list. I prefer this form, when available.

Resources