Lets say I have a folder with files like the one below:
.
|__ components
| |__ index.js
| |__ _index.js
| |__ _index.en.js
|
|__ _.js
What I want is to search in components/index.js and in _.js and ignore all the components/_*.js files.
Basially I need something like ack's --ignore-file=match:_.*\.js but a version that also supports file's path.
Is it possible in ack 2?
Unfortunately, ack doesn't have any direct support for ignoring certain files in subdirectories.
Another option is to combine ack with grep with something like the following:
ack -f | grep -v -x 'components/_.*\.js' | ack -x mystr
To explain:
ack -f prints the files selected, without actually searching.
grep -v -x 'components/_.*\.js' matches lines that don't contain the _.*\.js files under the components directory.
ack -x then reads the list of files to search from grep.
Of course, you can tailor the directories to ignore with grep.
Related
I want to copy files from multiple sub directories to a new directory within the main directory called copiedFiles/. I only want to copy files that can be matched to strings in the file strs2bMatchd.csv. The names of the sub directories also matches the first part of the strings to be matched (see example below).
The main directory with sub directories looks like this
main_dir/
strs2bMatchd.csv
1111/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1111_ddd1_x443.csv
1111_ddd2_x444.csv
1112/
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1112_fff1_x664.csv
1112_fff2_x665.csv
1113/
1113_ccc1_x912.csv
1113_ccc2_x913.csv
The files to be copied should match the strings in strs2bMatchd.csv file
cat strs2bMatchd.csv
1111_aaa1
1111_aaa2
1112_bbb1
1112_bbb2
1113_ccc1
1113_ccc2
This is the expected result
main_dir/
strs2bMatchd.csv
1111/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1111_ddd1_x443.csv
1111_ddd2_x444.csv
1112/
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1112_fff1_x664.csv
1112_fff2_x665.csv
1113/
1113_ccc1_x912.csv
1113_ccc2_x913.csv
copiedFiles/
1111_aaa1_x873.csv
1111_aaa2_x874.csv
1112_bbb1_x912.csv
1112_bbb2_x913.csv
1113_ccc1_x912.csv
1113_ccc2_x913.csv
As an alternative, consider
M=main_dir
mkdir -p $M/copiedFiles
find $M | grep -F -v "$M/copiedFiles" | grep -Ff $M/strs2bMatch.csv | xargs cp -t $M/copiedFiles/
It will execute the find once.
If the files may contain space or other special characters, consider using the safe version (NUL terminated strings) - for find (-print0), grep (-z -Z) xargs (-0).
Update 1: OP indicate his version of cp does not have -t. Alternative solution, without this options is posted.
I can not test, please try to solve problem using man, etc.
M=main_dir
mkdir -p $M/copiedFiles
find $M | grep -F -v "$M/copiedFiles" | grep -Ff $M/strs2bMatch.csv | xargs -I{} cp {} $M/copiedFiles/
First of all I'm a newbie with bash scripting so forgive me if i'm making easy mistakes.
Here's my problem. I needed to download my company's website. I accomplish this using wget with no problems but because some files have the ? symbol and windows doesn't like filenames with ? I had to create a script that renames files and also update the source code of all files that calls the rename file.
To accomplish this I use the following code:
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
grep -rl "$SUBSTRING" * | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
done
This is having 2 problems.
This is taking way too long, I've waited more than 5 hours and is still going.
It looks like is doing a append in the source code because when i stop the script and search for changes the URL is repeated like 4 times ( or more ).
Thanks all for your comments, i will try the 2 separete step and see, also, just as FYI, there are 3291 files that were downloaded with wget, still thinking that using bash scripting is prefer over other tools for this?
Seems odd that a file would have ? in it. Website URLs have ? to indicate passing of parameters. wget from a website also doesn't guarantee you're getting the site, especially if server side execution takes place, like php files. So, I suspect as wget does its recursiveness, it's finding url's passing parameters and thus creating them for you.
To really get the site, you should have direct access to the files.
If I were you, I'd start over and not use wget.
You may also be having issues with files or directories with spaces in their name.
Instead of that line with xargs, you're already doing one file at a time, but grepping for all recursively. Just do the sed on the new file itself.
Ok, here's the idea (untested):
in the first loop, just move the files and compose a global sed replacement file
once it is done, just scan all the files and apply sed with all the patterns at once, thus saving a lot of read/write operations which are likely to be the cause of the performance issue here
I would avoid to put the current script in the current directory or it will be processed by sed, so I suppose that all files to be processed are not in the current dir but in data directory
code:
sedfile=/tmp/tmp.sed
data=data
rm -f $sedfile
# locate ourselves in the subdir to preserve the naming logic
cd $data
# rename the files and compose the big sedfile
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
echo "s/$SUBSTRING/$NEWSTRING/g" >> $sedfile
done
# now apply the big sedfile once on all the files:
# if you need to go recursive:
find . -type f | xargs sed -i -f $sedfile
# if you don't:
sed -i -f $sedfile *
Instead of using grep, you can use the find command or ls command to list the files and then operate directly on them.
For example, you could do:
ls -1 /path/to/files/* | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
Here's where I got the idea based on another question where grep took too long:
Linux - How to find files changed in last 12 hours without find command
I have a file.gz (not a .tar.gz!) or file.zip file. It contains one file (20GB-sized text file with tens of millions of lines) named 1.txt.
Without saving 1.txt to disk as a whole (this requirement is the same as in my previous question), I want to extract all its lines that match some regular expression and don't match another regex.
The resulting .txt files must not exceed a predefined limit, say, one million lines.
That is, if there are 3.5M lines in 1.txt that match those conditions, I want to get 4 output files: part1.txt, part2.txt, part3.txt, part4.txt (the latter will contain 500K lines), that's all.
I tried to make use of something like
gzip -c path/to/test/file.gz | grep -P --regexp='my regex' | split -l1000000
But the above code doesn't work. Maybe Bash can do it, as in my previous question, but I don't know how.
You can perhaps use zgrep.
zgrep [ grep_options ] [ -e ] pattern filename.gz ...
NOTE: zgrep is a wrapper script (installed with gzip package), which essentially uses the same command internally as mentioned in other answers.
However, it looks more readable in the script & easier to write the command manually.
I'm afraid It's imposible, quote from gzip man:
If you wish to create a single archive file with multiple members so
that members can later be extracted independently, use an archiver
such as tar or zip.
UPDATE: After de edit, if the gz only contains one file , a one step tool like awk shoul be fine:
gzip -cd path/to/test/file.gz | awk 'BEGIN{global=1}/my regex/{count+=1;print $0 >"part"global".txt";if (count==1000000){count=0;global+=1}}'
split is also a good choice but you will have to rename files after it.
Your solution is almost good. The problem is that You should specify for gzip what to do. To decompress use -d. So try:
gzip -dc path/to/test/file.gz | grep -P --regexp='my regex' | split -l1000000
But with this you will have a bunch of files like xaa, xab, xac, ... I suggest to use the PREFIX and numeric suffixes features to create better output:
gzip -dc path/to/test/file.gz | grep -P --regexp='my regex' | split -dl1000000 - file
In this case the result files will look like: file01, file02, fil03 etc.
If You want to filter out some not matching perl style regex, you can try something like this:
gzip -dc path/to/test/file.gz | grep -P 'my regex' | grep -vP 'other regex' | split -dl1000000 - file
I hope this helps.
I have a directory containing a bunch of header files from a library. I would like to see how "Uint32" is defined.
So, I need a way to scan over all those header files and print out lines with "Uint32".
I guess grep could help, but I'm new to shell scripts.
What should I do?
There's a couple of ways.
grep -r --include="*.c" Unit32
is one way.
Another is:
find . -name "*.c" | xargs grep Unit32
If you have spaces in the file names, the latter can be problematic.
find . -name "*.c" -print0 | xargs -0 grep Unit32
will solve that typically.
Just simple grep will be fine:
grep "Uint32" *.h*
This will search both *.h and *.hpp header files.
Whilst using grep is fine, for navigating code you may also want to investigate ack (a source-code aware grep variant), and/or ctags (which integrates with vi or emacs and allows navigation through code in your editor)
ack in particular is very nice, since it'll navigate through directory hierarchies, and only work on specific types of files (so for C it'll interrogate .c and .h files, but ignore SCM revision directories, backup files etc.)
Of course, you really need some form of IDE to give you complete navigation over the codebase.
I have 2 directories which have the similiar structure.
./projectA
/directory1
/file1
/directory2
/file2
/file3
/file4
/directory3
/directoryA
./projectB
/directory1
/directory2
/file3
/directory3
/directoryB
I would like to merge projectA into projectB. If a directory or file existing in A and not existing in B, do svn copy from A to B to the according target. If a file in A has an according file in B, echo a warning. How can I do this automatically? It can be shell scripts or tools....
Thanks
With this command you get all files that are in projectA but not in projectB, ignoring the .svn folder:
diff -qr projectA projectB --exclude=.svn | grep "^Only in projectA:" | cut -d: -f2 | sed 's/^ *//g
With this command you get all files that exist in both folders and that differ (i.e. files you might need to check before copying):
diff -qr projectA projectB --exclude=.svn | grep "^Files " | cut -d" " -f2 | sed 's!projectA!!g'
The second command will not work with files that have spaces in them, though.
Now that you've got two lists with the file names you need, you can easily write a small script that does the right thing with them.