Manipulating a file - bash - bash

I need some guidance manipulating a text file that is the result of a diff. I only want those results listed after the > delimiter (which are file names) and then I will add a path to the file name for further work.
I am not dealing with large files.
I am hoping to do it all in place.
Essentially I want to take something like this
96a97,98
> SCR-33333.sql
> SCR-33333-WEB.sql
and create an action like
cp /add/this/path/SCR-33333.sql /to/somewhere/else
Can anyone please give me a quick example I can run with?

Well, you could try this, bearing in mind that it'll only work if filenames do not contains spaces...
diff this that | awk '/^>/{print "/add/this/path/" $2}' | xargs -i cp {} /to/somewhere/else
(note: this is a one-liner command. ignore wrapping caused by web browser.)

grep ">" dummy.txt | cut -f 2 -d ' ' | xargs -I{} cp /add/this/path/{} somewhere
where 'dummy.txt' is your diff file.

Related

I want to pipe grep output to sed for input

I'm trying to pipe the output of grep to sed so it will only edit specific files. I don't want sed to edit something without changing it. (Changing the modified date.)
I'm searching with grep and writing with sed. That's it
The thing I am trying to change is a dash, not the normal type, a special type. "-" is normal. "–" isn't normal
The code I currently have:
sed -i 's/– foobar/- foobar/g' * ; perl-rename 's/– foobar/- foobar/' *'– foobar'*
Sorry about the trouble, I'm inexperienced.
Are you sure about what you want to achieve? Let me explain you:
grep "string_in_file" <filelist> | sed <sed_script>
This is first showing the "string_in_file", preceeded by the filename.
If you launch a sed on this, then it will just show you the result of that sed-script on screen, but it will not change the files itself. In order to do this, you need the following:
grep -l "string_in_file" <filelist> | sed <sed_script_on_file>
The grep -l shows you some filenames, and the new sed_script_on_file needs to be a script, reading the file, and altering it.
Thank you all for helping, I'm sorry about not being fast in responding
After a bit of fiddling with the command, I got it:
grep -l 'old' * | xargs -d '\n' sed -i 's/old/new/'
This should only touch files that contain old and leave all other files.
This might be what you're trying to do if your file names don't contain newlines:
grep -l -- 'old' * | xargs sed -i 's/old/new/'

Faster grep in many files from several strings in a file

I have the following working script to grep in a directory of Many files from some specific strings previously saved into a file.
I use the files extension to grep all files as its name are random and note that every string from my previously file should be searched in all the files.
Also, I cut the outputting grep as it return 2 or 3 lines of the matched file and I only want a specific part that shows the filename.
I might be using something redundant, how it could be faster?
#!/bin/bash
#working but slow
cd /var/FILES_DIRECTORY
while read line
do
LC_ALL=C fgrep "$line" *.cps | cut -c1-27 >> /var/tmp/test_OUT.txt
done < "/var/tmp/test_STRINGS.txt"
grep -F -f /var/tmp/test_STRINGS.txt *.cps | cut -c1-27
Isn't what you're looking for ?
this should speed up your script :
#!/bin/bash
#working fast
cd /var/FILES_DIRECTORY
export LC_ALL=C
grep -f /var/tmp/test_STRINGS.txt *.cps | cut -c1-27 > /var/tmp/test_OUT.txt

remove files less than a cetain size and extract filenames

I am working on a cluster remotely and give a few thousands of jobs. Some jobs crash early. I need to move the output files of those jobs (smaller than 1KB) to another folder and start them again. I guess find can move them with something like:
find . -size -1000c -exec mv {} ../crashed \;
but I also need to restart these crashed jobs. Output files in a bunch of folders in output folder and I need folder name and file name(without extantion) seperatly.
I guess sed or/and awk can do this easily but i am not sure how. By the way i am working on BASH shell.
I am trying to use cut, which seems to be working:
for i in $( find . -size -1000c )
do
FOLDER=$(echo "${i%.*}" | cut -d'/' -f2)
FILENAME=$(echo "${i%.*}" | cut -d'/' -f3)
done
But wouldn't it be better using sed or awk? And how?
Sed is a stream editor and since you're not changing anything I wouldn't use it in this case. You could use awk instead of cut like this:
FOLDER=$(echo "${i%.*}" | awk -v FS="/" '{ print $2 }')
where the -v FS="/" specifies that the variable FS (field separator, is a slash, kind of the same as what you do with the -d option in cut) and print $2 tells awk to print only the second field.
Same goes for the other instruction you have there. In your case what you have to do is simple enough, so cut actually cuts it :D
I usually use awk for more complicated tasks, involving multiple files and/or mathematical computations.
Edit:
note that I'm using gawk here (the awk implementation by GNU). I'm not sure you can pass a variable value with the -v option in other implementations, they'll have their way to do it.

To show only file name without the entire directory path

ls /home/user/new/*.txt prints all txt files in that directory. However it prints the output as follows:
[me#comp]$ ls /home/user/new/*.txt
/home/user/new/file1.txt /home/user/new/file2.txt /home/user/new/file3.txt
and so on.
I want to run the ls command not from the /home/user/new/ directory thus I have to give the full directory name, yet I want the output to be only as
[me#comp]$ ls /home/user/new/*.txt
file1.txt file2.txt file3.txt
I don't want the entire path. Only filename is needed. This issues has to be solved using ls command, as its output is meant for another program.
ls whateveryouwant | xargs -n 1 basename
Does that work for you?
Otherwise you can (cd /the/directory && ls) (yes, parentheses intended)
No need for Xargs and all , ls is more than enough.
ls -1 *.txt
displays row wise
There are several ways you can achieve this. One would be something like:
for filepath in /path/to/dir/*
do
filename=$(basename $filepath)
... whatever you want to do with the file here
done
Use the basename command:
basename /home/user/new/*.txt
(cd dir && ls)
will only output filenames in dir. Use ls -1 if you want one per line.
(Changed ; to && as per Sactiw's comment).
you could add an sed script to your commandline:
ls /home/user/new/*.txt | sed -r 's/^.+\///'
A fancy way to solve it is by using twice "rev" and "cut":
find ./ -name "*.txt" | rev | cut -d '/' -f1 | rev
The selected answer did not work for me, as I had spaces, quotes and other strange characters in my filenames. To quote the input for basename, you should use:
ls /path/to/my/directory | xargs -n1 -I{} basename "{}"
This is guaranteed to work, regardless of what the files are called.
I prefer the base name which is already answered by fge.
Another way is :
ls /home/user/new/*.txt|awk -F"/" '{print $NF}'
one more ugly way is :
ls /home/user/new/*.txt| perl -pe 's/\//\n/g'|tail -1
just hoping to be helpful to someone as old problems seem to come back every now and again and I always find good tips here.
My problem was to list in a text file all the names of the "*.txt" files in a certain directory without path and without extension from a Datastage 7.5 sequence.
The solution we used is:
ls /home/user/new/*.txt | xargs -n 1 basename | cut -d '.' -f1 > name_list.txt
There are lots of way we can do that and simply you can try following.
ls /home/user/new | tr '\n' '\n' | grep .txt
Another method:
cd /home/user/new && ls *.txt
Here is another way:
ls -1 /home/user/new/*.txt|rev|cut -d'/' -f1|rev
You could also pipe to grep and pull everything after the last forward slash. It looks goofy, but I think a defensive grep should be fine unless (like some kind of maniac) you have forward slashes within your filenames.
ls folderpathwithcriteria | grep -P -o -e "[^/]*$"
When you want to list names in a path but they have different file extensions.
me#server:/var/backups$ ls -1 *.zip && ls -1 *.gz

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Resources