Makefile: loop over folder and merge files into a new file - bash

I have a folder /jobs and I am trying to merge every files present within that folder into a new file called workflows.yaml with a new line between each file merged.
I am able to loop over the directory using
for FILE in jobs/*; do awk $$FILE > workflows.yaml
And I am also able to merge
awk '{print}' jobs/a.yaml jobs/b.yaml > workflows.yaml
What I tried but did not work:
for FILE in jobs/*; do echo $$FILE; done

You don't need awk, ed, etc. At least I don't see why based on the question. Isn't this good enough:
all:
for f in jobs/*; do cat $$f; echo; done > workflows.yml
? If not perhaps you could be clear in your question about exactly what you want to do. When you say "a new line" do you mean, a blank line?

Related

find specific text in a directory and delete the lines from the files

I want to find specific text in a directory, and then delete the lines from the files that include the specific text.
Now I have two questions:
How can I achieve the task?
What is wrong with What I have tried? I have tried the methods below, but failed. the details are following:
grep -rnw "./" -e "webdesign"
This searches the current directory with pattern "webdesign", and I get the result:
.//pages/index.html:1:{% load webdesign %}
.//pages/pricing.html:1:{% load webdesign %}
.//prototypes.py:16: 'django.contrib.webdesign',
Then I use sed to remove the lines from those files, which doesn't work, only get blank file ( I mean it deletes all my file content):
sed -i "/webdesign/d" ./pages/index.html
or
sed "/webdesign/d" ./pages/index.html > ./pages/index.html
My software environment is: OS X Yosemite, Mac Terminal, Bash
A loop in bash will do the trick provided that there are no filenames with spaces (in which case other solutions are possible, but this is the simplest)
for i in `grep -lrnw "yourdirectory/" -e "webdesign"`
do
sed "/webdesign/d" $i > $i.tmp
# safety to avoid destroying the file if problem arises (disk full?)
if [ $? = 0 ] ; then
mv -f $i.tmp $i
fi
done
note that you should not locate this script in the current directory because it contains webdesign and it will be modified as well :)
Thanks to choroba, I know that -i option doesn't work like wished. But it has another meaning or it would be rejected by the opt parser. It has something to do with suffixes, well, it doesn't matter now, but it's difficult to see the problem at first.
Without -i you cannot work on a file in-place. And redirecting output to the input just destroys the input file (!). That's why your solution did not work.
You can install GNU sed that supports the -i option, then
sed -i '/webdesign/d' files
should work. Note that it's safer to use -i~ to create a backup.
You cannot write to the same file you're reading from, that's why
sed /webdesign/d file > file
doesn't work (it overwrites the file before you can read anything from it). Create a temporary file
sed /webdesign/d file > file.tmp
mv file.tmp file

How do I write a bash script to copy files into a new folder based on name?

I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.

Shell one-liner to add a line to a sorted file

I want to add a line to a text file so that the result is sorted, where the text file was originally sorted. For example:
cp file tmp; echo "new line" >> tmp; sort tmp > file; rm -f tmp
I'd REALLY like to do it w/o the temp file and w/o the semicolons (using pipes instead?); using sed would be acceptable. Is this possible, and if so, how?
echo "New Line" | sort -o file - file
The -o file means write result to file (and it is explicitly safe to have any of the input files as the output file). The - on its own means 'read standard input' which contains the new line of information. The file at the end means 'also read file'. This would work with any Unix sort from (at least) 7th Edition UNIX™ circa 1978 onwards, and possibly even before that. There are no temporary files or dependencies on other utilities.
Given that a single line is 'sorted' and the file is also in sorted order, you can probably speed the process up by just merging the two sorted inputs:
echo "New Line" | sort -o file -m - file
That also would have worked with even really old sort commands.
This is the shortest one liner I can think of without any temporary files:
$ echo "something" >> file; sort file -o file
Yep, you'll either need to resort or comm them together (if they're already presorted) assuming they have no tabs, which will save you the sort (which can produce temp files and overhead depending on file size).
Alternative:
comm -3 file <(echo "new line") |tr -d '\t'
This might be the "shortest":
sort -m file <(echo "new line")
You can do this without any semicolons and without a temp file, but probably not without depending on some utilities that might not be everywhere (like awk with in-place file modification, or perl).
Why don't you want to use temp files or semicolons?
Edit: since semicolons are ok, how about:
val=$(cat file); { echo "$val"; echo "new line"; } | sort > file
Large files / performance:
Convert your file to an SQLite database with a single indexed column and query that.
Or re-implement a file-based B-tree or hash map yourself, which how SQLite implements indexes...
I think it is impossible to insert into sorted text files efficiently: even if you do a binary search, you still have to copy everything that comes after the insertion point, and that disk operation will be the bottleneck: https://unix.stackexchange.com/questions/87772/add-lines-to-the-beginning-and-end-of-the-huge-file
For search, sgrep might work: https://askubuntu.com/questions/423886/efficiently-search-sorted-file/701237#701237

Process loop over multiple file sets

In order to simplify my work I usually do this:
for FILE in ./*.txt;
do ID=`echo ${FILE} | sed 's/^.*\///'`;
bin/Tool ${FILE} > ${ID}_output.txt;
done
Hence process loops over all *.txt files.
Now I have two file groups - my Tool uses two inputs (-a & -b). Is there any command to run Tool for every FILE_A over every FILE_B and name the output file as a combination of both them?
I imagine it should look like something like this:
for FILE_A in ./filesA/*.txt;
do for FILE_B in ./filesB/*.txt;
bin/Tool -a ${FILE_A} -b ${FILE_B} > output.txt;
done
So the process would run number of *.txt in filesA over number of *.txt in filesB.
And also the naming issue which I even don't know where to put in...
Hope it is clear what I am asking. Never had to do such task before and a command line would be really helpful.
Looking forward!
NEWNAME="${FILE_A##*/}_${FILE_B##*/}_output.txt"

Merging, then splitting files

Using a for loop, I can merge all of the files in a directory that end with *.txt:
for filename in *.txt; do
cat "${filename}"
echo
done > output.txt
After doing this, I will run output.txt through various scripts, in which the text will be changed considerably. After that, I want to split the files, at the same places at which they were merged, into different files (output01.txt, output02.txt, etc.).
How can I split the files at the same place they were merged?
This cannot be based on line number, because the scripts will add \t in places.
I think a solution that might work is to place "#########" at the end of each of the initial *.txt files before merging them, but I don't know how to get BASH to split the files again at that mark.
Instead of that for loop for concatenating, you can just use cat *.txt.
Anyway, why don't you just perform the scripts on each file independently within the for loop?
If you really want to combine and re-segregate, you can use:
for filename in *.txt; do
cat "${filename}"
echo "#####"
done > output.txt
# Pass output.txt through whatever
awk 'BEGIN { fileno = 1; file = sprintf("output%02d.txt", fileno) };
{ if($1 ~ /#####/) { fileno++;
file = sprintf("output%02d.txt", fileno);
next }
else print >file
}' output.txt
The canonical answer would be:
tar c *.txt > output.txt
You could split/unmerge them exactly by doing
tar xf output.txt # in the current directory
tar x -C /tmp/splitfiles/ -f output.txt
Now if you really want to do stuff like that in a loop and extract to stdout/a pipe, you could:
while read fname < <(tar tf output.txt)
do
# extract named to pipe
tar -xOf output.txt "$fname" | myprogram "$fname"
done
However, that would possibly not be very efficient. You could consider just doing
while read fname < <(tar x -v -C /tmp/splitfiles/ -f output.txt)
do
# handle extracted file
myprogram "/tmp/splitfiles/$fname"
unlink "/tmp/splitfiles/$fname" # drop the temp file
done
This will be completely asynchronous (so if extraction or even the transmission of the archive is slow, the first files can already be processed while waiting for more data to arrive).
See also my other answer https://stackoverflow.com/a/8341221/85371 (look for the older answer part, since that question was changed to be very specific later)
As Fredrik wrote here you can use csplit to split your merged file.

Resources