Merge all files in a directory into one using bash - bash

I have a directory with several *.js files. Quantity and file names are unknown. Something like this:
js/
|- 1.js
|- 2.js
|- blabla.js
I need to merge all the files in this directory into one merged_dmYHis.js. For example, if files contents are:
1.js
aaa
bbb
2.js
ccc
ddd
eee
blabla.js
fff
The merged_280120111257.js would contain:
aaa
bbb
ccc
ddd
eee
fff
Is there a way to do it using bash, or such task requires higher level programming language, like python or similar?

cat 1.js 2.js blabla.js > merged_280120111257.js
general solution would be:
cat *.js > merged_`date +%d%m%Y%H%M`.js
Just out of interest - do you think it is a good idea to name the files with DDMMYYYYHHMM? It may be difficult to sort the files chronologically (within the shell). How about the YYYYMMDDHHMM pattern?
cat *.js > merged_`date +%Y%m%d%H%M`.js

You can sort the incoming files as well, the default is alphabetical order, but this example goes through from oldest to the newest by the file modification timestamp:
cat `ls -tr *.js` > merged_`date +%Y%m%d%H%M`.js
In this example cat takes the list of files from the ls command, and -t sorts by timestamp, and -r reverses the default order.

Related

Is it possible to work with 'for loop grep' commands?

I have lots of files in every year directory
and in each file have long and large sentence like this for exmaple
List item
home/2001/2001ab.txt
the AAAS kill every one not but me and you and etc
the A1CF maybe color of full fill zombie
home/2002/2002ab.txt
we maybe know some how what
home/2003/2003ab.txt
Mr, Miss boston, whatever
aaas will will will long long
and in home directory, I got home/reference.txt (list of word file)
A1BG
A1CF
A2M
AAAS
I'd like to do count how many word in the file reference.txt is in every single year file
this is my code where I run in every year directory
home/2001/, home/2002/, home/2003/
# awk
function search () {
awk -v pattern="$1" '$0 ~ pattern {print}' *.txt > $1
}
# load custom.txt
for i in $(cat reference.txt)
do
search $i
done
# word count
wc -l * > line-count.txt
this is my result
home/2001/A1BG
$cat A1BG
0
home/2001/A1CF
$cat A1CF
1
home/2001/A2M
$cat A2M
0
home/2001/AAAS
$cat AAAS
1
home/2001/line-count.txt
$cat line-count.txt
2021ab.txt 2
A1BG
A1CF 1
A2M 0
AAAS 1
result line-count.txt file have all information what I want
but I have to do this work repeat manually
do cd directory
do run my code
and then cd directory
I have around 500 directory and file, it is not easy
and second problem is wasty bunch of file
create lots of file and takes too much time
because of this at first I'd likt use grep command
but I dont' know how to use list of file instead of single word
that is why I use awk
How can i do it more simple
at first I'd likt use grep command but I dont' know how to use list of
file instead of single word
You might use --file=FILE option for that purpose, selected file should hold one pattern per line.
How can i do it more simple
You might use --count option to avoid need of using wc -l for that, consider following simple example, let file.txt content be
123
456
789
and file1.txt content be
abc123
def456
and file2.txt content be
ghi789
xyz000
and file3.txt content be
xyz000
xyz000
then
grep --count --file=file.txt file1.txt file2.txt file3.txt
gives output
file1.txt:2
file2.txt:1
file3.txt:0
Observe that no files are created and file without matches does appear in output. Disclaimer: this solution assumes file.txt does not contain character of special meaning for GNU grep, if this does not hold do not use this solution.
(tested in GNU grep 3.4)

diff: how to use '--ignore-matching-lines' option

I have two files:
$ cat xx
aaa
bbb
ccc
ddd
eee
$ cat zz
aaa
bbb
ccc
#ddd
eee
I want to diff them, while ignoring comments.
I tried all possible permutations, but nothing works:
diff --ignore-matching-lines='#' -u xx zz
diff --ignore-matching-lines='#.*' -u xx zz
diff --ignore-matching-lines='^#.*' -u xx zz
how can I diff two files, while ignoring given regex, such as anything starting with # ?
That not how the -I option in diff works, see this Giles's comment on Unix.SE and also on the man page - 1.4 Suppressing Differences Whose Lines All Match a Regular Expression
In short, the -I option works, if all the differences (insertions/deletions or changes) between the files match the RE defined. In your case, the diff between your two files, as seen in the output
diff f1 f2
4c4
< ddd
---
> #ddd
i.e. 4th line change in both the files, ddd and #ddd are the "hunks" as defined in the man page, together don't match any of your REs #, #.* or ^#.*. So when such an indifference exists, the action will be to print both the matching and the non-matching lines. Quoting the manual,
for each nonignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones.
The same would have worked better, if the file f1 did not contain the line ddd, i.e.
f1
aaa
bbb
ccc
eee
f2
aaa
bbb
ccc
#ddd
eee
where doing
diff f1 f2
3a4
> #ddd
would result in just one "hunk", #ddd which can be marked for ignoring with a pattern like ^# i.e. ignore any lines starting with a #, as you can see will produce the desired output (no lines)
diff -u -I '^#' f1 f2
So given your input contains the uncommented line ddd in f1, it will be not straightforward to define an RE to match a commented and an uncommented line. But diff does support including multiple -I flags as
diff -I '^#' -I 'ddd' f1 f2
but that cannot be valid, as you cannot know the exclude pattern beforehand to include in the ignore pattern.
As a workaround, you can simply ignore lines starting with # on either of the files, before passing it to diff i.e.
diff <(grep -v '^#' f1) <(grep -v '^#' f2)
4d3
< ddd

Extract unmatched files in directory using a text file

I have 100 files in a directory, and a text file that lists out 35 of these files.
####Directory
apple carrot orange pears bananas
###text file
apple
carrot
orange
I would like to use this text file that has filenames and compare in the directory to get unmatched filenames into a separate file. So it will be a file that lists out like below:
##unmatched text file
pears
bananas
I know to do this by using find if the search term was a particular string but could not figure out this
Assume that the text file contains a subset of the files in the directory. Also assume that the file is called list.txt and the directory is called dir1, then the following will work:
(cat list.txt; ls -1 dir1) | sort | uniq -u
Explanations
The command (cat list.txt; ls -1 dir1) starts a sub shell, executes the cat and the ls commands
The combined output is then sorted and uniq -u will picks out those that are unique (not duplicated)
I believe this is what you want. If that works, you can redirect into another file:
(cat list.txt; ls -1 dir1) | sort | uniq -u > list2.txt

Compare the 2 folders and find files which are different and move the different files alone to another folder using unix command

I have 2 folders AA and BB. AA contains 2 files:
1.txt
2.txt
BB contains 3 files:
1.txt
2.txt
3.txt
I need to move any files which are only in one of these directories to another directory CC using a Unix shell script.
In a real scenario I need to be able to handle lots of files.
This should move the files which are only in one of the directories (untested; it will only work if you have simple paths; that is, no whitespace or special characters):
for path in $(diff -qr AA BB | grep 'Only in' | sed -e 's/^Only in //;s/: /\//')
do
echo "$path" CC/
done
If this prints the right paths, replace echo with mv.

Transform Data with repeating attribute in each row to ARFF

I have a dataset as text file and data format is as follow,
ID: 1
Name: a
ID: 2
Name: b
ID: 3
Name: c
I want to convert this data format to be in arff format as follows
ID Name
1 a
2 b
3 c
Which tools should I use? It is a large dataset of 1GB with many rows. I got this dataset from snap.stadford.edu to practice Large data handling.
How about use the programming language of your choice?
The input format is text, the output format (arff) is also effectively text.
Why don't you write a program to convert the formats?
You can get the desired result with simple command line tools. If you have the data in one file called x.txt, use:
grep ID: x.txt | sed 's/^[^ ]\+ //' > a.txt
grep Name: x.txt | sed 's/^[^ ]\+ //' > b.txt
to get the data in two different files named a.txt and b.txt.
The files will have:
$ cat a.txt
1
2
3
$ cat b.txt
a
b
c
Then join the files with the paste command:
$ paste a.txt b.txt
1 a
2 b
3 c
This solution if very efficient, if the files are quite large, as you said.

Resources