concatenate files with similar names using shell - shell

I have very limited knowledge of shell scripting, for example if I have the following files in a folder
abcd_1_1.txt
abcd_1_2.txt
def_2_1.txt
def_2_2.txt
I want the output as abcd_1.txt, def_2.txt. For each pattern in the file names, concantenate the files and generate the 'pattern'.txt as an output
patterns list <-?
for i in patterns; do echo cat "$i"* > "$i".txt; done
I am not sure how to code this in a shell script, any help is appreciated.

Maybe something like this (assumes bash, and I didn't test it).
declare -A prefix
files=(*.txt)
for f in "${files[#]"; do
prefix[${f%_*}]=
done
for key in "${!prefix[#]}"; do
echo "${prefix[$key]}.txt"
done

for i in abcd_1 def_2
do
cat "$i"*.txt > "$i".txt
done
The above will work in any POSIX shell, such as dash or bash.
If, for some reason, you want to maintain a list of patterns and then loop through them, then it is appropriate to use an array:
#!/bin/bash
patterns=(abcd_1 def_2)
for i in "${patterns[#]}"
do
cat "$i"*.txt > "$i".txt
done
Arrays require an advanced shell such as bash.
Related Issue: File Order
Does it the order in which files are added to abcd_1 or def_2 matter to you? The * will result is lexical ordering. This can conflict with numeric ordering. For example:
$ echo def_2_*.txt
def_2_10.txt def_2_11.txt def_2_12.txt def_2_1.txt def_2_2.txt def_2_3.txt def_2_4.txt def_2_5.txt def_2_6.txt def_2_7.txt def_2_8.txt def_2_9.txt
Observe that def_2_12.txt appears in the list ahead of def_2_1.txt. Is this a problem? If so, we can explicitly force numeric ordering. One method to do this is bash's brace expansion:
$ echo def_2_{1..12}.txt
def_2_1.txt def_2_2.txt def_2_3.txt def_2_4.txt def_2_5.txt def_2_6.txt def_2_7.txt def_2_8.txt def_2_9.txt def_2_10.txt def_2_11.txt def_2_12.txt
In the above, the files are numerically ordered.

Related

BASH Shell Find Multiple Files with Wildcard and Perform Loop with Action

I have a script that I call with an application, I can't run it from command line. I derive the directory where the script is called and in the next variable go up 1 level where my files are stored. From there I have 3 variables with the full path and file names (with wildcard), which I will refer to as "masks".
I need to find and "do something with" (copy/write their names to a new file, whatever else) to each of these masks. The do something part isn't my obstacle as I've done this fine when I'm working with a single mask, but I would like to do it cleanly in a single loop instead of duplicating loop and just referencing each mask separately if possible.
Assume in my $FILESFOLDER directory below that I have 2 existing files, aaa0.csv & bbb0.csv, but no file matching the ccc*.csv mask.
#!/bin/bash
SCRIPTFOLDER=${0%/*}
FILESFOLDER="$(dirname "$SCRIPTFOLDER")"
ARCHIVEFOLDER="$FILESFOLDER"/archive
LOGFILE="$SCRIPTFOLDER"/log.txt
FILES1="$FILESFOLDER"/"aaa*.csv"
FILES2="$FILESFOLDER"/"bbb*.csv"
FILES3="$FILESFOLDER"/"ccc*.csv"
ALLFILES="$FILES1
$FILES2
$FILES3"
#here as an example I would like to do a loop through $ALLFILES and copy anything that matches to $ARCHIVEFOLDER.
for f in $ALLFILES; do
cp -v "$f" "$ARCHIVEFOLDER" > "$LOGFILE"
done
echo "$ALLFILES" >> "$LOGFILE"
The thing that really spins my head is when I run something like this (I haven't done it with the copy command in place) that log file at the end shows:
filesfolder/aaa0.csv filesfolder/bbb0.csv filesfolder/ccc*.csv
Where I would expect echoing $ALLFILES just to show me the masks
filesfolder/aaa*.csv filesfolder/bbb*.csv filesfolder/ccc*.csv
In my "do something" area, I need to be able to use whatever method to find the files by their full path/name with the wildcard if at all possible. Sometimes my network is down for maintenance and I don't want to risk failing a change directory. I rarely work in linux (primarily SQL background) so feel free to poke holes in everything I've done wrong. Thanks in advance!
Here's a light refactoring with significantly fewer distracting variables.
#!/bin/bash
script=${0%/*}
folder="$(dirname "$script")"
archive="$folder"/archive
log="$folder"/log.txt # you would certainly want this in the folder, not $script/log.txt
shopt -s nullglob
all=()
for prefix in aaa bbb ccc; do
cp -v "$folder/$prefix"*.csv "$archive" >>"$log" # append, don't overwrite
all+=("$folder/$prefix"*.csv)
done
echo "${all[#]}" >> "$log"
The change in the loop to append the output or cp -v instead of overwrite is a bug fix; otherwise the log would only contain the output from the last loop iteration.
I would probably prefer to have the files echoed from inside the loop as well, one per line, instead of collect them all on one humongous line. Then you can remove the array all and instead simply
printf '%s\n' "$folder/$prefix"*.csv >>"$log"
shopt -s nullglob is a Bash extension (so won't work with sh) which says to discard any wildcard which doesn't match any files (the default behavior is to leave globs unexpanded if they don't match anything). If you want a different solution, perhaps see Test whether a glob has any matches in Bash
You should use lower case for your private variables so I changed that, too. Notice also how the script variable doesn't actually contain a folder name (or "directory" as we adults prefer to call it); fixing that uncovered a bug in your attempt.
If your wildcards are more complex, you might want to create an array for each pattern.
tmpspaces=(/tmp/*\ *)
homequest=($HOME/*\?*)
for file in "${tmpspaces[#]}" "${homequest[#]}"; do
: stuff with "$file", with proper quoting
done
The only robust way to handle file names which could contain shell metacharacters is to use an array variable; using string variables for file names is notoriously brittle.
Perhaps see also https://mywiki.wooledge.org/BashFAQ/020

How to place files containing increasing numeric names consecutively in the terminal

I have certain files named something like file_1.txt, file_2.txt, ..., file_40.txt and I want to plot them in the terminal using xmgrace like this:
xmgrace file_01.txt file_02.txt [...] file_40.txt
What would be a bash code, maybe a for loop code so that I don't have to write them one by one from 1 to 40, please?
[Edit:]
I should mention that I tried to use the for loop as follows: for i in {00-40}; do xmgrace file_$i.txt; done, but it didn't help as it opens each file separately.
Depending of the tool you use:
xmlgrace file_*.txt
using a glob (this will treat all files matching the pattern)
or as Jetchisel wrote in comments:
xmlgrace file_{1..40}.txt
This is brace expansion
For general purpose, if the tool require a loop:
for i in {1..40}; do something "$i"; done
or
for ((i=0; i<=40; i++)); do something "$i"; done

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its nameā€”such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

For loop in shell script - colons and hash marks?

I am trying to make heads or tails of a shell script. Could someone please explain this line?
$FILEDIR is a directory containing files. F is a marker in an array of files that is returned from this command:
files=$( find $FILEDIR -type f | grep -v .rpmsave\$ | grep -v .swp\$ )
The confusing line is within a for loop.
for f in $files; do
target=${f:${#FILEDIR}}
<<do some more stuff>>
done
I've never seen the colon, and the hash before in a shell script for loop. I haven't been able to find any documentation on them... could someone try and enlighten me? I'd appreciate it.
There are no arrays involved here. POSIX sh doesn't have arrays (assuming you're not using another shell based upon the tags).
The colon indicates a Bash/Ksh substring expansion. These are also not POSIX. The # prefix expands to the number of characters in the parameter. I imagine they intended to chop off the directory part and assign it to target.
To explain the rest of that: first find is run and hilariously piped into two greps which do what could have been done with find alone (except breaking on possible filenames containing newlines), and the output saved into files. This is also something that can't really be done correctly if restricted only to POSIX tools, but there are better ways.
Next, files is expanded unquoted and mutalated by the shell in more ridiculous ways for the for loop to iterate over the meaningless results. If the rest of the script is this bad, probably throw it out and start over. There's no way that will do what's expected.
The colon can be as a substring. So:
A=abcdefg
echo ${A:4}
will print the output:
efg
I'm not sure why they would use a file directory as the 2nd parameter though...
If you are having problems understanding the for loop section, try http://www.dreamsyssoft.com/unix-shell-scripting/loop-tutorial.php

Tricky brace expansion in shell

When using a POSIX shell, the following
touch {quick,man,strong}ly
expands to
touch quickly manly strongly
Which will touch the files quickly, manly, and strongly, but is it possible to dynamically create the expansion? For example, the following illustrates what I want to do, but does not work because of the order of expansion:
TEST=quick,man,strong #possibly output from a program
echo {$TEST}ly
Is there any way to achieve this? I do not mind constricting myself to Bash if need be. I would also like to avoid loops. The expansion should be given as complete arguments to any arbitrary program (i.e. the program cannot be called once for each file, it can only be called once for all files). I know about xargs but I'm hoping it can all be done from the shell somehow.
... There is so much wrong with using eval. What you're asking is only possible with eval, BUT what you might want is easily possible without having to resort to bash bug-central.
Use arrays! Whenever you need to keep multiple items in one datatype, you need (or, should use) an array.
TEST=(quick man strong)
touch "${TEST[#]/%/ly}"
That does exactly what you want without the thousand bugs and security issues introduced and concealed in the other suggestions here.
The way it works is:
"${foo[#]}": Expands the array named foo by expanding each of its elements, properly quoted. Don't forget the quotes!
${foo/a/b}: This is a type of parameter expansion that replaces the first a in foo's expansion by a b. In this type of expansion you can use % to signify the end of the expanded value, sort of like $ in regular expressions.
Put all that together and "${foo[#]/%/ly}" will expand each element of foo, properly quote it as a separate argument, and replace each element's end by ly.
In bash, you can do this:
#!/bin/bash
TEST=quick,man,strong
eval echo $(echo {$TEST}ly)
#eval touch $(echo {$TEST}ly)
That last line is commented out but will touch the specified files.
Zsh can easily do that:
TEST=quick,man,strong
print ${(s:,:)^TEST}ly
Variable content is splitted at commas, then each element is distributed to the string around the braces:
quickly manly strongly
Taking inspiration from the answers above:
$ TEST=quick,man,strong
$ touch $(eval echo {$TEST}ly)

Resources