How to pass multiple files with variable filenames to the paste command

How to pass multiple files with variable filenames to the paste command - bash

Usually when we wish to concatenate several files column wise and the filenames of the files are just consecutive increasing integers we can do the following:
#Imagine I have 10 files
paste {1..10} > out
However, I'm currently working on a script in which the ranges are variables, so I want to be able to do something like this
first=1
last=10
paste {"${first}".."${last}"} > out
This doesn't work as variables can't be correctly expanded within the curly braces. Is there an alternative syntax I can use to achieve the same result?

If you don't want to use eval, you can use seq(1):
seq -s ' ' "$first" "$last"
Like so:
paste $(seq -s ' ' "$first" "$last") > out

Once upon a time a needed a seq like function but a way faster, so i made this
# Create sequence like {0..X}
cnt () { printf -v N %$1s; N=(${N// / 1}); printf "${!N[*]}"; }
$ cnt 5
0 1 2 3 4
And if we modify it a bit
# Create sequence like {X..Y}
cnt () { printf -v N %$2s; N=(${N// / 1}); N=(${!N[#]}); printf "${N[*]:$1} ${#N[#]}"; }
$ cnt 7 11
7 8 9 10 11

Related

Convert range to string

If I run the
echo {0..9}
command, then I get the following output:
0 1 2 3 4 5 6 7 8 9
Can I somehow put the string "0 1 2 3 4 5 6 7 8 9" into a variable inside bash script? I only found a way using echo:
x=`echo {0..9}`
But this method implies the execution of an external program. Is it possible to somehow manage only with bash?
Interested, rather than a way to convert a range to a string, but additionally concatenate with a string, for example:
datafiles=`echo data{0..9}.txt`

First of all,
x=`echo {0..9}`
doesn't call an external program (echo is a built-in) but creates a subshell. If it isn't desired you can use printf (a built-in as well) with -v option:
printf -v x ' %s' {0..9}
x=${x:1} # strip off the leading space
or
printf -v datafiles ' data%s.txt' {0..9}
datafiles=${datafiles:1}
or you may want storing them in an array:
datafiles=(data{0..9}.txt)
echo "${datafiles[#]}"
This last method will work correctly even if filenames contain whitespace characters:
datafiles=(data\ {0..9}\ .txt)
printf '%s\n' "${datafiles[#]}"

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?

Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>

For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).

Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed

don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>

To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

Paste hundreds of file with specific pattern name in bash/awk/c

I have 500 files and I want to merge them by adding columns.
My first file
3
4
1
5
My second file
7
1
4
2
Output should look like
3 7
4 1
1 4
5 2
But I have 500 files (sum_1.txt, sum_501.txt until sum_249501.txt), so I must have 500 column, so It will be very frustrating to write 500 file names.
Is it possible to do this easier? I try this, but it not makes 500 columns, but instead it makes a lot of rows
#!/bin/bash
file_name="sum"
tmp=$(mktemp) || exit 1
touch ${file_name}_calosc.txt
for first in {1..249501..500}
do
paste -d ${file_name}_calosc.txt ${file_name}_$first.txt >> ${file_name}_calosc.txt
done

Something like this (untested) should work regardless of how many files you have:
awk '
BEGIN {
for (i=1; i<=249501; i+=500) {
ARGV[ARGC++] = "sum_" i
}
}
{ vals[FNR] = (NR==FNR ? "" : vals[FNR] OFS) $0 }
END {
for (i=1; i<=FNR; i++) {
print vals[i]
}
}
'
It'd only fail if the total content of all the files was too big to fit in memory.

Your command says to paste two files together; to paste more files, give more files as arguments to paste.
You can paste a number of files together like
paste sum_{1..249501..500}_calosc.txt > sum_calosc.txt
but if the number of files is too large for paste, or the resulting command line is too long, you may still have to resort to temporary files.
Here's an attempt to paste 25 files at a time, then combine the resulting 20 files in a final big paste.
#!/bin/bash
d=$(mktemp -d -t pastemanyXXXXXXXXXXX) || exit
# Clean up when done
trap 'rm -rf "$d"; exit' ERR EXIT
for ((i=1; i<= 249501; i+=500*25)); do
printf -v dest "paste%06i.txt" "$i"
for ((j=1, k=i; j<=500; j++, k++)); do
printf "sum_%i.txt\n" "$k"
done |
xargs paste >"$d/$dest"
done
paste "$d"/* >sum_calosc.txt
The function of xargs is to combine its inputs into a single command line (or more than one if it would otherwise be too long; but we are specifically trying to avoid that here, because we want to control exactly how many files we pass to paste).

Detect if a series of numbers is sequential in bash/awk

So I have a series of scripts that generate intermediary text files along the way as a means of storing information across different scripts. Essentially the scripts detect rows within data that have been approved by the user for removal. The line numbers that are to be removed from the source file are stored in a file.
For example, say I have a source data file like this:
a1,b1,c1,d1
a2,b2,c2,d2
a3,b3,c3,d3
a4,b4,c4,d4
a5,b5,c5,d5
a6,b6,c6,d6
a7,b7,c7,d7
And the intermediary file would contain something like this:
1 3 4 5 6
Which would result, when the script is run, in an output data file as follows:
a2,b2,c2,d2
a7,b7,c7,d7
This all works fine, there is nothing to fix in this code. The problem is, when I'm dealing with actual data files sometimes there are literally thousands of numbers stored in the intermediary file for removal. This means I can't use a loop, because it will take a massive amount of time, and my current method of using sed gets overloaded with a error: too many arguments. Many of the line numbers are consecutive, so here's where I get to my question:
Is there a way in bash or awk to detect whether a series of space-separated numbers are consecutive?
I can sort out everything beyond that, I'm just stumped on how I could do this in one/a series of step(s). My plan, if I can detect consecutive values, is to change the intermediary file from:
1 3 4 5 6
To:
1 3-6
And then I'll be able to write code that will run on each range of values in a more manageable way.
If possible I'd like to avoid looping through each value and checking individually whether or not it's one step above the previous value, since I'm dealing with tens of thousands of numbers in a list.
If this is not possible in bash/awk, is there another way to accomplish this task to reduce the overall number of arguments passed to my script and greatly reduce the chances of encountering an error for too many arguments?

What about this?
BEGIN {
getline < "intermediate.txt"
split($0, skippedlines, " ")
skipindex = 1
}
{
if (skippedlines[skipindex] == NR)
++skipindex;
else
print
}

Use cat, join, and cut:
Files infile and ids:
a1,b1,c1,d1 1
a2,b2,c2,d2 3
a3,b3,c3,d3 4
a4,b4,c4,d4 5
a5,b5,c5,d5 6
a6,b6,c6,d6
a7,b7,c7,d7
Removal of selected lines:
$ join -v 2 ids <(cat -n infile) | cut -f 2 -d ' '
a2,b2,c2,d2
a7,b7,c7,d7
What's going on:
First, the initial file receives an id on each line, with cat -n infile;
then, the resulting file is joined on the first column with the file holding the ids;
only non-matching lines from second file are printed -- join -v 2;
the first column, with the ids, is removed;
and, it's a neat shell one-liner (:
In case your file with ids is written as an unique line, you can still make use of the above one-liner, simply adding a translation on the file with ids, as follows:
$ join -v 2 <(tr ' ' '\n' ids) <(cat -n infile) | cut -f 2 -d ' '

#jmihalicza's answer nicely uses awk to solve the whole problem of selecting the lines from source file that match those in the intermediate file. For completeness, the following awk program reduces the list of individual line numbers to ranges, where possible, which I think answers the original question:
{ for (j = 1; j <= NF; j++) {
lin[i++] = $j;
}
}
END {
start = lin[0];
j = 1;
while (j <= i) {
end = start
while (lin[j] == (lin[j-1]+1)) {
end = lin[j++];
}
if ((end+0) > (start+0)) {
printf "%d-%d ",start,end
} else {
printf "%d ",start
}
start = lin[j++];
}
}
Given this script, which I've called merge.awk and a file testlin.txt as follows:
1 3 4 5 6 9 10 11 13 15
... we can do this:
$ awk -f merge.awk <testlin.txt
1 3-6 9-11 13 15

This might work for you (GNU sed):
sed -r 's/\S+/&d/g;s/\s+/\n/g' intermediate_file | sed -f - source_file
Change the intermediate file into a sed script.

KornShell Script: List all even numbers in a range

What I am trying to do is list all the numbers that are even, between the two numbers the user enters via a KornShell (ksh) script. So if user enters for the first digit 2 then the second digit 25 it would display
2,4,6,8,10,12,14,16,18,20,22,24

first=2 # from user
last=25 # from user
seq $first 2 $last

This should work with ksh93 and bash, doesn't require seq or perl which might not be installed depending on the OS used.
function evens {
for((i=($1+($1%2));i<($2-3);i+=2));do printf "%s," $i;done
echo $((i+2))
}
$ evens 2 25
2,4,6,8,10,12,14,16,18,20,24
$ evens 3 24
4,6,8,10,12,14,16,18,20,24
$ evens 0 9
0,2,4,8

In ksh, assuming you have used variables start and end:
set -A evens # use an array to store the numbers
n=0
i=$start
(( i % 2 == 1 )) && (( i+=1 )) # start at an even number
while (( i <= end )); do
evens[n]=$i
(( n+=1 ))
(( i+=2 ))
done
IFS=,
echo "${evens[*]}" # output comma separated string
outputs
2,4,6,8,10,12,14,16,18,20,22,24

there are many ways to do it in shell, shell script, awk, seq etc...
since you tagged question with vi, I added one with vim:
fun! GetEven(f,t)
let ff=a:f%2?a:f+1:a:f
echom join(range(ff,a:t,2),",")
endf
source that function, and type :call GetEven(2,25) you will see your expected output.
It currently echoes in command area, if you want it to be shown in file, just use put or setline, easy too.

Using perl:
perl -e 'print join q{,}, grep { $_ % 2 == 0 } (shift .. shift)' 2 25
It yields:
2,4,6,8,10,12,14,16,18,20,22,24
EDIT to fix the trailing newline:
perl -e 'print join( q{,}, grep { $_ % 2 == 0 } (shift .. shift) ), "\n"' 2 25

By setting first=$(($1+($1%2))) and using the -s option to format the output you can use seq:
first=$(($1+($1%2)))
last=$2
seq -s, $first 2 $last
Save as a script called evens and call with even values of $first:
$ ./evens 2 25
2,4,6,8,10,12,14,16,18,20,22,24
Or odd values of $first:
$ ./evens 3 25
4,6,8,10,12,14,16,18,20,22,24

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to pass multiple files with variable filenames to the paste command - bash

If you don't want to use eval, you can use seq(1): seq -s ' ' "$first" "$last" Like so: paste $(seq -s ' ' "$first" "$last") > out

Related

Convert range to string

Is there a command for substituting a set of characters by a set of strings?

Paste hundreds of file with specific pattern name in bash/awk/c

Detect if a series of numbers is sequential in bash/awk

KornShell Script: List all even numbers in a range

Categories

Resources