Sort text file in a shell script [duplicate] - bash

This question already has answers here:
How to merge two files consistently line by line
(6 answers)
How can I extract a predetermined range of lines from a text file on Unix?
(28 answers)
Closed 2 years ago.
I've got a text file given, and the results and the counts vary (date, link and id can be anything). However, the count of dates, links and id's is always the same (so n - n - n for any positive integer n). Is n a positive integer, then the lines n, (n + k/3) and (n+2(k/3)), where k is the number of all lines, belong together.
As en example, I picked out n=3. So lines (1 | 4 | 7), (2 | 5 | 8) and (3 | 6 | 9) belong together:
Today, 17:09
Yesterday, 09:44
08.09.2020
07.09.2020
06.09.2020
/s-show/Link111...
/s-show/Link211...
/s-show/Link311...
/s-show/Link411...
/s-show/Link511...
id="1222222222"
id="2222222222"
id="3222222222"
id="4222222222"
id="5222222222"
I would like to sort the text file as the following:
id="1222222222"Today, 17:09/s-show/Link111...
id="2222222222"Yesterday, 09:44/s-show/Link211
id="3222222222"08.09.2020/s-show/Link311
id="4222222222"07.09.2020/s-show/Link411
id="5222222222"06.09.2020/s-show/Link511
In a former question, I only had two categories (date and link) and was adviced to do it like the following:
lc=$(wc -l <Textfile); paste -d '' <(head -n $((lc/2)) Textfile) <(tail -n
$((lc/2)) Textfile)
However, here I have 3 categories and the head and tail command won't let me read only the lines in the middle.
How could this be solved?

Leveraging the techniques taught in How can I extract a predetermined range of lines from a text file on Unix? --
#!/usr/bin/env bash
input=$1
total_lines=$(wc -l <"$1")
sections=$2
lines_per_section=$(( total_lines / sections ))
if (( lines_per_section * sections != total_lines )); then
echo "ERROR: ${total_lines} does not evenly divide into ${sections} sections" >&2
exit 1
fi
start=0
ranges=( )
for (( i=0; i<sections; i++ )); do
ranges+=( "$start:$(( start + lines_per_section ))" )
(( start += lines_per_section ))
done
get_range() { sed -n "$(( $1 + 1 )),$(( $2 ))p;$(( $2 + 1 ))q" <"$input"; }
consolidate_input() {
if (( $# )); then
current=$1; shift
paste <(get_range "${current%:*}" "${current#*:}") <(consolidate_input "$#")
fi
}
consolidate_input "${ranges[#]}"
But don't do that. Just put your three sections in three separate files, so you can use paste file1 file2 file3.

Related

How to sort 2 arrays in bash

I want to sort 2 arrays at the same time. The arrays are the following: wordArray and numArray. Both are global.
These 2 arrays contain all the words (without duplicates) and the number of the appearances of each word from a text file.
Right now I am using Bubble Sort to sort both of them at the same time:
# Bubble Sort function
function bubble_sort {
local max=${#numArray[#]}
size=${#numArray[#]}
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ "$i" != "$(($size-1))" ]
then
if [ ${numArray[$i]} \< ${numArray[$((i + 1))]} ]
then
local temp=${numArray[$i]}
numArray[$i]=${numArray[$((i + 1))]}
numArray[$((i + 1))]=$temp
local temp2=${wordArray[$i]}
wordArray[$i]=${wordArray[$((i + 1))]}
wordArray[$((i + 1))]=$temp2
fi
fi
((i += 1))
done
((max -= 1))
done
}
#Calling Bubble Sort function
bubble_sort "${numArray[#]}" "${wordArray[#]}"
But for some reason it won't sort them properly when large arrays are in place.
Does anyone knows what's wrong with it or an other approach to sort the words with the corresponding number of appearance with or without arrays?
This:
wordArray = (because, maybe, why, the)
numArray = (5, 12, 20, 13)
Must turn to this:
wordArray = (why, the, maybe, because)
numArray = (20, 13, 12, 5)
Someone recommended to write the two arrays side by side in a text file and sort the file.
How will it work for this input:
1 Arthur
21 Zebra
to turn to this output:
21 Zebra
1 Arthur
Assuming the arrays do not contain tab character or newline character, how about:
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
tmp1=$(mktemp tmp.XXXXXX) # file to be sorted
tmp2=$(mktemp tmp.XXXXXX) # sorted result
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}" # write the number and word delimited by a tab character
done > "$tmp1"
sort -nrk1,1 "$tmp1" > "$tmp2" # sort the file by number in descending order
while IFS=$'\t' read -r num word; do # read the lines splitting by the tab character
numArray_sorted+=("$num") # add the number to the array
wordArray_sorted+=("$word") # add the word to the array
done < "$tmp2"
rm -- "$tmp1" # unlink the temp file
rm -- "$tmp2" # same as above
echo "${wordArray_sorted[#]}" # same as above
echo "${numArray_sorted[#]}" # see the result
Output:
why the maybe because
20 13 12 5
If you prefer not to create temp files, here is the process substitution version, which will run faster without writing/reading temp files.
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
sort -nrk1,1 < <(
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}"
done
)
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"
Or simpler (using the suggestion by KamilCuk):
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
paste <(printf "%s\n" "${numArray[#]}") <(printf "%s\n" "${wordArray[#]}") | sort -nrk1,1
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"
You need numeric sort for the numbers. You can sort an array like this:
mapfile -t wordArray <(printf '%s\n' "${wordArray[#]}" | sort -n)
But what you actually need is something like:
for num in "${numArray[#]}"; do
echo "$num: ${wordArray[j++]}"
done |
sort -n k1,1
But, earlier in the process, you should have used only one array, where the word and frequency (or vice versa) are key value pairs. Then they always have a direct relationship, and can be printed similarly to the for loop above.

Bash - creating multiple lists files from one file list

The directory contains x files. I get a list of files. I want to split this list into a larger number of n lists, which would have a limited number of elements.
Examples:
files=$( ls -d /*.csv | sort )
echo $files
/100347_111111.csv
/111301_111111.csv
/111301_222222.csv
/256467_111111.csv
/256467_222222.csv
/256467_333333.csv
/256467_444444.csv
/256467_555555.csv
/256467_666666.csv
/256467_777777.csv
From the resulting list I want to create 3 lists. The lists must not have more than 4 elements. The first list should be composed of the first 4 elements from the files, the other list should contain the following 4 elements, the third list should contain the remaining elements.
n1
/100347_111111.csv
/111301_111111.csv
/111301_222222.csv
/256467_111111.csv
n2
/256467_222222.csv
/256467_333333.csv
/256467_444444.csv
/256467_555555.csv
n3
/256467_666666.csv
/256467_777777.csv
Does someone can help, how to create lists as described above?
FILES=( `ls -d * | sort`)
echo "${FILES[#]:0:4}"
Loop of 4
count=4
for i in $(seq 0 $(( ${#FILES[#]}/$count - 1 ))) ;
do
echo "######## Set" $i "#######";
echo "${FILES[#]:$(( i * $count )):$count }" ;
done
An example which may be reinventing the wheel:
\ls -1 |
{
n=0
cr=""
pack=1
while read -r l
do
mod=$(($n % 4))
if [[ "$mod" == "0" ]]
then
echo -e "$cr"n"$pack:"
fi
echo $l
n=$((n + 1))
pack=$((pack + 1))
cr="\n";
done
}
Here, we use the modulo operator to check if a new pack is about to be displayed (n modulo 4 = 0 if n is a multiple of 4).
I used curly brackets {} to put var initialization and the while loop in the same environment (other wise while won't be able to retrieve n, pack and cr variables).
Try split:
ls -d /*.csv | sort | split -l 4 -d
this will create files x01 x02... containing maximum 4 lines.

How to nest loops correctly

I have 2 scripts, #1 and #2. Each work OK by themselves. I want to read a 15 row file, row by row, and process it. Script #2 selects rows. Row 0 is is indicated as firstline=0, lastline=1. Row 14 would be firstline=14, lastline=15. I see good results from echo. I want to do the same with script #1. Can't get my head around nesting correctly. Code below.
#!/bin/bash
# script 1
filename=slash
firstline=0
lastline=1
i=0
exec <${filename}
while read ; do
i=$(( $i + 1 ))
if [ "$i" -ge "${firstline}" ] ; then
if [ "$i" -gt "${lastline}" ] ; then
break
else
echo "${REPLY}" > slash1
fold -w 21 -s slash1 > news1
sleep 5
fi
fi
done
# script2
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
for ((i=0;i<${#firstline[#]};i++))
do
echo ${firstline[$i]} ${lastline[$i]};
done
Your question is very unclear, but perhaps you are simply looking for some simple function calls:
#!/bin/bash
script_1() {
filename=slash
firstline=$1
lastline=$2
i=0
exec <${filename}
while read ; do
i=$(( $i + 1 ))
if [ "$i" -ge "${firstline}" ] ; then
if [ "$i" -gt "${lastline}" ] ; then
break
else
echo "${REPLY}" > slash1
fold -w 21 -s slash1 > news1
sleep 5
fi
fi
done
}
# script2
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
for ((i=0;i<${#firstline[#]};i++))
do
script_1 ${firstline[$i]} ${lastline[$i]};
done
Note that reading the file this way is extremely inefficient, and there are undoubtedly better ways to handle this, but I am trying to minimize the changes from your code.
Update: Based on your later comments, the following idiomatic Bash code that uses sed to extract the line of interest in each iteration solves your problem much more simply:
Note:
- If the input file does not change between loop iterations, and the input file is small enough (as it is in the case at hand), it's more efficient to buffer the file contents in a variable up front, as is demonstrated in the original answer below.
- As tripleee points out in a comment: If simply reading the input lines sequentially is sufficient (as opposed to extracting lines by specific line numbers, then a single, simple while read -r line; do ... # fold and output, then sleep ... done < "$filename" is enough.
# Determine the input filename.
filename='slash'
# Count its number of lines.
lineCount=$(wc -l < "$filename")
# Loop over the line numbers of the file.
for (( lineNum = 1; lineNum <= lineCount; ++lineNum )); do
# Use `sed` to extract the line with the line number at hand,
# reformat it, and output to the target file.
fold -w 21 -s <(sed -n "$lineNum {p;q;}" "$filename") > 'news1'
sleep 5
done
A simplified version of what I think you're trying to achieve:
#!/bin/bash
# Split fields by newlines on input,
# and separate array items by newlines on output.
IFS=$'\n'
# Read all input lines up front, into array ${lines[#]}
# In terms of your code, you'd use
# read -d '' -ra lines < "$filename"
read -d '' -ra lines <<<$'line 1\nline 2\nline 3\nline 4\nline 5\nline 6\nline 7\nline 8\nline 9\nline 10\nline 11\nline 12\nline 13\nline 14\nline 15'
# Define the arrays specifying the line ranges to select.
firstline=(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14)
lastline=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
# Loop over the ranges and select a range of lines in each iteration.
for ((i=0; i<${#firstline[#]}; i++)); do
extractedLines="${lines[*]: ${firstline[i]}: 1 + ${lastline[i]} - ${firstline[i]}}"
# Process the extracted lines.
# In terms of your code, the `> slash1` and `fold ...` commands would go here.
echo "$extractedLines"
echo '------'
done
Note:
The name of the array variable filled with read -ra is lines; ${lines[#]} is Bash syntax for returning all array elements as separate words (${lines[*]} also refers to all elements, but with slightly different semantics), and this syntax is used in the comments to illustrate that lines is indeed an array variable (note that if you were to use simply $lines to reference the variable, you'd implicitly get only the item with index 0, which is the same as: ${lines[0]}.
<<<$'line 1\n...' uses a here-string (<<<) to read an ad-hoc sample document (expressed as an ANSI C-quoted string ($'...')) in the interest of making my example code self-contained.
As stated in the comment, you'd read from $filename instead:
read -d '' -ra lines <"$filename"
extractedLines="${lines[*]: ${firstline[i]}: 1 + ${lastline[i]} - ${firstline[i]}}" extracts the lines of interest; ${firstline[i]} references the current element (index i) from array ${firstline[#]}; since the last token in Bash's array-slicing syntax
(${lines[*]: <startIndex>: <elementCount>}) is the count of elements to return, we must perform a calculation to determine the count, which is what 1 + ${lastline[i]} - ${firstline[i]} does.
By virtue of using "${lines[*]...}" rather than "${lines[#]...}", the extracted array elements are joined by the first character in $IFS, which in our case is a newline ($'\n') (when extracting a single line, that doesn't really matter).

How to sum a row of numbers from text file-- Bash Shell Scripting

I'm trying to write a bash script that calculates the average of numbers by rows and columns. An example of a text file that I'm reading in is:
1 2 3 4 5
4 6 7 8 0
There is an unknown number of rows and unknown number of columns. Currently, I'm just trying to sum each row with a while loop. The desired output is:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
And so on and so forth with each row. Currently this is the code I have:
while read i
do
echo "num: $i"
(( sum=$sum+$i ))
echo "sum: $sum"
done < $2
To call the program it's stats -r test_file. "-r" indicates rows--I haven't started columns quite yet. My current code actually just takes the first number of each column and adds them together and then the rest of the numbers error out as a syntax error. It says the error comes from like 16, which is the (( sum=$sum+$i )) line but I honestly can't figure out what the problem is. I should tell you I'm extremely new to bash scripting and I have googled and searched high and low for the answer for this and can't find it. Any help is greatly appreciated.
You are reading the file line by line, and summing line is not an arithmetic operation. Try this:
while read i
do
sum=0
for num in $i
do
sum=$(($sum + $num))
done
echo "$i Sum: $sum"
done < $2
just split each number from every line using for loop. I hope this helps.
Another non bash way (con: OP asked for bash, pro: does not depend on bashisms, works with floats).
awk '{c=0;for(i=1;i<=NF;++i){c+=$i};print $0, "Sum:", c}'
Another way (not a pure bash):
while read line
do
sum=$(sed 's/[ ]\+/+/g' <<< "$line" | bc -q)
echo "$line Sum = $sum"
done < filename
Using the numsum -r util covers the row addition, but the output format needs a little glue, by inefficiently paste-ing a few utils:
paste "$2" \
<(yes "Sum =" | head -$(wc -l < "$2") ) \
<(numsum -r "$2")
Output:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
Note -- to run the above line on a given file foo, first initialize $2 like so:
set -- "" foo
paste "$2" <(yes "Sum =" | head -$(wc -l < "$2") ) <(numsum -r "$2")

Bash for loop - naming after n (which is user's input) [duplicate]

This question already has answers here:
How do I iterate over a range of numbers defined by variables in Bash?
(20 answers)
Closed 10 years ago.
I am looping over the commands with for i in {1..n} loop and want output files to have n extension.
For example:
for i in {1..2}
do cat FILE > ${i}_output
done
However n is user's input:
echo 'Please enter n'
read number
for i in {1.."$number"}
do
commands > ${i}_output
done
Loop rolls over n times - this works fine, but my output looks like this {1..n}_output.
How can I name my files in such loop?
Edit
Also tried this
for i in {1.."$number"}
do
k=`echo ${n} | tr -d '}' | cut -d "." -f 3`
commands > ${k}_output
done
But it's not working.
Use a "C-style" for-loop:
echo 'Please enter n'
read number
for ((i = 1; i <= number; i++))
do
commands > ${i}_output
done
Note that the $ is not required ahead of number or i in the for-loop header but double-parentheses are required.
The range parameter in for loop works only with constant values. So replace {1..$num} with a value like: {1..10}.
OR
Change the for loop to:
for((i=1;i<=number;i++))
You can use a simple for loop ( similar to the ones found in langaues like C, C++ etc):
echo 'Please enter n'
read number
for (( i=1; i <= $number; i++ ))
do
commands > ${i}_output
done
Try using seq (1) instead. As in for i in $(seq 1 $number).

Resources