I have this kind of file :
abak 1 2 3 4
b.b 2 3 4 5
abak 2 5 6 2
b.b -1.2 3 4 6
cc 3 4 5 6
And I want
abak 1 2 3 4
b.b -1.2 3 4 6
cc 3 4 5 6
A sorted by column 2 file with only the min value for the column
As a first step I tried to sort the lines with :
set file [open "[lindex $argv 0]" "r"]
foreach line [split [read $file] "\n"] {
lappend records [split $line " "]
}
set records [lsort -index 1 -real $records]
foreach record $records {
puts [join $record " "]
}
}
but i go the error :
expected floating-point number but got ""
while executing
"lsort -index 1 -real $records"
column 2 have not all floating number, but it's a real number;
Why it cannot work ?
Thanks
This is very much a question about creating and manipulating a data structure. This is how I would approach it:
set fid [open filename r]
set data [dict create]
while {[gets $fid line] != -1} {
set fields [regexp -inline -all {\S+} $line]
dict lappend data [lindex $fields 0] [lrange $fields 1 end]
}
dict for {key values} $data {
puts [format "%-5s %s" $key [lindex [lsort -real -index 0 $values] 0]]
}
outputs
abak 1 2 3 4
b.b -1.2 3 4 6
cc 3 4 5 6
Your key problem is that split is not the right way to extract those records: it converts multi-space sequences into empty elements. Instead, you want to use this:
lappend records [regexp -all -inline {\S+} $line]
That will convert the line into its list of non-space sequences. (Yes, you lose the spaces when you reconvert; that's usually not too big a problem but you can handle it if you need to.) The rest of your code looks fine enough.
Related
hello the StackOverflow i wanted to ask you how to sum the odd numbers in every line from input file.txt
the input.txt file looks like that
4 1 8 3 7
2 5 8 2 7
4 7 2 5 2
0 2 5 3 5
3 6 3 1 6
the output must be
11
12
12
13
7
start of the code like this
read -p "Enter file name:" filename
while read line
do
...
my code whats the wrong here
#!/bin/sh
read -p "Enter file name:" filename
while read line
do
sum = 0
if ($_ % 2 -nq 0){
sum = sum + $_
}
echo $sum
sum = 0
done <$filename
The logic seems correct in your question, so I'll go with you're not sure how to do this line by line as stated on your comment.
if ($_ % 2 -nq 0){
sum = sum + $_
}
I think it's a good place for a function in this case. Takes a string containing integers as input and returns the sum of all odd numbers on that string or -1 assuming there are no integers or all even numbers.
function Sum-OddNumbers {
[cmdletbinding()]
param(
[parameter(mandatory,ValueFromPipeline)]
[string]$Line
)
process {
[regex]::Matches($Line,'\d+').Value.ForEach{
begin {
$result = 0
}
process {
if($_ % 2) {
$result += $_
}
}
end {
if(-not $result) {
return -1
}
return $result
}
}
}
}
Usage
#'
4 1 8 3 7
2 5 8 2 7
4 7 2 5 2
0 2 5 3 5
3 6 3 1 6
2 4 6 8 10
asd asd asd
'# -split '\r?\n' | Sum-OddNumbers
Result
11
12
12
13
7
-1
-1
If that's how your txt file is set up, you can use Get-Content and a bit of logic to accomplish this.
Get-Content will read the file line by line (unless -Raw is specified), which we can pipe to a Foreach-Object to have the current line in the iteration split by the white space.
Then, we can evaluate the newly formed array (due to splitting the white space, leaving the numbers to create an array).
Finally, just get the sum of the odd numbers.
Get-Content -Path .\input.txt | ForEach-Object {
# Split the current line into an array of just #'s
$OddNumbers = $_.Split(' ').Trim() | Foreach {
if ($_ % 2 -eq 1) { $_ } # odd number filter
}
# Add the filtered results
($OddNumbers | Measure-Object -Sum).Sum
}
"what's wrong here":
while read line
do
sum = 0
if ($_ % 2 -nq 0){
sum = sum + $_
}
echo $sum
sum = 0
done <$filename
First, in sh, spaces are not allowed around the = in an assigmnent
Next the if syntax is wrong. See https://www.gnu.org/software/bash/manual/bash.html#index-if
See also https://www.gnu.org/software/bash/manual/bash.html#Shell-Arithmetic
I have a file with the output of an SQL query like this:
DG_DATA 9 DG_FRA 0 OCR002 3 OCR 3
I use the following for extract the columns with numbers
awk '{for(x=1;x<=NF;++x)if(x % 2 == 0)printf $x "\t"}'
and I get an output with this format:
9 0 3 3
but I need some help to compare all the integers and print "CRITICAL" if any integer is greater than 80.
This is a way to do it:
{
n=split($0, arr, " ");
i=2;
while(i<=n){
str=str arr[i]"\t";
if(arr[i] > 80){
f=1
};
i+=2;
}
print str;
if(f){
print "CRITICAL";
}
}
EXAMPLES
All values ok:
[llamazing#pc ~]$ echo "DG_DATA 9 DG_FRA 0 OCR002 3 OCR 3" | awk '{n=split($0, a, " ");i=2;while(i<=n){str=str a[i]"\t";if(a[i] > 80){f=1};i+=2;}print str;if(f){print "CRITICAL";}}'
9 0 3 3
With critical message:
[llamazing#pc ~]$ echo "DG_DATA 9 DG_FRA 81 OCR002 3 OCR 3" | awk '{n=split($0, a, " ");i=2;while(i<=n){str=str a[i]"\t";if(a[i] > 80){f=1};i+=2;}print str;if(f){print "CRITICAL";}}'
9 81 3 3
CRITICAL
I have the following shell script that reads in data from a file inputted at the command line. The file is a matrix of numbers, and I need to separate the file by columns and then sort the columns. Right now I can read the file and output the individual columns but I am getting lost on how to sort. I have inputted a sort statement, but it only sorts the first column.
EDIT:
I have decided to take another route and actual transpose the matrix to turn the columns into rows. Since I have to later calculate the mean and median and have already successfully done this for the file row-wise earlier in the script - it was suggested to me to try and "spin" the matrix if you will to turn the columns into rows.
Here is my UPDATED code
declare -a col=( )
read -a line < "$1"
numCols=${#line[#]} # save number of columns
index=0
while read -a line ; do
for (( colCount=0; colCount<${#line[#]}; colCount++ )); do
col[$index]=${line[$colCount]}
((index++))
done
done < "$1"
for (( width = 0; width < numCols; width++ )); do
for (( colCount = width; colCount < ${#col[#]}; colCount += numCols ) ); do
printf "%s\t" ${col[$colCount]}
done
printf "\n"
done
This gives me the following output:
1 9 6 3 3 6
1 3 7 6 4 4
1 4 8 8 2 4
1 5 9 9 1 7
1 5 7 1 4 7
Though I'm now looking for:
1 3 3 6 6 9
1 3 4 4 6 7
1 2 4 4 8 8
1 1 5 7 9 9
1 1 4 5 7 7
To try and sort the data, I have tried the following:
sortCol=${col[$colCount]}
eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'
Also: (which is how I sorted the row after reading in from line)
sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )
If you could provide any insight on this, it would be greatly appreciated!
Note, as mentioned in the comments, a pure bash solution isn't pretty. There are a number of ways to do it, but this is probably the most straight forward. The following requires reading all values per line into the array, and saving the matrix stride so it can be transposed to read all column values into a row matrix and sorted. All sorted columns are inserted into new row matrix a2. Transposing that row matrix yields your original matrix back in column sort order.
Note this will work for any rank of column matrix in your file.
#!/bin/bash
test -z "$1" && { ## validate number of input
printf "insufficient input. usage: %s <filename>\n" "${0//*\//}"
exit 1;
}
test -r "$1" || { ## validate file was readable
printf "error: file not readable '%s'. usage: %s <filename>\n" "$1" "${0//*\//}"
exit 1;
}
## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[#]})" )
msia() {
local a=( "$#" )
local sz=${#a[#]}
local _tmp
[[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
for((i=0;i<$sz;i++)); do
for((j=$((sz-1));j>i;j--)); do
[[ ${a[$i]} -gt ${a[$j]} ]] && {
_tmp=${a[$i]}
a[$i]=${a[$j]}
a[$j]=$_tmp
}
done
done
echo ${a[#]}
unset _tmp
unset sz
return 0
}
declare -a a1 ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0
while read line; do ## read all lines into array
a1+=( $line );
(( cnt == 0 )) && stride=${#a1[#]} ## calculate matrix stride
(( cnt++ ))
done < "$1"
sz=${#a1[#]} ## calculate matrix size
## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
for ((j = 0; j < stride; j++)); do
printf " %s" ${a1[i+j]}
done
printf "\n"
done
## sort columns from stride array
for ((j = 0; j < stride; j++)); do
for ((i = 0; i < sz; i += stride)); do
arow+=( ${a1[i+j]} )
done
a2+=( $(msia ${arow[#]}) ) ## create sorted array
unset arow
done
## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
for ((i = 0; i < sz; i += cnt)); do
printf " %s" ${a2[i+j]}
done
printf "\n"
done
exit 0
Output
$ bash sort_cols2.sh dat/matrix.txt
original array:
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
sorted array:
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
Awk script
awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i} #Add to column array
END{
for(i=1;i<=NF;i++){
split(a[i],b) #Split column
x=asort(b) #sort column
for(j=1;j<=x;j++){ #loop through sort
d[j]=d[j](d[j]~/./?" ":"")b[j] #Recreate lines
}
}
for(i=1;i<=NR;i++)print d[i] #Print lines
}' file
Output
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
Here's my entry in this little exercise. Should handle an arbitrary number of columns. I assume they're space-separated:
#!/bin/bash
linenumber=0
while read line; do
i=0
# Create an array for each column.
for number in $line; do
[ $linenumber == 0 ] && eval "array$i=()"
eval "array$i+=($number)"
(( i++ ))
done
(( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
thisarray=array$j
eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done
# Print each array's 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[#]}); do
for j in $(seq 0 $i ); do
eval 'printf ${array'$j'['$k']}" "'
done
echo ""
done
Not bash but i think this python code worths a look showing how this task can be achieved using built-in functions.
From the interpreter:
$ cat matrix.txt
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17)
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a))
for a in zip(*[a.split() for a in f.readlines()])]):
... print ' '.join(row)
...
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
Hello stackoverflow users!
Generally I would like to tune up script I am using, just to make it more insensitive to missing data.
My example data looks like this (tab delimited csv file with headers):
ColA ColB ColC
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
I use awk script found elsewhere, as follows:
awk 'BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$2 }
$2 == delim {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }'
Which gives me output I want - omit 1st line, find 2nd column and set delimiter - in this example it will be '0':
file_no00.txt
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
file_no01.txt
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
file_no02.txt
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
To make the script more robust (imagine that rows with 0's are deleted) I would need to split file according to the subtracted value of rows 'n+1' and 'n' if this value is below 0 split file, so basically if (value_row_n+1)-value_row_n < 0 then split file. Of course I would need also to maintain the file naming. Preferred way is bash with awk use. Any advices? Thanks in advance!
Cheers!
Here is awk command that you can use:
cat file
ColA ColB ColC
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
awk 'NR == 1 {
next
}
!p || $2 < p {
f = sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{
p = $2;
print $0 > f
}' file
I suggest small modifications to your current script:
awk 'BEGIN { fn=0; f=sprintf("file_no%02d.txt",fn++); print "Creating " f }
NR==1 { next }
NR==2 { delim=$2 }
$2 - delim < 0 {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f; delim = $2 }' infile
First, create the first file name just before starting the processing.
Second, in last condition save the value of current line to compare with the value of next line.
Third, instead the comparison with zero, do the substraction between previous value and current one to check if result is less than zero.
It yields:
==> file_no00.txt <==
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
==> file_no01.txt <==
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
==> file_no02.txt <==
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
I have a large dataset that looks like this:
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
For every line, I want to subtract the first field from the second, third from fourth and so on deepening on the number of fields (always even). Then, I want to report those lines for which difference from all the pairs exceeds a certain limit (say 2). I should also be able to report next best lines i.e., lines in which one pairwise comparison fails to meet the limit, but all other pairs meet the limit.
from the above example, if I set a limit to 2 then, my output file should contain
best lines:
2 5 3 7 1 6 # because (5-2), (7-3), (6-1) are all > 2
4 8 1 8 6 9 # because (8-4), (8-1), (9-6) are all > 2
next best line(s)
1 5 2 9 4 5 # because except (5-4), both (5-1) and (9-2) are > 2
My current approach is to read every line, save each field as a variable, do subtraction.
But I don't know how to proceed further.
Thanks,
Prints "best" lines to the file "best", and prints "next best" lines to the file "nextbest"
awk '
{
fail_count=0
for (i=1; i<NF; i+=2){
if ( ($(i+1) - $i) <= threshold )
fail_count++
}
if (fail_count == 0)
print $0 > "best"
else if (fail_count == 1)
print $0 > "nextbest"
}
' threshold=2 inputfile
Pretty straightforward stuff.
Loop through fields 2 at a time.
If (next field - current field) does not exceed threshold, increment fail_count
If that line's fail_count is zero, that means it belongs to "best" lines.
Else if that line's fail_count is one, it belongs to "next best" lines.
Here's a bash-way to do it:
#!/bin/bash
threshold=$1
shift
file="$#"
a=($(cat "$file"))
b=$(( ${#a[#]}/$(cat "$file" | wc -l) ))
for ((r=0; r<${#a[#]}/b; r++)); do
br=$((b*r))
for ((c=0; c<b; c+=2)); do
if [[ $(( ${a[br + c+1]} - ${a[br + c]} )) < $threshold ]]; then
break; fi
if [[ $((c+2)) == $b ]]; then
echo ${a[#]:$br:$b}; fi
done
done
Usage:
$ ./script.sh 2 yourFile.txt
2 5 3 7 1 6
4 8 1 8 6 9
This output can then easily be redirected:
$ ./script.sh 2 yourFile.txt > output.txt
NOTE: this does not work properly if you have those empty lines between each line...But I'm sure the above will get you well on your way.
I probably wouldn't do that in bash. Personally, I'd do it in Python, which is generally good for those small quick-and-dirty scripts.
If you have your data in a text file, you can read here about how to get that data into Python as a list of lines. Then you can use a for-loop to process each line:
threshold = 2
results = []
for line in content:
numbers = [int(n) for n in line.split()] # Split it into a list of numbers
pairs = zip(numbers[::2],numbers[1::2]) # Pair up the numbers two and two.
result = [abs(y - x) for (x,y) in pairs] # Subtract the first number in each pair from the second.
if sum(result) > threshold:
results.append(numbers)
Yet another bash version:
First a check function that return nothing but a result code:
function getLimit() {
local pairs=0 count=0 limit=$1 wantdiff=$2
shift 2
while [ "$1" ] ;do
[ $(( $2-$1 )) -ge $limit ] && : $((count++))
: $((pairs++))
shift 2
done
test $((pairs-count)) -eq $wantdiff
}
than now:
while read line ;do getLimit 2 0 $line && echo $line;done <file
2 5 3 7 1 6
4 8 1 8 6 9
and
while read line ;do getLimit 2 1 $line && echo $line;done <file
1 5 2 9 4 5
If you can use awk
$ cat del1
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
1 5 2 9 4 5 3 9
$ cat del1 | awk '{
> printf "%s _ ",$0;
> for(i=1; i<=NF; i+=2){
> printf "%d ",($(i+1)-$i)};
> print NF
> }' | awk '{
> upper=0;
> for(i=1; i<=($NF/2); i++){
> if($(NF-i)>threshold) upper++
> };
> printf "%d _ %s\n", upper, $0}' threshold=2 | sort -nr
3 _ 4 8 1 8 6 9 _ 4 7 3 6
3 _ 2 5 3 7 1 6 _ 3 4 5 6
3 _ 1 5 2 9 4 5 3 9 _ 4 7 1 6 8
2 _ 1 5 2 9 4 5 _ 4 7 1 6
0 _ 5 6 5 6 3 5 _ 1 1 2 6
You can process result further according to your needs. The result is sorted by ‘goodness’ order.