Jq label and break to break out of control structure - label

I'm trying to learn jq label and break.
Here's my attempt:
https://jqplay.org/s/YHxn1dRlQO
Filter:
label $out | reduce .[] as $item (0; if $item==1 then break $out else .+$item end)
Input:
[3,2,1]
Output:
empty
I was expecting output 5.
I don't really know how label and break works.
Can you explain how to correctly use them with examples?

Given your input array, .==1 will never evaluate to true. $item==3, $item==2 and $item==1, as well as .==0, .==3 and .==5, however, would.
And if it does, it still wouldn't output 5 because break is treated like an error that was caught.
From the manual:
The break $label_name expression will cause the program to to act as though the nearest (to the left) label $label_name produced empty.
Therefore
label $out | reduce .[] as $item (0;
if $item==1 then break $out else .+$item end
)
will correctly produce empty.
You can still capture empty though, for instance
label $out | reduce .[] as $item (0;
if $item == 1 then break $out else . + $item end
)? // (.[:index(1)] | add)
5
Demo

break does not work (in the way one might reasonably expect (*) ) with reduce. Whether that is by design or not, I cannot say, but perhaps it's not surprising as reduce is intended for reductions.
Apart from reduce, though, break behaves properly elsewhere, notably with foreach.
(*) Consider for example:
label $out
| reduce range(0;10) as $i (0; .+$i | if $i == 4 then ., break $out else . end)
One might reasonably expect the accumulated sum (0+1+2+3+4) to be emitted, but instead the result is empty.
By contrast, if you replace reduce with foreach, the result is as expected:
0
1
3
6
10
Notice that to achieve the effect that is presumably wanted with reduce, one can simply use last in conjunction with foreach:
last(label $out
| foreach range(0;10) as $i (0; .+$i | if $i == 4 then ., break $out else . end))

Related

Fibonacci & for loop: how are the commands executed step by step?

#!/bin/bash
a=0
b=1
echo "give a number:"
read n
clear
echo "the fibonacci sequence until $n:"
for (( i=0; i<n; i++ ))
do
echo -n "$a "
c=$((a + b))
a=$b
b=$c
done
If I interpret it well, this code echoes a $a value after every i++ jumps, then switches the variables as you can see, then on the next i++ loop jump it happens again until "i" reaches "n".
Question: if we want in every loop jump the value of the new "c" why shall we echo $a? I see the connection that: a=$b, b=$c, and c=$((a + b)) but i don't get it why do we refer to $a when doing echo?
Is there a more elegant solution?
You mean, “never ever calculate anything needlessly, ever”? It is possible, of course, but it depends on how much ugliness in the control logic you are willing to tolerate. In the example below, fibonacci1 calculates at most one extra element of the series that may not get printed out and fibonacci2 never calculates any extra series elements and everything makes it to the standard output.
Is any of that “elegant”? Probably not. This is actually a common problem most people encounter when coding (in languages other than purely functional ones): Most high(er)-level languages (unlike e.g. assemblers) provide predefined control structures that work great in typical and obvious cases (e.g. one control variable and one operation per iteration) but may become “suboptimal” in more complex scenarios.
A notoriously common example is a variable that stores a value from the previous iteration. Let’s assume you assign it at the very end of the loop. That works fine, but… Could you avoid the very last assignment (because it is useless), instead of leaving it to the compiler’s wisdom? Yes, you could, but then (e.g.) for ((init; condition; step)); do ...; ((previous = current)); done becomes (e.g.) for ((init;;)); do ...; ((step)); ((condition)) || break; ((previous = current)); done.
On one hand, a tiny bit of something (such as thin air) may have been “saved”. On the other hand, the code became assembler-like and harder to write, read and maintain.
To find a balance there^^^ and {not,} optimize when it {doesn’t,does} matter is a lifelong struggle. It may be something like CDO, which is like OCD, but sorted correctly.
fibonacci1() {
local -ai fib=(0 1)
local -i i
for ((i = $1; i > 2; i -= 2)) {
printf '%d %d ' "${fib[#]}"
fib=($((fib[0] + fib[1])) $((fib[0] + 2 * fib[1])))
}
echo "${fib[#]::i}"
}
fibonacci2() {
trap 'trap - return; echo' return
local -i a=0 b=1 i="$1"
((i)) || return 0
printf '%d' "$a"
((--i)) || return 0
printf ' %d' "$b"
for ((;;)); do
((--i)) || return 0
printf ' %d' "$((a += b))"
((--i)) || return 0
printf ' %d' "$((b += a))"
done
}
for ((i = 0; i <= 30; ++i)); do
for fibonacci in fibonacci{1,2}; do
echo -n "${fibonacci}(${i}): "
"$fibonacci" "$i"
done
done

TCSH - most compact sytax for checking if value in an array/list?

I'm trying to find out if there's a more compact way to write the following format tcsh code, ideally still in tcsh. I spent a while searching around using different keywords but couldn't find anything helpful.
Essentially the code snippet is just looping though a number sequence, and then if that looped value is within a set of numbers, assigning a variable. For simplicity I've made the lists of numbers $VAR is being compared against relatively short, and the number of comparison occurrences few, but the actual problem is double the size is both respects.
foreach VAR (`seq 1 24`)
if ($VAR == 1 || $VAR == 2 || $VAR == 3 || $VAR == 4) then
set cat = small
else if ($VAR == 5 || $VAR == 6 || $VAR == 7 || $VAR == 8) then
set cat = medium
else
set cat = large
endif
end
I suppose I was thinking more along the lines of python where you can just say "if x in [...]" etc., rather than needing to compare $VAR to every number individually as is the case above. I'd considered the following type of setup, but one ends up with more lines overall.
foreach VAR (`seq 1 24`)
foreach C (1 2 3 4)
if ($C == $VAR) then
set cat = small
endif
end
...
end
If the provided code is as simple as is gets in tcsh, is there a more succinct way in say, bash? Thanks for any tips.
I am afraid that in tcsh You won't save any lines thou:
#/bin/tcsh
foreach VAR (`seq 1 24`)
if (${VAR} < 5) then
set cat = "small"
else if (${VAR} > 8) then
set cat = "large"
else
set cat = "medium"
endif
end
exit(0)
You can also consider passing logic to AWK (I do not know if helps in Your case):
#/bin/tcsh
foreach VAR (`seq 1 24`)
set cat = (`echo ${VAR} | awk '$0<5 {print("small"); exit;} $0>8 {print("large"); exit;} {print("medium");}'`)
end
exit (0)

Using nested loops in bash to process huge datasets

I am currently working on big datasets (typically 10 Gb for each) that prevent me from using R (RStudio) and dealing with data frames as I used to.
In order to deal with a restricted amount of memory (and CPU power), I've tried Julia and Bash (Shell Script) to process those files.
My question is the following: I've concatenated my files (I have more or less 1 million individual files merged into one big file) and I would like to process those big files in this way: Let's say that I have something like:
id,latitude,longitude,value
18,1,2,100
18,1,2,200
23,3,5,132
23,3,5,144
23,3,5,150
I would like to process my file saying that for id = 18, compute the max (200), the min (100) or some other propreties then go to next id and do the same. I guess some sort of nested loop in bash would work but I'm having issues doing it in an elegant way, the answers found on the Internet so far were not really helping. I cannot process it in Julia because it's too big/heavy, that's why I'm looking for answers mostly in bash.
However, I wanted to do this because I thought it would be faster to process a huge file rather than open a file, calculate, close file and go to the next one again and again. I'm not sure at all though !
Finally, which one would be better to use? Julia or Bash? Or something else?
Thank you !
Julia or Bash?
If you are talking about using plain bash and not some commands that could be executed in any other shell, then the answer is obviously Julia. Plain bash is magnitudes slower than Julia.
However, I would recommend to use an existing tool instead of writing your own.
GNU datamash could be what you need. You can call it from bash or any other shell.
for id = 18, compute the max (200), the min (100) [...] then go to next id and do the same
With datamash you could use the following bash command
< input.csv datamash -Ht, -g 1 min 4 max 4
Which would print
GroupBy(id),min(value),max(value)
18,100,200
23,132,150
Loops in bash are slow, I think that Julia is a much better fit in this case. Here is what I would do:
(Ideally) convert your data into a binary format, like NetCDF or HDF5.
load a chunk of data (e.g. 100 000 rows, not all, unless all data holds into RAM) and perform min/max per id as you propose
go to the next chunk and update the min/max for every ids
Do not load all the data at once in memory if you can avoid it. For computing easy statistics like the minimum, maximum, sum, mean, standard deviation, ... this can by done.
In my opinion, the memory overhead of julia (versus bash) are probably quite small given the size of the problem.
Be sure to read the performance tips in Julia and in particular place hoot-loops inside functions and not in global scope.
https://docs.julialang.org/en/v1/manual/performance-tips/index.html
Alternatively, such operations can also be done with specific queries in a SQL database.
Bash is definitely not the best option. (Fortran, baby!)
Anyway, the following can be translated to any language you want.
#!/bin/bash
function postprocess(){
# Do whatever statistics you want on the arrays.
echo "id: $last_id"
echo "lats: ${lat[#]}"
echo "lons: ${lon[#]}"
echo "vals: ${val[#]}"
}
# Set dummy start variable
last_id="not a valid id"
count=0
while read line; do
id=$( echo $line | cut -d, -f1 )
# Ignore first line
[ "$id" == "id" ] && continue
# If this is a new id, post-process the old one
if [ $id -ne $last_id -a $count -ne 0 ] 2> /dev/null; then
# Do post processing of data
postprocess
# Reset counter
count=0
# Reset value arrays
unset lat
unset lon
unset val
fi
# Increment counter
(( count++ ))
# Set last_id
last_id=$id
# Get values into arrays
lat+=($( echo $line | cut -d, -f2 ))
lon+=($( echo $line | cut -d, -f3 ))
val+=($( echo $line | cut -d, -f4 ))
done < test.txt
[ $count -gt 0 ] && postprocess
For this kind of problem, I'd be wary of using bash, because it isn't suited to line-by-line processing. And awk is too line-oriented for this kind of job, making the code complicated.
Something like this in perl might do the job, with a loop of loops grouping lines together by their id field.
IT070137 ~/tmp $ cat foo.pl
#!/usr/bin/perl -w
use strict;
my ($id, $latitude, $longitude, $value) = read_data();
while (defined($id)) {
my $group_id = $id;
my $min = $value;
my $max = $value;
($id, $latitude, $longitude, $value) = read_data();
while (defined($id) && $id eq $group_id) {
if ($value < $min) {
$min = $value;
}
if ($value > $max) {
$max = $value;
}
($id, $latitude, $longitude, $value) = read_data();
}
print $group_id, " ", $min, " ", $max, "\n";
}
sub read_data {
my $line = <>;
if (!defined($line)) {
return (undef, undef, undef, undef);
}
chomp($line);
my ($id, $latitude, $longitude, $value) = split(/,/, $line);
return ($id, $latitude, $longitude, $value);
}
IT070137 ~/tmp $ cat foo.txt
id,latitude,longitude,value
18,1,2,100
18,1,2,200
23,3,5,132
23,3,5,144
23,3,5,150
IT070137 ~/tmp $ perl -w foo.pl foo.txt
id value value
18 100 200
23 132 150
Or if you prefer Python:
#!/usr/bin/python -tt
from __future__ import print_function
import fileinput
def main():
data = fileinput.input()
(id, lattitude, longitude, value) = read(data)
while id:
group_id = id
min = value
(id, lattitude, longitude, value) = read(data)
while id and group_id == id:
if value < min:
min = value
(id, lattitude, longitude, value) = read(data)
print(group_id, min)
def read(data):
line = data.readline()
if line == '':
return (None, None, None, None)
line = line.rstrip()
(id, lattitude, longitude, value) = line.split(',')
return (id, lattitude, longitude, value)
main()

How to pass a variable modified with perl in find | xargs to the bash variables?

COUNT=0
export COUNT
find "$SRCROOT" \(-name "*.h" -or -name "*.m") | xargs -0 perl -ne 'if (/$ENV{KEYWORDS}/){$COUNT++; print"Iteration number = $COUNT"}'
echo "Total Count= " $COUNT
This gives output as
Iteration number = 0
Iteration number = 1
...
Iteration number = 25
Total Count= 0
Expected Total Count is 25(or some number based on resultset) but it always shows 0. The COUNT value modified in side loop is not retrieved after that.
How can i get the correct value of COUNT ?
It is not possible for a child process (in this case perl) to change the environment of the parent process (your shell). What you can do though, is capture the output of the child process. Assuming the rest of the code is working as you expect, you can do this:
# replace ...'s with code from your sample
COUNT=$(find ... | xargs cat | perl -ne'if (...) { $COUNT++ } END { print $COUNT }' )
Notice that $(..) is used to capture the output of a command. Also notice that the only string printed is the $COUNT value in the end.

How can I shift digits in bash?

I have a homework assignment that is asking to shift a decimal number by a specified amount of digits. More clearly this bash script will take two input arguments, the first is the number(maximum 9 digits) that the shift will be performed on and the second is the number(-9 to 9) of digits to shift. Another requirement is that when a digit is shifted off the end, it should be attached to the other end of the number. One headache of a requirement is that we cannot use control statements of any kind: no loops, no if, and switch cases.
Example: 12345 3 should come out to 345000012 and 12345 -3 should be 12345000
I know that if I mod 12345 by 10^3 I get 345 and then if I divide 12345 by 10^3 I get 12 and then I can just concatenate those two variables together to get 34512. I am not quite sure if that is exactly correct but that is the closest I can get as of now. As far as the -3 shift, I know that 10^-3 is .001 and would work however when I try using 10^-3 in bash I get an error.
I am just lost at this point, any tips would be greatly appreciated.
EDIT: After several hours of bashing (pun intended) my head against this problem, I finally came up with a script that for the most part works. I would post the code right now but I fear another student hopelessly lost might stumble upon it. I will check back and post what I came up with in a week or two. I was able to do it with mods and division. Thank you all for the responses, it really helped me to open up and think about the problem from different angles.
Here's a hint:
echo ${string:0:3}
echo ${#string}
Edit (2011-02-11):
Here's my solution. I added some additional parameters with defaults.
rotate-string ()
{
local s=${1:-1} p=${2:--1} w=${3:-8} c=${4:-0} r l
printf -vr '%0*d' $w 0 # save $w zeros in $r
r=${r//0/$c}$s # change the zeros to the character in $c, append the string
r=${r: -w} # save the last $w characters of $r
l=${r: -p%w} # get the last part of $r ($p mod %w characters)
echo "$l${r::w-${#l}}" # output the Last part on the Left and the Right part which starts at the beginning and goes for ($w minus the_length_of_the_Left_part) characters
}
usage: rotate-string string positions-to-rotate width fill-character
example: rotate-string abc -4 9 =
result: ==abc====
Arguments can be omitted starting from the end and these defaults will be used:
fill-character: "0"
width: 8
positions-to-rotate: -1
string: "1"
More examples:
$ rotate-string
00000010
$ rotate-string 123 4
01230000
Fun stuff:
$ for i in {126..6}; do printf '%s\r' "$(rotate-string Dennis $i 20 .)"; sleep .05; done; printf '\n'
$ while true; do for i in {10..1} {1..10}; do printf '%s\r' "$(rotate-string : $i 10 .)"; sleep .1; done; done
$ while true; do for i in {40..2} {2..40}; do printf '%s\r' "$(rotate-string '/\' $i 40 '_')"; sleep .02; done; done
$ d=0; while true; do for i in {1..10} {10..1}; do printf '%s\r' "$(rotate-string $d $i 10 '_')"; sleep .02; done; ((d=++d%10)); done
$ d=0; while true; do for i in {1..10}; do printf '%s\r' "$(rotate-string $d $i 10 '_')"; sleep .2; ((d=++d%10)); done; done
$ shape='▁▂▃▄▅▆▇█▇▆▅▄▃▂▁'; while true; do for ((i=1; i<=COLUMNS; i++)); do printf '%s\r' "$(rotate-string "$shape" $i $COLUMNS ' ')"; done; done
In the absence of control structures, you need to use recursion, with index values as "choice selections", which is how functional programming often works.
#!/bin/sh
#
# cshift NUMBER N
cshift() {
let num=10#$1
num=`printf '%09d' $num`
lshift="${num:1:8}${num:0:1}"
rshift="${num:8:1}${num:0:8}"
next=( "cshift $lshift $(($2 + 1))" "echo $num" "cshift $rshift $(( $2 - 1 ))" )
x=$(( $2 == 0 ? 1 : $2 < 0 ? 0 : 2 ))
eval "${next[x]}"
}
cshift $1 $2
and, the testing:
$ for ((i=-9;i<=9;i++)); do cshift 12345 $i ; done
000012345
500001234
450000123
345000012
234500001
123450000
012345000
001234500
000123450
000012345
500001234
450000123
345000012
234500001
123450000
012345000
001234500
000123450
000012345
You can also do some math on the indexes and avoid the recursion, but I don't mind making the computer work harder so I don't have to. It's easy to think of how to do the shift by one in either direction, and then I use an evaluated choice that is selected by the signum of the shift value, outputting a value and stopping when the shift value is zero.

Resources