I have a tab file with two columns like that
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 205 284 307 406
2 10 13 40 47 58 2 13 40 87
and the desired output should be
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 14 27
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 6 209 299 305
2 10 13 23 40 47 58 87 10 23 40 58
I would like to change the numbers in 2nd column for random numbers in 1st column resulting in an output in 2nd column with the same number of numbers. I mean e.g. if there are four numbers in 2nd column for x row, the output must have four random numbers from 1st column for this row, and so on...
I'm try to create two arrays by AWK and split and replace every number in 2nd column for numbers in 1st column but not in a randomly way. I have seen the rand() function but I don't know exactly how joint these two things in a script. Is it possible to do in BASH environment or are there other better ways to do it in BASH environment? Thanks in advance
awk to the rescue!
$ awk -F'\t' 'function shuf(a,n)
{for(i=1;i<n;i++)
{j=i+int(rand()*(n+1-i));
t=a[i]; a[i]=a[j]; a[j]=t}}
function join(a,n,x,s)
{for(i=1;i<=n;i++) {x=x s a[i]; s=" "}
return x}
BEGIN{srand()}
{an=split($1,a," ");
shuf(a,an);
bn=split($2,b," ");
delete m; delete c; j=0;
for(i=1;i<=bn;i++) m[b[i]];
# pull elements from a upto required sample size,
# not intersecting with the previous sample set
for(i=1;i<=an && j<bn;i++) if(!(a[i] in m)) c[++j]=a[i];
cn=asort(c);
print $1 FS join(c,cn)}' file
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 85 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 20 205 294 295
2 10 13 23 40 47 58 87 10 13 47 87
shuffle (standard algorithm) the input array, sample required number of elements, additional requirement is no intersection with the existing sample set. Helper structure map to keep existing sample set and used for in tests. The rest should be easy to read.
Assuming that there is a tab delimiting the two columns, and each column is a space delimited list:
awk 'BEGIN{srand()}
{n=split($1,a," ");
m=split($2,b," ");
printf "%s\t",$1;
for (i=1;i<=m;i++)
printf "%d%c", a[int(rand() * n) +1], (i == m) ? "\n" : " "
}' FS=\\t input
Try this:
# This can be an external file of course
# Note COL1 and COL2 seprated by hard TAB
cat <<EOF > d1.txt
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 6 94
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 205 284 307 406
2 10 13 40 47 58 2 13 40 87
EOF
# Loop to read each line, not econvert TAB to:, though could have used IFS
cat d1.txt | sed 's/ /:/' | while read LINE
do
# Get the 1st column data
COL1=$( echo ${LINE} | cut -d':' -f1 )
# Get col1 number of items
NUM_COL1=$( echo ${COL1} | wc -w )
# Get col2 number of items
NUM_COL2=$( echo ${LINE} | cut -d':' -f2 | wc -w )
# Now split col1 items into an array
read -r -a COL1_NUMS <<< "${COL1}"
COL2=" "
# THis loop runs once for each COL2 item
COUNT=0
while [ ${COUNT} -lt ${NUM_COL2} ]
do
# Generate a random number to use as teh random index for COL1
COL1_IDX=${RANDOM}
let "COL1_IDX %= ${NUM_COL1}"
NEW_NUM=${COL1_NUMS[${COL1_IDX}]}
# Check for duplicate
DUP_FOUND=$( echo "${COL2}" | grep ${NEW_NUM} )
if [ -z "${DUP_FOUND}" ]
then
# Not a duplicate, increment loop conter and do next one
let "COUNT = COUNT + 1 "
# Add the random COL1 item to COL2
COL2="${COL2} ${COL1_NUMS[${COL1_IDX}]}"
fi
done
# Sort COL2
COL2=$( echo ${COL2} | tr ' ' '\012' | sort -n | tr '\012' ' ' )
# Print
echo ${COL1} :: ${COL2}
done
Output:
5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 :: 88 95
6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 :: 20 299 304 305
2 10 13 40 47 58 :: 2 10 40 58
I have been working on a Nurse scheduling problem in AMPL for the following conditions:
Total no. of Nurses=20
Total no. of shits= 3 #morning,day,night
Planning Horizon 7 days: let's say M T W R F Sa Su
Along with following constraints:
Max no. of working days in a week: 5
A rest days after 4 continuous
night shifts.
Consecutive night and morning shifts are not allowed.
Demand per shift is 7 nurses.
A nurse can only work in one shift per day, i.e. morning, night, day
Cost scenarios:
Morning shift: $12
Day shift: $13
Night shift : $15
Objective function is to minimize the cost of operation as per Nurse preferences.
Can anyone give me an idea of how this problem can be formulated ?
So at first some things unusual in your problem definition:
This is not a real optimization problem, since your objective function is fixed per definition (every shift has 7 nurses, and every nurse has an equal price per shift)
In your Problem you defined 7 nurses per shift with a maimum of 5 working days. So you need 7 nurses on three shifts on seven days. This equals 147 nurse/shifts. But with the cap of five working days and only one shift per day, you just have 20 Nurses on 5 shifts, which equals to 100 nurse/shifts.
I've built the problem in Mathprog but the code should be more or less equal to AMPL. I've started with three sets for the nurses, days and shifts.
set shifts := {1,2,3};
set days := {1,2,3,4,5,6,7};
set nurses := {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
The shedule is defined as a set of binary variables:
var schedule{nurses, days, shifts}, binary;
The simple objective contains the sum of all nurse/shifts in this week with the related prices:
minimize cost: sum{i in nurses, j in days}(schedule[i,j,1]*c_morning+schedule[i,j,2]*c_day+schedule[i,j,3]*c_night);
To your first constraint one can limit the sum of all shifts per nurse to five, since there is only one shift per day possible:
s.t. working_days{n in nurses}:
sum{i in days, j in shifts}(schedule[n,i,j]) <= 5;
The restday is the hardest part of the problem. For simplicity I've created another set which just contains the days, where a nurse could have achived four night-shifts in a row. You can also formulate the constraint with the original set of days and exclude the first four days.
set nigth_days := {5,6,7};
s.t. rest{n in nurses,i in nigth_days}:
(schedule[n,i-4,3]+schedule[n,i-3,3]+schedule[n,i-2,3]+schedule[n,i-1,3]+sum{j in shifts}(schedule[n,i,j])) <= 4;
For not having a morning-shift after a night-shift I used the same attempt like for the rest days. The seventh day is excluded, since there is no eigth day where we can look for a morning-shift.
set yester_days := {1,2,3,4,5,6};
s.t. night_morning{i in yester_days, n in nurses}:
(schedule[n,i,3]+schedule[n,i+1,1]) <= 1;
The demand of four nurses per shift should be met (I've reduced the number since more then 4 nurses are infeasible, due to the 5 shift limit)
s.t. demand_shift{i in days, j in shifts}:
sum{n in nurses}(schedule[n,i,j]) = 4;
The fifth constraint is to limit the shifts per day to a max of one.
s.t. one_shift{n in nurses, i in days}:
sum{ j in shifts}(schedule[n,i,j]) <= 1;
set nurse; #no. of full time employees working in the facility
set days; #planning horizon
set shift; #no. of shift in a day
set S; #shift correseponding to the outsourced nurses
set D;#day corresponding to the outsourced nurses
set N;#
# ith nurse working on day j
# j starts from Monday (j=1), Tuesday( j=2), Wednesday (j=3), Thursday(j=4), Friday(j=5), Saturday(j=6), Sunday(j=7)
#s be the shift as morning, day and night
param availability{i in nurse, j in days};
param costpershift{i in nurse, j in days, s in shift};
param outcost{n in N, l in D, m in S};
var nurseavailability{i in nurse,j in days,s in shift} binary; # = 1 if nurse i is available on jth day working on sth shift, 0 otherwise
var outsourced{n in N, l in D, m in S} integer;
#Objective function
minimize Cost: sum{i in nurse, j in days, s in shift} costpershift[i,j,s]*nurseavailability[i,j,s]+ sum{ n in N, l in D, m in S}outcost[n,l,m]*outsourced[n,l,m];
#constraints
#maximum no. of shifts per day
subject to maximum_shifts_perday {i in nurse,j in days}:
sum{s in shift} nurseavailability[i,j,s]*availability[i,j] <= 1;
#maximum no. of working says a week
subject to maximum_days_of_work {i in nurse}:
sum{j in days,s in shift} availability[i,j]*nurseavailability[i,j,s]<=5; #maximum working days irrespective of shifts
# rest days after night shifts
subject to rest_days_after_night_shift{i in nurse}:
sum{j in days} availability[i,j]*nurseavailability[i,j,3]<=4;
#demand per shift
subject to supply{j in days, s in shift, l in D, m in S}:
sum{i in nurse} availability[i,j]*nurseavailability[i,j,s] + sum{n in N} outsourced[n,l,m]=7;
#outsourcing only works well when there is more variability in supply.
#increasing the staff no. would be effective for reducing the cost variability in demand.
#considering a budget of $16,000 per week
#outsourcing constraints: a maximum of 20 nurses can be outsourced per shift
# no. of fulltime employees=30
#demand is 7 nurses per shift
#the average variability
#all nurses are paid equally # $12 per hour.
#cost of an outsourced shift is $144.
#cost of morning shift is $96.
#cost of day shift is $104.
#cost of night shift is $120.
data;
#set nurse ordered:= nurse1 nurse2 nurse3 nurse4 nurse5 nurse6 nurse7 nurse8
#nurse9 nurse10 nurse11 nurse12 nurse13 nurse14 nurse15 nurse16 nurse17
#nurse18 nurse19 nurse20;
set nurse:= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;
#set days ordered:= Monday Tuesday Wednesday Thursday Friday Saturday Sunday;
set days:= 1 2 3 4 5 6 7;
#set shift ordered:= Morning Day Night;
set shift:= 1 2 3;
set D:= 1 2 3 4 5 6 7; #outsourced days
set S:=1 2 3; #outshit
set N := 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20;
param outcost
[*,*,1]:
1 2 3 4 5 6 7:=
1 144 144 144 144 144 144 144
2 144 144 144 144 144 144 144
3 144 144 144 144 144 144 144
4 144 144 144 144 144 144 144
5 144 144 144 144 144 144 144
6 144 144 144 144 144 144 144
7 144 144 144 144 144 144 144
8 144 144 144 144 144 144 144
9 144 144 144 144 144 144 144
10 144 144 144 144 144 144 144
11 144 144 144 144 144 144 144
12 144 144 144 144 144 144 144
13 144 144 144 144 144 144 144
14 144 144 144 144 144 144 144
15 144 144 144 144 144 144 144
16 144 144 144 144 144 144 144
17 144 144 144 144 144 144 144
18 144 144 144 144 144 144 144
19 144 144 144 144 144 144 144
20 144 144 144 144 144 144 144
[*,*,2]:
1 2 3 4 5 6 7:=
1 144 144 144 144 144 144 144
2 144 144 144 144 144 144 144
3 144 144 144 144 144 144 144
4 144 144 144 144 144 144 144
5 144 144 144 144 144 144 144
6 144 144 144 144 144 144 144
7 144 144 144 144 144 144 144
8 144 144 144 144 144 144 144
9 144 144 144 144 144 144 144
10 144 144 144 144 144 144 144
11 144 144 144 144 144 144 144
12 144 144 144 144 144 144 144
13 144 144 144 144 144 144 144
14 144 144 144 144 144 144 144
15 144 144 144 144 144 144 144
16 144 144 144 144 144 144 144
17 144 144 144 144 144 144 144
18 144 144 144 144 144 144 144
19 144 144 144 144 144 144 144
20 144 144 144 144 144 144 144
[*,*,3]:
1 2 3 4 5 6 7:=
1 144 144 144 144 144 144 144
2 144 144 144 144 144 144 144
3 144 144 144 144 144 144 144
4 144 144 144 144 144 144 144
5 144 144 144 144 144 144 144
6 144 144 144 144 144 144 144
7 144 144 144 144 144 144 144
8 144 144 144 144 144 144 144
9 144 144 144 144 144 144 144
10 144 144 144 144 144 144 144
11 144 144 144 144 144 144 144
12 144 144 144 144 144 144 144
13 144 144 144 144 144 144 144
14 144 144 144 144 144 144 144
15 144 144 144 144 144 144 144
16 144 144 144 144 144 144 144
17 144 144 144 144 144 144 144
18 144 144 144 144 144 144 144
19 144 144 144 144 144 144 144
20 144 144 144 144 144 144 144;
param availability:
1 2 3 4 5 6 7 :=
1 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1
7 1 0 1 1 1 1 1
8 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1
11 1 1 1 1 1 1 1
12 1 1 1 1 1 1 1
13 1 1 1 1 1 1 1
14 1 1 1 1 1 1 1
15 1 1 1 1 1 1 1
16 1 1 1 1 1 1 1
17 0 1 1 1 1 1 1
18 1 1 1 1 1 1 1
19 1 1 1 1 1 1 1
20 1 1 1 1 1 1 1
21 1 1 1 1 1 1 1
22 1 1 1 1 1 1 1
23 1 1 1 1 1 1 1
24 1 1 1 1 1 1 1
25 1 1 1 1 1 1 1
26 1 1 1 1 1 1 1
27 1 1 1 1 1 1 1
28 1 1 1 1 1 1 1
29 1 1 1 1 1 1 1
30 1 1 1 1 1 1 1;
param costpershift:=
[*,*,1]: 1 2 3 4 5 6 7 :=
1 96 96 96 96 96 96 96
2 96 96 96 96 96 96 96
3 96 96 96 96 96 96 96
4 96 96 96 96 96 96 96
5 96 96 96 96 96 96 96
6 96 96 96 96 96 96 96
7 96 96 96 96 96 96 96
8 96 96 96 96 96 96 96
9 96 96 96 96 96 96 96
10 96 96 96 96 96 96 96
11 96 96 96 96 96 96 96
12 96 96 96 96 96 96 96
13 96 96 96 96 96 96 96
14 96 96 96 96 96 96 96
15 96 96 96 96 96 96 96
16 96 96 96 96 96 96 96
17 96 96 96 96 96 96 96
18 96 96 96 96 96 96 96
19 96 96 96 96 96 96 96
20 96 96 96 96 96 96 96
21 96 96 96 96 96 96 96
22 96 96 96 96 96 96 96
23 96 96 96 96 96 96 96
24 96 96 96 96 96 96 96
25 96 96 96 96 96 96 96
26 96 96 96 96 96 96 96
27 96 96 96 96 96 96 96
28 96 96 96 96 96 96 96
29 96 96 96 96 96 96 96
30 96 96 96 96 96 96 96
[*,*,2] : 1 2 3 4 5 6 7 :=
1 104 104 104 104 104 104 104
2 104 104 104 104 104 104 104
3 104 104 104 104 104 104 104
4 104 104 104 104 104 104 104
5 104 104 104 104 104 104 104
6 104 104 104 104 104 104 104
7 104 104 104 104 104 104 104
8 104 104 104 104 104 104 104
9 104 104 104 104 104 104 104
10 104 104 104 104 104 104 104
11 104 104 104 104 104 104 104
12 104 104 104 104 104 104 104
13 104 104 104 104 104 104 104
14 104 104 104 104 104 104 104
15 104 104 104 104 104 104 104
16 104 104 104 104 104 104 104
17 104 104 104 104 104 104 104
18 104 104 104 104 104 104 104
19 104 104 104 104 104 104 104
20 104 104 104 104 104 104 104
21 104 104 104 104 104 104 104
22 104 104 104 104 104 104 104
23 104 104 104 104 104 104 104
24 104 104 104 104 104 104 104
25 104 104 104 104 104 104 104
26 104 104 104 104 104 104 104
27 104 104 104 104 104 104 104
28 104 104 104 104 104 104 104
29 104 104 104 104 104 104 104
30 104 104 104 104 104 104 104
[*,*,3] : 1 2 3 4 5 6 7 :=
1 120 120 120 120 120 120 120
2 120 120 120 120 120 120 120
3 120 120 120 120 120 120 120
4 120 120 120 120 120 120 120
5 120 120 120 120 120 120 120
6 120 120 120 120 120 120 120
7 120 120 120 120 120 120 120
8 120 120 120 120 120 120 120
9 120 120 120 120 120 120 120
10 120 120 120 120 120 120 120
11 120 120 120 120 120 120 120
12 120 120 120 120 120 120 120
13 120 120 120 120 120 120 120
14 120 120 120 120 120 120 120
15 120 120 120 120 120 120 120
16 120 120 120 120 120 120 120
17 120 120 120 120 120 120 120
18 120 120 120 120 120 120 120
19 120 120 120 120 120 120 120
20 120 120 120 120 120 120 120
21 120 120 120 120 120 120 120
22 120 120 120 120 120 120 120
23 120 120 120 120 120 120 120
24 120 120 120 120 120 120 120
25 120 120 120 120 120 120 120
26 120 120 120 120 120 120 120
27 120 120 120 120 120 120 120
28 120 120 120 120 120 120 120
29 120 120 120 120 120 120 120
30 120 120 120 120 120 120 120;
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6
awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........
Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71