Bash scripting: using sed and cut to output a specific format - bash

I am working on a bash script using sed and cut that will take times input in various ways and output them in a specific format. Here is an example line:
timeinhour=$(cut -d" " -f2<<<"$line" | sed 's/p/ /' | sed 's/a/ /' | sed 's/am/ /' | sed 's/pm/ /' | sed 's/AM/ /' | sed 's/PM/ /' )
As you can see I am just removing any trailing am or pm from a time entry that might be formatted in various ways leaving only the numbers.
So I want this line to just spit out the hour of the day (timeinhour), ie "1000AM" = "10" as does "10a" and "10am."
The problem I am running into is the varying lengths of the time entries. If I tell sed or cut to remove the last two characters "1000" will correctly output the hour I need: "10," but using it on one that is already "10" obviously results in a blank output.
I have been experimenting with a line like this
sed 's/\(.*\)../\1/'
If anyone has any advice, I would appreciate it.
For example, this input:
1p
1032AM
419pm
1202a
would produce:
1
10
4
12

sed 's/[^0-9]//g;s/^[0-9]\{1,2\}$/&00/;s/^\(.*\)..$/\1/'
the steps
1p -> 1 -> 100 -> 1
10a -> 10 -> 1000 -> 10
419pm -> 419 -> 419 -> 4
1202a -> 1202 -> 1202 -> 12
delete what is not number
expand 1 or 2 digit (hours) into 4 digit HHmm
ignore last two charactes (minutes)

Try:
timeinhour=$(cut -d" " -f2<<<"$line" | sed 's/p/ /;s/a/ /;s/am/ /;s/pm/ /;s/AM/ /;s/PM/ /' | sed 's/\(.*\)../\1/' # Using your example.

Related

How can I count and display only the words that are repeated more than once using unix commands?

I am trying to count and display only the words that are repeated more than once in a file. The basic idea is:
You are given a file with names and characters like commas, colons, slashes, etc..
Use the cut command to display only the first names in the file (other commands are also allowed).
Count and then display only the names repeated more than once.
I got to the point of counting and displaying all the names. However, I haven't found a way to display and to count only those names repeated more than once.
Here is a section of the file:
user1:x:80:200:Mia,Spurs:/home/user1:/bin/bash
user2:x:80:200:Martha,Dalton:/home/user2:/bin/bash
user3:x:80:200:Lucy,Carlson:/home/user3:/bin/bash
user4:x:80:200:Carl,Bingo:/home/user4:/bin/bash
Here is what I have been able to do:
Daniel#Daniel-MacBook-Pro Files % cut -d ":" -f 5-5 file1 | cut -d "," -f 1-1 | sort -n | uniq -c
1 Mia
3 Martha
1 Lucy
1 Carl
1 Jessi
1 Joke
1 Jim
2 Race
1 Sem
1 Shirly
1 Susan
1 Tim
You can filter out the rows with count 1 with grep.
cut -d ":" -f 5 file1 | cut -d "," -f 1 | sort | uniq -c | grep -v '^ *1 '

Inconsistency in output field separator

We have to find the difference(d) Between last 2 nos and display rows with the highest value of d in ascending order
INPUT
1 | Latha | Third | Vikas | 90 | 91
2 | Neethu | Second | Meridian | 92 | 94
3 | Sethu | First | DAV | 86 | 98
4 | Theekshana | Second | DAV | 97 | 100
5 | Teju | First | Sangamithra | 89 | 100
6 | Theekshitha | Second | Sangamithra | 99 |100
Required OUTPUT
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
awk 'BEGIN{FS="|";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
Output:
4 $ Theekshana $ Second $ DAV $ 97 $ 100$3
5 $ Teju $ First $ Sangamithra $ 89 $ 100$11
3 $ Sethu $ First $ DAV $ 86 $ 98$12
As you can see there is space before and after $ sign but for the last column (avg) there is no space, please explain why its happening
2)
awk 'BEGIN{FS=" | ";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
OUTPUT
4$|$Theekshana$|$Second$|$0
5$|$Teju$|$First$|$0
6$|$Theekshitha$|$Second$|$0
I have not mentiond | as the output field separator but still it appears, why is this happening and the difference is zero too
I am just 6 days old in unix,please answer even if its easy
your field separator is only the pipe symbol, so surrounding whitespace is part of the field definitions and that's what you see in the output. In combined uses pipe has the regex special meaning and need to be escaped. In your second case it means space or space is the field separator.
$ awk 'BEGIN {FS=" *\\| *"; OFS="$"}
{d=sqrt(($NF-$(NF-1))^2); $1=$1;
print d "\t" $0,d}' file | sort -n | tail -3 | cut -f2-
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
a slight rewrite will eliminate the number of fields dependency and fixes the format.

Is there a way to print lines from a file from n to m and than reverse their positions?

I'm trying to print text from line 10 to 20 and then reverse their positions.
I've tried this:
sed '10!G;h;$!d' file.txt
But it only prints from 10 to end of the file. Is there any way to stop it at line 20 by using only one sed command?
Almost there, you just need to replace $!d with the 'until' line-number
sed -n '10,20p' tst.txt
// Prints line 10 <--> 20
sed -n '10!G;h;20p' tst.txt
// Prints REVERSE line 10 <--> 20
output:
20
19
18
17
16
15
14
13
12
11
10
tst.txt:
1
2
3
4
...
19
20
Info
You can use this to print a range of lines:
sed -n -e 10,20p file.txt | tac
tac will reverse the order of the lines
And for those of you without tac (like those mac users out there):
sed -n -e 10,20p file.txt | tail -r

awk length is counting +1

I'm trying, as an exercise, to output how many words exist in the dictionary for each possible length.
Here is my code:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
Here is the output:
...
1799 5
427 4
81 3
1 2
My problem is that awk length count one more letter for each word in my file. The right output should have been:
1799 4
427 3
81 2
1 1
I checked my file and it does not contain any space after the word:
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
...
So I guess awk is counting the newline as a character, despite the fact it is not supposed to.
Is there any solution? Or something I'm doing wrong?
I'm gonna venture a guess. Isn't your awk expecting "U*X" style newlines (LF), but your dico.txt has Windows style (CR+LF). That easily give you the +1 on all lengths.
I took your four words:
$ cat dico.txt
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
And ran your line:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
1 11
1 10
1 8
1 7
So far so good. Now the same, but dico.txt with windows newlines:
$ cat dico.txt | todos > dico_win.txt
$ awk '{print length}' dico_win.txt | sort -nr | uniq -c
1 12
1 11
1 9
1 8

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Resources