how to use cut command -f flag as reverse - bash

This is a text file called a.txt
ok.google.com
abc.google.com
I want to select every subdomain separately
cat a.txt | cut -d "." -f1 (it select ok From left side)
cat a.txt | cut -d "." -f2 (it select google from left side)
Is there any way, so I can get result from right side
cat a.txt | cut (so it can select com From right side)

There could be few ways to do this, one way which I could think of right now could be using rev + cut + rev solution. Which will reverse the input by rev command and then set field separator as . and print fields as per they are from left to right(but actually they are reversed because of the use of rev), then pass this output to rev again to get it in its actual order.
rev Input_file | cut -d'.' -f 1 | rev

You can use awk to print the last field:
awk -F. '{print $NF}' a.txt
-F. sets the record separator to "."
$NF is the last field
And you can give your file directly as an argument, so you can avoid the famous "Useless use of cat"
For other fields, but counting from the last, you can use expressions as suggested in the comment by #sundeep or described in the users's guide under
4.3 Nonconstant Field Numbers. For example, to get the domain, before the TLD, you can substract 1 from the Number of Fields NF :
awk -F. '{ print $(NF-1) }' a.txt

You might use sed with a quantifier for the grouped value repeated till the end of the string.
( Start group
\.[^[:space:].]+ Match 1 dot and 1+ occurrences of any char except a space or dot
){1} Close the group followed by a quantifier
$ End of string
Example
sed -E 's/(\.[^[:space:].]+){1}$//' file
Output
ok.google
abc.google
If the quantifier is {2} the output will be
ok
abc

Depending on what you want to do after getting the values then you could use bash for splitting your domain into an array of its components:
#!/bin/bash
IFS=. read -ra comps <<< "ok.google.com"
echo "${comps[-2]}"
# or for bash < 4.2
echo "${comps[${#comps[#]}-2]}"
google

Related

check if column has more than one value in unix [duplicate]

I have a text file with a large amount of data which is tab delimited. I want to have a look at the data such that I can see the unique values in a column. For example,
Red Ball 1 Sold
Blue Bat 5 OnSale
...............
So, its like the first column has colors, so I want to know how many different unique values are there in that column and I want to be able to do that for each column.
I need to do this in a Linux command line, so probably using some bash script, sed, awk or something.
What if I wanted a count of these unique values as well?
Update: I guess I didn't put the second part clearly enough. What I wanted to do is to have a count of "each" of these unique values not know how many unique values are there. For instance, in the first column I want to know how many Red, Blue, Green etc coloured objects are there.
You can make use of cut, sort and uniq commands as follows:
cat input_file | cut -f 1 | sort | uniq
gets unique values in field 1, replacing 1 by 2 will give you unique values in field 2.
Avoiding UUOC :)
cut -f 1 input_file | sort | uniq
EDIT:
To count the number of unique occurences you can make use of wc command in the chain as:
cut -f 1 input_file | sort | uniq | wc -l
awk -F '\t' '{ a[$1]++ } END { for (n in a) print n, a[n] } ' test.csv
You can use awk, sort & uniq to do this, for example to list all the unique values in the first column
awk < test.txt '{print $1}' | sort | uniq
As posted elsewhere, if you want to count the number of instances of something you can pipe the unique list into wc -l
Assuming the data file is actually Tab separated, not space aligned:
<test.tsv awk '{print $4}' | sort | uniq
Where $4 will be:
$1 - Red
$2 - Ball
$3 - 1
$4 - Sold
# COLUMN is integer column number
# INPUT_FILE is input file name
cut -f ${COLUMN} < ${INPUT_FILE} | sort -u | wc -l
Here is a bash script that fully answers the (revised) original question. That is, given any .tsv file, it provides the synopsis for each of the columns in turn. Apart from bash itself, it only uses standard *ix/Mac tools: sed tr wc cut sort uniq.
#!/bin/bash
# Syntax: $0 filename
# The input is assumed to be a .tsv file
FILE="$1"
cols=$(sed -n 1p $FILE | tr -cd '\t' | wc -c)
cols=$((cols + 2 ))
i=0
for ((i=1; i < $cols; i++))
do
echo Column $i ::
cut -f $i < "$FILE" | sort | uniq -c
echo
done
This script outputs the number of unique values in each column of a given file. It assumes that first line of given file is header line. There is no need for defining number of fields. Simply save the script in a bash file (.sh) and provide the tab delimited file as a parameter to this script.
Code
#!/bin/bash
awk '
(NR==1){
for(fi=1; fi<=NF; fi++)
fname[fi]=$fi;
}
(NR!=1){
for(fi=1; fi<=NF; fi++)
arr[fname[fi]][$fi]++;
}
END{
for(fi=1; fi<=NF; fi++){
out=fname[fi];
for (item in arr[fname[fi]])
out=out"\t"item"_"arr[fname[fi]][item];
print(out);
}
}
' $1
Execution Example:
bash> ./script.sh <path to tab-delimited file>
Output Example
isRef A_15 C_42 G_24 T_18
isCar YEA_10 NO_40 NA_50
isTv FALSE_33 TRUE_66

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

Show with star symbols how many times a user have logged in

I'm trying to create a simple shell script showing how many times a user has logged in to their linux machine for at least one week. The output of the shell script should be like this:
2021-12-16
****
2021-12-15
**
2021-12-14
*******
I have tried this so far but it shows only numeric but i want showing * symbols.
user="$1"
last -F | grep "${user}" | sed -E "s/${user}.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c
Any help?
You might want to refactor all of this into a simple Awk script, where repeating a string n times is also easy.
user="$1"
last -F |
awk -v user="$1" 'BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":");
for(i=1; i<=12; i++) mon[m[i]] = sprintf("%02i", i) }
$1 == user { ++count[$8 "-" mon[$5] "-" sprintf("%02i", $6)] }
END { for (date in count) {
padded = sprintf("%-" count[date] "s", "*");
gsub(/ /, "*", padded);
print date, padded } }'
The BEGIN block creates an associative array mon which maps English month abbreviations to month numbers.
sprintf("%02i", number) produces the value of number with zero padding to two digits (i.e. adds a leading zero if number is a single digit).
The $1 == user condition matches the lines where the first field is equal to the user name we passed in. (Your original attempt had two related bugs here; it would look for the user name anywhere in the line, so if the user name happened to match on another field, it would erroneously match on that; and the regex you used would match a substring of a longer field).
When that matches, we just update the value in the associative array count whose key is the current date.
Finally, in the END block, we simply loop over the values in count and print them out. Again, we use sprintf to produce a field with a suitable length. We play a little trick here by space-padding to the specified width, because sprintf does that out of the box, and then replace the spaces with more asterisks.
Your desired output shows the asterisks on a separate line from the date; obviously, it's easy to change that if you like, but I would advise against it in favor of a format which is easy to sort, grep, etc (perhaps to then reformat into your desired final human-readable form).
If you have GNU sed you're almost there. Just pipe the output of uniq -c to this GNU sed command:
sed -En 's/^\s*(\S+)\s+(\S+).*/printf "\2\n%\1s" ""/e;s/ /*/g;p'
Explanation: in the output of uniq -c we substitute a line like:
6 Dec-15-2021
by:
printf "Dec-15-2021\n%6s" ""
and we use the e GNU sed flag (this is a GNU sed extension so you need GNU sed) to pass this to the shell. The output is:
Dec-15-2021
where the second line contains 6 spaces. This output is copied back into the sed pattern space. We finish by a global substitution of spaces by stars and print:
Dec-15-2021
******
A simple soluction, using tempfile
#!/bin/bash
user="$1"
tempfile="/tmp/last.txt"
IFS='
'
last -F | grep "${user}" | sed -E "s/"${user}".*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) //" | awk '{print $1"-"$2"-"$4}' | uniq -c > $tempfile
for LINE in $(cat $tempfile)
do
qtde=$(echo $LINE | awk '{print $1'})
data=$(echo $LINE | awk '{print $2'})
echo -e "$data "
for ((i=1; i<=qtde; i++))
do
echo -e "*\c"
done
echo -e "\n"
done

grep - how to display another word instead of the matching of grep

Given input like:
ID VALUE
technique lol
technology case
london knife
ocean sky
I'm currently using
grep -Eo '^[^ ]+' FILE | grep "tech"
for match every word which contain "tech" in the ID column.
In this case, it display :
technique
technology
However does anyone can tell me how can I display the word from the second column regarding the word matching in the first column ?
For example how to display the word:
lol
case
(display the value instead the key)
Also, how can I display the key (as above) and the value separate by "=" like ? (without any spaces):
key=value
Thanks
You can grep for lines starting with "tech" and then just display the second column. The exact format depends on how your input file columns are separated. If they are tab separated:
grep '^tech' FILE | cut -f 2
If they are space separated:
grep '^tech' FILE | tr -s ' ' $'\t' | cut -f 2
This "squeezes" repeated spaces and replaces them with a single tab character.
For your second question, you can use
sed -n '/^tech/ s/[[:space:]]\+/=/p' FILE
This means "don't print (-n); on lines matching ^tech, make the substitution and print".
Using awk:
awk '$1 ~ "tech" {print $2}' < inputfile
or with key=value
awk '$1 ~ "tech" {print $1"="$2}' < inputfile

Oneliner to calculate complete size of all messages in maillog

Ok guys I'm really at a dead end here, don't know what else to try...
I am writing a script for some e-mail statistics, one of the things it needs to do is calculate the complete size of all messages in the maillog, this is what I wrote so far:
egrep ' HOSTNAME sendmail\[.*.from=.*., size=' maillog | awk '{print $8}' |
tr "," "+" | tr -cd '[:digit:][=+=]' | sed 's/^/(/;s/+$/)\/1048576/' |
bc -ql | awk -F "." '{print $1}'
And here is a sample line from my maillog:
Nov 15 09:08:48 HOSTNAME sendmail[3226]: oAF88gWb003226:
from=<name.lastname#domain.com>, size=40992, class=0, nrcpts=24,
msgid=<E08A679A54DA4913B25ADC48CC31DD7F#domain.com>, proto=ESMTP,
daemon=MTA1, relay=[1.1.1.1]
So I'll try to explain it step by step:
First I grep through the file to find all the lines containing the actual "size", next i print the 8th field, in this case "size=40992,".
Next I replace all the comma characters with a plus sign.
Then I delete everything except the digits and the plus sign.
Then I replace the beginning of the line with a "(", and I replace the last extra plus sign with a ")" followed by "/1048576". So i get a huge expression looking like this:
"(1+2+3+4+5...+n)/1048576"
Because I want to add up all the individual message sizes and divide it so I get the result in MB.
The last awk command is when I get a decimal number I really don't care for precision so i just print the part before the decimal point.
The problem is, this doesn't work... And I could swear it was working at one point, could it be my expression is too long for bc to handle?
Thanks if you took the time to read through :)
I think a one-line awk script will work too. It matches any line that your egrep pattern matches, then for those lines it splits the eighth record by the = sign and adds the second part (the number) to the SUM variable. When it sees the END of the file it prints out the value of SUM/1048576 (or the byte count in Mibibytes).
awk '/ HOSTNAME sendmail\[.*.from=.*., size=/{ split($8,a,"=") ; SUM += a[2] } END { print SUM/1048576 }' maillog
bc chokes if there is no newline in its input, as happens with your expression. You have to change the sed part to:
sed 's/^/(/;s/+$/)\/1048576\n/'
The final awk will happily eat all your output if the total size is less than 1MB and bc outputs something like .03333334234. If you are not interested in the decimal part remove that last awk command and the -l parameter from bc.
I'd do it with this one-liner:
grep ' HOSTNAME sendmail[[0-9][0-9]*]:..*:.*from=..*, size=' maillog | sed 's|.*, size=\([0-9][0-9]*\), .*|\1+|' | tr -d '\n' | sed 's|^|(|; s|$|0)/1048576\n|' | bc

Resources