Extract number in every line of TSV file - bash

I have a file with tab-separated-values and also with blank spaces like this:
! (desambiguación) http://es.dbpedia.org/resource/!_(desambiguación) 5
! (álbum) http://es.dbpedia.org/resource/!_(álbum_de_Trippie_Redd) 2
!! http://es.dbpedia.org/resource/!! 4
$9.99 http://es.dbpedia.org/resource/$9.99 6
Tomlinson http://es.dbpedia.org/resource/(10108)_Tomlinson 20
102 Miriam http://es.dbpedia.org/resource/(102)_Miriam 2
2003 QQ47 http://es.dbpedia.org/resource/(143649)_2003_QQ47 2
I want to extract the last number of every line:
5
2
4
6
20
2
2
For that, I have done this:
while read line;
do
NUMBER=$(echo $line | cut -f 3 -d ' ')
echo $NUMBER
done < $PAIRCOUNTS_FILE
The main problem is that some lines have more spaces than others and cut doesn't work for me with default delimiter (tab). I dont' know why, maybe because I am using WSL.
I have tried cut with several options but it doesn't work in anyway:
NUMBER=$(echo $line | cut -f 3 -d ' ')
NUMBER=$(echo $line | cut -f 4 -d ' ')
NUMBER=$(echo $line | cut -f 2)
NUMBER=$(echo $line | cut -f 3)
Hope you can help me with this. Thanks in advance.

I want to extract the last number of every line:
You could use grep
grep -Eo '[[:digit:]]+$' file
Or mapfile aka readarray which is a bash4+ feature.
mapfile -t array < file
printf '%s\n' "${array[#]##* }"

You can use awk:
awk '{print $NF}' file
With cut (if it is truly TAB separated and 3 fields per line):
cat file | cut -f3
If you have some variable number of fields per line, use rev|cut|rev to get the last field:
cat file | rev | cut -f1 | rev
Or with pure Bash and parameter expansion:
while IFS= read -r line; do
last=${line##* } # that is a literal TAB in the parameter expansion
printf "%s\n" "$last";
done <file
Or, read into a bash array and echo the last field:
while IFS=$'\t' read -r -a arr; do
echo "${arr[${#arr[#]}-1]}"
done <file
If you have a mixture of tabs and spaces you can do what usually is a mistake and break a Bash variable on white spaces in general (tabs and spaces) into an array:
while IFS= read -r line; do
arr=($line) # break on either tab or space without quotes
echo "${arr[${#arr[#]}-1]}"
done <file

Related

how to awk pattern as variable and loop the result?

I assign a keyword as variable, and need to awk from a file using this variable and loop. The file has millions of lines.
i have tried the code below.
DEVICE="DEV2"
while read -r line
do
echo $line
X_keyword=`echo $line | cut -d ',' -f 2 | grep -w "X" | cut -d '=' -f2`
echo $X_keyword
done <<< "$(grep -w $DEVICE $config)"
log="Dev2_PRT.log"
while read -r file
do
VALUE=`echo $file | cut -d '|' -f 1`
HEADER=`echo $VALUE | cut -c 1-4`
echo $file
if [[ $HEADER = 'PTR:' ]]; then
VALUE=`echo $file | cut -d '|' -f 4`
echo $VALUE
XCOORD+=($VALUE)
((X++))
fi
done <<< "awk /$X_keyword/ $log"
expected result:
the log files content lots of below:
PTR:1|2|3|4|X_keyword
PTR:1|2|3|4|Y_rest .....
Filter the X_keyword and get the field no 4.
Unfortunately your shell script is simply the wrong approach to this problem (see https://unix.stackexchange.com/q/169716/133219 for some of the reasons why) so you should set it aside and start over.
To demonstrate the solution, lets create a sample input file:
$ seq 10 | tee file
1
2
3
4
5
6
7
8
9
10
and a shell variable to hold a regexp that's a character list of the chars 5, 6, or 7:
$ var='[567]'
Now, given the above input, here is the solution for how to g/re/p pattern as variable and count how many results:
$ awk -v re="$var" '$0~re{print; c++} END{print "---" ORS c+0}' file
5
6
7
---
3
If that's not all you need then please edit your question to clarify your requirements and provide concise, testable sample input and expected output.

bash scripting to add users

I created a bash script to read information such as username, group etc., from a text file and create users based on it in linux. The code seems to function properly and creates the users as desired. But the user information in the last line of the text file always gets misinterpreted. Even if i delete it then the next last line gets misinterpreted i.e., the text is read wrongly.
`
#!/bin/bash
userfile="users.txt"
IFS=$'\n'
if [ ! -f "$userfile" ]
then
echo "File does not exist. Specify a valid file and try again. "
exit
fi
groups=(`cut -f 4 "$userfile" | sed 's/ //'`)
fullnames=(`cut -f 1 "$userfile" | sed 's/,//' | sed 's/"//g'`)
username1=(`cut -f 1 "$userfile" |sed 's/,//' | sed 's/"//' | tr [A-Z] [a-z] | awk '{print substr($2,1,1) substr($3,1,1) substr($1,1,1)}'`)
username2=(`cut -f 4 "$userfile" | tr [A-Z] [a-z] | awk '{print substr($1,1,1)}'`)
i=0
n=${#username1[#]}
for (( q=0; q<n; q++ ))
do
usernames[$q]=${username1[$q]}"${username2[$q]}"
done
declare -a usernames
x=0
created=0
for user in ${usernames[*]}
do
adduser -c ${fullnames[$x]} -p 123456789 -f 15 -m -d /home/${groups[$x]}/$user -K LOGIN_RETRIES=3 -K PASS_MAX_DAYS=30 -K PASS_WARN_AGE=3 -N -s /bin/bash $user 2> /dev/null
usermod -g ${groups[$x]} $user
chage -d 0 $user
let created=$created+1
x=$x+1
echo -e "User $user created "
done
echo "$created Users created"
enter image description here`
#!/bin/bash
userfile="./users.txt"; # <-- Config
while read line; do
# FULL NAME
# Capture all between quotes as full name
fullname=$(printf '%s' "${line}" | sed 's/^"\(.*\)".*/\1/')
# Remove spaces and punctuations???:
fullname=$(printf '%s' "${fullname}" | tr -d '[:punct:][:blank:]')
# Right-side names:
partb=$(printf '%s' "${line}" | sed "s/^\".*\"//g")
# CODE 1, capture second row
code1=$(printf '%s' "${partb}" | cut -f 2 )
# CODE 2, capture third row
code2=$(printf '%s' "${partb}" | cut -f 3 )
# GROUP, capture fourth row
group=$(printf '%s' "${partb}" | cut -f 4 )
# Print only for report
echo "fullname: ${fullname}\n code 1: ${code1}\n code 2: ${code2}\n group: ${group}\n"
done <${userfile}
Maybe these are the fields that you want, now you have it in variables for manipulate them: $fullname, $code1, $code2 and $group.
Although maybe the fail that you observed was due to some misplaced quotation mark in the text file or the line breaks, on the attached screenshot I can see one missed quote.

Creating a bash array, separated by new lines

I am reading in from a .txt file which looks something like this:
:DRIVES
name,server,share_1
other_name,other_server,share_2
new_name,new_server,share 3
:NAME
which is information to mount drives. I want to load them into a bash array to cycle through and mount them, however my current code breaks at the third line because the array is being created by any white space. Instead of reading
new_name,new_server,share 3
as one line, it reads it as 2 lines
new_name,new_server,share
3
I have tried changing the value of IFS to
IFS=$'\n' #and
IFS='
'
however neither work. How can I create an array from the above file separated by newlines. My code is below.
file_formatted=$(cat ~/location/to/file/test.txt)
IFS='
' # also tried $'\n'
drives=($(sed 's/^.*:DRIVES //; s/:.*$//' <<< $file_formatted))
for line in "${drives[#]}"
do
#seperates lines into indiviudal parts
drive="$(echo $line | cut -d, -f2)"
server="$(echo $line | cut -d, -f3)"
share="$(echo $line | cut -d, -f4 | tr '\' '/' | tr '[:upper:]' '[:lower:]')"
#mount //$server/$share using osascript
#script breaks because it tries to mount /server/share instead of /server/share 3
EDIT:
tried this and got the same output as before:
drives=$(sed 's/^.*:DRIVES //; s/:.*$//' <<< $file_formatted)
while IFS= read -r line; do
printf '%s\n' "$line"
done <<< "$drives"
This is the correct way to iterate over your file; no arrays needed.
{
# Skip over lines until we read :DRIVES
while IFS= read -r line; do
[[ $line = :DRIVES ]] && break
done
# Split each comma-separated line into the desired variables,
# until we read :NAMES, wt which point we break out of this loop
while IFS=, read -r drive server share; do
[[ $drive == :NAMES ]] && break
# Use $drive, $server, and $share
done
# No need to read the rest of the file, if any
} < ~/location/to/file/test.txt

bash assign variable to another after operation

I'm trying to print domain and topLeveldomain variables (example.com)
$line = example.com
domain =$line | cut -d. -f 1
topLeveldomain = $line | cut -d. -f 2
However when I try and echo $domain, it doesn't display desired value
test.sh: line 4: domain: command not found
test.sh: line 5: topLeveldomain: command not found
I suggest:
line="example.com"
domain=$(echo "$line" | cut -d. -f 1)
topLeveldomain=$(echo "$line" | cut -d. -f 2)
The right code for this should be:
line="example.com"
domain=$(echo "$line" | cut -d. -f 1)
topLeveldomain=$(echo "$line" | cut -d. -f 2)
Consider the right syntax of bash:
variable=value
(there are no blanks allowed)
if you want to use the content of the variable you have to add a leading $
e.g.
echo $variable
You don't need external tools for this, just do this in bash
$ string="example.com"
# print everything upto first de-limiter '.'
$ printf "${string%%.*}\n"
example
# print everything after first de-limiter '.'
$ printf "${string#*.}\n"
com
Remove spaces around =:
line=example.com # YES
line = example.com # NO
When you create a variable, do not prepend $ to the variable name:
line=example.com # YES
$line=example.com # NO
When using pipes, you need to pass standard output to the next command. Than means, you usually need to echo variables or cat files:
echo $line | cut -d. -f1 # YES
$line | cut -d. -f1 # NO
Use the $() syntax to get the output of a command into a variable:
new_variable=$(echo $line | cut -d. -f1) # YES
new_variable=echo $line | cut -d. -f1 # NO
I would rather use AWK:
domain="abc.def.hij.example.com"
awk -F. '{printf "TLD:%s\n2:%s\n3:%s\n", $NF, $(NF-1), $(NF-2)}' <<< "$domain"
Output
TLD:com
2:example
3:hij
In the command above, -F option specifies the field separator; NF is a built-in variable that keeps the number of input fields.
Issues with Your Code
The issues with your code are due to invalid syntax.
To set a variable in the shell, use
VARNAME="value"
Putting spaces around the equal sign will cause errors. It is a good
habit to quote content strings when assigning values to variables:
this will reduce the chance that you make errors.
Refer to the Bash Guide for Beginners.
this also works:
line="example.com"
domain=$(echo $line | cut -d. -f1)
toplevel=$(cut -d. -f2 <<<$line)
echo "domain name=" $domain
echo "Top Level=" $toplevel
You need to remove $ from line in the beginning, correct the spaces and echo $line in order to pipe the value to cut . Alternatively feed the cut with $line.

hash each line in text file

I'm trying to write a little script which will open a text file and give me an md5 hash for each line of text. For example I have a file with:
123
213
312
I want output to be:
ba1f2511fc30423bdbb183fe33f3dd0f
6f36dfd82a1b64f668d9957ad81199ff
390d29f732f024a4ebd58645781dfa5a
I'm trying to do this part in bash which will read each line:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line
done
later on I do:
$ more 123.txt | ./read.line.by.line.sh | md5sum | cut -d ' ' -f 1
but I'm missing something here, does not work :(
Maybe there is an easier way...
Almost there, try this:
while read -r line; do printf %s "$line" | md5sum | cut -f1 -d' '; done < 123.txt
Unless you also want to hash the newline character in every line you should use printf or echo -n instead of echo option.
In a script:
#! /bin/bash
cat "$#" | while read -r line; do
printf %s "$line" | md5sum | cut -f1 -d' '
done
The script can be called with multiple files as parameters.
You can just call md5sum directly in the script:
#!/bin/bash
#read.file.line.by.line.sh
while read line
do
echo $line | md5sum | awk '{print $1}'
done
That way the script spits out directly what you want: the md5 hash of each line.
this worked for me..
cat $file | while read line; do printf %s "$line" | tr -d '\r\n' | md5 >> hashes.csv; done

Resources