Trim line to the first comma (bash) - bash

I have a line from which I need to cut the branch name to the first comma:
commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)
I need to cut out MGB-322.
The number of characters in a line is always different.
awk -F "origin/" '{print $2}' - this is how I cut out
MGB-322, refs/pipelines/36877)
But how to tell it to trim to the first comma?
I tried doing it via substr,
awk -F "origin/" '{print substr ($2,1, index $2 ,)}'
But it is not clear how to correctly specify the comma in index

With any awk. Use / and , as field separator:
awk '{print $3}' FS='[/,]' file
Output:
MGB-322
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

With OP's code fix: considered that you have only occurrence of origin in case you have more than occurrence then change $NF to $2 in following code. Written and tested in https://ideone.com/xjv2we
awk -F"origin/" '{print $NF}' Input_file
sed could be also helpful here, generic solution it's based on first occurrence of comma and / as per OP's thread title. I have written this on mobile so couldn't test it as of now should with though and will test it after sometime.
sed 's/\([^,]*\),\([^/]*\)\/\(.*\)/\3/' Input_file

"I need to cut out MGB-322."
You can use cut in two steps:
echo "${line}" | cut -d"/" -f2 | cut -d"," -f1
I would prefer one step with awk (already anwered by others) or sed
echo "${line}" | sed -r 's/.*origin.(.*), refs.*/\1/'

Why spawn procs? bash's built-in parameter parsing will handle this.
If
$: line="commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)"
then
$: [[ "$line" =~ .*origin.(.*), ]] && echo "${BASH_REMATCH[1]}"
MGB-322
or maybe
$: tmp=${line#*, origin/}; echo ${tmp%,*}
MGB-322
or even
$: IFS=",/" read _ _ x _ <<< "$line" && echo $x
MGB-322
c.f. https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

Related

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

How to print from the first period to the second period using awk?

20200601.title.info.event.txt is the file name. I want to use awk to print whatever is in between the first two periods. So in this case "title." Appreciate any help.
The file is a $1 variable. All files are formatted with the same info.
I'm using
FILE=$1
DATE=echo $FILE | awk '{printf "%", substr ($1,1,8)' -
TITLE=
Need to grab the TITLE in between the the first two periods to use as a variable elsewhere. The TITLE will be different for every file. Thank you all for the help.
Like this:
cut -d'.' -f2 <<< 20200601.title.info.event.txt
This is the straightforward way to cut a string.
And with awk like #Mihir wrote in comments:
awk -F. '{print $2}' <<< 20200601.title.info.event.txt
Using awk's split() instead of FS in case it's used for something else::
$ awk '
BEGIN {
split("20200601.title.info.event.txt",a,".")
print a[1]
}'
title
Using bash's =~ operator:
$ [[ "20200601.title.info.event.txt" =~ \.[^.]*\. ]] && echo ${BASH_REMATCH[0]:1:-1}
title
Using sed:
$ sed 's/^[^.]*.\|\..*//g' <<< "20200601.title.info.event.txt"
title
In all cases you could set the string to a variable first and use the variable instead of the string.

Split String in Unix Shell Script

I have a String like this
//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
and want to get last part of
00000000957481f9-08d035805a5c94bf
Let's say you have
text="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
If you know the position, i.e. in this case the 9th, you can go with
echo "$text" | cut -d'/' -f9
However, if this is dynamic and your want to split at "/", it's safer to go with:
echo "${text##*/}"
This removes everything from the beginning to the last occurrence of "/" and should be the shortest form to do it.
For more information on this see: Bash Reference manual
For more information on cut see: cut man page
The tool basename does exactly that:
$ basename //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf
00000000957481f9-08d035805a5c94bf
I would use bash string function:
$ string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
$ echo "${string##*/}"
00000000957481f9-08d035805a5c94bf
But following are some other options:
$ awk -F'/' '$0=$NF' <<< "$string"
00000000957481f9-08d035805a5c94bf
$ sed 's#.*/##g' <<< "$string"
00000000957481f9-08d035805a5c94bf
Note: <<< is herestring notation. They do not create a subshell, however, they are NOT portable to POSIX sh (as implemented by shells such as ash or dash).
In case you want more than just the last part of the path,
you could do something like this:
echo $PWD | rev | cut -d'/' -f1-2 | rev
You can use this BASH regex:
s='//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf'
[[ "$s" =~ [^/]+$ ]] && echo "${BASH_REMATCH[0]}"
00000000957481f9-08d035805a5c94bf
This can be done easily in awk:
string="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"
echo "${string}" | awk -v FS="/" '{ print $NF }'
Use "/" as field separator and print the last field.
You can try this...
echo //ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf |awk -F "/" '{print $NF}'

Use Awk to extract substring

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,
echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'
While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)
I am asking in general, how to write a compatible awk script that performs the same functionality ...
You just want to set the field separator as . using the -F option and print the first field:
$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0
Same thing but using cut:
$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0
Or with sed:
$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0
Even grep:
$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0
Or just use cut:
echo aaa0.bbb.ccc | cut -d'.' -f1
I am asking in general, how to write a compatible awk script that
performs the same functionality ...
To solve the problem in your quesiton is easy. (check others' answer).
If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)
for example:
some awk works on string in terms of characters, some with bytes
some supports \x escape, some not
FS interpreter works differently
keywords/reserved words abbreviation restriction
some operator restriction e.g. **
even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
the implementation of certain functions are also different. (your problem is one example, see below)
well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.
There are two reasons why your awk line behaves differently on gawk and mawk:
your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.
gawk and mawk implemented substr() differently.
You don't need awk for this...
echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc
echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc
x=aaa0.bbb.ccc; echo ${x/.*/}
Heavier options:
sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc
You do not need any external command at all, just use Parameter Expansion in bash:
hostname=aaa0.bbb.ccc
echo ${hostname%%.*}
if you don't want to change the input field separator, then it's possible to use split function:
echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'
documentation:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep
and store the pieces in array and the separator
strings in the seps array.
awk is still the cleanest approach :
mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc
aaa0
If there's stuff before or after :
mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]' <<< 'some aaa0.bbb.ccc text'
aaa0

Length of a specific field, and showing the record in much easier way

My goal is to find out the length of the second field and if the length is more than five characters, then I need to show the entire record using shell scripts/command.
echo "From the csv file"
cat latency.csv |
while read line
do
latency=`echo $line | cut -d"," -f2 | tr -d " "`
length=$(echo ${#latency})
if [ $length -gt 5 ]
then
echo $line
fi
done
There is nothing wrong with my code, but being UNIX/Linux, I thought there should be a simpler way of doing such things.
Is there one such simpler method?
awk -F, 'length($2)>5' file
this should work
updated
awk -F, '{a=$0;gsub(/ /,"",$2);if(length($2)>5)print a}' file
awk -F, '{
t = $2
gsub(/ /, x, t)
if (length(t) > 5)
print
}' latency.csv
Or:
perl -F, -ane'
print if
$F[1] =~ tr/ //dc > 5
' latency.csv

Resources