merge lines until end-of-record marker is seen - shell

Here is part of my data, I need new line characters to be removed from the lines, excluding those lines ending with a sequence matching the format |HH:MM:SS.
Here are the first two records. First record started with "Reset system" and second with "Collaborator informs".
Reset system password SISMED WE|Collaborator requests password reset of SISMED WEB system.
Login: John Doe
Nome: Jackie
Locat: D. XYZ – UA ABC Al
Setor/Depto: Administration
Floor: 1st
Tel./Ramal: 358-108|14/01/2015 |11:23:22
Collaborator informs that he can not open archiv ... |Collaborator informs you that you can not open files
Path: \\abc\def\ghi\jkl\mno
File: ESCALAS.xls
Name: Hutch cock
Locat: D. Al Mo
Setor/Depto: Hos
Floor: 2nd
Tel./Ramal: 1521
IP: 1.5.2.14|14/01/2015 |11:26:21
I need output some thing like below
Reset system password SISMED WE|Collaborator requests password reset of SISMED WEB system.Login: John Doe Nome: Jackie Locat: D. XYZ – UA ABC Al Setor/Depto: Administration Floor: 1st Tel./Ramal: 358-108|14/01/2015 |11:23:22
Collaborator informs that he can not open archiv ... |Collaborator informs you that you can not open files Path: \\abc\def\ghi\jkl\mno File: ESCALAS.xls Name: Hutch cock Locat: D. Al Mo Setor/Depto: Hos Floor: 2nd Tel./Ramal: 1521 IP: 1.5.2.14|14/01/2015 |11:26:21
Can some body please help me with UNIX commands.
Thank you.

In native bash, aiming for readability over terseness:
#!/usr/bin/env bash
# if we were passed a filename as an argument, read from that file
# otherwise, this script reads from stdin
[[ $1 ]] && exec <"$1"
# ERE-syntax regex matching end-of-record marker
end_of_record_re='[|][[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}[[:space:]]*$'
buffer='' # start out with an empty buffer
while IFS= read -r line; do # while we can, read a line.
if ! [[ $line =~ $end_of_record_re ]]; then # unless it has an end marker...
buffer+=" $line" # ...add to our buffer, preceded by a space
else # if the line has an end marker...
printf '%s\n' "${buffer# }${line}" # ...print buffer except for first space
buffer= # ...and reset the buffer to be empty
fi
done
# finally, if we have trailing content, print it out.
[[ $buffer ]] && printf '%s\n' "${buffer# }"

Related

bash command to check if line has certain pattern

I have a file in which I have to check a line that begins with a certain pattern. for example - id: 34. I wrote bash script but it does not seem to detect the line
#!/bin/bash
id=34
# Read the file line by line
while read line; do
# Check if the line starts with pattern
if [[ $line =~ ^[[:space:]]-[[:space:]]id:[[:space:]]$id ]]; then
in_section=true
echo "$line"
fi
done < file.txt
sample file
$cat file.txt
apiVersion: v1
data:
topologydata: |
config:
topology:
spspan:
- id: 1
name: hyudcda1-
siteids:
- 34
spssite:
- id: 34
location: PCW
matesite: tesan
You was close, but better use grep:
grep -E "^[[:space:]]+-[[:space:]]+id:[[:space:]]+$id" file
And you should give a try to a YAML parser: yq
Your regular expression matches exactly one single space before the - character while read removes the leading and trailing spaces, so your $line variable value has zero leading spaces. Try:
^[[:space:]]*-[[:space:]]id:[[:space:]]$id
It will match with zero or any number of leading spaces. If you can also have zero or more than one space between - and id and between id and the integer, try:
^[[:space:]]*-[[:space:]]*id:[[:space:]]*$id
And if you want read to keep the leading spaces try:
while IFS= read line; do
Finally if, instead of zero or more, you want to match one or more spaces replace * by +.

How to read commented line in a file and copy the same ..as it is to other file in shell script

I have file (Name test.func) with a comments as below
#--------------------
# DOG $ CAT NAMES
#--------------------
Brownie
Blacky
Vicky
Pammy
#--------------
# MOBILE & LAPTOP NAMES
#--------------
Lenovo
Oppo
Realme
The code i have written is as below
TestFile=$(cat /usr/test.func)
for line in $TestFile
echo "line is $line"
if [[ "$line" == *"#"* ]]; then
echo "$line is commented"
echo "$line" >>test_copy.func
echo " "
fi
if ...
#Some other logic here
fi
done
Output is giving as below (in test_copy.func)
line is #----------
#-------- is commented
line is #
# is commented
line is DOG
line is &
line is CAT
line is NAMES
*Some logic is performed*
line is #----------
#-------- is commented
line is #
# is commented
line is MOBILE
line is &
line is LAPTOP
line is NAMES
*Some logic is performed*
Expected output in test_copy.func should be as below
#--------------------
# DOG $ CAT NAMES
#--------------------
*Output as per the logic*
#--------------
# MOBILE & LAPTOP NAMES
#--------------
*Output as per the logic*
Commented lines are splited in the actual output.
But Expected result should be as in the source file
Can anyone help me to resolve this issue
code
The code:
TestFile=$(cat /usr/test.func)
for line in $TestFile
does not loop over the lines of the file, but over the "words" (contiguous strings of non-whitespace characters). The variable TestFile contains the contents of the file, but the for loop is subject to field splitting. In other words, if the file contains "foo bar baz", the loop is equivalent to for line in foo bar baz; do .... This is a very fragile construction, as it is also subject to glob expansion, etc. For example, if the file contains wildcards (eg foo * bar), those wildcards will be expanded (and foo * bar expands to a string that contains all the names in the current directory).
The standard way to iterate over the lines of a file is
while read line; do ... done < /usr/test.func
But this is terribly slow and should generally be avoided. Tools like sed and awk are far more appropriate. It's normally a bad idea to read through a file on multiple passes, but while read is so slow that you could read the file 50 times with other tools before you would likely begin to notice. You probably don't want to copy lines that merely contain a # (as the *"#"* expression will do, but only want to copy lines that begin with #, but that's a different question). I would recommend either:
sed -n -e '/^\s*#/p' /usr/test.func > test_copy.func
while read -r line; do some_other_logic "$line"; done < /usr/test.func
or:
awk '/^\s*#/{print > "test_copy.func"}
{ some other logic here }' /usr/test.func

How to split a string by a defined string with multiple characters in bash?

Following output consisting of several devices needs to be parsed:
0 interface=ether1 address=172.16.127.2 address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05 mac-address=CC:2D:E0:00:00:08
identity="myrouter1" platform="MikroTik" version="6.43.8 (stable)"
1 interface=ether2 address=10.5.44.100 address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07 mac-address=CC:2D:E0:00:00:05
identity="myrouter4" platform="MikroTik" version="6.43.8 (stable)"
3 interface=ether4 address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017 mac-address=B8:69:F4:00:00:07
identity="myrouter2" platform="MikroTik" version="6.43.8 (stable)"
...
10 interface=ether5 address=10.26.51.24 address4=10.26.51.24
address6=fe80::ba69:f4ff:fe00:0039 mac-address=B8:69:F4:00:00:04
identity="myrouter3" platform="MikroTik" version="6.43.8 (stable)"
11 interface=ether3 address=10.26.51.100 address4=10.26.51.100
address6=fe80::ce2d:e0ff:fe00:f00 mac-address=CC:2D:E0:00:00:09
identity="myrouter5" platform="MikroTik" version="6.43.8 (stable)"
edit: for ease of things I shortened and anonymized the output, first block has 7 lines, second block has 5 lines, third block has 7 lines, fourth block 4 lines, so the number of lines is inconsistent.
Basically its the output from a Mikrotik device: "/ip neighbor print detail"
Optimal would be to access every device(=number) on its own, then further access all setting=value (of one device) seperately to finally access settings like $device[0][identity] or similar.
I tried to set IFS='\d{1,2} ' but seems IFS only works for single character seperation.
Looking on the web I didn't find a way to accomplish this, am I looking for the wrong way and there is another way to solve this?
Thanks in advance!
edit: Found this solution Split file by multiple line breaks which helped me to get:
devices=()
COUNT=0;
while read LINE
do
[ "$LINE" ] && devices[$COUNT]+="$LINE " || { (( ++COUNT )); }
done < devices.txt
then i could use #Kamil's solution to easily access values.
While your precise output format is a bit unclear, bash offers an efficient way to parse the data making use of process substitution. Similar to command substitution, process substitution allows redirecting the output of commands to stdin. This allows you to read the result of a set of commands that reformat your mikrotik file into a single line for each device.
While there are a number of ways to do it, one of the ways to handle the multiple gymnastics needed to reformat the multi-line information for each device into a single line is by using tr and sed. tr to first replace each '\n' with an '_' (or pick your favorite character not used elsewhere), and then again to "squeeze" the leading spaces to a single space (technically not required, but for completeness). After replacing the '\n' with '_' and squeezing spaces, you simply use two sed expressions to change the "__" (resulting from the blank line) back into a '\n' and then to remove all '_'.
With that you can read your device number n and the remainder of the line holing your setting=value pairs. To ease locating your "identity=" line, simply converting the line into an array and looping using parameter expansions (for substring removal), you can save and store the "identity" value as id (trimming the double-quotes is left to you)
Now it is simply a matter of outputting the value (or doing whatever you wish with them). While you can loop again and output the array values, it is just a easy to pass the intentionally unquoted line to printf and let the printf-trick handle separating the setting=value pairs for output. Lastly, you form your $device[0][identity] identifier and output as the final line in the device block.
Putting it altogether, you could do something like the following:
#!/bin/bash
id=
while read n line; do ## read each line from process substitution
a=( $line ) ## split line into array
for i in ${a[#]}; do ## search array, set id
[ "${i%=*}" = "identity" ] && id="${i##*=}"
done
echo "device=$n" ## output device=
printf " %s\n" ${line[#]} ## output setting=value (unquoted on purpose)
printf " \$device[%s][%s]\n" "$n" "$id" ## $device[0][identity]
done < <(tr '\n' '_' < "$1" | tr -s ' ' | sed -e 's/__/\n/g' -e 's/_//g')
Example Use/Output
Note, the script takes the filename to parse as the first input.
$ bash mikrotik_parse.sh mikrotik
device=0
interface=ether1
address=172.16.127.2
address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05
mac-address=CC:2D:E0:00:00:08
identity="myrouter1"
platform="MikroTik"
version="6.43.8
(stable)"
$device[0]["myrouter1"]
device=1
interface=ether2
address=10.5.44.100
address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07
mac-address=CC:2D:E0:00:00:05
identity="myrouter4"
platform="MikroTik"
version="6.43.8
(stable)"
$device[1]["myrouter4"]
device=3
interface=ether4
address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017
mac-address=B8:69:F4:00:00:07
identity="myrouter2"
platform="MikroTik"
version="6.43.8
(stable)"
$device[3]["myrouter2"]
Look things over and let me know if you have further questions. As mentioned at the beginning, you haven't defined an explicit output format you are looking for, but gleaning what information was in the question, this should be close.
I think you're on the right track with IFS.
Try piping IFS=$'\n\n' (to break apart the line groups by interface) through cut (to extract the specific field(s) you want for each interface).
Bash likes single long rows with delimter separated values. So first we need to convert your file to such format.
Below I read 4 lines at a time from input. I notices that the output spans over 4 lines only - I just concatenate the 4 lines and act as if it is a single line.
while
IFS= read -r line1 &&
IFS= read -r line2 &&
IFS= read -r line3 &&
IFS= read -r line4 &&
line="$line1 $line2 $line3 $line4"
do
if [ -n "$line4" ]; then
echo "ERR: 4th line should be empt - $line4 !" >&2
exit 4
fi
if ! num=$(printf "%d" ${line:0:3}); then
echo "ERR: reading number" >&2
exit 1
fi
line=${line:3}
# bash variables can't have `-`
line=${line/mac-address=/mac_address=}
# unsafe magic
vars=(interface address address4
address6 mac_address identity platform version)
for v in "${vars[#]}"; do
unset "$v"
if ! <<<"$line" grep -q "$v="; then
echo "ERR: line does not have $v= part!" >&2
exit 1
fi
done
# eval call
if ! eval "$line"; then
echo "ERR: eval line=$line" >&2
exit 1
fi
for v in "${vars[#]}"; do
if [ -z "${!v}" ]; then
echo "ERR: variable $v was not set in eval!" >&2
exit 1;
fi
done
echo "$num: $interface $address $address4 $address6 $mac_address $identity $platform $version"
done < file
then I retrieve the leading number from the line, which I suspect was printed with printf "%3d" so I just slice the line ${line:0:3}
for the rest of the line I indent to use eval. In this case I trust upstream, but I try to assert some cases (variable not defined in the line, some syntax error and similar)
then the magic eval "$line" happens, which assigns all the variables in my shell
after that I can use variables from the line like normal variables
live example at tutorialspoint
Eval command and security issues

Beginner bash scripting

I want to break down a file that has multiple lines that follow this style:
e-mail;year/month/date;groups;sharedFolder
An example line from file:
alan.turing#cam.ac.uk;1912/06/23;visitor;/visitorData
Essentially I want to break each line up into four arrays that can be accessed later on in a loop to create a new user for each line.
I have declared the arrays already have a file saved as variable 'filename'
Usernames need to be the first three letters of the surname and the first three letters of the first name.
Passwords need to be the users birthdate as day/month/year.
So far this is what I have. Am I on the right track? Are there places I have gone wrong or could improve on?
#reads file and saves into appropriate arrays
while read -r line
do
IFS = $';' read -r -a array <<< "$line"
mailArray += "$(array[0])"
dateArray += "$(array[1])"
groupArray += "$(array[2])"
folderArray += "$(array[3])"
done < $filename
#create usernames from emails
for i in "$(mailArray[#])"
do
IFS=$'.' read -r -a array <<< "$i"
part1 = ${array[0]:0:3}
part2 = ${array[1]:0:3}
user = $part2
user .= $part1
userArray += ("$user")
done
#create passwords from birthdates
for i in "$(dateArray[#])"
do
IFS=$'/' read -r -a array <<< "$i"
password = $part3
password .= $part2
password .= $part1
passArray += ("$password")
done
Not sure if arrays are required here, and if you just want to create username, password from the lines in the desired format, please see below:
# Sample Input Data:
bash$> cat d
alan.turing#cam.ac.uk;1912/06/23;visitor;/visitorData
rob.zombie#cam.ac.uk;1966/06/23;metalhead;/metaldata
donald.trump#stupid.com;1900/00/00;idiot;/idiotique
bash$>
# Sample Output from script:
bash$> ./dank.sh "d"
After processing the line [alan.turing#cam.ac.uk;1912/06/23;visitor;/visitorData], we have the following parameters extracted:
Name: alan
Surname: turing
Birthdate: 1912/06/23
username: alatur
Password: 1912/06/23
After processing the line [rob.zombie#cam.ac.uk;1966/06/23;metalhead;/metaldata], we have the following parameters extracted:
Name: rob
Surname: zombie
Birthdate: 1966/06/23
username: robzom
Password: 1966/06/23
After processing the line [donald.trump#stupid.com;1900/00/00;idiot;/idiotique], we have the following parameters extracted:
Name: donald
Surname: trump
Birthdate: 1900/00/00
username: dontru
Password: 1900/00/00
Now the script, which does this operation.
# Script.
bash$> cat dank.sh
#!/bin/bash
cat "$1" | while read line
do
name=`echo $line | sed 's/^\(.*\)\..*\#.*$/\1/g'`
surname=`echo $line | sed 's/^.*\.\(.*\)\#.*$/\1/g'`
bdate=`echo $line | sed 's/^.*;\(.*\);.*;.*$/\1/g'`
temp1=`echo $name | sed 's/^\(...\).*$/\1/g'`
temp2=`echo $surname | sed 's/^\(...\).*$/\1/g'`
uname="${temp1}${temp2}"
echo "
After processing the line [$line], we have the following parameters extracted:
Name: $name
Surname: $surname
Birthdate: $bdate
username: $uname
Password: $bdate
"
done
bash$>
Basically, i am just running couple of sed commands to extract what is useful and required, and storing them in variables and then one could use them anyways they want to. You could redirect them to a file if you want or print out a pipe separated output.. upto you.
Let me know..

Grep moving spaces?

{
while read -r line ; do
if grep -q '2015' $line
then
echo "inside then"
echo "$line"
echo "yes"
fi
done
} < testinput
Once I execute the above code the output is:
inside then
( nothing is printed in this line its spaces)
yes
Why is that input line is not getting printed in the second output line?
Your help is appreciated. The reason why I am asking is that I actually have to perform a few operations on the input line after the match using grep is successful.
Input file Sample :
2015-07-18-00.07.28.991321-240 I84033A497 LEVEL: Info
PID : 21233902 TID : 9510 PROC : db2sysc 0
INSTANCE: xxxxxxxx NODE : 000 DB : XXXXXXX
APPHDL : 0-8 APPID: *LOCAL.xxxxxxx.150718040727
AUTHID : XXXXXXXX
EDUID : 9510 EDUNAME: db2agent (XXXXXXXX) 0
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure
I need to capture the time when SQLHA shows up in the input file or log file. To do that first I find the match for time in the input file and then I save that time in the variables. Once I find SQLHA I will write the time saved in the variables into an output file. So for every occurrence of SQLHA in the log, I will write the time to the output file.
After the update about what is really wanted, it is fairly clear that
you should probably use awk, though sed would also be an option (but harder). You can do it in shell too, though that's messier.
awk '/^2015-/ { datetime = $1 } / SQLHA / { print datetime }' testinput
Using sed:
sed -n -e '/^2015-/ {s/ .*//; h; n; }' -e '/ SQLHA / { x; p; x; }' testinput
(If you find 2015- at the start of a line, remove the stuff after a space and save it in the hold space. If you find SQLHA with spaces on either side, swap the hold and pattern space (thus placing the saved date/time in the pattern space), then print it, then switch it back. The switch back means that if two lines contain SQLHA between occurrences of the date line, you'll get the same date printed twice, rather than a date and then the first of the SQLHA lines. You end up having to think about what can go wrong, as well as what to do when everything goes right — but that may be more for later than right now.)
Using just sh:
while read -r line
do
case "$line" in
(2015-*) datetime=$(set -- $line; echo $1);; # Absence of quotes is deliberate
(*SQLHA*) echo "$datetime";;
esac
done < testinput
There are many other ways to do that in shell. Some of them are safer than this. It'll work on the data shown safely, but you might get to run against maliciously created data.
while read -r line
do
case "$line" in
(2015-*) datetime=$(echo "$line" | sed 's/ .*//');;
(*SQLHA*) echo "$datetime";;
esac
done < testinput
This invokes sed once per date line. Using Bash, I guess you can use:
while read -r line
do
case "$line" in
(2015-*) datetime="${line/ */}";; # Replace blank and everything after with nothing
(*SQLHA*) echo "$datetime";;
esac
done < testinput
This is the least likely to go wrong and avoids executing an external command for each line. You could also avoid case…esac using if and probably [[ so as to get pattern matching. Etc.
Running your script on a Mac, I get error output such as:
grep: 2015-07-18-00.07.28.991321-240: No such file or directory
grep: I84033A497: No such file or directory
grep: LEVEL:: No such file or directory
Are you not seeing that? If you're not, then either you've sent errors to /dev/null (or some other location than the terminal) or you've not shown us exactly the code you're using — or there's a blank line at the top of your testinput file.
This will do what your script is trying to do:
#!/usr/bin/awk -f
/2015/ {
print "inside then"
print
print "yes"
}
This is what I have written(very basic).I will try to run the same program with Grep and post why i am getting the blank space soon.
while read -r line
do
if [[ $line == *2015* ]];
then
dtime=`echo $line | cut -c1-26`
fi
if [[ $line == *SQLHA* ]];
then
echo $dtime
fi
done
} < testinput
Input Used:
2015-07-18-00.07.28.991321-240 I84033A497 LEVEL: Info
EDUID : 9510 EDUNAME: db2agent (SIEB_RPT) 0
FUNCTION: DB2 Common,APIs for DB2 HA Infrastructure, sqlhaAmIin
2015-07-18-00.07.29.991321-240 I84033A497 LEVEL: Info
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlha
2015-07-18-00.07.48.991321-240 I84033A497 LEVEL: Info
EDUID : 9510 EDUNAME: db2agent (SIEB_RPT) 0
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlha
O/p :
2015-07-18-00.07.29.991321
2015-07-18-00.07.48.991321

Resources