How to split a string by a defined string with multiple characters in bash? - bash

Following output consisting of several devices needs to be parsed:
0 interface=ether1 address=172.16.127.2 address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05 mac-address=CC:2D:E0:00:00:08
identity="myrouter1" platform="MikroTik" version="6.43.8 (stable)"
1 interface=ether2 address=10.5.44.100 address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07 mac-address=CC:2D:E0:00:00:05
identity="myrouter4" platform="MikroTik" version="6.43.8 (stable)"
3 interface=ether4 address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017 mac-address=B8:69:F4:00:00:07
identity="myrouter2" platform="MikroTik" version="6.43.8 (stable)"
...
10 interface=ether5 address=10.26.51.24 address4=10.26.51.24
address6=fe80::ba69:f4ff:fe00:0039 mac-address=B8:69:F4:00:00:04
identity="myrouter3" platform="MikroTik" version="6.43.8 (stable)"
11 interface=ether3 address=10.26.51.100 address4=10.26.51.100
address6=fe80::ce2d:e0ff:fe00:f00 mac-address=CC:2D:E0:00:00:09
identity="myrouter5" platform="MikroTik" version="6.43.8 (stable)"
edit: for ease of things I shortened and anonymized the output, first block has 7 lines, second block has 5 lines, third block has 7 lines, fourth block 4 lines, so the number of lines is inconsistent.
Basically its the output from a Mikrotik device: "/ip neighbor print detail"
Optimal would be to access every device(=number) on its own, then further access all setting=value (of one device) seperately to finally access settings like $device[0][identity] or similar.
I tried to set IFS='\d{1,2} ' but seems IFS only works for single character seperation.
Looking on the web I didn't find a way to accomplish this, am I looking for the wrong way and there is another way to solve this?
Thanks in advance!
edit: Found this solution Split file by multiple line breaks which helped me to get:
devices=()
COUNT=0;
while read LINE
do
[ "$LINE" ] && devices[$COUNT]+="$LINE " || { (( ++COUNT )); }
done < devices.txt
then i could use #Kamil's solution to easily access values.

While your precise output format is a bit unclear, bash offers an efficient way to parse the data making use of process substitution. Similar to command substitution, process substitution allows redirecting the output of commands to stdin. This allows you to read the result of a set of commands that reformat your mikrotik file into a single line for each device.
While there are a number of ways to do it, one of the ways to handle the multiple gymnastics needed to reformat the multi-line information for each device into a single line is by using tr and sed. tr to first replace each '\n' with an '_' (or pick your favorite character not used elsewhere), and then again to "squeeze" the leading spaces to a single space (technically not required, but for completeness). After replacing the '\n' with '_' and squeezing spaces, you simply use two sed expressions to change the "__" (resulting from the blank line) back into a '\n' and then to remove all '_'.
With that you can read your device number n and the remainder of the line holing your setting=value pairs. To ease locating your "identity=" line, simply converting the line into an array and looping using parameter expansions (for substring removal), you can save and store the "identity" value as id (trimming the double-quotes is left to you)
Now it is simply a matter of outputting the value (or doing whatever you wish with them). While you can loop again and output the array values, it is just a easy to pass the intentionally unquoted line to printf and let the printf-trick handle separating the setting=value pairs for output. Lastly, you form your $device[0][identity] identifier and output as the final line in the device block.
Putting it altogether, you could do something like the following:
#!/bin/bash
id=
while read n line; do ## read each line from process substitution
a=( $line ) ## split line into array
for i in ${a[#]}; do ## search array, set id
[ "${i%=*}" = "identity" ] && id="${i##*=}"
done
echo "device=$n" ## output device=
printf " %s\n" ${line[#]} ## output setting=value (unquoted on purpose)
printf " \$device[%s][%s]\n" "$n" "$id" ## $device[0][identity]
done < <(tr '\n' '_' < "$1" | tr -s ' ' | sed -e 's/__/\n/g' -e 's/_//g')
Example Use/Output
Note, the script takes the filename to parse as the first input.
$ bash mikrotik_parse.sh mikrotik
device=0
interface=ether1
address=172.16.127.2
address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05
mac-address=CC:2D:E0:00:00:08
identity="myrouter1"
platform="MikroTik"
version="6.43.8
(stable)"
$device[0]["myrouter1"]
device=1
interface=ether2
address=10.5.44.100
address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07
mac-address=CC:2D:E0:00:00:05
identity="myrouter4"
platform="MikroTik"
version="6.43.8
(stable)"
$device[1]["myrouter4"]
device=3
interface=ether4
address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017
mac-address=B8:69:F4:00:00:07
identity="myrouter2"
platform="MikroTik"
version="6.43.8
(stable)"
$device[3]["myrouter2"]
Look things over and let me know if you have further questions. As mentioned at the beginning, you haven't defined an explicit output format you are looking for, but gleaning what information was in the question, this should be close.

I think you're on the right track with IFS.
Try piping IFS=$'\n\n' (to break apart the line groups by interface) through cut (to extract the specific field(s) you want for each interface).

Bash likes single long rows with delimter separated values. So first we need to convert your file to such format.
Below I read 4 lines at a time from input. I notices that the output spans over 4 lines only - I just concatenate the 4 lines and act as if it is a single line.
while
IFS= read -r line1 &&
IFS= read -r line2 &&
IFS= read -r line3 &&
IFS= read -r line4 &&
line="$line1 $line2 $line3 $line4"
do
if [ -n "$line4" ]; then
echo "ERR: 4th line should be empt - $line4 !" >&2
exit 4
fi
if ! num=$(printf "%d" ${line:0:3}); then
echo "ERR: reading number" >&2
exit 1
fi
line=${line:3}
# bash variables can't have `-`
line=${line/mac-address=/mac_address=}
# unsafe magic
vars=(interface address address4
address6 mac_address identity platform version)
for v in "${vars[#]}"; do
unset "$v"
if ! <<<"$line" grep -q "$v="; then
echo "ERR: line does not have $v= part!" >&2
exit 1
fi
done
# eval call
if ! eval "$line"; then
echo "ERR: eval line=$line" >&2
exit 1
fi
for v in "${vars[#]}"; do
if [ -z "${!v}" ]; then
echo "ERR: variable $v was not set in eval!" >&2
exit 1;
fi
done
echo "$num: $interface $address $address4 $address6 $mac_address $identity $platform $version"
done < file
then I retrieve the leading number from the line, which I suspect was printed with printf "%3d" so I just slice the line ${line:0:3}
for the rest of the line I indent to use eval. In this case I trust upstream, but I try to assert some cases (variable not defined in the line, some syntax error and similar)
then the magic eval "$line" happens, which assigns all the variables in my shell
after that I can use variables from the line like normal variables
live example at tutorialspoint
Eval command and security issues

Related

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

Extracting file content using a for loop [duplicate]

I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"

Bash script to add double quotes in .CSV comma delimited file

I need to add double quotes to the csv file. My sample data is like this..
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29 06:10:01",200890899011202395,0899,0279395,356058102052972,89148000005117597971,67756296
I have tried some code available online with awk and sed, it is resulting as below , Error - **First digit in the number is being trimmed like for ex. in '378478' it is only displaying '78478'.
Also it is adding double quotes to already existing double quotes too!** nothing seems to be perfectly working. Please guide me!
"78478","COMPLETED","Tracfone","","",""2020/03/29 09:39:22"","","2787","","356074101197544","89148000005748235454","75176540"
"78328","COMPLETED",""Total Wireless"",""Unlimited Talk"," Text"," & Data (First 25GB High Speed"," then unlimited 2GB)"","50",""2020/03/29 06:10:01"","200890899011202395","0899","0279395","356058102052972","89148000005117597971","67756296"
"78329","COMPLETED",""Cricket Wireless"",""Unlimited Talk"," Text"," & 4G LTE Data w/ 15GB Hotspot"","60",""2020/03/29""
This is the code I am using:
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file # or
sed -E 's/([^,]*) , (.*)/"\1" , "\2"/' file
My total code is the below one. my Intention was to first convert all .xlsx to .csv and then add double quotes to same csv and save it in the same file.i know the $file.csv part is wrong, hence i need some help
find "$Src_Dir" -type f -iname "*.xlsx" -print>path/temp
cat path/temp | while IFS="" read -r -d $'\0' file;
do
echo $file
ssconvert "${file}" --export-type=Gnumeric_stf:stf_csv
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' $file > $file.csv
done
If you want to handle anything other than the simplest CSV files, you should probably move away from sed and awk. There are much better tools available.
For example, if you sudo apt install csvtool (or equivalent) on your favourite distro, you can use its call-per-line functionality to process each line in the input file. See the following script for an example:
#!/bin/bash
function quotify {
# Start empty line, process every field.
line=""
while [[ $# -ne 0 ]] ; do
# Append comma for all but first field, then quoted field.
[[ -n "${line}" ]] && line="${line},"
line="${line}\"$1\""
shift
done
# Output the fully quoted line.
echo "${line}"
}
# Needed to call functions. Also, ensure link: /bin/sh -> /bin/bash.
export -f quotify
# Pretty-print input and output.
echo "Input file:"
sed 's/^/ /' inputFile.csv
echo "Output file:"
csvtool call quotify inputFile.csv | sed 's/^/ /'
Note the quotify function which is called for each line in the CSV file, with the arguments set to each field within that line (sans quotes, whether the original fields had quotes or not).
It basically constructs a string of all the fields in the line, with quotes around them, then writes that to standard output, as shown below in the output from that script:
Input file:
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29"
Output file:
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29"
Even though using a separate tool is probably the easiest way to go, if you absolutely cannot install other packages, then you're going to have to code up something in a package you already have. The following bash script is a good place to start, as it uses no other tools to achieve its goal.
At the moment, it's tied to a very specific set of rules, as follows:
White space matters. Anything between the commas is considered part of the field. This especially matters when detecting a quoted field, it must have the quote as the first character, no abc, "d,e,f",ghi stuff since the "d,e,f" won't be handled correctly.
Quoted fields are allowed to contain commas, and "" sequences within them are turned into ".
It's probably not a good idea to supply ill-formatted CSV files :-)
But, with that in mind, here we go. I'll offer a brief textual description of each section but hopefully the comments in the code will be enough to figure out what's going on.
First, a function for finding the position if some string within another string, useful for working out the field bounds:
function findPos {
haystack="$1"
needle="$2"
# Remove everything past the needle.
prefix="${haystack%%${needle}*}"
# If nothing was removed, it wasn't found, so supply massive number.
# Otherwise, it was found at the length of the string with removed stuff.
position=999999
[[ ${#prefix} -ne ${#haystack} ]] && position=${#prefix}
echo ${position}
}
Then we can use that in the function that works out the length of the next field. This basically just looks for the next comma for unquoted fields, and does special handling for quoted fields by building up the field from segments (it has to handle quotes within quotes and commas):
function getNextFieldLen {
line="$1"
# Empty line means all work done.
[[ -z "${line}" ]] && echo -1 && return
# Handle unquoted first, this is easy.
[[ "${line:0:1}" != '"' ]] && { echo $(findPos "${line}" ","); return; }
# Now handle quoted. Loop over all segments where a segment is defined as
# the text up to the next <"">, assuming it's before the next <",>.
field=""
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
while [[ ${nextDoubleQuote} -lt ${nextQuoteComma} ]]; do
# Append segment to the field and go back for next segment.
field="${field}${line:0:${nextDoubleQuote}}\"\""
line="${line:${nextDoubleQuote}}"
line="${line:2}"
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
done
# Add final segment (up to the comma) and output entire field.
field="${field}${line:0:${nextQuoteComma}}\""
echo "${#field}"
}
Finally, there's the top-level function which will quotify whatever comes in via standard input:
function quotifyStdIn {
# Process file line by line.
while read -r line; do
# Start with empty output line and non-comma separator.
outLine="" ; sep=""
# Place terminator to make processing easier, start field loop.
line="${line},"
fieldLen=$(getNextFieldLen "${line}")
while [[ ${fieldLen} -ge 0 ]]; do
# Get field and quotify if needed, adjust line (remove field and comma).
field="${line:0:${fieldLen}}"
[[ "${field:0:1}" = '"' ]] || field="\"${field}\""
line="${line:$((fieldLen+1))}"
#line="${line:${fieldLen}}"
#line="${line:1}"
# Append to output line and prepare for next field.
outLine="${outLine}${sep}${field}"; sep=","
fieldLen=$(getNextFieldLen "${line}")
done
# Output built line.
echo "${outLine}"
done
}
And, on the off-chance you want to read directly from a file (though providing a file name that's empty or "-" will use standard input so you can probably just use the file-based function for everything):
function quotifyFile {
file="$1"
# Empty file or "-" means standard input, otherwise take input from real file.
[[ ${#file} -eq 0 ]] && { quotifyStdIn; return; }
[[ "${file}" = "-" ]] && { quotifyStdIn; return; }
quotifyStdIn < "${file}"
}
And, finally, because every program that's not a "Hello, world" one deserves some form of test harness, this is what you can use to test the various capabilities:
(
echo 'paxdiablo,was here'
echo 'and,"then, strangely,",he,was,not'
echo '50,"My name is ""Pax"", and yours is ""Bob""",42'
echo '17,"""Love"" is grand",19'
) > harness.csv
echo "Before:"
sed "s/^/ /" harness.csv
echo "After:"
quotifyFile harness.csv | sed "s/^/ /"
rm -rf harness.csv
And, since a test harness is of little use unless you run the tests, here's the results of the first run:
Before:
paxdiablo,was here
and,"then, strangely,",he,was,not
50,"My name is ""Pax"", and yours is ""Bob""",42
17,"""Love"" is grand",19
After:
"paxdiablo","was here"
"and","then, strangely,","he","was","not"
"50","My name is ""Pax"", and yours is ""Bob""","42"
"17","""Love"" is grand","19"
Hopefully, that will be enough to get you going in the absence of being able to install packages. Of course, if one of the packages you can't install in bash itself, then you have problems that I can't help you with :-)
Your starting CSV is not a good CSV: the 2 rows have different number of columns
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 378478 | COMPLETED | Tracfone | - | - | 2020/03/29 09:39:22 | - | 2787 | - | 356074101197544 | 89148000005748235454 | 75176540 |
| 378328 | COMPLETED | Total Wireless | Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB) | 50 | 2020/03/29 | - | - | - | - | - | - |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
Using Miller (https://github.com/johnkerl/miller) you could run
mlr --csv --quote-all -N unsparsify input >output
to have
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29","","","","","",""
You can use it downloading the executable https://github.com/johnkerl/miller/releases/tag/v5.7.0

Unix one-liner to swap/transpose two lines in multiple text files?

I wish to swap or transpose pairs of lines according to their line-numbers (e.g., switching the positions of lines 10 and 15) in multiple text files using a UNIX tool such as sed or awk.
For example, I believe this sed command should swap lines 14 and 26 in a single file:
sed -n '14p' infile_name > outfile_name
sed -n '26p' infile_name >> outfile_name
How can this be extended to work on multiple files? Any one-liner solutions welcome.
If you want to edit a file, you can use ed, the standard editor. Your task is rather easy in ed:
printf '%s\n' 14m26 26-m14- w q | ed -s file
How does it work?
14m26 tells ed to take line #14 and move it after line #26
26-m14- tells ed to take the line before line #26 (which is your original line #26) and move it after line preceding line #14 (which is where your line #14 originally was)
w tells ed to write the file
q tells ed to quit.
If your numbers are in a variable, you can do:
linea=14
lineb=26
{
printf '%dm%d\n' "$linea" "$lineb"
printf '%d-m%d-\n' "$lineb" "$linea"
printf '%s\n' w q
} | ed -s file
or something similar. Make sure that linea<lineb.
If you want robust in-place updating of your input files, use gniourf_gniourf's excellent ed-based answer
If you have GNU sed and want to in-place updating with multiple files at once, use
#potong's excellent GNU sed-based answer (see below for a portable alternative, and the bottom for an explanation)
Note: ed truly updates the existing file, whereas sed's -i option creates a temporary file behind the scenes, which then replaces the original - while typically not an issue, this can have undesired side effects, most notably, replacing a symlink with a regular file (by contrast, file permissions are correctly preserved).
Below are POSIX-compliant shell functions that wrap both answers.
Stdin/stdout processing, based on #potong's excellent answer:
POSIX sed doesn't support -i for in-place updating.
It also doesn't support using \n inside a character class, so [^\n] must be replaced with a cumbersome workaround that positively defines all character except \n that can occur on a line - this is a achieved with a character class combining printable characters with all (ASCII) control characters other than \n included as literals (via a command substitution using printf).
Also note the need to split the sed script into two -e options, because POSIX sed requires that a branching command (b, in this case) be terminated with either an actual newline or continuation in a separate -e option.
# SYNOPSIS
# swapLines lineNum1 lineNum2
swapLines() {
[ "$1" -ge 1 ] || { printf "ARGUMENT ERROR: Line numbers must be decimal integers >= 1.\n" >&2; return 2; }
[ "$1" -le "$2" ] || { printf "ARGUMENT ERROR: The first line number ($1) must be <= the second ($2).\n" >&2; return 2; }
sed -e "$1"','"$2"'!b' -e ''"$1"'h;'"$1"'!H;'"$2"'!d;x;s/^\([[:print:]'"$(printf '\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177')"']*\)\(.*\n\)\(.*\)/\3\2\1/'
}
Example:
$ printf 'line 1\nline 2\nline 3\n' | swapLines 1 3
line 3
line 2
line 1
In-place updating, based on gniourf_gniourf's excellent answer:
Small caveats:
While ed is a POSIX utility, it doesn't come preinstalled on all platforms, notably not on Debian and the Cygwin and MSYS Unix-emulation environments for Windows.
ed always reads the input file as a whole into memory.
# SYNOPSIS
# swapFileLines lineNum1 lineNum2 file
swapFileLines() {
[ "$1" -ge 1 ] || { printf "ARGUMENT ERROR: Line numbers must be decimal integers >= 1.\n" >&2; return 2; }
[ "$1" -le "$2" ] || { printf "ARGUMENT ERROR: The first line number ($1) must be <= the second ($2).\n" >&2; return 2; }
ed -s "$3" <<EOF
H
$1m$2
$2-m$1-
w
EOF
}
Example:
$ printf 'line 1\nline 2\nline 3\n' > file
$ swapFileLines 1 3 file
$ cat file
line 3
line 2
line 1
An explanation of #potong's GNU sed-based answer:
His command swaps lines 10 and 15:
sed -ri '10,15!b;10h;10!H;15!d;x;s/^([^\n]*)(.*\n)(.*)/\3\2\1/' f1 f2 fn
-r activates support for extended regular expressions; here, notably, it allows use of unescaped parentheses to form capture groups.
-i specifies that the files specified as operands (f1, f2, fn) be updated in place, without backup, since no optional suffix for a backup file is adjoined to the -i option.
10,15!b means that all lines that do not (!) fall into the range of lines 10 through 15 should branch (b) implicitly to the end of the script (given that no target-label name follows b), which means that the following commands are skipped for these lines. Effectively, they are simply printed as is.
10h copies (h) line number 10 (the start of the range) to the so-called hold space, which is an auxiliary buffer.
10!H appends (H) every line that is not line 10 - which in this case implies lines 11 through 15 - to the hold space.
15!d deletes (d) every line that is not line 15 (here, lines 10 through 14) and branches to the end of the script (skips remaining commands). By deleting these lines, they are not printed.
x, which is executed only for line 15 (the end of the range), replaces the so-called pattern space with the contents of the hold space, which at that point holds all lines in the range (10 through 15); the pattern space is the buffer on which sed commands operate, and whose contents are printed by default (unless -n was specified).
s/^([^\n]*)(.*\n)(.*)/\3\2\1/ then uses capture groups (parenthesized subexpressions of the regular expression that forms the first argument passed to function s) to partition the contents of the pattern space into the 1st line (^([^\n]*)), the middle lines ((.*\n)), and the last line ((.*)), and then, in the replacement string (the second argument passed to function s), uses backreferences to place the last line (\3) before the middle lines (\2), followed by the first line (\1), effectively swapping the first and last lines in the range. Finally, the modified pattern space is printed.
As you can see, only the range of lines spanning the two lines to swap is held in memory, whereas all other lines are passed through individually, which makes this approach memory-efficient.
This might work for you (GNU sed):
sed -ri '10,15!b;10h;10!H;15!d;x;s/^([^\n]*)(.*\n)(.*)/\3\2\1/' f1 f2 fn
This stores a range of lines in the hold space and then swaps the first and last lines following the completion of the range.
The i flag edits each file (f1,f2 ... fn) in place.
With GNU awk:
awk '
FNR==NR {if(FNR==14) x=$0;if(FNR==26) y=$0;next}
FNR==14 {$0=y} FNR==26 {$0=x} {print}
' file file > file_with_swap
The use of the following helper script allows using the power of find ... -exec ./script '{}' l1 l2 \; to locate the target files and to swap lines l1 & l2 in each file in place. (it requires that there are no identical duplicate lines within the file that fall within the search range) The script uses sed to read the two swap lines from each file into an indexed array and passes the lines to sed to complete the swap by matching. The sed call uses its "matched first address" state to limit the second expression swap to the first occurrence. An example use of the helper script below to swap lines 5 & 15 in all matching files is:
find . -maxdepth 1 -type f -name "lnum*" -exec ../swaplines.sh '{}' 5 15 \;
For example, the find call above found files lnumorig.txt and lnumfile.txt in the present directory originally containing:
$ head -n20 lnumfile.txt.bak
1 A simple line of test in a text file.
2 A simple line of test in a text file.
3 A simple line of test in a text file.
4 A simple line of test in a text file.
5 A simple line of test in a text file.
6 A simple line of test in a text file.
<snip>
14 A simple line of test in a text file.
15 A simple line of test in a text file.
16 A simple line of test in a text file.
17 A simple line of test in a text file.
18 A simple line of test in a text file.
19 A simple line of test in a text file.
20 A simple line of test in a text file.
And swapped the lines 5 & 15 as intended:
$ head -n20 lnumfile.txt
1 A simple line of test in a text file.
2 A simple line of test in a text file.
3 A simple line of test in a text file.
4 A simple line of test in a text file.
15 A simple line of test in a text file.
6 A simple line of test in a text file.
<snip>
14 A simple line of test in a text file.
5 A simple line of test in a text file.
16 A simple line of test in a text file.
17 A simple line of test in a text file.
18 A simple line of test in a text file.
19 A simple line of test in a text file.
20 A simple line of test in a text file.
The helper script itself is:
#!/bin/bash
[ -z $1 ] && { # validate requierd input (defaults set below)
printf "error: insufficient input calling '%s'. usage: file [line1 line2]\n" "${0//*\//}" 1>&2
exit 1
}
l1=${2:-10} # default/initialize line numbers to swap
l2=${3:-15}
while IFS=$'\n' read -r line; do # read lines to swap into indexed array
a+=( "$line" );
done <<<"$(sed -n $((l1))p "$1" && sed -n $((l2))p "$1")"
((${#a[#]} < 2)) && { # validate 2 lines read
printf "error: requested lines '%d & %d' not found in file '%s'\n" $l1 $l2 "$1"
exit 1
}
# swap lines in place with sed (remove .bak for no backups)
sed -i.bak -e "s/${a[1]}/${a[0]}/" -e "0,/${a[0]}/s/${a[0]}/${a[1]}/" "$1"
exit 0
Even though I didn't manage to get it all done in a one-liner I decided it was worth posting in case you can make some use of it or take ideas from it. Note: if you do make use of it, test to your satisfaction before turning it loose on your system. The script currently uses sed -i.bak ... to create backups of the files changed for testing purposes. You can remove the .bak when you are satisfied it meets your needs.
If you have no use for setting default lines to swap in the helper script itself, then I would change the first validation check to [ -z $1 -o -z $2 -o $3 ] to insure all required arguments are given when the script is called.
While it does identify the lines to be swapped by number, it relies on the direct match of each line to accomplish the swap. This means that any identical duplicate lines up to the end of the swap range will cause an unintended match and failue to swap the intended lines. This is part of the limitation imposed by not storing each line within the range of lines to be swapped as discussed in the comments. It's a tradeoff. There are many, many ways to approach this, all will have their benefits and drawbacks. Let me know if you have any questions.
Brute Force Method
Per your comment, I revised the helper script to use the brute forth copy/swap method that would eliminate the problem of any duplicate lines in the search range. This helper obtains the lines via sed as in the original, but then reads all lines from file to tmpfile swapping the appropriately numbered lines when encountered. After the tmpfile is filled, it is copied to the original file and tmpfile is removed.
#!/bin/bash
[ -z $1 ] && { # validate requierd input (defaults set below)
printf "error: insufficient input calling '%s'. usage: file [line1 line2]\n" "${0//*\//}" 1>&2
exit 1
}
l1=${2:-10} # default/initialize line numbers to swap
l2=${3:-15}
while IFS=$'\n' read -r line; do # read lines to swap into indexed array
a+=( "$line" );
done <<<"$(sed -n $((l1))p "$1" && sed -n $((l2))p "$1")"
((${#a[#]} < 2)) && { # validate 2 lines read
printf "error: requested lines '%d & %d' not found in file '%s'\n" $l1 $l2 "$1"
exit 1
}
# create tmpfile, set trap, truncate
fn="$1"
rmtemp () { cp "$tmpfn" "$fn"; rm -f "$tmpfn"; }
trap rmtemp SIGTERM SIGINT EXIT
declare -i n=1
tmpfn="$(mktemp swap_XXX)"
:> "$tmpfn"
# swap lines in place with a tmpfile
while IFS=$'\n' read -r line; do
if ((n == l1)); then
printf "%s\n" "${a[1]}" >> "$tmpfn"
elif ((n == l2)); then
printf "%s\n" "${a[0]}" >> "$tmpfn"
else
printf "%s\n" "$line" >> "$tmpfn"
fi
((n++))
done < "$fn"
exit 0
If the line numbers to be swapped are fixed then you might want to try something like the sed command in the following example to have lines swapped in multiple files in-place:
#!/bin/bash
# prep test files
for f in a b c ; do
( for i in {1..30} ; do echo $f$i ; done ) > /tmp/$f
done
sed -i -s -e '14 {h;d}' -e '15 {N;N;N;N;N;N;N;N;N;N;G;x;d}' -e '26 G' /tmp/{a,b,c}
# -i: inplace editing
# -s: treat each input file separately
# 14 {h;d} # first swap line: hold ; suppress
# 15 {N;N;...;G;x;d} # lines between: collect, append held line; hold result; suppress
# 26 G # second swap line: append held lines (and output them all)
# dump test files
cat /tmp/{a,b,c}
(This is according to Etan Reisner's comment.)
If you want to swap two lines, you can send it through twice, you could make it loop in one sed script if you really wanted, but this works:
e.g.
test.txt: for a in {1..10}; do echo "this is line $a"; done >> test.txt
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6
this is line 7
this is line 8
this is line 9
this is line 10
Then to swap lines 6 and 9:
sed ':a;6,8{6h;6!H;d;ba};9{p;x};' test.txt | sed '7{h;d};9{p;x}'
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 9
this is line 7
this is line 8
this is line 6
this is line 10
In the first sed it builds up the hold space with lines 6 through 8.
At line 9 it prints line 9 then prints the hold space (lines 6 through 8) this accomplishes the first move of 9 to place 6. Note: 6h; 6!H avoids a new line at the top of the pattern space.
The second move occurs in the second sed script it saves line 7 to the hold space, then deletes it and prints it after line 9.
To make it quasi-generic you can use variables like this:
A=3 && B=7 && sed ':a;'${A}','$((${B}-1))'{'${A}'h;'${A}'!H;d;ba};'${B}'{p;x};' test.txt | sed $(($A+1))'{h;d};'${B}'{p;x}'
Where A and B are the lines you want to swap, in this case lines 3 and 7.
if, you want swap two lines, to create script "swap.sh"
#!/bin/sh
sed -n "1,$((${2}-1))p" "$1"
sed -n "${3}p" "$1"
sed -n "$((${2}+1)),$((${3}-1))p" "$1"
sed -n "${2}p" "$1"
sed -n "$((${3}+1)),\$p" "$1"
next
sh swap.sh infile_name 14 26 > outfile_name

stopping 'sed' after match found on a line; don't let sed keep checking all lines to EOF

I have a text file in which each a first block of text on each line is separated by a tab from a second block of text like so:
VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
In case it is hard to tell, tab is long space between "quasi-subjunctive" and "Be".
So I am thinking off the top of my head a 'for' loop in which a var is set using 'sed' to read the first block of text of a line, upto and including the tab (or not, doesn't really matter) and then the 'var' is used to find subsequent matches adding a "(x)" right before the tab to make the line unique. The 'x' of course would be a running counter numbering the first instance '1' incrementing and then each subsequent match one number higher.
One problem I see is stopping 'sed' after each subsequent match so the counter can be incremented. Is there a way to do this, since it is "sed's" normal behaviour to continue on thru without stop (as far as I know) until all lines are processed.
You can set the IFS to TAB character and read the line into variables. Something like:
$ while IFS=$'\t' read block1 block2;do
echo "block1 is $block1"
echo "block2 is $block2"
done < file
block1 is VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive
block2 is Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
Ok so I got the job done with this little (or perhaps big if too much overkill?) script I whipped up:
#!/bin/bash
sedLnCnt=1
while [[ "$sedLnCnt" -lt 521 ]] ; do
lN=$(sed -n "${sedLnCnt} p" sGNoSecNums.html|sed -r 's/^([^\t]*\t).*$/\1/') #; echo "\$lN: $lN"
lnNum=($(grep -n "$lN" sGNoSecNums.html|sed -r 's/^([0-9]+):.*$/\1/')) #; echo "num of matches: ${#lnNum[#]}"
if [[ "${#lnNum[#]}" -gt 1 ]] ; then #'if'
lCnt="${#lnNum[#]}"
((eleN = $lCnt-1)) #; echo "\$eleN: ${eleN}" # var $eleN needs to be 1 less than total line count of zero-based array
while [[ "$lCnt" -gt 0 ]] ; do
sed -ri "${lnNum[$eleN]}s/^([^\t]*)\t/\1 \(${lCnt}\)\t/" sGNoSecNums.html
((lCnt--))
((eleN--))
done
fi
((sedLnCnt++))
done
Grep was the perfect way to find line numbers of matches, jamming them into an array and then editing each line appending the unique identifier.

Resources