I need to add double quotes to the csv file. My sample data is like this..
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29 06:10:01",200890899011202395,0899,0279395,356058102052972,89148000005117597971,67756296
I have tried some code available online with awk and sed, it is resulting as below , Error - **First digit in the number is being trimmed like for ex. in '378478' it is only displaying '78478'.
Also it is adding double quotes to already existing double quotes too!** nothing seems to be perfectly working. Please guide me!
"78478","COMPLETED","Tracfone","","",""2020/03/29 09:39:22"","","2787","","356074101197544","89148000005748235454","75176540"
"78328","COMPLETED",""Total Wireless"",""Unlimited Talk"," Text"," & Data (First 25GB High Speed"," then unlimited 2GB)"","50",""2020/03/29 06:10:01"","200890899011202395","0899","0279395","356058102052972","89148000005117597971","67756296"
"78329","COMPLETED",""Cricket Wireless"",""Unlimited Talk"," Text"," & 4G LTE Data w/ 15GB Hotspot"","60",""2020/03/29""
This is the code I am using:
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file # or
sed -E 's/([^,]*) , (.*)/"\1" , "\2"/' file
My total code is the below one. my Intention was to first convert all .xlsx to .csv and then add double quotes to same csv and save it in the same file.i know the $file.csv part is wrong, hence i need some help
find "$Src_Dir" -type f -iname "*.xlsx" -print>path/temp
cat path/temp | while IFS="" read -r -d $'\0' file;
do
echo $file
ssconvert "${file}" --export-type=Gnumeric_stf:stf_csv
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' $file > $file.csv
done
If you want to handle anything other than the simplest CSV files, you should probably move away from sed and awk. There are much better tools available.
For example, if you sudo apt install csvtool (or equivalent) on your favourite distro, you can use its call-per-line functionality to process each line in the input file. See the following script for an example:
#!/bin/bash
function quotify {
# Start empty line, process every field.
line=""
while [[ $# -ne 0 ]] ; do
# Append comma for all but first field, then quoted field.
[[ -n "${line}" ]] && line="${line},"
line="${line}\"$1\""
shift
done
# Output the fully quoted line.
echo "${line}"
}
# Needed to call functions. Also, ensure link: /bin/sh -> /bin/bash.
export -f quotify
# Pretty-print input and output.
echo "Input file:"
sed 's/^/ /' inputFile.csv
echo "Output file:"
csvtool call quotify inputFile.csv | sed 's/^/ /'
Note the quotify function which is called for each line in the CSV file, with the arguments set to each field within that line (sans quotes, whether the original fields had quotes or not).
It basically constructs a string of all the fields in the line, with quotes around them, then writes that to standard output, as shown below in the output from that script:
Input file:
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29"
Output file:
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29"
Even though using a separate tool is probably the easiest way to go, if you absolutely cannot install other packages, then you're going to have to code up something in a package you already have. The following bash script is a good place to start, as it uses no other tools to achieve its goal.
At the moment, it's tied to a very specific set of rules, as follows:
White space matters. Anything between the commas is considered part of the field. This especially matters when detecting a quoted field, it must have the quote as the first character, no abc, "d,e,f",ghi stuff since the "d,e,f" won't be handled correctly.
Quoted fields are allowed to contain commas, and "" sequences within them are turned into ".
It's probably not a good idea to supply ill-formatted CSV files :-)
But, with that in mind, here we go. I'll offer a brief textual description of each section but hopefully the comments in the code will be enough to figure out what's going on.
First, a function for finding the position if some string within another string, useful for working out the field bounds:
function findPos {
haystack="$1"
needle="$2"
# Remove everything past the needle.
prefix="${haystack%%${needle}*}"
# If nothing was removed, it wasn't found, so supply massive number.
# Otherwise, it was found at the length of the string with removed stuff.
position=999999
[[ ${#prefix} -ne ${#haystack} ]] && position=${#prefix}
echo ${position}
}
Then we can use that in the function that works out the length of the next field. This basically just looks for the next comma for unquoted fields, and does special handling for quoted fields by building up the field from segments (it has to handle quotes within quotes and commas):
function getNextFieldLen {
line="$1"
# Empty line means all work done.
[[ -z "${line}" ]] && echo -1 && return
# Handle unquoted first, this is easy.
[[ "${line:0:1}" != '"' ]] && { echo $(findPos "${line}" ","); return; }
# Now handle quoted. Loop over all segments where a segment is defined as
# the text up to the next <"">, assuming it's before the next <",>.
field=""
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
while [[ ${nextDoubleQuote} -lt ${nextQuoteComma} ]]; do
# Append segment to the field and go back for next segment.
field="${field}${line:0:${nextDoubleQuote}}\"\""
line="${line:${nextDoubleQuote}}"
line="${line:2}"
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
done
# Add final segment (up to the comma) and output entire field.
field="${field}${line:0:${nextQuoteComma}}\""
echo "${#field}"
}
Finally, there's the top-level function which will quotify whatever comes in via standard input:
function quotifyStdIn {
# Process file line by line.
while read -r line; do
# Start with empty output line and non-comma separator.
outLine="" ; sep=""
# Place terminator to make processing easier, start field loop.
line="${line},"
fieldLen=$(getNextFieldLen "${line}")
while [[ ${fieldLen} -ge 0 ]]; do
# Get field and quotify if needed, adjust line (remove field and comma).
field="${line:0:${fieldLen}}"
[[ "${field:0:1}" = '"' ]] || field="\"${field}\""
line="${line:$((fieldLen+1))}"
#line="${line:${fieldLen}}"
#line="${line:1}"
# Append to output line and prepare for next field.
outLine="${outLine}${sep}${field}"; sep=","
fieldLen=$(getNextFieldLen "${line}")
done
# Output built line.
echo "${outLine}"
done
}
And, on the off-chance you want to read directly from a file (though providing a file name that's empty or "-" will use standard input so you can probably just use the file-based function for everything):
function quotifyFile {
file="$1"
# Empty file or "-" means standard input, otherwise take input from real file.
[[ ${#file} -eq 0 ]] && { quotifyStdIn; return; }
[[ "${file}" = "-" ]] && { quotifyStdIn; return; }
quotifyStdIn < "${file}"
}
And, finally, because every program that's not a "Hello, world" one deserves some form of test harness, this is what you can use to test the various capabilities:
(
echo 'paxdiablo,was here'
echo 'and,"then, strangely,",he,was,not'
echo '50,"My name is ""Pax"", and yours is ""Bob""",42'
echo '17,"""Love"" is grand",19'
) > harness.csv
echo "Before:"
sed "s/^/ /" harness.csv
echo "After:"
quotifyFile harness.csv | sed "s/^/ /"
rm -rf harness.csv
And, since a test harness is of little use unless you run the tests, here's the results of the first run:
Before:
paxdiablo,was here
and,"then, strangely,",he,was,not
50,"My name is ""Pax"", and yours is ""Bob""",42
17,"""Love"" is grand",19
After:
"paxdiablo","was here"
"and","then, strangely,","he","was","not"
"50","My name is ""Pax"", and yours is ""Bob""","42"
"17","""Love"" is grand","19"
Hopefully, that will be enough to get you going in the absence of being able to install packages. Of course, if one of the packages you can't install in bash itself, then you have problems that I can't help you with :-)
Your starting CSV is not a good CSV: the 2 rows have different number of columns
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 378478 | COMPLETED | Tracfone | - | - | 2020/03/29 09:39:22 | - | 2787 | - | 356074101197544 | 89148000005748235454 | 75176540 |
| 378328 | COMPLETED | Total Wireless | Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB) | 50 | 2020/03/29 | - | - | - | - | - | - |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
Using Miller (https://github.com/johnkerl/miller) you could run
mlr --csv --quote-all -N unsparsify input >output
to have
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29","","","","","",""
You can use it downloading the executable https://github.com/johnkerl/miller/releases/tag/v5.7.0
Following output consisting of several devices needs to be parsed:
0 interface=ether1 address=172.16.127.2 address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05 mac-address=CC:2D:E0:00:00:08
identity="myrouter1" platform="MikroTik" version="6.43.8 (stable)"
1 interface=ether2 address=10.5.44.100 address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07 mac-address=CC:2D:E0:00:00:05
identity="myrouter4" platform="MikroTik" version="6.43.8 (stable)"
3 interface=ether4 address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017 mac-address=B8:69:F4:00:00:07
identity="myrouter2" platform="MikroTik" version="6.43.8 (stable)"
...
10 interface=ether5 address=10.26.51.24 address4=10.26.51.24
address6=fe80::ba69:f4ff:fe00:0039 mac-address=B8:69:F4:00:00:04
identity="myrouter3" platform="MikroTik" version="6.43.8 (stable)"
11 interface=ether3 address=10.26.51.100 address4=10.26.51.100
address6=fe80::ce2d:e0ff:fe00:f00 mac-address=CC:2D:E0:00:00:09
identity="myrouter5" platform="MikroTik" version="6.43.8 (stable)"
edit: for ease of things I shortened and anonymized the output, first block has 7 lines, second block has 5 lines, third block has 7 lines, fourth block 4 lines, so the number of lines is inconsistent.
Basically its the output from a Mikrotik device: "/ip neighbor print detail"
Optimal would be to access every device(=number) on its own, then further access all setting=value (of one device) seperately to finally access settings like $device[0][identity] or similar.
I tried to set IFS='\d{1,2} ' but seems IFS only works for single character seperation.
Looking on the web I didn't find a way to accomplish this, am I looking for the wrong way and there is another way to solve this?
Thanks in advance!
edit: Found this solution Split file by multiple line breaks which helped me to get:
devices=()
COUNT=0;
while read LINE
do
[ "$LINE" ] && devices[$COUNT]+="$LINE " || { (( ++COUNT )); }
done < devices.txt
then i could use #Kamil's solution to easily access values.
While your precise output format is a bit unclear, bash offers an efficient way to parse the data making use of process substitution. Similar to command substitution, process substitution allows redirecting the output of commands to stdin. This allows you to read the result of a set of commands that reformat your mikrotik file into a single line for each device.
While there are a number of ways to do it, one of the ways to handle the multiple gymnastics needed to reformat the multi-line information for each device into a single line is by using tr and sed. tr to first replace each '\n' with an '_' (or pick your favorite character not used elsewhere), and then again to "squeeze" the leading spaces to a single space (technically not required, but for completeness). After replacing the '\n' with '_' and squeezing spaces, you simply use two sed expressions to change the "__" (resulting from the blank line) back into a '\n' and then to remove all '_'.
With that you can read your device number n and the remainder of the line holing your setting=value pairs. To ease locating your "identity=" line, simply converting the line into an array and looping using parameter expansions (for substring removal), you can save and store the "identity" value as id (trimming the double-quotes is left to you)
Now it is simply a matter of outputting the value (or doing whatever you wish with them). While you can loop again and output the array values, it is just a easy to pass the intentionally unquoted line to printf and let the printf-trick handle separating the setting=value pairs for output. Lastly, you form your $device[0][identity] identifier and output as the final line in the device block.
Putting it altogether, you could do something like the following:
#!/bin/bash
id=
while read n line; do ## read each line from process substitution
a=( $line ) ## split line into array
for i in ${a[#]}; do ## search array, set id
[ "${i%=*}" = "identity" ] && id="${i##*=}"
done
echo "device=$n" ## output device=
printf " %s\n" ${line[#]} ## output setting=value (unquoted on purpose)
printf " \$device[%s][%s]\n" "$n" "$id" ## $device[0][identity]
done < <(tr '\n' '_' < "$1" | tr -s ' ' | sed -e 's/__/\n/g' -e 's/_//g')
Example Use/Output
Note, the script takes the filename to parse as the first input.
$ bash mikrotik_parse.sh mikrotik
device=0
interface=ether1
address=172.16.127.2
address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05
mac-address=CC:2D:E0:00:00:08
identity="myrouter1"
platform="MikroTik"
version="6.43.8
(stable)"
$device[0]["myrouter1"]
device=1
interface=ether2
address=10.5.44.100
address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07
mac-address=CC:2D:E0:00:00:05
identity="myrouter4"
platform="MikroTik"
version="6.43.8
(stable)"
$device[1]["myrouter4"]
device=3
interface=ether4
address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017
mac-address=B8:69:F4:00:00:07
identity="myrouter2"
platform="MikroTik"
version="6.43.8
(stable)"
$device[3]["myrouter2"]
Look things over and let me know if you have further questions. As mentioned at the beginning, you haven't defined an explicit output format you are looking for, but gleaning what information was in the question, this should be close.
I think you're on the right track with IFS.
Try piping IFS=$'\n\n' (to break apart the line groups by interface) through cut (to extract the specific field(s) you want for each interface).
Bash likes single long rows with delimter separated values. So first we need to convert your file to such format.
Below I read 4 lines at a time from input. I notices that the output spans over 4 lines only - I just concatenate the 4 lines and act as if it is a single line.
while
IFS= read -r line1 &&
IFS= read -r line2 &&
IFS= read -r line3 &&
IFS= read -r line4 &&
line="$line1 $line2 $line3 $line4"
do
if [ -n "$line4" ]; then
echo "ERR: 4th line should be empt - $line4 !" >&2
exit 4
fi
if ! num=$(printf "%d" ${line:0:3}); then
echo "ERR: reading number" >&2
exit 1
fi
line=${line:3}
# bash variables can't have `-`
line=${line/mac-address=/mac_address=}
# unsafe magic
vars=(interface address address4
address6 mac_address identity platform version)
for v in "${vars[#]}"; do
unset "$v"
if ! <<<"$line" grep -q "$v="; then
echo "ERR: line does not have $v= part!" >&2
exit 1
fi
done
# eval call
if ! eval "$line"; then
echo "ERR: eval line=$line" >&2
exit 1
fi
for v in "${vars[#]}"; do
if [ -z "${!v}" ]; then
echo "ERR: variable $v was not set in eval!" >&2
exit 1;
fi
done
echo "$num: $interface $address $address4 $address6 $mac_address $identity $platform $version"
done < file
then I retrieve the leading number from the line, which I suspect was printed with printf "%3d" so I just slice the line ${line:0:3}
for the rest of the line I indent to use eval. In this case I trust upstream, but I try to assert some cases (variable not defined in the line, some syntax error and similar)
then the magic eval "$line" happens, which assigns all the variables in my shell
after that I can use variables from the line like normal variables
live example at tutorialspoint
Eval command and security issues
In my script I need to get the highest number of a file two times, so i wanted to create a function. This is the command in the script:
First time:
highest=$( ls $path.bak.* | sort -t"." -k2 -n | tail -n1 | sed -r 's/.*\.(.*)/\1/')
Second time:
newhighest=$(ls $path.bak.* | sort -t"." -k2 -n | tail -n1 | sed -r 's/.*\.(.*)/\1/')
Now my question:
How can I shorten this with a function?
Here my Input-Files:
test.bak.1
test.bak.2
test.bak.3
test.bak.4
test.bak.5
test.bak.6
test.bak.7
test.bak.8
test.bak.9
test.bak.10
test.bak.11
Expected return: 11
Written out for readability:
#!/usr/bin/env bash
# ^^^^ - Ensure that this script is run with bash, not /bin/sh
# Enable "extended globs", so we can exclude names that don't end with digits
shopt -s extglob
# since your files are test.bak.*
path=test
get_highest() {
# set the function's argument list
set -- "$path".bak.+([[:digit:]])
# if we have just one valid filename, we know the glob expanded successfully.
# otherwise, no such files exist, so exit the function immediately
[[ -e $1 || -L $1 ]] || return 1
# stream our list of extensions into sort, and let awk find the highest number
printf '%s\n' "${###*.}" | awk '$0>last{last=$0}END{print last}'
}
highest=$(get_highest) || { echo "No backup files found" >&2; exit 1; }
new_highest=$(get_highest) || { echo "No backup files on 2nd pass" >&2; exit 1; }
Note:
Expansions need to be quoted; "$path"/*, not $path/*, or else path="Directory With Spaces/test" would look for files in Spaces/test, after emitting Directory and With as results.
ls should never be used programatically.
extglob syntax allows regex-like capabilities for matching groups of files, letting us assert here that we only consider filenames that end in .bak. followed by a digit.
In general, you should write your scripts to be easy to read and understand as a higher priority than writing them to be short. Your future self (and others who need to maintain code in the future) will thank you.
Because newlines can contain in filenames, they're unsafe to use to separate filenames in a stream; only the NUL character is safe for this use when names are not otherwise quoted or escaped. Thus, when emitting a stream of arbitrary names they should be formatted with the string %s\0 and sorted with the -z argument. However, we're only printing the numeric extensions here, making newlines safe.