I need to add double quotes to the csv file. My sample data is like this..
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29 06:10:01",200890899011202395,0899,0279395,356058102052972,89148000005117597971,67756296
I have tried some code available online with awk and sed, it is resulting as below , Error - **First digit in the number is being trimmed like for ex. in '378478' it is only displaying '78478'.
Also it is adding double quotes to already existing double quotes too!** nothing seems to be perfectly working. Please guide me!
"78478","COMPLETED","Tracfone","","",""2020/03/29 09:39:22"","","2787","","356074101197544","89148000005748235454","75176540"
"78328","COMPLETED",""Total Wireless"",""Unlimited Talk"," Text"," & Data (First 25GB High Speed"," then unlimited 2GB)"","50",""2020/03/29 06:10:01"","200890899011202395","0899","0279395","356058102052972","89148000005117597971","67756296"
"78329","COMPLETED",""Cricket Wireless"",""Unlimited Talk"," Text"," & 4G LTE Data w/ 15GB Hotspot"","60",""2020/03/29""
This is the code I am using:
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' file # or
sed -E 's/([^,]*) , (.*)/"\1" , "\2"/' file
My total code is the below one. my Intention was to first convert all .xlsx to .csv and then add double quotes to same csv and save it in the same file.i know the $file.csv part is wrong, hence i need some help
find "$Src_Dir" -type f -iname "*.xlsx" -print>path/temp
cat path/temp | while IFS="" read -r -d $'\0' file;
do
echo $file
ssconvert "${file}" --export-type=Gnumeric_stf:stf_csv
awk -F"'?,'?" -v OFS='","' '{$1=$1; gsub(/^.|$/,"\"")} 1' $file > $file.csv
done
If you want to handle anything other than the simplest CSV files, you should probably move away from sed and awk. There are much better tools available.
For example, if you sudo apt install csvtool (or equivalent) on your favourite distro, you can use its call-per-line functionality to process each line in the input file. See the following script for an example:
#!/bin/bash
function quotify {
# Start empty line, process every field.
line=""
while [[ $# -ne 0 ]] ; do
# Append comma for all but first field, then quoted field.
[[ -n "${line}" ]] && line="${line},"
line="${line}\"$1\""
shift
done
# Output the fully quoted line.
echo "${line}"
}
# Needed to call functions. Also, ensure link: /bin/sh -> /bin/bash.
export -f quotify
# Pretty-print input and output.
echo "Input file:"
sed 's/^/ /' inputFile.csv
echo "Output file:"
csvtool call quotify inputFile.csv | sed 's/^/ /'
Note the quotify function which is called for each line in the CSV file, with the arguments set to each field within that line (sans quotes, whether the original fields had quotes or not).
It basically constructs a string of all the fields in the line, with quotes around them, then writes that to standard output, as shown below in the output from that script:
Input file:
378478,COMPLETED,Tracfone,,,"2020/03/29 09:39:22",,2787,,356074101197544,89148000005748235454,75176540
378328,COMPLETED,"Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)",50,"2020/03/29"
Output file:
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29"
Even though using a separate tool is probably the easiest way to go, if you absolutely cannot install other packages, then you're going to have to code up something in a package you already have. The following bash script is a good place to start, as it uses no other tools to achieve its goal.
At the moment, it's tied to a very specific set of rules, as follows:
White space matters. Anything between the commas is considered part of the field. This especially matters when detecting a quoted field, it must have the quote as the first character, no abc, "d,e,f",ghi stuff since the "d,e,f" won't be handled correctly.
Quoted fields are allowed to contain commas, and "" sequences within them are turned into ".
It's probably not a good idea to supply ill-formatted CSV files :-)
But, with that in mind, here we go. I'll offer a brief textual description of each section but hopefully the comments in the code will be enough to figure out what's going on.
First, a function for finding the position if some string within another string, useful for working out the field bounds:
function findPos {
haystack="$1"
needle="$2"
# Remove everything past the needle.
prefix="${haystack%%${needle}*}"
# If nothing was removed, it wasn't found, so supply massive number.
# Otherwise, it was found at the length of the string with removed stuff.
position=999999
[[ ${#prefix} -ne ${#haystack} ]] && position=${#prefix}
echo ${position}
}
Then we can use that in the function that works out the length of the next field. This basically just looks for the next comma for unquoted fields, and does special handling for quoted fields by building up the field from segments (it has to handle quotes within quotes and commas):
function getNextFieldLen {
line="$1"
# Empty line means all work done.
[[ -z "${line}" ]] && echo -1 && return
# Handle unquoted first, this is easy.
[[ "${line:0:1}" != '"' ]] && { echo $(findPos "${line}" ","); return; }
# Now handle quoted. Loop over all segments where a segment is defined as
# the text up to the next <"">, assuming it's before the next <",>.
field=""
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
while [[ ${nextDoubleQuote} -lt ${nextQuoteComma} ]]; do
# Append segment to the field and go back for next segment.
field="${field}${line:0:${nextDoubleQuote}}\"\""
line="${line:${nextDoubleQuote}}"
line="${line:2}"
nextQuoteComma=$(findPos "${line}" '",')
nextDoubleQuote=$(findPos "${line}" '""')
done
# Add final segment (up to the comma) and output entire field.
field="${field}${line:0:${nextQuoteComma}}\""
echo "${#field}"
}
Finally, there's the top-level function which will quotify whatever comes in via standard input:
function quotifyStdIn {
# Process file line by line.
while read -r line; do
# Start with empty output line and non-comma separator.
outLine="" ; sep=""
# Place terminator to make processing easier, start field loop.
line="${line},"
fieldLen=$(getNextFieldLen "${line}")
while [[ ${fieldLen} -ge 0 ]]; do
# Get field and quotify if needed, adjust line (remove field and comma).
field="${line:0:${fieldLen}}"
[[ "${field:0:1}" = '"' ]] || field="\"${field}\""
line="${line:$((fieldLen+1))}"
#line="${line:${fieldLen}}"
#line="${line:1}"
# Append to output line and prepare for next field.
outLine="${outLine}${sep}${field}"; sep=","
fieldLen=$(getNextFieldLen "${line}")
done
# Output built line.
echo "${outLine}"
done
}
And, on the off-chance you want to read directly from a file (though providing a file name that's empty or "-" will use standard input so you can probably just use the file-based function for everything):
function quotifyFile {
file="$1"
# Empty file or "-" means standard input, otherwise take input from real file.
[[ ${#file} -eq 0 ]] && { quotifyStdIn; return; }
[[ "${file}" = "-" ]] && { quotifyStdIn; return; }
quotifyStdIn < "${file}"
}
And, finally, because every program that's not a "Hello, world" one deserves some form of test harness, this is what you can use to test the various capabilities:
(
echo 'paxdiablo,was here'
echo 'and,"then, strangely,",he,was,not'
echo '50,"My name is ""Pax"", and yours is ""Bob""",42'
echo '17,"""Love"" is grand",19'
) > harness.csv
echo "Before:"
sed "s/^/ /" harness.csv
echo "After:"
quotifyFile harness.csv | sed "s/^/ /"
rm -rf harness.csv
And, since a test harness is of little use unless you run the tests, here's the results of the first run:
Before:
paxdiablo,was here
and,"then, strangely,",he,was,not
50,"My name is ""Pax"", and yours is ""Bob""",42
17,"""Love"" is grand",19
After:
"paxdiablo","was here"
"and","then, strangely,","he","was","not"
"50","My name is ""Pax"", and yours is ""Bob""","42"
"17","""Love"" is grand","19"
Hopefully, that will be enough to get you going in the absence of being able to install packages. Of course, if one of the packages you can't install in bash itself, then you have problems that I can't help you with :-)
Your starting CSV is not a good CSV: the 2 rows have different number of columns
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
| 378478 | COMPLETED | Tracfone | - | - | 2020/03/29 09:39:22 | - | 2787 | - | 356074101197544 | 89148000005748235454 | 75176540 |
| 378328 | COMPLETED | Total Wireless | Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB) | 50 | 2020/03/29 | - | - | - | - | - | - |
+--------+-----------+----------------+--------------------------------------------------------------------------+----+---------------------+---+------+---+-----------------+----------------------+----------+
Using Miller (https://github.com/johnkerl/miller) you could run
mlr --csv --quote-all -N unsparsify input >output
to have
"378478","COMPLETED","Tracfone","","","2020/03/29 09:39:22","","2787","","356074101197544","89148000005748235454","75176540"
"378328","COMPLETED","Total Wireless","Unlimited Talk, Text, & Data (First 25GB High Speed, then unlimited 2GB)","50","2020/03/29","","","","","",""
You can use it downloading the executable https://github.com/johnkerl/miller/releases/tag/v5.7.0
In my script I need to get the highest number of a file two times, so i wanted to create a function. This is the command in the script:
First time:
highest=$( ls $path.bak.* | sort -t"." -k2 -n | tail -n1 | sed -r 's/.*\.(.*)/\1/')
Second time:
newhighest=$(ls $path.bak.* | sort -t"." -k2 -n | tail -n1 | sed -r 's/.*\.(.*)/\1/')
Now my question:
How can I shorten this with a function?
Here my Input-Files:
test.bak.1
test.bak.2
test.bak.3
test.bak.4
test.bak.5
test.bak.6
test.bak.7
test.bak.8
test.bak.9
test.bak.10
test.bak.11
Expected return: 11
Written out for readability:
#!/usr/bin/env bash
# ^^^^ - Ensure that this script is run with bash, not /bin/sh
# Enable "extended globs", so we can exclude names that don't end with digits
shopt -s extglob
# since your files are test.bak.*
path=test
get_highest() {
# set the function's argument list
set -- "$path".bak.+([[:digit:]])
# if we have just one valid filename, we know the glob expanded successfully.
# otherwise, no such files exist, so exit the function immediately
[[ -e $1 || -L $1 ]] || return 1
# stream our list of extensions into sort, and let awk find the highest number
printf '%s\n' "${###*.}" | awk '$0>last{last=$0}END{print last}'
}
highest=$(get_highest) || { echo "No backup files found" >&2; exit 1; }
new_highest=$(get_highest) || { echo "No backup files on 2nd pass" >&2; exit 1; }
Note:
Expansions need to be quoted; "$path"/*, not $path/*, or else path="Directory With Spaces/test" would look for files in Spaces/test, after emitting Directory and With as results.
ls should never be used programatically.
extglob syntax allows regex-like capabilities for matching groups of files, letting us assert here that we only consider filenames that end in .bak. followed by a digit.
In general, you should write your scripts to be easy to read and understand as a higher priority than writing them to be short. Your future self (and others who need to maintain code in the future) will thank you.
Because newlines can contain in filenames, they're unsafe to use to separate filenames in a stream; only the NUL character is safe for this use when names are not otherwise quoted or escaped. Thus, when emitting a stream of arbitrary names they should be formatted with the string %s\0 and sorted with the -z argument. However, we're only printing the numeric extensions here, making newlines safe.
Good day.
Everyday i receive a list of numbers like the example below:
11986542586
34988745236
2274563215
4532146587
11987455478
3652147859
As you can see some of them have a 9(11 digits total) in as the third digit and some dont(10 digits total, that`s because the ones with an extra 9 are the new Brazilian mobile number format and the ones without it are in the old format.
The thing is that i have to use the numbers in both formats as a parameter for another script and i usually have do this by hand.
I am trying to create a script that reads the length of a mobile number and check it`s and add or remove the "9" of a number or string if the digits condition is met and save it in a separate file condition is met.
So far i am only able to check its length but i don`t know how to add or remove the "9" in the third digit.
#!/bin/bash
Numbers_file="/FILES/dir/dir2/Numbers_File.txt"
while read Numbers
do
LEN=${#Numbers}
if [ $LEN -eq "11" ]; then
echo "lenght = "$LEN
elif [ $LEN -eq "10" ];then
echo "lenght = "$LEN
else
echo "error"
fi
done < $Numbers_file
You can delete the third character of any string with sed as follows:
sed 's/.//3'
Example:
echo "11986542586" | sed 's/.//3'
1186542586
To add a 9 in the third character:
echo "2274563215" | sed 's/./&9/3'
22794563215
If you are absolutely sure about the occurrence happening only at the third position, you can use an awk statement as below,
awk 'substr($0,3,1)=="9"{$0=substr($0,1,2)substr($0,4,length($0))}1' file
1186542586
3488745236
2274563215
4532146587
1187455478
3652147859
Using the POSIX compliant substr() function, process only the lines having 9 at the 3rd position and move around the record not considering that digit alone.
substr(s, m[, n ])
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s
There are lots of text manipulation tools that will do this, but the lightest weight is probably cut because this is all it does.
cut only supports a single range but does have an invert function so cut -c4 would give you just the 4th character, but add in --complement and you get everything but character 4.
echo 1234567890 | cut -c4 --complement
12356789