Bash/Shell -- Inserting line breaks between dates - bash

Ive got data that comes in a file with multiple dates/times, etc...
example:
12/15/19,23:30,80.2
12/15/19,23:45,80.6
12/16/19,00:00,80.5
12/16/19,00:15,80.2
And would like to use some command that will automatically go through the whole file and anytime the date changes, it would insert 2 Blank lines so that i'm able to see more clearly when the date changes.
example of what i'm looking for the file to look like after said command:
12/15/19,23:30,80.2
12/15/19,23:45,80.6
12/16/19,00:00,80.5
12/16/19,00:15,80.2
What is the best way to do this through bash/shell command line commands?

Using awk:
awk -F',' 'NR>1 && prev!=$1{ print ORS }
{ prev=$1; print }' file
Use , as field separator
If this is not the first line and prev is different from field1, print two newlines (print prints one newline and the
output record separator ORS another one)
For each line, save the value of field1 in variable prev and print the line

Since you're detecting patterns over multiple lines, you'll want to use bash builtins instead of programs like grep or sed.
# initialize variable
last_date=''
# loop over file lines (IFS='' to loop by line instead of word)
while IFS='' read line; do
# extract date (up to first comma)
this_date="${line%%,*}"
# print blank line unless dates are equal
[[ "$this_date" = "$last_date" ]] || echo
# remember date for next line
last_date="$this_date"
# print
printf '%s\n' "$line"
# feed loop with file
done < my_file.txt
Here's the shorter copy/paste version:
b='';while IFS='' read l;do a="${l%%,*}";[[ "$a" = "$b" ]]||echo;b="$a";printf '%s\n' "$l";done < my_file.txt
And you can also make it a function:
function add_spaces {
# initialize variable
last_date=''
# loop over file lines (IFS='' to loop by line instead of word)
while IFS='' read line; do
# extract date (up to first comma)
this_date="${line%%,*}"
# print blank line unless dates are equal
[[ "$this_date" = "$last_date" ]] || echo
# remember date for next line
last_date="$this_date"
# print
printf '%s\n' "$line"
# feed loop with file
done < "$1" # $1 is the first argument to the function
}
So that you can call it whenever you want:
add_spaces my_file.txt

Related

Copy number of line composed by special character in bash

I have an exercise where I have a file and at the begin of it I have something like
#!usr/bin/bash
# tototata
#tititutu
#ttta
Hello world
Hi
Test test
#zabdazj
#this is it
And I have to take each first line starting with a # until the line where I don't have one and stock it in a variable. In case of a shebang, it has to skip it and if there's blank space between lines, it has to skip them too. We just want the comment between the shebang and the next character.
I'm new to bash and I would like to know if there's a way to do it please ?
Expected output:
# tototata
#tititutu
#ttta
Try in this easy way to better understand.
#!/bin/bash
sed 1d your_input_file | while read line;
do
check=$( echo $line | grep ^"[#;]" )
if ([ ! -z "$check" ] || [ -z "$line" ])
then
echo $line;
else
exit 1;
fi
done
This may be more correct, although your question was unclear about weather the input file had a script shebang, if the shebang had to be skipped to match your sample output, or if the input file shebang was just bogus.
It is also unclear for what to do, if the first lines of the input file are not starting with #.
You should really post your assignment's text as a reference.
Anyway here is a script that does collects first set of consecutive lines starting with a sharp # into the arr array variable.
It may not be an exact solution to your assignment (witch you should be able to solve with what your previous lessons taught you), but will get you some clues and keys to iterate reading lines from a file and testing that lines starts with a #.
#!/usr/bin/env bash
# Our variable to store parsed lines
# Is an array of strings with an entry per line
declare -a arr=()
# Iterate reading lines from the file
# while it matches Regex: ^[#]
# mean while lines starts with a sharp #
while IFS=$'\n' read -r line && [[ "$line" =~ ^[#] ]]; do
# Add line to the arr array variable
arr+=("$line")
done <a.txt
# Print each array entries with a newline
printf '%s\n' "${arr[#]}"
How about this (not tested, so you may have to debug it a bit, but my comments in the code should explain what is going on):
while read line
do
# initial is 1 one the first line, and 0 after this. When the script starts,
# the variable is undefined.
: ${initial:=1}
# Test for lines starting with #. Need to quote the hash
# so that it is not taken as comment.
if [[ $line == '#'* ]]
then
# Test for initial #!
if (( initial == 1 )) && [[ $line == '#!'* ]]
then
: # ignore it
else
echo $line # or do whatever you want to do with it
fi
fi
# stop on non-blank, non-comment line
if [[ $line != *[^\ ]* ]]
then
break
fi
initial=0 # Next line won't be an initial line
done < your_file

testing last column header of text file for equality

I am trying to check if columns of a user file called testfile.txt are correctly named (they should be named var1, var2, var3). The file looks like
var1 var2 var3
5 6 7
I was thinking the following should work:
read i j k < testfile.txt
echo "${k}"
if [[ "${i}" != "var1" || "${j}" != "var2" || "${k}" != "var3" ]]; then
echo "input incorrect"
else
echo "input correct"
fi
but this returns
var3
input incorrect
So although the last column seems to be correctly named, the test fails. If I only test for the names of the first two columns, it works, but the test for the last column is always deemed false somehow.
How can I correct the script so that it can also test correctly for the value of the last column header?
If you just want to strip the CR's from the header line:
read i j k <<< $( sed '1 s/\r//g; 2q;' testfile.txt )
If you want to clean the whole file:
tr -d "\r" <testfile.txt>x && mv x testfile.txt
One in GNU awk (or any awk that supports multichar RS like mawk or Busybox awk):
$ awk 'BEGIN {
RS="\r?\n" # regard \r
headers="var1, var2, var3" # header names
n=split(headers,h,/, */) # set to an array
}
NR==1 { # only process first record
for(i=1;i<=NF;i++) # and every field of it
if($i!=h[i] || n!=NF) { # if a header differ of count is wrong
print "input incorrect" # complain
exit # and leave
}
exit # leave without complaining
}' testfile
Output could be:
input incorrect
or not.

envsubst based on string

I have the following two lines in my text file
$MyEnv
someText$MyEnv
I want to use envsubst to only replace the second occurence of MyEnv variable. How can I use the string "someText" to distinguish between the first and second occurrence of the variable and substitute in env variable?
so envsubst < file1 >file2
file 2
$MyEnv
someTextValueofMyEnv
How is this possible
The following code will substitute all environment variables on the second line of the input. The requested envsubst command is the only non-builtin that is used.
L=0
while read line; do
L=$((L+1))
if [ $L = 2 ]; then
echo "$line" |envsubst
else
echo "$line"
fi
done < file1 > file2
Start reading with the last line since it dictates the inputs and outputs; the contents of file1 are read line by line, populating $line for each iteration of the while loop. The echo lines are piped into file2.
We have a line counter $L which increments at the beginning of the loop. If we're on line 2, we send the line through envsubst. Otherwise, we just report it.
You also asked how you could use the string "someText" to distinguish between occurrences. I'm not exactly sure what you mean by this, but consider this:
while read line; do
# $line contains the string 'someText$MyEnv'
# (literally: $line does not match itself when removing that string)
if [ "$line" != "${line#*someText\$MyEnv}" ]; then
echo "$line" |envsubst
else
echo "$line"
fi
done < file1 > file2
Note: envsubst will only substitute exported variables. The envsubst command is not portable; it's part of GNU gettext and it is not a part of either the POSIX standard utilities or the Linux Standard Base commands (LSB).
To be fully portable (and fully using sh builtins!), you'd need to use eval, which is unsafe without lots of extra checks.

Reformatting a csv file, script is confused by ' %." '

I'm using bash on cygwin.
I have to take a .csv file that is a subset of a much larger set of settings and shuffle the new csv settings (same keys, different values) into the 1000-plus-line original, making a new .json file.
I have put together a script to automate this. The first step in the process is to "clean up" the csv file by extracting lines that start with "mme " and "sms ". Everything else is to pass through cleanly to the "clean" .csv file.
This routine is as follows:
# clean up the settings, throwing out mme and sms entries
cat extract.csv | while read -r LINE; do
if [[ $LINE == "mme "* ]]
then
printf "$LINE\n" >> mme_settings.csv
elif [[ $LINE == "sms "* ]]
then
printf "$LINE\n" >> sms_settings.csv
else
printf "$LINE\n" >> extract_clean.csv
fi
done
My problem is that this thing stubs its toe on the following string at the end of one entry: 100%." When it's done with the line, it simply elides the %." and the new-line marker following it, and smears the two lines together:
... 100next.entry.keyname...
I would love to reach in and simply manually delimit the % sign, but it's not a realistic option for my use case. Clearly I'm missing something. My suspicion is that I am in some wise abusing cat or read in the first line.
If there is some place I should have looked to find the answer before bugging you all, by all means point me in that direction and I'll sod off.
Syntax for printf is :
printf format [argument]...
In [ printf ] format string, anything followed by % is a format specifier as described in the link above. What you would like to do is :
while read -r line; do # Replaced LINE with line, full uppercase variable are reserved for the syste,
if [[ "$line" = "mme "* ]] # Here* would glob for anything that comes next
then
printf "%s\n" $line >> mme_settings.csv
elif [[ "$line" = "sms "* ]]
then
printf "%s\n" $line >> sms_settings.csv
else
printf "%s\n" $line >> extract_clean.csv
fi
done<extract.csv # Avoided the useless use of cat
As pointed out, your problem is expanding a parameter containing a formatting instruction in the formatting argument of printf, which can be solved by using echo instead or moving the parameter to be expanded out of the formatting string, as demonstrated in other answers.
I recommend not looping over your whole file with Bash in the first place, as it's notoriously slow; you're extracting lines starting with certain patterns, which is a job at which grep excels:
grep '^mme ' extract.csv > mme_settings.csv
grep '^sms ' extract.csv > sms_settings.csv
grep -v '^mme \|^sms ' extract.csv > extract_clean.csv
The third command uses the -v option (extract lines that don't match) and alternation to exclude lines both starting with mme and sms.

IFS separate a string like "Hello","World","this","is, a boring", "line"

I'm trying to parse a .csv file and I have some problems with IFS.
The file contains lines like this:
"Hello","World","this","is, a boring","line"
The columns are separated with a comma, so I tried to explode the line with this code:
IFS=, read -r -a tempArr <<< "$line"
But I get this output:
"Hello"
"World"
"this"
"is
a boring"
"line"
I understand why, so I tried some other commands but I don't get my expected output.
IFS=\",\"
IFS=\",
IFS=',\"'
IFS=,\"
Every time the third element is seperated in 2 parts.
How can I use IFS to seperate the string to 5 parts like this?
"Hello"
"World"
"this"
"is, a boring"
"line"
give this a try:
sed 's/","/"\n"/g' <<<"${line}"
sed has a search and replace command s which is using regex to search pattern.
The regex replaces , in "," with new line char.
As a consequence each element is on a separate line.
You may wish to use the gawk with FPAT to define what makes a valid string -
Input :
"hello","world","this,is"
Script :
gawk -n 'BEGIN{FS=",";OFS="\n";FPAT="([^,]+)|(\"[^\"]+\")"}{$1=$1;print $0}' somefile.csv
Output :
"hello"
"world"
"this,is"
bashlib provides a csvline function. Assuming you've installed it somewhere in your PATH:
line='"Hello","World","this","is, a boring","line"'
source bashlib
csvline <<<"$line"
printf '%s\n' "${CSVLINE[#]}"
...output from the above being:
Hello
World
this
is, a boring
line
To quote the implementation (which is copyright lhunath, the below text being taken from this specific revision of the relevant git repo):
# _______________________________________________________________________
# |__ csvline ____________________________________________________________|
#
# csvline [-d delimiter] [-D line-delimiter]
#
# Parse a CSV record from standard input, storing the fields in the CSVLINE array.
#
# By default, a single line of input is read and parsed into comma-delimited fields.
# Fields can optionally contain double-quoted data, including field delimiters.
#
# A different field delimiter can be specified using -d. You can use -D
# to change the definition of a "record" (eg. to support NULL-delimited records).
#
csvline() {
CSVLINE=()
local line field quoted=0 delimiter=, lineDelimiter=$'\n' c
local OPTIND=1 arg
while getopts :d: arg; do
case $arg in
d) delimiter=$OPTARG ;;
esac
done
IFS= read -d "$lineDelimiter" -r line || return
while IFS= read -rn1 c; do
case $c in
\")
(( quoted = !quoted ))
continue ;;
$delimiter)
if (( ! quoted )); then
CSVLINE+=( "$field" ) field=
continue
fi ;;
esac
field+=$c
done <<< "$line"
[[ $field ]] && CSVLINE+=( "$field" ) ||:
} # _____________________________________________________________________

Resources