format a file content using shell script [duplicate] - bash

This question already has answers here:
Escaping separator within double quotes, in awk
(3 answers)
Closed 2 years ago.
Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me. The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
cat $1 | sed '1d' | cut -d, -f2-3 | tr -d '"' > $2
It was working well until I found out that it is not working for a type of data containing comma in a3 like this one:
data,VERSION,"FUNDS.TRANSFER,ASS.VERS.TIERS.BOP",,
Which returns
VERSION>FUNDS.TRANSFER
instead of
VERSION>FUNDS.TRANSFER,ASS.VERS.TIERS.BOP
Can you help me out updating it please ? Thanks

Consider using a proper CSV parsing tool like csvtool to extract the relevant columns (its much easier & more reliable than rolling out your own parsing). Then, use tr/sed to do the necessary transformations:
sed '1d' file.txt | csvtool -t ',' col 2,3 - | tr -d '"' | sed 's/,/>/'
Steps:
Remove the header line using sed
Use csvtool to extract the 2nd and 3rd columns
Use tr to remove the double quotes
Use sed to map the first , to a > (you can't use tr for this since that does a global translation)
You can install csvtool with your package manager, e.g. on a Debian-based system sudo apt-get install csvtool. Replace apt-get with your package manager on other systems e.g. yum, brew, ...

Ruby has a CSV module included:
ruby -rcsv -e '
CSV.read(ARGV.shift).map {|row|
printf "%s>%s\n", row[1], row[2]
}
' file | sed 1d

enter image description here
hello sir I'm also learning a shell script, what do you want like that ??
this is my code
function klir(){
line2=$(cut -d\, -f1 $1 > $1.temp.filterline2)
line3=$(cut -d\" -f2 $1 > $1.temp.filterline3)
paste $1.temp.filterline2 $1.temp.filterline3 | sed "s/\t/>/g ; 1d"
rm $1.temp.filterline2 $1.temp.filterline3 2>/dev/null
}
klir $1

Related

Extract specific string from line with standard grep,egrep or awk

i'm trying to extract a specific string from a grep output
uci show minidlna
produces a large list
.
.
.
minidlna.config.enabled='1'
minidlna.config.db_dir='/mnt/sda1/usb/db'
minidlna.config.enable_tivo='1'
minidlna.config.wide_links='1'
.
.
.
so i tried to narrow down what i wanted by running
uci show minidlna | grep -oE '\bdb_dir=\S+'
this narrows the output to
db_dir='/mnt/sda1/usb/db'
what i want is to output only
/mnt/sda1/usb/db
without the quotes and without the starting "db_dir" so i can run rm /mnt/sda1/usb/db/file.db
i've used the answers found here
How to extract string following a pattern with grep, regex or perl
and that's as close as i got.
EDIT: after using Ed Morton's awk command i needed to pass the output to rm command.
i used:
| ( read DB; (rm $DB/files.db) .
read DB passes the output into the vairable DB.
(...) combines commands.
rm $DB/files.db deletes the the file files.db.
Is this what you're trying to do?
$ awk -F"'" '/db_dir/{print $2}' file
/mnt/sda1/usb/db
That will work in any awk in any shell on every UNIX box.
If that's not what you want then edit your question to clarify your requirements and post more truly representative sample input/output.
Using sed with some effort to avoid single quotes:
sed -n 's/^minidlna.config.db_dir=\s*\S\(\S*\)\S\s*$/\1/p' input
Well, so you end up having a string like db_dir='/mnt/sda1/usb/db'.
I would first remove the quotes by piping this to
.... | tr -d "'"
Now you end up with a string like db_dir=/mnt/sda1/usb/db.
Say you have this string stored in a variable named confstr, then
${confstr##*=}
gives you just /mnt/sda1/usb/db, since *= denotes everything from the start to the equal sign, and ## denotes removal.
I would do this:
Once you either extracted your line about into file.txt (or pipe it into this command), split the fields using the quote character. Use printf to generate the rm command and pass this into bash to execute.
$ awk -F"'" '{printf "rm %s.db/file.db\n", $2}' file.txt | bash
rm: /mnt/sda1/usb/db.db/file.db: No such file or directory
With your original command:
$ uci show minidlna | grep -oE '\bdb_dir=\S+' | \
awk -F"'" '{printf "rm %s.db/file.db\n", $2}' | bash

Getting rid of some special symbol while reading from a file

I am writing a small script which is getting some configuration options from a settings file with a certain format (option=value or option=value1 value2 ...).
settings-file:
SomeOption=asdf
IFS=HDMI1 HDMI2 VGA1 DP1
SomeOtherOption=ghjk
Script:
for VALUE in $(cat settings | grep IFS | sed 's/.*=\(.*\)/\1/'); do
echo "$VALUE"x
done
Now I get the following output:
HDMI1x
HDMI2x
VGA1x
xP1
Expected output:
HDMI1x
HDMI2x
VGA1x
DP1x
I obviously can't use the data like this since the last read entry is mangled up somehow. What is going on and how do I stop this from happening?
Regards
Generally you can use awk like this:
awk -F'[= ]' '$1=="IFS"{for(i=2;i<=NF;i++)print $i"x"}' settings
-F'[= ] splits the line by = or space. The following awk program checks if the first field, the variable name equals IFS and then iterates trough column 2 to the end and prints them.
However, in comments you said that the file is using Windows line endings. In this case you need to pre-process the file before using awk. You can use tr to remove the carriage return symbols:
tr -d '\r' settings | awk -F'[= ]' '$1=="IFS"{for(i=2;i<=NF;i++)print $i"x"}'
The reason is likely that your settings file uses DOS line endings.
Once you've fixed that (with dos2unix for example), your loop can also be modified to the following, removing two utility invocations:
for value in $( sed -n -e 's/^IFS.*=\(.*\)/\1/p' settings ); do
echo "$value"x
done
Or you can do it all in one go, removing the need to modify the settings file at all:
tr -d '\r' <settings |
for value in $( sed -n -e 's/^IFS.*=\(.*\)/\1/p' ); do
echo "$value"x
done

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

Get substring from file using "sed"

Can anyone help me to get substring using sed program?
I have a file with this line:
....
define("BASE", "empty"); # there can be any string (not only "empty").
....
And I need to get "empty" as string variable to my bash script.
At this moment I have:
sed -n '/define(\"BASE\"/p' path/to/file.ext
# returns this line:
# define("BASE", "empty");
# but I need 'empty'
UPD: Thanks to #Jaypal
For now I have bash script:
DBNAME=`sed -n '/define(\"BASE\"/p' path/to/file.ext`
echo $DBNAME | sed -r 's/.*"([a-zA-Z]+)".*/\1/'
It work OK, but if there any way to make the same manipulation with one line of code?
You should use is
sed -n 's/.*\".*\", \"\(.*\)\".*/\1/p' yourFile.txt
which means something (.*) followed by something in quotes (\".*\"), then a comma and a blank space (,), and then again something within quotes (\"\(.*\)\").
The brackets define the part that you later can reuse, i.e. the string within the second quotes. used it with \1.
I put -n front in order to answer the updated question, to get online the line that was manipulated.
This should help -
sed -r 's/.*"([a-zA-Z]+)"\);/\1/' path/to/file.ext
If you are ok with using awk then you can try the following -
awk -F\" '/define\(/{print $(NF-1)}' path/to/file.ext
Update:
DBNAME=$(sed -r '/define\(\"BASE\"/s/.*"([a-zA-Z]+)"\);/\1/' path/to/file.ext)
sed -nr '/^define.*"(.*)".*$/{s//\1/;p}' path/to/file.ext
if your file doesn't change over time (i.e. the line numbers will always be the same) you can take the line, and use delimiters to take your part out:
`sed -n 'Xp' your.file | cut -d ' ' -f 2 |cut -d "\"" -f 2`
assuming X is the line number of your required line

How to remove the last character from a bash grep output

COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2`
outputs something like this
"Abc Inc";
What I want to do is I want to remove the trailing ";" as well. How can i do that? I am a beginner to bash. Any thoughts or suggestions would be helpful.
This will remove the last character contained in your COMPANY_NAME var regardless if it is or not a semicolon:
echo "$COMPANY_NAME" | rev | cut -c 2- | rev
I'd use sed 's/;$//'. eg:
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | sed 's/;$//'`
foo="hello world"
echo ${foo%?}
hello worl
I'd use head --bytes -1, or head -c-1 for short.
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | head --bytes -1`
head outputs only the beginning of a stream or file. Typically it counts lines, but it can be made to count characters/bytes instead. head --bytes 10 will output the first ten characters, but head --bytes -10 will output everything except the last ten.
NB: you may have issues if the final character is multi-byte, but a semi-colon isn't
I'd recommend this solution over sed or cut because
It's exactly what head was designed to do, thus less command-line options and an easier-to-read command
It saves you having to think about regular expressions, which are cool/powerful but often overkill
It saves your machine having to think about regular expressions, so will be imperceptibly faster
I believe the cleanest way to strip a single character from a string with bash is:
echo ${COMPANY_NAME:: -1}
but I haven't been able to embed the grep piece within the curly braces, so your particular task becomes a two-liner:
COMPANY_NAME=$(grep "company_name" file.txt); COMPANY_NAME=${COMPANY_NAME:: -1}
This will strip any character, semicolon or not, but can get rid of the semicolon specifically, too.
To remove ALL semicolons, wherever they may fall:
echo ${COMPANY_NAME/;/}
To remove only a semicolon at the end:
echo ${COMPANY_NAME%;}
Or, to remove multiple semicolons from the end:
echo ${COMPANY_NAME%%;}
For great detail and more on this approach, The Linux Documentation Project covers a lot of ground at http://tldp.org/LDP/abs/html/string-manipulation.html
Using sed, if you don't know what the last character actually is:
$ grep company_name file.txt | cut -d '=' -f2 | sed 's/.$//'
"Abc Inc"
Don't abuse cats. Did you know that grep can read files, too?
The canonical approach would be this:
grep "company_name" file.txt | cut -d '=' -f 2 | sed -e 's/;$//'
the smarter approach would use a single perl or awk statement, which can do filter and different transformations at once. For example something like this:
COMPANY_NAME=$( perl -ne '/company_name=(.*);/ && print $1' file.txt )
don't have to chain so many tools. Just one awk command does the job
COMPANY_NAME=$(awk -F"=" '/company_name/{gsub(/;$/,"",$2) ;print $2}' file.txt)
In Bash using only one external utility:
IFS='= ' read -r discard COMPANY_NAME <<< $(grep "company_name" file.txt)
COMPANY_NAME=${COMPANY_NAME/%?}
Assuming the quotation marks are actually part of the output, couldn't you just use the -o switch to return everything between the quote marks?
COMPANY_NAME="\"ABC Inc\";" | echo $COMPANY_NAME | grep -o "\"*.*\""
you can strip the beginnings and ends of a string by N characters using this bash construct, as someone said already
$ fred=abcdefg.rpm
$ echo ${fred:1:-4}
bcdefg
HOWEVER, this is not supported in older versions of bash.. as I discovered just now writing a script for a Red hat EL6 install process. This is the sole reason for posting here.
A hacky way to achieve this is to use sed with extended regex like this:
$ fred=abcdefg.rpm
$ echo $fred | sed -re 's/^.(.*)....$/\1/g'
bcdefg
Some refinements to answer above. To remove more than one char you add multiple question marks. For example, to remove last two chars from variable $SRC_IP_MSG, you can use:
SRC_IP_MSG=${SRC_IP_MSG%??}
cat file.txt | grep "company_name" | cut -d '=' -f 2 | cut -d ';' -f 1
I am not finding that sed 's/;$//' works. It doesn't trim anything, though I'm wondering whether it's because the character I'm trying to trim off happens to be a "$". What does work for me is sed 's/.\{1\}$//'.

Resources