bash remove/change values from one field with a loop - bash

I have a file where the 10th column in excel contains prices.
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5500",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50000",19.50,bieber,20160506,0,,N,E,,,,,,
When it goes to csv the quotes and the comma's stay.
I need to pick out the column that is surrounded by quotes - I use grep -o
and then after clearing the commas, i get rid of the quotes.
I can't use quotes or comma to delimit in awk because the prices get broken up into different fields.
cat /tmp/wowmom | awk -F ',' '{print $10}'
"5000"
"75
"100
"5500"
"50
"350
"50000"
while read line
do
clean_price=$(grep -o '".*"' $line)
echo "$clean_price" | tr -d',' > cleanprice1
echo "cleanprice1" | tr -d'"' > clearnprice2
done </tmp/wowmom
I get errors though "No such file or directory" on the grep
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
I want to some way, Isolate the value within quotes with a grep -o and take out comma from the number , then use awk to take the quotes out of field 10.
I am doinng this manually right now It is a suprizingly long job - there are thousands of lines on this.

You an use FPAT with gnu-awk for this:
awk -v FPAT='"[^"]+",|[^,]*' '{gsub(/[",]+/, "", $10)} 1' OFS=, file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,

You are using the wrong tool here.
sed -r 's/^(([^,]+,){9})"([^,]+),?([^,]+)"/\1\3\4/' file.csv > newfile.csv
The regular expression captures the first nine fields into the first back reference (and also populates the second with the last of the nine fields), the number before the separator comma in the third, and the rest of the number in the fourth, then the substitution glues them back without the skipped elements.
If you have numbers with more than one thousands separator (i.e. above one million), you will need a slightly more complex script.
In terms of what's wrong with your original script, the second argument to grep is the name of the file to grep, not the string to grep. You can use a here string (in Bash) or pipe the string to grep, but again, this is not how you do it properly.
grep -o '"[^"]*"' <<<"$line"
or
printf '%s' "$line" | grep -o '"[^"]*"'
Notice also the quotes -- omitting quotes are a common newbie error; you can get away with it for a while, and then it bites you.

A pure Bash solution:
while IFS=\" read -r l n r; do
printf '%s\n' "$l${n//,/}$r"
done < input_file.txt

If you're looking for perl:
#!perl
use strict;
use warnings;
use Text::CSV;
use autodie;
my $csv = Text::CSV->new({binary=>1, eol=>"\n"});
my $filename = shift #ARGV;
open my $fh, "<", $filename;
while (my $row = $csv->getline($fh)) {
$row->[9] =~ s/,//g;
$csv->print(*STDOUT, $row);
}
close $fh;
demo:
$ perl csv.pl file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,

Related

How to get a number with variable number of digits from a string in a file using bash script?

I have the following file:
APP_VERSION.ts
export const APP_VERSION = 1;
This is the only content of that file, and the APP_VERSION variable will be incremented as needed.
So, the APP_VERSION could be a single digit number or multiple digit number, like 15 or 999, etc.
I need to use that value in one of my bash scripts.
use-app-version.sh
APP_VERSION=`cat src/constants/APP_VERSION.ts`
echo $APP_VERSION
I know I can read it with cat. But how can I parse that string so I can get exactly the APP_VERSION value, whether it's 1 or 999, for example.
sed -En 's/(^.*APP_VERSION.*)([[:digit:]]+.*)(\;.*$)/\2/p' src/constants/APP_VERSION
Using sed, split the line into three sections defined by opening and closing brackets. Substitute the line for second section on ( the version value) and print.
You may use this awk:
app_ver=$(awk -F '[[:blank:];=]+' '$(NF-2) == "APP_VERSION" {print $(NF-1)}' src/constants/APP_VERSION.ts)
echo "$app_ver"
1
You can concat some commands to remove everything else:
APP_VERSION=`cat src/constants/APP_VERSION.ts | awk -F '=' '{print $2}' | tr -d ' ' | tr -d ';'`
1 - Cat get all file content
2 - AWK gets all content after '='
3 - Remove space
4 - Remove ;
A simple
APP_VERSION=$(grep --text -Eo '[0-9]+' src/constants/APP_VERSION.ts)
should be enough
With bash only:
APP_VERSION=$(cat src/constants/APP_VERSION.ts)
APP_VERSION=${APP_VERSION%;}
APP_VERSION=${APP_VERSION/*= }
Line 2 removes the trailing ';', line 3 removes everything before "= ".
Alternatively, you could set APP_VERSION as an array, take 5th element, and remove trailing ';'.
Or, another solution, using IFS:
IFS='=;' read a APP_VERSION < src/constants/APP_VERSION.ts
In this version, the space will remain before version number.
Assuming that the task can be rephrased to "extract the digits from a file", there are a few options:
Delete all characters that aren't digits with tr:
version=$(tr -cd '[:digit:]' < infile)
Use grep to match all digits and retain nothing but the match:
version=$(grep -Eo '[[:digit:]]+' infile)
Read file into string and delete all non-digits with just Bash:
contents=$(< infile)
version=${contents//[![:digit:]]}

convert a file content using shell script

Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me.
The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
#while Loops
i=1
while IFS=\" read t1 t2 t3
do
test $i -eq 1 && ((i=i+1)) && continue
echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; }
done < $1
The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n
And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there).
Can someone please help me out? Thanks in advance.
Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:
$ < input tr -d '"' | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3
If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)
Here's a simple pipeline that will do the trick:
sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'
Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.
Use this Perl one-liner:
perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } #F[1,2]' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Is there a way to format the width of a substring within a string in a bash/sh script?

I have to format the width of a substring within a string using a bash script, but without using tokens or loops. A single character between two colons should be prepended by a 0 in order to match the standard width of 2 for each field.
For e.g
from:
6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3
to
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
How can I do this?
sed -r 's/\<([0-9a-f])\>/0\1/g'
Search and replace with a regex. Use \< and \> to match word boundaries so [0-9a-f] only matches single digits.
$ sed -r 's/\<([0-9a-f])\>/0\1/g' <<< "6:0:36:35:30:30:72:6c:73:0:c:52:4c:30:31:30:31:30:30:30:31:36:39:0:1:3"
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
awk -F: -v OFS=: '{for(i=1;i<=NF;i++) if(length($i)==1)gsub($i,"0&",$i)}1' file
Output:
06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03
This will divide the whole line into fields separated by : , if the length of any of the field is == 1. then it will replace that field with 0field.
Bash solution:
IFS=:; for i in $string; do echo -n 0$i: | tail -c 3; done
With
str="06:00:36:35:30:30:72:6c:73:00:0c:52:4c:30:31:30:31:30:30:30:31:36:39:00:01:03"
you can add a '0' to all tokens and remove those that are unwanted:
sed -r 's/0([0-9a-f]{2})/\1/g' <<< "0${str//:/:0}"
That doesn't feel right, making errors and repairing them.
A better alternative is
echo $(IFS=:; printf "%2s:" ${str} | tr " " "0")

how to iterate in a file using keyword on a bash

in some file there is some content like:
scenario1{
user_range:="1..100"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="101..200"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
...
I want replace some values using sed -i. But I can't figure out how to iterates by keyword "scenario" in order to change user_ranges and ips for the whole file.
awk to the rescue!
$ awk -v RS='\n}' 'BEGIN{OFS="\n"}
{from=250*c+1; to=250*(++c);
sub(/:=.*/,":=\""from".."to"\"",$2)}
{print $0 RT}' file
scenario1{
user_range:="1..250"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="251..500"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
ip addresses can be done similarly if there is a regular pattern.
If you insist on using sed You may find it easier if you convert your file to a csv-format first.
tr '\n' ',' <testfile | tr '}' '\n' | tr -d "{" |sed 's/^,*//g;s/,*$//g' >csvfile
Since this results in one scenario per line, it will be much easier to use sed
It is quite easy with plain bash to seperate the values. I assume that the order of the key-value pairs and the number of newlines per stanza stay the same (just for demonstration purpose)
while read line
do
scenario=${line//\{/}
read line; user_range=${line}
read line; ip_low=${line}
read line; ip_high=${line}
read line; endchar=${line}
# here you can insert every piece of code you need
# to change your variables
cat<<-EOF
$scenario{
$user_range
$ip_low
$ip_high
}
EOF
done <file_like_your_example >new_file

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

Resources