How to read a file using delimiter and write in a different file line by line based on delimiter using shell script - shell

I have a requirement, say file foo.txt which contains key-value pair like below:
vi foo.txt
a^1.0^b^2^cc^30^d^4^e^55^fg^67.0^h^8^i^99
and so on.
I have to write a shell script to put each key-value pair in different line to a different file, say goo.txt which contains:
a^1.0^
b^2^
cc^30^
D^4^
e^55^
fg^67.0^
h^8^
i^99^
I have a similar Perl script for it, but I want shell script. The Perl script is like
#! /usr/bin/perl -w
#
use strict;
use Getopt::Long;
my $filename = "";
my $Tags = "";
GetOptions ('file=s' => \$filename);
if (defined $filename and $filename ne "")
{
open (my $DATA,$filename) or die $!;
$Tags = <$DATA>;
while ($Tags =~ m/(.*?\^.*?\^)/g)
{
print "$1\n";
}
close($DATA)
}
I want a similar code using shell script. Need help to write a shell script (ksh) for my requirement.

cat filename | awk -F"^" '{split($0, a, "^"); for(i=1;i<NF;i=i+2){print a[i]"^"a[i+1]"^"}}'

You want an extra newline after each second ^.
The fields are without a ^, A character not ^ can be written as [^^].
Each two fields (any amount of letters excluding ^ followed by a ^ and again),
should be replaced by the match, followed by a newline.
sed 's/[^^]*^[^^]*^/&\n/g' foo.txt

Related

convert a file content using shell script

Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me.
The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
#while Loops
i=1
while IFS=\" read t1 t2 t3
do
test $i -eq 1 && ((i=i+1)) && continue
echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; }
done < $1
The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n
And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there).
Can someone please help me out? Thanks in advance.
Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:
$ < input tr -d '"' | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3
If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)
Here's a simple pipeline that will do the trick:
sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'
Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.
Use this Perl one-liner:
perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } #F[1,2]' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

bash script to modify and extract information

I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

Extracting the first two characters from a file in perl into another file

I'm having a little bit of trouble with my code below -- I'm trying to figure out how to open up all these text files (.csv files that end in DIS that all have one line in them) and get the first two characters (these are all numbers) from them and print them into another file of the same name, with a ".number" suffix. Some of these .DIS files don't have anything in them, in which case I want to print "0".
Lastly, I would like to go through each original .DIS file and delete the first 3 characters -- I did this through bash.
my #DIS = <*.DIS>;
foreach my $file (#DIS){
my $name = $file;
my $output = "$name.number";
open(INHANDLE, "< $file") || die("Could not open file");
while(<INHANDLE>){
open(OUT_FILE,">$output") || die;
my $line = $_;
chomp ($line);
my $string = $line;
if ($string eq ""){
print "0";
} else {
print substr($string,0,2);
}
}
system("sed -i 's/\(.\{3\}\)//' $file");
}
When I run this code, I get a list of numbers are concatenated together and empty .DIS.number files. I'm rather new to Perl, so any help would be appreciated!
When I run this code, I get a list of numbers are concatenated together and empty .DIS.number files.
This is because of this line.
print substr($string,0,2);
print defaults to printing to STDOUT (ie. the screen). You need to give it the filehandle to print to.
print OUT_FILE substr($string,0,2);
They're being concatenated because print just prints what you tell it to, it won't put newlines in for you (there are some global variables which can change this, don't mess with them). You have to add the newline yourself.
print OUT_FILE substr($string,0,2), "\n";
As a final note, when working with files in Perl I would suggest using lexical filehandles, Path::Tiny, and autodie. They will avoid a great number of classic problems working with files in Perl.
I suggest you do it like this
Each *.dis file is opened and the contents read into $text. Then a regex substitution is used to remove the first three characters from the string and capture the first two in $1
If the substitution succeeded then the contents of $1 are written to the number file, otherwise the original file is empty (or shorter than two characters) and a zero is written instead. The remaining contents of $text are then written back to the *.dis file
use strict;
use warnings;
use v5.10.1;
use autodie;
for my $dis_file ( glob '*.DIS' ) {
my $text = do {
open my $fh, '<', $dis_file;
<$fh>;
};
my $num_file = "$dis_file.number";
open my $dis_fh, '>', $dis_file;
open my $num_fh, '>', $num_file;
if ( defined $text and $text =~ s/^(..).?// ) {
print $num_fh "$1\n";
print $dis_fh $text;
}
else {
print $num_fh "0\n";
print $dis_fh "-\n";
}
}
this awk script extract the first two chars of each file to it's own file. Empty files expected to have one empty line based on the spec.
awk 'FNR==1{pre=substr($0,1,2);pre=length(pre)==2?pre:0; print pre > FILENAME".number"}' *.DIS
This will remove the first 3 chars
cut -c 4-
Bash for loop will be better to do both, which we'll need to modify the awk script little bit
for f in *.DIS;
do awk 'NR==1{pre=substr($0,1,2);$0=length(pre)==2?pre:0; print}' $f > $f.number;
cut -c 4- $f > $f.cut;
done
explanation: loop through all files in *.DTS, for the first line of each file, try to get first two chars (1,2) of the line ($0) assign to pre. If the length of pre is not two (either the line is empty or with 1 char only) set the line to 0 or else use pre; print the line, output file name will be input file appended with .number suffix. The $0 assignment is a trick to save couple keystrokes since print without arguments prints $0, otherwise you can provide the argument.
Ideally you should quote "$f" since it may contain space in file name...

Need to pick Latest File From a Dir Using Shell Script

I am new to Shell Script and I got a requirement to pick the latest files from a dir using Shell script
Directory Name : FTPDIR
File In this Dir will be of
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
Note: (Need to Pick one Latest from each Group)- Below is the out put which I need to get after executing the shell script
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
Ex: In the latest File APC5502015VP092020121451.csv the no 092020121451 is the date part in the format : MMDDYYYYHHMM and string part is APC5502015VP (Length Not Fixed in String Part)
I need to pick those three files from the dir using shell script
Can you help me to resolve this?
It's going to be really problematic to do this safely in just bash. As Jonathan mentioned, "special" characters like spaces or newlines may bung up your script.
If we can assume that there won't be any of those, then we can do most of job in bash, without involving other tools.
# Make an associative array to record types, in the second loop...
declare -A a
for file in *.csv; do
# First, we convert the filenames into something that can be sorted.
# The next three lines account for your "unknown length" in the first part
# of the filename. We assume the date+time is the 12 chars before ".csv".
new="$(rev <<<"$file")"
new="${new:4:12}"
new="$(rev <<<"$new")"
new="${new:4:4}${new:0:2}${new:2:2}${new:8:4}"
len=$(( ${#file} - 16 ))
echo "$new ${file:0:$len} $file"
done | sort | while read date type file; do
# Next, we print only the first of each "type"...
if [[ ${a[$type]} -eq 0 ]]; then
a[$type]=1
echo "$file"
fi
# And stop once we have collected three types.
if [[ ${#a[*]} -ge 3 ]]; then
break
fi
done
As I say, this doesn't handle newlines in filenames.
Note also that this uses rev and sort, which are not built in to bash. The rev parts could be done internally, using more code, which might make them execute faster, but you'd only see a difference in very extreme cases. There's not much we can do about sort, since there isn't a built-in within bash.
This Perl script works on the given data. No doubt it could be improved.
#!/usr/bin/env perl
use strict;
use warnings;
my %bases;
while (<>)
{
chomp;
my $name = $_;
my($prefix, $mmdd, $yyyy, $hhmm) = ($name =~ m/(.*)(\d{4})(\d{4})(\d{4})\.csv/);
#print "$name = $prefix $yyyy $mmdd $hhmm\n";
my $stamp = "$yyyy$mmdd$hhmm";
if (!exists($bases{$prefix}) || ($stamp > $bases{$prefix}->{stamp}))
{
$bases{$prefix} = { name => $name, stamp => $stamp };
}
}
foreach my $prefix (sort keys %bases)
{
print "$bases{$prefix}->{name}\n";
}
Output:
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
this is the awk solution:
cd FTPDIR
ls -1|awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}'
Testted Below:
> cat temp
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
> awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}' temp
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
APC5502015VP092020121451.csv

Reading java .properties file from bash

I am thinking of using sed for reading .properties file, but was wondering if there is a smarter way to do that from bash script?
This would probably be the easiest way: grep + cut
# Usage: get_property FILE KEY
function get_property
{
grep "^$2=" "$1" | cut -d'=' -f2
}
The solutions mentioned above will work for the basics. I don't think they cover multi-line values though. Here is an awk program that will parse Java properties from stdin and produce shell environment variables to stdout:
BEGIN {
FS="=";
print "# BEGIN";
n="";
v="";
c=0; # Not a line continuation.
}
/^\#/ { # The line is a comment. Breaks line continuation.
c=0;
next;
}
/\\$/ && (c==0) && (NF>=2) { # Name value pair with a line continuation...
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e - 1); # Trim off the backslash.
c=1; # Line continuation mode.
next;
}
/^[^\\]+\\$/ && (c==1) { # Line continuation. Accumulate the value.
v= "" v substr($0,1,length($0)-1);
next;
}
((c==1) || (NF>=2)) && !/^[^\\]+\\$/ { # End of line continuation, or a single line name/value pair
if (c==0) { # Single line name/value pair
e=index($0,"=");
n=substr($0,1,e-1);
v=substr($0,e+1,length($0) - e);
} else { # Line continuation mode - last line of the value.
c=0; # Turn off line continuation mode.
v= "" v $0;
}
# Make sure the name is a legal shell variable name
gsub(/[^A-Za-z0-9_]/,"_",n);
# Remove newlines from the value.
gsub(/[\n\r]/,"",v);
print n "=\"" v "\"";
n = "";
v = "";
}
END {
print "# END";
}
As you can see, multi-line values make things more complex. To see the values of the properties in shell, just source in the output:
cat myproperties.properties | awk -f readproperties.awk > temp.sh
source temp.sh
The variables will have '_' in the place of '.', so the property some.property will be some_property in shell.
If you have ANT properties files that have property interpolation (e.g. '${foo.bar}') then I recommend using Groovy with AntBuilder.
Here is my wiki page on this very topic.
I wrote a script to solve the problem and put it on my github.
See properties-parser
One option is to write a simple Java program to do it for you - then run the Java program in your script. That might seem silly if you're just reading properties from a single properties file. However, it becomes very useful when you're trying to get a configuration value from something like a Commons Configuration CompositeConfiguration backed by properties files. For a time, we went the route of implementing what we needed in our shell scripts to get the same behavior we were getting from CompositeConfiguration. Then we wisened up and realized we should just let CompositeConfiguration do the work for us! I don't expect this to be a popular answer, but hopefully you find it useful.
If you want to use sed to parse -any- .properties file, you may end up with a quite complex solution, since the format allows line breaks, unquoted strings, unicode, etc: http://en.wikipedia.org/wiki/.properties
One possible workaround would using java itself to preprocess the .properties file into something bash-friendly, then source it. E.g.:
.properties file:
line_a : "ABC"
line_b = Line\
With\
Breaks!
line_c = I'm unquoted :(
would be turned into:
line_a="ABC"
line_b=`echo -e "Line\nWith\nBreaks!"`
line_c="I'm unquoted :("
Of course, that would yield worse performance, but the implementation would be simpler/clearer.
In Perl:
while(<STDIN>) {
($prop,$val)=split(/[=: ]/, $_, 2);
# and do stuff for each prop/val
}
Not tested, and should be more tolerant of leading/trailing spaces, comments etc., but you get the idea. Whether you use Perl (or another language) over sed is really dependent upon what you want to do with the properties once you've parsed them out of the file.
Note that (as highlighted in the comments) Java properties files can have multiple forms of delimiters (although I've not seen anything used in practice other than colons). Hence the split uses a choice of characters to split upon.
Ultimately, you may be better off using the Config::Properties module in Perl, which is built to solve this specific problem.
I have some shell scripts that need to look up some .properties and use them as arguments to programs I didn't write. The heart of the script is a line like this:
dbUrlFile=$(grep database.url.file etc/zocalo.conf | sed -e "s/.*: //" -e "s/#.*//")
Effectively, that's grep for the key and filter out the stuff before the colon and after any hash.
if you want to use "shell", the best tool to parse files and have proper programming control is (g)awk. Use sed only simple substitution.
I have sometimes just sourced the properties file into the bash script. This will lead to environment variables being set in the script with the names and contents from the file. Maybe that is enough for you, too. If you have to do some "real" parsing, this is not the way to go, of course.
Hmm, I just run into the same problem today. This is poor man's solution, admittedly more straightforward than clever;)
decl=`ruby -ne 'puts chomp.sub(/=(.*)/,%q{="\1";}).gsub(".","_")' my.properties`
eval $decl
then, a property 'my.java.prop' can be accessed as $my_java_prop.
This can be done with sed or whatever, but I finally went with ruby for its 'irb' which was handy for experimenting.
It's quite limited (dots should be replaced only before '=',no comment handling), but could be a starting point.
#Daniel, I tried to source it, but Bash didn't like dots in variable names.
I have had some success with
PROPERTIES_FILE=project.properties
function source_property {
local name=$1
eval "$name=\"$(sed -n '/^'"$name"'=/,/^[A-Z]\+_*[A-Z]*=/p' $PROPERTIES_FILE|sed -e 's/^'"$name"'=//g' -e 's/"/\\"/g'|head -n -1)\""
}
source_property 'SOME_PROPERTY'
This is a solution that properly parses quotes and terminates at a space when not given quotes. It is safe: no eval is used.
I use this code in my .bashrc and .zshrc for importing variables from shell scripts:
# Usage: _getvar VARIABLE_NAME [sourcefile...]
# Echos the value that would be assigned to VARIABLE_NAME
_getvar() {
local VAR="$1"
shift
awk -v Q="'" -v QQ='"' -v VAR="$VAR" '
function loc(text) { return index($0, text) }
function unquote(d) { $0 = substr($0, eq+2) d; print substr($0, 1, loc(d)-1) }
{ sub(/^[ \t]+/, ""); eq = loc("=") }
substr($0, 1, eq-1) != VAR { next } # assignment is not for VAR: skip
loc("=" QQ) == eq { unquote(QQ); exit }
loc("=" Q) == eq { unquote( Q); exit }
{ print substr($1, eq + 1); exit }
' "$#"
}
This saves the desired variable name and then shifts the argument array so the rest can be passed as files to awk.
Because it's so hard to call shell variables and refer to quote characters inside awk, I'm defining them as awk variables on the command line. Q is a single quote (apostrophe) character, QQ is a double quote, and VAR is that first argument we saved earlier.
For further convenience, there are two helper functions. The first returns the location of the given text in the current line, and the second prints the content between the first two quotes in the line using quote character d (for "delimiter"). There's a stray d concatenated to the first substr as a safety against multi-line strings (see "Caveats" below).
While I wrote the code for POSIX shell syntax parsing, that appears to only differ from your format by whether there is white space around the asignment. You can add that functionality to the above code by adding sub(/[ \t]*=[ \t]*/, "="); before the sub(…) on awk's line 4 (note: line 1 is blank).
The fourth line strips off leading white space and saves the location of the first equals sign. Please verify that your awk supports \t as tab, this is not guaranteed on ancient UNIX systems.
The substr line compares the text before the equals sign to VAR. If that doesn't match, the line is assigning a different variable, so we skip it and move to the next line.
Now we know we've got the requested variable assignment, so it's just a matter of unraveling the quotes. We do this by searching for the first location of =" (line 6) or =' (line 7) or no quotes (line 8). Each of those lines prints the assigned value.
Caveats: If there is an escaped quote character, we'll return a value truncated to it. Detecting this is a bit nontrivial and I decided not to implement it. There's also a problem of multi-line quotes, which get truncated at the first line break (this is the purpose of the "stray d" mentioned above). Most solutions on this page suffer from these issues.
In order to let Java do the tricky parsing, here's a solution using jrunscript to print the keys and values in a bash read-friendy (key, tab character, value, null character) way:
#!/usr/bin/env bash
jrunscript -e '
p = new java.util.Properties();
p.load(java.lang.System.in);
p.forEach(function(k,v) { out.format("%s\t%s\000", k, v); });
' < /tmp/test.properties \
| while IFS=$'\t' read -d $'\0' -r key value; do
key=${key//./_}
printf -v "$key" %s "$value"
printf '=> %s = "%s"\n' "$key" "$value"
done
I found printf -v in this answer by #david-foerster.
To quote jrunscript: Warning: Nashorn engine is planned to be removed from a future JDK release

Resources