AIX script for file information - shell

I have a file, in AIX server, with multiple record entries in below format
Name(ABC XYZ) Gender(Male)
AGE(26) BDay(1990-12-09)
My problem is I want to extract the name and the b'day from the file for all the records. I am trying to list it like below:
ABC XYZ 1990-12-09
Can someone please help me with the scripting

Something like this maybe:
awk -F"[()]" '/Name/ && /Gender/{name=$2} /BDay/{print name,$4}' file.txt
That says... "treat opening and closing parentheses as field separators. If you see a line go by that contains Name and Gender, save the second field in the variable name. If you see a line go by that contains the word Bday, print out the last name you saw and also the fourth field on the current line."

Related

Populate a value in a particular column in csv

I have a folder where there are 50 excel sheets in CSV format. I have to populate a particular value say "XYZ" in the column I of all the sheets in that folder.
I am new to unix and have looked for a couple of pages Here and Here . Can anyone please provide me the sample script to begin with?
For example :
Let's say column C in this case:
A B C
ASFD 2535
BDFG 64486
DFGC 336846
I want to update column C to value "XYZ".
Thanks.
I would export those files into csv format
- with semikolon as field separator
- eventually by leaving out column descriptions (otherwise see comment below)
Then the following combination of SHELL and SED script could more or less do already the trick
#! /bin/sh
for i in *.csv
do
sed -i -e "s/$/;XZY/" $i
done
-i means to edit the file in place, here you could append the value to all lines
-e specifies the regular expresssion for substitution
You might want to use a similar script like this, to rename "XYZ" to "C" only in the 1st line if the csv files should contain also column descriptions.

remove/ replace unprintable characters from txt file using shell script

I am trying to remove a newline characters from with in quotes in file
I am able to achieve that using the below code
awk -F"\"" '!length($NF){print;next}{printf("%s ", $0)}' filename.txt>filenamenew.txt
Note I am creating a new file filenamenew.txt is this avoidable can i do the command in place the reason I ask is because files are huge.
my file is pipe delimited
sample input file
"id"|"name"
"1"|"john
doe"
"2"|"second
name
in the list"
using the above code I get the following output
"id"|"name"
"1"|"john doe"
"2"|"second name in the list"
but I have a huge files and i see in some of the lines have ^M character in between quotes example
second sample input file
"id"|"name"
"1"|"john
doe"
"^M2"|"second^M^M
name
in the list"
o/p using above code
"id"|"name"
"1"|"john doe"
name in the list"
so basically if there is a ^M in the line that string is not being printed but i read online ^M is equal to \r so i used
tr -d'\r'< filename.txt
I also tried
awk-F"|"{sub(/^M/,"")}1
but it did not remove those characters (^M)
A little background on why i am doing this
I am extracting data from a relational table and loading into flat file and checking if the counts between table and file matched but since there is \n in columns count(*) vs wc-l in file is not matching.
final resolution:
i don't want to delete these unprintable characters in the long run but want to replace it with some character or value(so that counts between table and file matches) and then when i am loading it back to a table i want to again replace the value that i have added effectively as a place holder with \n or ^M what was originally present so that there is no tampering of data from my side.
Any suggestions is appreciated.
thanks.

How to get date and string separately in a given file name using shell script

Hi I am trying to get the date and string separately from the given file name but not getting exact idea how to do it.
This is the file name "95FILRDF01PUBLI20170823XEURC0V41000.XML"
I want to extract date "20170823" and string "XEUR" from this file name.
I was going through lots of posts in Stackexchange/Stackoverflow, but didn't understand the regular expression they are using.
https://unix.stackexchange.com/questions/182563/how-to-extract-a-part-of-file-name-in-unix-linux-shell-script
Extract part of filename
To extract date and name:
$ name="95FILRDF01PUBLI20170823XEURC0V41000.XML"
$ echo "$name" | sed -E 's/.*([[:digit:]]{8})([[:alpha:]]{4}).*/date=\1 name=\2/'
date=20170823 name=XEUR
The key part of the regex is ([[:digit:]]{8})([[:alpha:]]{4}). The first part of that, ([[:digit:]]{8}) matches 8 digits and saves them as group 1. The second part of that, ([[:alpha:]]{4}) matches four letters that follow the date and saves them as group 2.
The key part is surrounded by .* before and .* after which matches whatever is left over.
The replacement text is date=\1 name=\2 which formats the output.

Create CSV from specific columns in another CSV using shell scripting

I have a CSV file with several thousand lines, and I need to take some of the columns in that file to create another CSV file to use for import to a database.
I'm not in shape with shell scripting anymore, is there anyone who can help with pointing me in the correct direction?
I have a bash script to read the source file but when I try to print the columns I want to a new file it just doesn't work.
while IFS=, read symbol tr_ven tr_date sec_type sec_name name
do
echo "$name,$name,$symbol" >> output.csv
done < test.csv
Above is the code I have. Out of the 6 columns in the original file, I want to build a CSV with "column6, column6, collumn1"
The test CSV file is like this:
Symbol,Trading Venue,Trading Date,Security Type,Security Name,Company Name
AAAIF,Grey Market,22/01/2015,Fund,,Alternative Investment Trust
AAALF,Grey Market,22/01/2015,Ordinary Shares,,Aareal Bank AG
AAARF,Grey Market,22/01/2015,Ordinary Shares,,Aluar Aluminio Argentino S.A.I.C.
What am I doing wrong with my script? Or, is there an easier - and faster - way of doing this?
Edit
These are the real headers:
Symbol,US Trading Venue,Trading Date,OTC Tier,Caveat Emptor,Security Type,Security Class,Security Name,REG_SHO,Rule_3210,Country of Domicile,Company Name
I'm trying to get the last column, which is number 12, but it always comes up empty.
The snippet looks and works fine to me, maybe you have some weird characters in the file or it is coming from a DOS environment (use dos2unix to "clean" it!). Also, you can make use of read -r to prevent strange behaviours with backslashes.
But let's see how can awk solve this even faster:
awk 'BEGIN{FS=OFS=","} {print $6,$6,$1}' test.csv >> output.csv
Explanation
BEGIN{FS=OFS=","} this sets the input and output field separators to the comma. Alternatively, you can say -F=",", -F, or pass it as a variable with -v FS=",". The same applies for OFS.
{print $6,$6,$1} prints the 6th field twice and then the 1st one. Note that using print, every comma-separated parameter that you give will be printed with the OFS that was previously set. Here, with a comma.

Remove all lines from a given text file based on a given list of IDs

I have a list of IDs like so:
11002
10995
48981
And a tab delimited file like so:
11002 Bacteria;
10995 Metazoa
I am trying to delete all lines in the tab delimited file containing one of the IDs from the ID list file. For some reason the following won't work and just returns the same complete tab delimited file without any line removed whatsoever:
grep -v -f ID_file.txt tabdelimited_file.txt > New_tabdelimited_file.txt
I also tried numerous other combinations with grep, but currently I draw blank here.
Any idea why this is failing?
Any help would be greatly appreciated
Since you tagged this with awk, here is one way of doing it:
awk 'BEGIN{FS=OFS="\t"}NR==FNR{ids[$1]++;next}!($1 in ids)' idFile tabFile > new_tabFile
BTW your grep command is correct. Just double check if your file is not formatted for windows.

Resources