How to handle improper Data Coming from CSV in Informatica - informatica-powercenter

I have source file (CSV) and need to load into target (Oracle). But I got an error
FR_3065 ROW[4],Filed [Student_rollnumber]:Invalid Number:[.].The row will be skipped
CSV TABL
Student_rollnumber,Studnet_Name,Marks,Subjects
10,'Revanth',70,"Maths",
11,'Satish',85,Science
12,'Anil',75,"Java
",
13,'Surya',90,"C++",
14,'Ramana',85,"python",
15,'Sudheer'70,"Informatica
",
16,'Prakash',85,"SQL"
I found that in line number 4 the qouts and comma(",) are in the next line how to concat that both ("Java",) And make it single column(Subject)

MatchQuotesPastEndOfLine mentioned by Koushik should work.
Alternatively you may use sed with below pattern to replace newline+" with simply just a " - as a result removing the new line at the end of quoted string.
sed ':a;N;$!ba;s/\n"/"/g'
Feel free to test this gist.
This however will remove just the ending new line and will not help if it's anywhere in the middle. As said, the MatchQuotesPastEndOfLine mentioned by Koushik is the best possible solution.
Above has been based on this question.

Related

Using awk to get lines between two patterns

newbie with awk and trying to write a bash script to use it to print lines between two patterns in a log file and for the life of me I cannot make it work.
I am thinking I need to escape some of the characters.
Here's an example of the section of log I am trying to get lines from:
Processing... AP710 (/var/opt/testsys/rptprint/AP710)
sidjosajdois
sokds3488sds
doskdoskdoskdo
sodk229929
sending entire report to Job Mgr (spool) for user
I want the four lines between the "Processing..." line (first pattern) and the "sending" line (second pattern), and there is only one section of the log that has this above section with both the first pattern line and second pattern line.
I've tried using awk with the following command using a portion of the first pattern, and escaping the "/" characters as needed:
awk '/\/var\/opt\/testsys\/rptprint\/AP710/{flag=1;next}/sending entire report to Job Mgr/{flag=0}flag' log
But it gives me some other different section of the log that also happens to have the path "/var/opt/testsys/rptprint/AP710", so then I tried changing it to have more of the line (first pattern) by adding "Processing..." and it doesn't return anything....
awk '/Processing\.\.\. AP710 \(\/var\/opt\/testsys\/rptprint\/AP710/{flag=1;next}/sending entire report to Job Mgr/{flag=0}flag' log
Can someone give some guidance about awk so I can get the lines between the 2 patterns? After spending a few hours I am going a little bonkers trying to figure it out, I think my being new to awk is causing me to miss something obvious.
Cheers.
Whenever you find yourself escaping characters in a regexp to make them literal, really consider whether or not you should be using a regexp or if instead you should be doing a string comparison. In fact, always start out with a string comparison and switch to regexp if you need to.
$ awk '
$0=="sending entire report to Job Mgr (spool) for user" { inSection=0 }
inSection;
$0=="Processing... AP710 (/var/opt/testsys/rptprint/AP710)" { inSection=1 }
' file
sidjosajdois
sokds3488sds
doskdoskdoskdo
sodk229929

Add <br> to end of each lines in a file via bash

I am trying to add "<br>" to the end of each line in a .log file, and create a HTML file of the results.
I have tried
sed 's/$/<br><br>/' latest.log >> latest.html
After 395 lines, it cuts out. I would just make the .log file a .html file, but the line breaks don't cross over. Sorry if any of this seems weird, I'm fairly new to this.
Well, hard to say bcaus it might be smth wrong with your input file (for example some unwanted white characters).
but you can insert it out the milion ways, the simplest one:
sed 's/.*/&<br><br>/'
do you need to explain it?
I'll just use tags at the beginning of the first line and the ending. Thank you, Walter A.

Replacing Middle Part of String Occurring Multiple Times

I have a file, that has variations of this line multiple times:
source = "git::https://github.com/ORGNAME/REPONAME.git?ref=develop"
I am passing through a tag name in a variable. I want to find every line that starts with source and update that line in the file to be
source = "git::https://github.com/ORGNAME/REPONAME.git?ref=$TAG"
This should be able to be done with awk and sed, but having some difficulty making it work. Any help would be much appreciated!
Best,
Keren
Edit: In this scenario, the it says "develop", but it could also be set to "feature/test1" or "0.0.1" as well.
Edit2: The line with "source" is also indented by three or four spaces.
This should do:
sed 's/^\([[:blank:]]*source.*[?]ref=\)[^"]*\("\)/\1'"$TAG"'\2/' file
with sed
$ sed '/^source/s/ref=develop"$/ref=$TAG"/' file
replace ref=develop" at the end of line with ref=$TAG" for lines starting with source.

Using sed to find and replace recursively

I am using a chef recipe to update a configuration file on my node.The contents of the file look something like follows:
server server1.domain.com
server server2.domain.com
I have a ruby array defined in my attribute file as follows:
default['servers'] = %w(xyz.domain.com abc.domain.com)
I want to use sed recursively to replace the server values in the file, such that my file is updated as such:
server xyz.domain.com
server abc.domain.com
I tried the following ruby loop in my recipe:
(node['servers']).each_with_index do |ntserver,index|
bash "server set" do
code <<-EOH
sed -i 's|server .*|server #{node['servers'].at(index)}|' /etc/ntp.conf
EOH
end
end
But after the chef-client is ran and the changes are applied respectively, the contents of configuration file are as follows:
server abc.domain.com
server abc.domain.com
I am new to sed command so can't figure out where i'm going wrong.
Any help will be appreciated.
By design you should not modify files with Chef. Instead you overwrite the whole file with cookbook_file resource or, if you need to insert some dynamic values into the file, with template resource.
The sed command (the way you use it) is quite simple; it only performs (inplace in the given file due to the -i option) a substitution of each string matching the pattern server .* by the string server #{node['servers'].at(index)}. It does this throughout the whole file, so each loop changes all occurrences in the whole file.
What bothers me is that you write that in the original version you've got server1.domain.com but in the pattern you've got server .* (meaning server, followed by a space , and any amount of other characters .*). Because of the space, this should not match anything, so nothing should be changed at all. But maybe you just put that space in there by mistake when posting your question. I'll assume that there was no such space in your actual code because this way it would fit the observed phenomenon.
So, to change only one line at a time, you should have a counter in your loop and have the number of the iteration in the search pattern, so that it is server1.* for the first iteration, server2.* for the second and so on. Then each iteration will change only exactly one line and you should get your required result.

Inserting new line when joining files in VBScript

I have two text files that I want to combine ..I am using the below code to do that ..the issue is at the start of the second file this code is inserting some weird characters like spaces..Is there a way to insert a new line instead of using writeline.
Set txsOutput = FSO.CreateTextFile(strOutputPath)
Set txsInput = FSO.OpenTextFile(strInputPath,1)
txsOutput.Writeline txsInput.ReadAll
Thanks
.ReadAll() reads the trailing EOL(s) of the file. .Writeline will add a further EOL. Use .Write instead to get an exact copy of the first input file as the head of the output file.
If the "weird characters like spaces" are - unwanted - parts of the first file, you'll have to use string ops (Instr, Left, Replace, ...) or a RegExp to clean the data.
If they come from the second file (assuming you used .ReadAll for that too), you should check the encoding of that file and/or clean the data using the methods above.

Resources