How to remove invisible new line characters in Perl - shell

I am writing a shell script in perl that takes values from two databases and compares them. When the script is finished it outputs a report that is supposed to be formatted this way:
Table Name Date value from Database 1 value from Database 2 Difference
The output is printed into a report file, but even when it is output to the command console it looks like this:
tablename 2017-06-20 7629628
7629628
0
Here's my code that makes the string then outputs it to the file:
$outputstring="$tablelist[0] $DATErowcount0[$output_iteration] $rowcount0[$output_iteration] $TDrowcount0[$output_iteration] $count_dif\n";
print FILE $outputstring;
There seems to be a newline character hidden after $rowcount0[$output_iteration] and before $count_dif. What do I need to do to fix this/print it all in one line?
To fill the arrays with values, values are read from files created by SQL commands.
Here's some of the code:
`$num_from_TDfile=substr $r2, 16;
$date_from_TDfile = substr $r2, 0, 12;
$TDrowcount0[$TDnum_rows0]=$num_from_TDfile;
$DATETDrowcount0[$TDnum_rows0]=$date_from_TDfile;
$TDnum_rows0=$TDnum_rows0+1;`

Adding the chomp to each of the strings taken from the files as suggested by tadman fixed the output so that it was all on one line rather than three lines as in the question's example.

Related

Error in using datamash in Bash script on txt file with multiple rows and columns

I have a .txt file containing ~1 million rows and 11 columns. Only the first column has alphabetical values while the rest 2-11 columns have numeric values. An example given below:
Image of the Input Table
I used datamash to group column 1 by taking the mean of each value in col 2 to col 11 columnwise. So the result must look like this
Output Table
I tried this code in bash script:
datamash --sort --whitespace --headers groupby 1 mean 2-11 < infilename.txt > outfilename.txt
It is giving me this error:
'atamash: invalid numeric value in line 2 field 11: '3.510344247
I have checked multiple times that there is no single quotation in my data at line 2 field 11..I even remove all the special characters and only kept the alphanumeric values and whitespace as delimiter, then again ran the same command and getting the same error. It is only targeting the column 11 becoz when I run the above command from 2-10 it is giving me the correct result.
What is it that i am doing wrong? Any help would be appreciated.
I searched for the same error in the internet and got the solution that the error 'atamash: invalid numeric value in is becoz i have to convert the dos .txt file into unix file. So using dos2unix command i converted the file the file and run the above code. Its worked! So i am posting this comment for future consideration.

Slight error when using awk to remove spaces from a CSV column

I have used the following awk command on my bash script to delete spaces on the 26th column of my CSV;
awk 'BEGIN{FS=OFS="|"} {gsub(/ /,"",$26)}1' original.csv > final.csv
Out of 400 rows, I have about 5 random rows that this doesn't work on even if I rerun the script on final.csv. Can anyone assist me with a method to take care of this? Thank you in advance.
EDIT: Here is a sample of the 26th column on original.csv vs final.csv respectively;
2212026837 2212026837
2256 41688 6 2256416886
2076113566 2076113566
2009 84517 7 2009845177
2067950476 2067950476
2057 90531 5 2057 90531 5
2085271676 2085271676
2095183426 2095183426
2347366235 2347366235
2200160434 2200160434
2229359595 2229359595
2045373466 2045373466
2053849895 2053849895
2300 81552 3 2300 81552 3
I see two possibilities.
The simplest is that you have some whitespace other than a space. You can fix that by using a more general regex in your gsub: instead of / /, use /[[:space:]]/.
If that solves your problem, great! You got lucky, move on. :)
The other possible problem is trickier. The CSV (or, in this case, pipe-SV) format is not as simple as it appears, since you can have quoted delimiters inside fields. This, for instance, is a perfectly valid 4-field line in a pipe-delimited file:
field 1|"field 2 contains some |pipe| characters"|field 3|field 4
If the first 4 fields on a line in your file looked like that, your gsub on $26 would actually operate on $24 instead, leaving $26 alone. If you have data like that, the only real solution is to use a scripting language with an actual CSV parsing library. Perl has Text::CSV, but it's not installed by default; Python's csv module is, so you could use a program like so:
import csv, fileinput as fi, re;
for row in csv.reader(fi.input(), delimiter='|'):
row[25] = re.sub(r'\s+', '', row[25]) # fields start at 0 instead of 1
print '|'.join(row)
Save the above in a file like colfixer.py and run it with python colfixer.py original.csv >final.csv.
(If you tried hard enough, you could get that shoved into a -c option string and run it from the command line without creating a script file, but Python's not really built for that and it gets ugly fast.)
You can use the string function split, and iterate over the corresponding array to reassign the 26th field:
awk 'BEGIN{FS=OFS="|"} {
n = split($26, a, /[[:space:]]+/)
$26=a[1]
for(i=2; i<=n; i++)
$26=$26""a[i]
}1' original.csv > final.csv

Delete lines in a file based on first row

I try to work on a whole series of txt files (actually .out, but behaves like a space delimited txt file). I want to delete certain lines in the text, based on the output compared to the first row.
So for example:
ID VAR1 VAR2
1 8 9
2 4 1
3 3 2
I want to delete all the lines with VAR1 < 0,5.
I found a way to do this manually in excel, but with 350+ files, this is going to be a long night, there are sure ways to do this more effective.. I worked on this set of files already in terminal (OSX).
This is a typical job for awk, the venerable language for file manipulation.
What awk does is match each line in a file to a condition, and provide an action for it. It also allows for easy elementary parsing of line columns. In this case, you want to test whether the second column is less than 0.5, and if so not print that line. Otherwise, print the line (in effect this removes lines for which the variable is less than 0.5.
Your variable is in column 2, which in awk is referred to as $2. Each full line is referred to by the variable $0.
So you would do something like this:
{ if ($2 < 0.5) {
}
else {
print $0
}
}
Or something like that, I haven't used awk for a while. The above code is an awk script. Apply it on your file, and redirect the output to a new file (which will have all the lines not satisfying the condition removed).

characters of each row of csv file to be passed to an argument in shell script

I have a csv file with 1 column and 150 rows. I want to pass the content of the csv file to a shell script argument like
a=(1 2 3)
I have used the code a="$(<files.csv)",
but it takes all the characters in the csv file as one character, but not characters of each row as different character of "a". Please let me know how to do this.
Thanks.
Since a=(1 2 3) creates an array in bash (in some other shells it's just a syntax error), I assume you are trying to create an array. If that is the case, try:
a=($(cat files.csv))
That is, just replace your double quotes with surrounding parentheses.

What changes when a file is saved in Kedit for windows that the unix2dos command doesn't do?

So I have a strange question. I have written a script that re-formats data files. I basically create new files with the right column order, spacing, and such. I then unix2dos these files (the program I am formatting these files for is DIPS for windows, and I assume that the files should be ansi). When I go to open the files in the DIPS Program however an error occurs and the file won't open.
When I create the same kind of data file through the DIPS program and open it in note pad, it matches exactly with the data files I have created with my script.
On the other hand if I open the data files that I have created with my script in Kedit first, save them, and then open them in the DIPS program everything works.
My question is what could saving in Kedit possibly do that unix2dos does not?
(Also if I try using note pad or word pad to save instead of Kedit the file doesn't open in DIPS)
Here is what was created using the diff command in unix
"
1,16c1,16
* This file is generated by Dips for Windows.
* The following 2 lines are the Title of this file.
Cobre Panama
Drill Hole B11106-GT
Number of Traverses: 0
Global Orientation is:
DIP/DIPDIRECTION
0.000000 (Declination)
NO QUANTITY
Number of extra columns are: 0
--
* This file is generated by Dips for Windows.
* The following 2 lines are the Title of this file.
Cobre Panama
Drill Hole B11106-GT
Number of Traverses: 0
Global Orientation is:
DIP/DIPDIRECTION
0.000000 (Declination)
NO QUANTITY
Number of extra columns are: 0
18c18
--
440c440
--
442c442
-1
-1
"
Any help would be appreciated! Thanks!
Okay! Figured it out.
Simply when you unix2dos your file you do not strip any space characters in between the last letter in a line and the line break character. When saving in Kedit you do strip the spaces between the last letter in a line and the line break character.
In my script I had a poor programing practice in which I was writing a string like this;
echo "This is an example string " >> outfile.txt
The character count is 32, and if you could see the break line character (chr(10)) the line would read;
This is an example string
If you unix2dos outfile.txt the line looks the same as above but with a different break line character. However when you place the file into Kedit and save it, now the character count is 25 and the line looks like this;
This is an example string
This occurs because Kedit does not preserve spaces at the end of a line. It places the return or line break character at the last letter or "non space" character in a line.
So programs that read literal input like DIPS (i'm guessing) or more widely used AutoCAD scripting will have a real problem with extra spaces before the return character. Basically in AutoCAD scripting a space in a line is treated as a return character. So if you have ten extra spaces at the end of a line it's treated the same as ten returns instead of the one you probably intended.
OH and if this helped you out or though it was good please give me a vote up!
unix2dos converts the line-break characters at the end of each line, from unix line breaks (10) to dos line breaks (13, 10)
Kedit could possible change the encoding of the file (like from ansi to UTF-8)
You can change the encoding of a file with the iconv utility (on a linux box)

Resources