Replacing new lines with comma, but I need file on a line - bash

I have 9 different files, I'm looping over each, and taking the first, second and third line.
This is my code:
if [ -f displayStudentsInfo.txt ];
then
rm displayStudentsInfo.txt
fi
for f in 20*.txt
do
sed -n '1p' "$f" | cut -d' ' -f3 > anyfile.txt
sed -n '2p' "$f" | cut -d' ' -f2 >> anyfile.txt
sed -n '3p' "$f" | cut -d' ' -f2- >> anyfile.txt
sed -E '$!s/\r?$/, /' anyfile.txt | tr -d \\r\\n >> displayStudentsInfo.txt
done
cat displayStudentsInfo.txt
rm anyfile.txt
I have used this command to add each file on a line, but unfortunately, all the files are being added on the same line.
sed -E '$!s/\r?$/, /' anyfile.txt | tr -d \\r\\n
Output:
201664003, 2.8, Mathematics201700128, 3.2, Pharmacy201703451, 2.2, Political Science201759284, 3.4, Marketing201800082, 3.3, Information Technology Management201800461, 2.7, Information Technology Management201800571, 2.7, Information Technology Management201804959, 3.4, Computer Science201806050, 3.5, Computer Science201806715, 3, Computer Science201942365, 3.6, Computer Science

One idea using awk and printf (sans a '\n' so all output is appended to single line):
awk '
{ printf "%s%s", pfx, $0 # print prefix and current line; prefix initially = ""
pfx=", " # set prefix to ", " for subsequent lines
}
END { printf "\m" } # add a linefeed at the end
' display.txt
This generates:
201664003, GPA: 3.6, Major: Computer Science
NOTE: This may not work if, as indicated in other comments, there are some undesirable non-printing characters in the input file.

Seems like you have Windows line endings (CR LF) instead of Linux line endings (just LF).
The whole file is still printed, but because of the CR the console overwrites already printed letters. You can confirm this by looking at the hexdump tr '\n' ', ' < display.txt | hexdump -c.
To fix this, remove the CRs. Also, tr can only replace single letters. To replace a single letter \n with two letters , insert those two letters using sed.
With sed you can also make sure, that , is only inserted between the lines, but not at the end.
sed -E '$!s/\r?$/, /' display.txt | tr -d \\r\\n; echo
The tr also deletes the \n at the end of the file. This breaks the convention that every output/file should end with a linebreak. Therefore, we add that linebreak again by executing echo afterwards.
sed command explained:
$! for every line except the last one
s/.../.../ replace
\r? an optional CR
and the empty string before the end of the line (\n)
with ,

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

Replacing/removing excess white space between columns in a file

I am trying to parse a file with similar contents:
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
I want the out file to be tab delimited:
I am a string\t12831928
I am another string\t41327318
A set of strings\t39842938
Another string\t3242342
I have tried the following:
sed 's/\s+/\t/g' filename > outfile
I have also tried cut, and awk.
Just use awk:
$ awk -F' +' -v OFS='\t' '{sub(/ +$/,""); $1=$1}1' file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Breakdown:
-F' +' # tell awk that input fields (FS) are separated by 2 or more blanks
-v OFS='\t' # tell awk that output fields are separated by tabs
'{sub(/ +$/,""); # remove all trailing blank spaces from the current record (line)
$1=$1} # recompile the current record (line) replacing FSs by OFSs
1' # idiomatic: any true condition invokes the default action of "print"
I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
The difficulty comes in the varying number of words per-line. While you can handle this with awk, a simple script reading each word in a line into an array and then tab-delimiting the last word in each line will work as well:
#!/bin/bash
fn="${1:-/dev/stdin}"
while read -r line || test -n "$line"; do
arr=( $(echo "$line") )
nword=${#arr[#]}
for ((i = 0; i < nword - 1; i++)); do
test "$i" -eq '0' && word="${arr[i]}" || word=" ${arr[i]}"
printf "%s" "$word"
done
printf "\t%s\n" "${arr[i]}"
done < "$fn"
Example Use/Output
(using your input file)
$ bash rfmttab.sh < dat/tabfile.txt
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Each number is tab-delimited from the rest of the string. Look it over and let me know if you have any questions.
sed -E 's/[ ][ ]+/\\t/g' filename > outfile
NOTE: the [ ] is openBracket Space closeBracket
-E for extended regular expression support.
The double brackets [ ][ ]+ is to only substitute tabs for more than 1 consecutive space.
Tested on MacOS and Ubuntu versions of sed.
Your input has spaces at the end of each line, which makes things a little more difficult than without. This sed command would replace the spaces before that last column with a tab:
$ sed 's/[[:blank:]]*\([^[:blank:]]*[[:blank:]]*\)$/\t\1/' infile | cat -A
I am a string^I12831928 $
I am another string^I41327318 $
A set of strings^I39842938 $
Another string^I3242342 $
This matches – anchored at the end of the line – blanks, non-blanks and again blanks, zero or more of each. The last column and the optional blanks after it are captured.
The blanks before the last column are then replaced by a single tab, and the rest stays the same – see output piped to cat -A to show explicit line endings and ^I for tab characters.
If there are no blanks at the end of each line, this simplifies to
sed 's/[[:blank:]]*\([^[:blank:]]*\)$/\t\1/' infile
Notice that some seds, notably BSD sed as found in MacOS, can't use \t for tab in a substitution. In that case, you have to use either '$'\t'' or '"$(printf '\t')"' instead.
another approach, with gnu sed and rev
$ rev file | sed -r 's/ +/\t/1' | rev
You have trailing spaces on each line. So you can do two sed expressions in one go like so:
$ sed -E -e 's/ +$//' -e $'s/ +/\t/' /tmp/file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Note the $'s/ +/\t/': This tells bash to replace \t with an actual tab character prior to invoking sed.
To show that these deletions and \t insertions are in the right place you can do:
$ sed -E -e 's/ +$/X/' -e $'s/ +/Y/' /tmp/file
I am a stringY12831928X
I am another stringY41327318X
A set of stringsY39842938X
Another stringY3242342X
Simple and without invisible semantic characters in the code:
perl -lpe 's/\s+$//; s/\s\s+/\t/' filename
Explanation:
Options:
-l: remove LF during processing (in this case)
-p: loop over records (like awk) and print
-e: code follows
Code:
remove trailing whitespace
change two or more whitespace to tab
Tested on OP data. The trailing spaces are removed for consistency.

modify distribution of data inside a file

I need help with bash in order to modify a file.txt. I have names, each name in a line
for example
Peter
John
Markus
and I need them in the same row and with " before and at the end of each element of the vector.
"Peter" "John" "Markus"
Well, I can insert " when I have all elements in a row but I don't know how to modify the shape...all lines in a row.
array=( Peter John Markus )
number=${#array[#]}
for ((i=0;i<number;i++)); do
array[i]="\"${array[i]}"\"
echo "${array[i]}"
done
With awk
$ awk '{printf "\""$0"\" "} END{print""}' file
"Peter" "John" "Markus"
How it works:
printf "\""$0"\" "
With every new line of input, $0, this prints out a quote, the line itself, a quote and a space.
END{print""}
(optional) After we have read the last line of the file, this prints out a newline.
With sed and tr
$ sed 's/.*/"&"/' file | tr '\n' ' '
"Peter" "John" "Markus"
How it works:
s/.*/"&"/
This puts a quote before and after every line
tr '\n' ' '
This replaces newline characters with spaces so that all names appear on the same line.
With sed alone
$ sed ':a;$!{N;ba};s/^/"/; s/$/"/; s/\n/" "/g' file
"Peter" "John" "Markus"
How it works:
:a;$!{N;ba}
This reads the whole file in to the pattern space.
s/^/"/
This adds a quote at the beginning of the file
s/$/"/
This adds a quote to the end of the file.
s/\n/" "/g
This replaces every newline with the three characters: quote-space-quote.
With bash
To make the bash script in the question print on one line, one can use echo -n in place of echo. In other words, replace:
echo "${array[i]}"
With:
echo -n "${array[i]} "
Quoting all words on one line
From the comments, suppose that our file has all the names on one line and we want to quote each individually. Use:
$ cat file2
Peter John Markus
$ sed -r 's/[[:alnum:]]+/"&"/g' file2
"Peter" "John" "Markus"
The above is for GNU sed. On OSX or other BSD system, try:
sed -E 's/[[:alnum:]]+/"&"/g' file2
Perl to the rescue:
perl -pe 'chomp; $_ = qq("$_" );chop if eof' < input
Explanation:
-p reads the input line by line and prints what's in $_
chomp removes a newline
$_ = qq("$_" ) puts a " before and "<Space> after the string.
chop if eof removes the trailing space.

Remove space between 2 columns and insert commas - bash

I am using:
cut -f1-2 input.txt|sed 1d
The data is outputting like this:
/mnt/Hector/Data/benign/binary/benign-pete/ fd0977d5855d1295bd57383b17981a09
/mnt/Hector/Data/benign/binary/benign-pete/ fd34c32786aadab513f506c30c2cba33
/mnt/Hector/Data/benign/binary/benign-pete/ fe7d03512e0731e40be628524efbf317
I am trying to get it to output without a space like this and insert a comma between the file path and md5 check sum so excel can separate it properly:
/mnt/Hector/Data/benign/binary/benign-pete/,fd0977d5855d1295bd57383b17981a09
/mnt/Hector/Data/benign/binary/benign-pete/,fd34c32786aadab513f506c30c2cba33
/mnt/Hector/Data/benign/binary/benign-pete/,fe7d03512e0731e40be628524efbf317
I didn't see your input.txt, but try this line, do the job in one shot:
awk -v OFS="," 'NR>1{print $1,$2}' input.txt
This can make it:
$ tr -s " " < your_file | sed 's/ /,/g'
/mnt/Hector/Data/benign/binary/benign-pete/,fd0977d5855d1295bd57383b17981a09
/mnt/Hector/Data/benign/binary/benign-pete/,fd34c32786aadab513f506c30c2cba33
/mnt/Hector/Data/benign/binary/benign-pete/,fe7d03512e0731e40be628524efbf317
tr -s " " < your_file removes extra spaces. sed 's/ /,/g' replaces spaces with commas.

Insert with sed n repeated characters

Creating a printout file from a mysql query, I insert a separation line after every TOTAL string with:
sed -i /^TOTAL/i'-------------------------------------------------- ' file.txt
Is there any more elegante way to repeat n "-" characters instead of typing them?
For instance, if I had to simply generate a line without finding/inserting, I would use:
echo -$-{1..50} | tr -d ' '
but don't know how to do something similar with sed into a file.
Thanks!
Just combine the two:
sed -i /^TOTAL/i"$(echo -$___{1..50} | tr -d ' ')" file.txt
With perl, you can repeat a character N times, see :
perl -pe 's/^TOTAL.*/"-"x50 . "\n$&"/e' file.txt
or :
perl -pe 's/^TOTAL.*/sprintf("%s\n%s", "-"x50, $&)/e' file.txt
and you keep a syntax close to sed.
Another way using builtin printf and bash brace expansion :
sed -i "/^TOTAL/i $(printf '%.0s-' {0..50})" file.txt

Resources