Get Field Separator from a File - ksh

I have several files and they are different between each other and they use ; and | as separators.
Is there a way to let my program read the file get the ; or | separator to save into a variable and use it later on in my script?

Generally speaking the answer is 'yes' you can dynamically determine the delimiter and use it in later coding.
You haven't mentioned how you'll determine which delimiter is in use (eg, what happens if both characters exist in your data file?) so for the sake of discussion we'll assume the delimiter has been determined and stored in the del variable.
You also haven't stated how you'll use this delimiter in your code so, again for sake of discussion, we'll look at examples using cut and awk.
Let's assume our datafile (mydata) contains the following single line of data:
$ cat mydata
abc;def|ghi;jkl|mno;pqr|stu
We'll now switch our delimiter between ; and | and look at some simple cut and awk examples ...
############################
# set our delimiter to ';'
$ del=";"
##### display 1st field
$ cut -d"${del}" -f1 mydata
abc
$ awk -F"${del}" '{print $1}' mydata
abc
#### display 4th field
$ cut -d"${del}" -f4 mydata
pqr|stu
$ awk -F"${del}" '{print $4}' mydata
pqr|stu
############################
# set our delimiter to '|'
#
$ del="|"
##### display 1st field
$ cut -d"${del}" -f1 mydata
abc;def
$ awk -F"${del}" '{print $1}' mydata
abc;def
##### display 4th field
$ cut -d"${del}" -f4 mydata
stu
$ awk -F"${del}" '{print $4}' mydata
stu

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

how to use cut command -f flag as reverse

This is a text file called a.txt
ok.google.com
abc.google.com
I want to select every subdomain separately
cat a.txt | cut -d "." -f1 (it select ok From left side)
cat a.txt | cut -d "." -f2 (it select google from left side)
Is there any way, so I can get result from right side
cat a.txt | cut (so it can select com From right side)
There could be few ways to do this, one way which I could think of right now could be using rev + cut + rev solution. Which will reverse the input by rev command and then set field separator as . and print fields as per they are from left to right(but actually they are reversed because of the use of rev), then pass this output to rev again to get it in its actual order.
rev Input_file | cut -d'.' -f 1 | rev
You can use awk to print the last field:
awk -F. '{print $NF}' a.txt
-F. sets the record separator to "."
$NF is the last field
And you can give your file directly as an argument, so you can avoid the famous "Useless use of cat"
For other fields, but counting from the last, you can use expressions as suggested in the comment by #sundeep or described in the users's guide under
4.3 Nonconstant Field Numbers. For example, to get the domain, before the TLD, you can substract 1 from the Number of Fields NF :
awk -F. '{ print $(NF-1) }' a.txt
You might use sed with a quantifier for the grouped value repeated till the end of the string.
( Start group
\.[^[:space:].]+ Match 1 dot and 1+ occurrences of any char except a space or dot
){1} Close the group followed by a quantifier
$ End of string
Example
sed -E 's/(\.[^[:space:].]+){1}$//' file
Output
ok.google
abc.google
If the quantifier is {2} the output will be
ok
abc
Depending on what you want to do after getting the values then you could use bash for splitting your domain into an array of its components:
#!/bin/bash
IFS=. read -ra comps <<< "ok.google.com"
echo "${comps[-2]}"
# or for bash < 4.2
echo "${comps[${#comps[#]}-2]}"
google

how to add zero to single digit date in a date field in shell

I have data like the below
1213231421312131|USER|21121|1231412|XEM|NAME|NAME|||5072020|2313||||NY|2131|||99|E|||ver.01
6454242352352352|USER|13131|7342422|XEM|NAME|NAME|||13032001|1231||||TX|7312|||11|E|||ver.01
5131242515111233|USER|21212|2314413|XEM|NAME|NAME|||2101979|1231||||TX|7312|||11|E|||ver.01
2341313412341123|USER|62422|1124242|XEM|NAME|NAME|||23111979|1231||||TX|7312|||11|E|||ver.01
I need data as below
1213231421312131|USER|21121|1231412|XEM|NAME|NAME|||05072020|2313||||NY|2131|||99|E|||ver.01
6454242352352352|USER|13131|7342422|XEM|NAME|NAME|||13032001|1231||||TX|7312|||11|E|||ver.01
5131242515111233|USER|21212|2314413|XEM|NAME|NAME|||02101979|1231||||TX|7312|||11|E|||ver.01
2341313412341123|USER|62422|1124242|XEM|NAME|NAME|||23111979|1231||||TX|7312|||11|E|||ver.01
So filed 10 is a date column, i would like to add zero to date... 7 digit to 8 digit. I have used the below command, but that command replacing Pipe symbol to space.
awk -F "|" '{$10 = sprintf("%08d", $10); print}' <fileName>
Please help me with this request
thank you
Yum
awk has an input field separator and an output field separator. The print command uses the latter. So to keep the | symbols, set the output field separator OFS too:
awk -F\| -v OFS=\| '{$10 = sprintf("%08d", $10); print}' yourFile
Or with numfmt from GNU coreutils (preinstalled on most Linux systems)
numfmt -d\| --field 10 --format %08f < yourFile
You can set the input (FS) and output (OFS) separator within a BEGIN clause:
awk 'BEGIN{FS=OFS="|"} {$10 = sprintf("%08d", $10); print}' <filename>
IMHO, it is the most convenient way to define identical input and output field separators other than space.
See the online demo:
s='1213231421312131|USER|21121|1231412|XEM|NAME|NAME|||5072020|2313||||NY|2131|||99|E|||ver.01
6454242352352352|USER|13131|7342422|XEM|NAME|NAME|||13032001|1231||||TX|7312|||11|E|||ver.01
5131242515111233|USER|21212|2314413|XEM|NAME|NAME|||2101979|1231||||TX|7312|||11|E|||ver.01
2341313412341123|USER|62422|1124242|XEM|NAME|NAME|||23111979|1231||||TX|7312|||11|E|||ver.01'
awk 'BEGIN{FS=OFS="|"} {$10 = sprintf("%08d", $10); print}' <<< "$s"
Output:
1213231421312131|USER|21121|1231412|XEM|NAME|NAME|||05072020|2313||||NY|2131|||99|E|||ver.01
6454242352352352|USER|13131|7342422|XEM|NAME|NAME|||13032001|1231||||TX|7312|||11|E|||ver.01
5131242515111233|USER|21212|2314413|XEM|NAME|NAME|||02101979|1231||||TX|7312|||11|E|||ver.01
2341313412341123|USER|62422|1124242|XEM|NAME|NAME|||23111979|1231||||TX|7312|||11|E|||ver.01
You should accept the awk solution before you get answers like
sed -r 's/(([^|]*\|){9})([^|]{7}\|)/\10\3/' filename
or worse
paste -d "|" <(cut -d"|" -f1-9 filename) \
<(cut -d"|" -f10 filename |sed -r 's/^.{7}$/0&/') \
<(cut -d"|" -f11- filename)

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

split a file into segments?

I have a file containing text data which are separated by semicolon ";". I want to separate the data , in other words split where ; occurs and write the data to an output file. Is there any way to do with bash script?
You most likely want awk with the FS (field separator variable) set to ';'.
Awk is the tool of choice for column-based data (some prefer Perl, but not me).
echo '1;2;3;4;5
6;7;8;9;10' | awk -F\; '{print $3" "$5}'
outputs:
3 5
8 10
If you just want to turn semicolons into newlines:
echo '1;2;3;4;5
6;7;8;9;10' | sed 's/;/\n/g'
outputs the numbers 1 through 10 on separate lines.
Obviously those commands are just using my test data. If you want to use them on your own file, use something like:
sed 's/;/\n/g' <input_file >output_file
#!/bin/bash
while read -d ';' ITEM; do
echo "$ITEM"
done
Try:
cat original_file.txt | cut -d";" -f1 > new_file.txt
This will split each line in fields delimited by ";" and select the first field (-f1).
You can access other fields with -f1, -f2, ... or multiple fields with -f1-2, -f2-.
You can translate a character to another character by the 'tr' command.
cat input.txt | tr ';' '\n' > output.txt
Where \n is new line and if you want a tab only you should replace it with \t

Resources