how to extract column by removing delimiter in bash script - bash

Having a huge file in which columns are separated by |~| delimiter.
How to extract required number of columns using shell command ?
Lets say File looks like :
column1|~|column2|~|column3|~|column4|~|column5|~|column6|~|column7
and we want to extract column 4 and 5

awk -F "(|~|)" '{ print $4,$5 }' file
Set the field delimiter as "|~|" with -F and then print the 4th and 5th fields ($4,$5)

In plain bash:
#!/bin/bash
while IFS= read -r line; do
readarray -t fields <<< "${line//'|~|'/$'\n'}"
printf '%s %s\n' "${fields[3]}" "${fields[4]}"
done < file
or, with awk
awk -F '\\|~\\|' '{ print $4, $5 }' file
or, with GNU sed:
sed -E 's/\|~\|/\n/g; s/([^\n]*\n){3}(([^\n]*\n){2}).*/\2/; s/\n/ /g' file

Related

how to change words with the same words but with number at the back bash

I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2

xargs and cut: getting `cut` fields of a csv to bash variable

I am using xargs in conjuction with cut but I am unsure how to get the output of cut to a variable which I can pipe to use for further processing.
So, I have a text file like so:
test.txt:
/some/path/to/dir,filename.jpg
/some/path/to/dir2,filename2.jpg
...
I do this:
cat test.txt | xargs -L1 | cut -d, -f 1,2
/some/path/to/dir,filename.jpg
but what Id like to do is:
cat test.txt | xargs -L1 | cut -d, -f 1,2 | echo $1 $2
where $1 and $2 are /some/path/to/dir and filename.jpg
I am stumped that I cannot seem to able to achieve this..
You may want to say something like:
#!/bin/bash
while IFS=, read -r f1 f2; do
echo ./mypgm -i "$f1" -o "$f2"
done < test.txt
IFS=, read -r f1 f2 reads a line from test.txt one by one,
splits the line on a comma, then assigns the variables f1 and f2
to the fields.
The line echo .. is for the demonstration purpose. Replace the
line with your desired command using $f1 and $f2.
Try this:
cat test.txt | awk -F, '{print $1, $2}'
From man xargs:
xargs [-L number] [utility [argument ...]]
-L number
Call utility for every number non-empty lines read.
From man awk:
Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as -f progfile.
So you don't have to use xargs -L1 as you don't pass the utility to call.
Also from man awk:
The -F fs option defines the input field separator to be the regular expression fs.
So awk -F, can replace the cut -d, part.
The fields are denoted $1, $2, ..., while $0 refers to the entire line.
So $1 is for the first column, $2 is for the second one.
An action is a sequence of statements. A statement can be one of the following:
print [ expression-list ] [ > expression ]
An empty expression-list stands for $0.
The print statement prints its argument on the standard output (or on a file if > file or >> file is present or on a pipe if | cmd is present), separated by the current output field separator, and terminated by the output record separator.
Put all these together, cat test.txt | awk -F, '{print $1, $2}' would achieve that you want.

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

Search in CSV file and split each matching line using command-line tools

I'm using the following command:
grep -F "searchterm" source.csv >> output.csv
to search for matching terms in source.csv. Each line in the source file is like so:
value1,value2,value3|value4,value5
How do I insert only the fields value1,value2,value3 into the output file?
You can simply use awk which will go through line by line and then you apply the separator and get the part you would like to take from the string .
awk -F"|" '{print $1}' input.csv > output.csv
You can do it with a simple while read loop:
while read -r line; do echo ${line%|*}; done < file.csv >> newfile.csv
or in a subshell, so you truncate the newfile each time:
( while read -r line; do echo ${line%|*}; done < file.csv ) > newfile.csv
or with sed:
sed -e 's/[|].*$//' file.csv > newfile.csv
This perl solution is similar to the awk solution:
perl -F'\|' -lane 'print $F[0]' input.csv > output.csv
The | field separator character needs to be escaped with a \
-a puts perl into autosplit mode, which populates the fields array F

printing results in one line separated by commas in bash

How can I print all text file location separated by commas in one line? Can I do this in for loop?
Here is an example of files.
/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt
/data/home/files/txt_files_2/file3.txt
output would look like
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Thanks
Here is the correct code
#!/bin/bash
delim=""
for i in /data/home/files/txt_files_1/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
printf "\\"
printf "\n"
for i in /data/home/files/txt_files_2/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
For single file input:
awk -v OFS=, -v RS= 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Or
awk -v OFS=, -v RS= -v ORS='\n\n' 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
You can use printf "%s," "$file" to print several names into a single line. To get the delimiters right, I use this trick:
delim=""
...loop...
printf "%s%s" "$delim" "$file"
delim=","
printf "\n"
<command to generate lines of paths> | tr '\n' ','
example:
echo "/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt" | tr '\n' ','
outputs:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt,,/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt
Assuming your input is in a file called list, this Perl one-liner does the job:
perl -F'\n' -00 -ane 'push #a, join(",", #F) }{ print(join(" \\\n\n", #a), "\n")' list
explanation
-00, in combination with -n, reads the file one block (paragraph) at a time.
The -a switch in combination with -F'\n' auto-splits the text on each new line. The result goes into the array #F.
An array is built, each element containing the comma separated list of the elements in #F
Once the file has been processed, all the elements of the array #a are printed, joined together as you specified. The additional "\n" on the end is optional.
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt

Resources