xargs and cut: getting `cut` fields of a csv to bash variable - bash

I am using xargs in conjuction with cut but I am unsure how to get the output of cut to a variable which I can pipe to use for further processing.
So, I have a text file like so:
test.txt:
/some/path/to/dir,filename.jpg
/some/path/to/dir2,filename2.jpg
...
I do this:
cat test.txt | xargs -L1 | cut -d, -f 1,2
/some/path/to/dir,filename.jpg
but what Id like to do is:
cat test.txt | xargs -L1 | cut -d, -f 1,2 | echo $1 $2
where $1 and $2 are /some/path/to/dir and filename.jpg
I am stumped that I cannot seem to able to achieve this..

You may want to say something like:
#!/bin/bash
while IFS=, read -r f1 f2; do
echo ./mypgm -i "$f1" -o "$f2"
done < test.txt
IFS=, read -r f1 f2 reads a line from test.txt one by one,
splits the line on a comma, then assigns the variables f1 and f2
to the fields.
The line echo .. is for the demonstration purpose. Replace the
line with your desired command using $f1 and $f2.

Try this:
cat test.txt | awk -F, '{print $1, $2}'
From man xargs:
xargs [-L number] [utility [argument ...]]
-L number
Call utility for every number non-empty lines read.
From man awk:
Awk scans each input file for lines that match any of a set of patterns specified literally in prog or in one or more files specified as -f progfile.
So you don't have to use xargs -L1 as you don't pass the utility to call.
Also from man awk:
The -F fs option defines the input field separator to be the regular expression fs.
So awk -F, can replace the cut -d, part.
The fields are denoted $1, $2, ..., while $0 refers to the entire line.
So $1 is for the first column, $2 is for the second one.
An action is a sequence of statements. A statement can be one of the following:
print [ expression-list ] [ > expression ]
An empty expression-list stands for $0.
The print statement prints its argument on the standard output (or on a file if > file or >> file is present or on a pipe if | cmd is present), separated by the current output field separator, and terminated by the output record separator.
Put all these together, cat test.txt | awk -F, '{print $1, $2}' would achieve that you want.

Related

Print line based on 2nd field value, without using a loop

I try to retrieve a line from a file without using a loop.
myFile.txt
val1;a;b;c
val2;b;d;e
val3;c;r;f
I would like to get the line where the second column is b.
If I do grep "b" myFile.txt then both first and second line will be outputed.
If I do cat myFile.txt | cut -d ';' -f2 | grep "b" then the output will just be b whereas I'd like to get the full line val2;b;d;e.
Is there a way of reaching the desired results without using a loop as below ? My file being huge it wouldn't be nice looping through it again and again.
while read line; do
if [ `echo $line | cut -d ';' -f2` = "b" ]; then
echo $line
fi
done < myFile.txt
Given your input file, The below one-liner should work:
awk -F";" '$2 == "b" {print}' myFile.txt
Explanation:
awk -F";" ##Field Separator as ";"
'$2 == "b" ##Searches for "b" in the second column($2)
{print}' ##prints the searched line
Using:
grep:
grep '^[^;]*;b;' myFile.txt
sed:
sed '/^[^;]*;b;/!d' myFile.txt
Output is the same for both:
val2;b;d;e

How can I print the first matched line using sed or grep?

I have a config file where each line is in a format say UniqueOption = SomeValue:
$ cat somefile
option1sub1 = yes
option1sub2 = 1234
...
option1subn = xxxx
option2 = 2345
option3 = no
...
I want to deal with each value of "option1" in a loop. but, sed or grep give me all of option1 in one time.
How could I achieve that using sed or grep, getting a single option1 line at a time?
pipe the output of grep to a while loop:
grep 'option1' somefile | while read line
do
echo "single option is in var $line"
done
Solution 1st: Following awk may help you on same to get the value of option1 string's last value.
awk -F" = " '/^option1/{print $NF}' Input_file
Solution 2nd: Above will print all values of string option1 in case you need only very first value of string option1 then use following.
awk -F" = " '/^option1/{print $NF;exit}' Input_file
The following will parse out all sub-options for option1 in the file file.conf and save them in a bash array. The options are then easily accessed from that array.
#!/bin/bash
while IFS= read -r data; do
opt1+=( "$data" )
done < <( awk -F ' *= *' '$1 ~ /^option1/ { print $2 }' file.conf )
printf 'Option 1, sub-option 1 is "%s"\n' "${opt1[0]}"
Output:
Option 1, sub-option 1 is "yes"
The awk script will return everything after the = (and any spaces), which allows you to store data that contains multiple words. Only the lines starting with option1 in the configuration file are processed.
This would be adapted to parse the whole configuration file into a single structure, possibly using an associative array in a sufficiently recent version of bash.
Already we can see few awesome answers but as you asked something with grep, you can use one of the following if you want.
For all values
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1'
For first value
grep option1 m | cut -d "=" -f2 | awk '{$1=$1};1' | head -1
Here: cut is used to cut the second option uisng dilimiter =; awk is used to trim the spaces in output and head is used to print first occurrence
With sed
sed '/^option1.* = /!d;s///' somefile
With gnu grep 2.20 (support of pcre)
grep -oP '^option1.* = \K.*' somefile
If you want to get only the first match
sed '/^option1.* = /!d;s///;q' somefile
grep -m1 -oP '^option1.* = \K.*' somefile

One line command with variable, word count and zcat

I have many files on a server which contains many lines:
201701010530.contentState.csv.gz
201701020530.contentState.csv.gz
201701030530.contentState.csv.gz
201701040530.contentState.csv.gz
I would like with one line command this result:
170033|20170101
169865|20170102
170010|20170103
170715|20170104
The goal is to have the number of lines of each file, just by keeping the date which is already in the filename of the file.
I tried this but the result is not in one line but two...
for f in $(ls -1 2017*gz);do zcat $f | wc -l;echo $f | awk '{print substr($0,1,8)}';done
Thanks in advance guys.
Just use zcat file | wc -l to get the number of lines.
For the name, I understand it is enough to extract the first 8 characters:
$ t="201701030530.contentState.csv.gz"
$ echo "${t:0:8}"
20170103
All together:
for file in 2017*gz;
do
lines=$(zcat "$file" | wc -l)
printf "%s|%s\n" "$lines" "${file:0:8}"
done > myresult.csv
Note the usage of for file in 2017*gz; to go through the files matching the 2017*gz pattern: this suffices, no need to parse ls!
Use zgrep -c ^ file to count the lines, here encapsulated in awk:
$ awk 'FNR==1{ "zgrep -c ^ " FILENAME | getline s; print s "|" substr(FILENAME,1,8) }' *.gz
12|20170101
The whole "zgrep -c ^ " FILENAME should probably be in a var (s) and then s | getline s.

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

Resources