this really has me stumped. Here is what I am trying to do:
I try to pipe an article from newsboat to a script. This script should then extract the Title and Url from the article.
Here is an example article:
Feed: NYT > Home Page
Title: Hit Pause on Brett Kavanaugh
Author: THE EDITORIAL BOARD
Link: https://www.nytimes.com/2018/09/26/opinion/kavanaugh-supreme-court-hearing-delay.html?partner=rss&emc=rss
Date: Thu, 27 Sep 2018 01:58:11 +0200
The integrity of the Supreme Court is at stake.
The article gets piped with a macro from newsboat:
macro R pipe-to "cat | ~/.scripts/newsboat_extract"
Here is the working script:
#!/bin/bash
cat > ~/newsboat #I do not really need this file, so if I can cut out saving to a file, I would prefer to
title="$(awk -F: '/^Title:/{for(i=2;i<=NF;++i)print $i}' ~/newsboat)"
url="$(awk -F: '/^Link:/{print $2 ":" $3}' ~/newsboat)"
printf '%s\n' "$title" "$url" >> newsboat_result
This delivers the expected output:
Hit Pause on Brett Kavanaugh
https://www.nytimes.com/2018/09/26/opinion/kavanaugh-supreme-court-hearing-delay.html?partner=rss&emc=rss
I would like to avoid saving to a file. However, saving to a variable does - for whatever reason - not work: And this is the script that is not working!
#!/bin/bash
article=$(cat)
title="$(awk -F: '/^Title:/{for(i=2;i<=NF;++i)print $i}' "$article")"
url="$(awk -F: '/^Link:/{print $2 ":" $3}' "$article")"
printf '%s\n' "$title" "$url" >> newsboat_result
the output turns to this:
#empty line
#empty line
I have completely no idea why the script would behave like this. It must have something to do how the variable is stored, right?
Any ideas? - I am pretty new at bash scripting and awk, so thankful also for any comments on how to solve this problem more efficiently.
""""""""""""
" SOLUTION "
""""""""""""
This did it, thank you!
#!/bin/bash
article=$(cat "${1:--}")
title="$(awk -F: '/^Title:/{for(i=2;i<=NF;++i)print $i}' <<< "$article")"
url="$(awk -F: '/^Link:/{print $2 ":" $3}' <<< "$article")"
printf '%s\n' "$title" "$url" >> newsboat_result
In your script, you are assuming that $ARTICLE is a plain file and you are making several operations on it. First you read it with cat and store the content in ~/newsboat, then you read it again with awk to extract the title, then you read it a third time to extract the URL.
This can't work with standard input; it can only be read once.
A quick fix is to work on the copy of it you made in the first operation:
#!/bin/bash
article=$1
feed_copy=~/newsboat
cat "${article:--}" > "$feed_copy" # Use stdin if parameter is not provided
title="$(awk -F: '/^Title:/ { for(i=2; i<=NF; ++i) print $i }' "$feed_copy")"
url="$(awk -F: '/^Link:/ { print $2 ":" $3 }' "$feed_copy")"
printf '%s\n' "$title" "$url" >> "$feed_copy"
Not tested, obviously, but that should work.
Notes:
reserve uppercase variable names for environment variables (this is a mere convention)
you should almost always quote your variables (cat "$article", not cat $article) unless you know what you are doing
avoid echo, use printf
There are other enhancements that could be made to this script but sorry, I lack the time.
[edit] Since you don't actually need the ~/newsboat file, here is a updated version that follows Charles Duffy's suggestion:
#!/bin/bash
feed_copy=$(cat "${1:--}")
title="$(awk -F: '/^Title:/ { for(i=2; i<=NF; ++i) print $i }' <<< "$feed_copy")"
url="$(awk -F: '/^Link:/ {print $2 ":" $3}' <<< "$feed_copy")"
printf '%s\n' "$title" "$url"
Related
This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 1 year ago.
I am trying to get the name out of /etc/passwd using awk to search only in the 5th field of every row, and then to cut some part of that line and print it out.
This is what I wrote but it doesn't seems to work:
for iter in "$#";
do cat /etc/passwd | awk -F ":" '$5==$iter' | cut -d":" -f6;
done;
concerning the delimiter syntax, everything should be fine I guess?
so my problem is in the $5==$iter, I assume.
How can I change that $5==$iter to - if the 5th field of that row contains my $iter var, then cut and so on..
Sorry for the ignorance, I am a beginner :)
Thanks in advance.
See How do I use shell variables in an awk script?
-v should be used to pass shell variables into awk. Also, there's no reason to use either cat or cut here:
for iter in "$#"; do
awk -F: -v iter="$iter" '$5==iter { print $6 }' </etc/passwd
done
As Charles Duffy commented, your code would be more efficient if it didn't need to read /etc/passwd every pass. And while this particular loop probably doesn't need to be optimized (after all, /etc/passwd is typically not that long and most OS's would cache the file anyway after the first read), it would be interesting to see an awk script read the file only once.
That said, here's another implementation where awk is only invoked once:
printf "%s\n" "$#" | awk -F: '
NR == FNR { etc_passwd[ $5 ] = $6; next }
{ print $0 , etc_passwd[ $0 ] }
' /etc/passwd /dev/stdin
The NR == FNR condition is an idiom that causes its associated command only to be executed for the first file in the list of files that follows the awk script (that is, for the reading of /etc/passwd).
You can also do everything in bash, example:
#!/bin/bash
declare -A passwd # declare a associative array
# build the associative array "passwd" with the
# 5th field as a "key" and 6th field as "value"
while IFS=$':\n' read -a line; do # emulate awk to extract fields
[[ -n "${line[4]}" ]] || continue # avoid blank "keys"
passwd["${line[4]}"]=${line[5]} # in bash, arrays starting in "0"
done < /etc/passwd
for iter in "$#"; do
if [ ${passwd[$iter] + 'x'} ]; then
echo ${passwd[$iter]}
fi
done
(This version doesn't get into accout múltiples values for 5th field)
here is a better version that can handle blank values as well, ike./script.sh '':
while IFS=$':\n' read -a line; do
for iter in "$#"; do
if [ "$iter" == "${line[4]}" ]; then
echo ${line[5]}
continue
fi
done
done < /etc/passwd
A pure awk solution could be:
#!/usr/bin/awk -f
BEGIN {
FS = ":"
for ( i = 1; i < ARGC; i++ ) {
args[ARGV[i]] = 1
delete ARGV[i]
}
ARGV[1] = "/etc/passwd"
}
($5 in args) { print $6 }
and you could call as ./script.awk -f 'param1' 'param2'.
I don't seem to locate an SO question that matches this exact problem.
I have a text file that has one text token per line, without any commas, tabs, or quotes. I want to create a comma delimited string based on the file content.
Input:
one
two
three
Output:
one,two,three
I am using this command:
csv_string=$(tr '\n' ',' < file | sed 's/,$//')
Is there a more efficient way to do this?
The usual command to do this is paste
csv_string=$(paste -sd, file.txt)
You can do it entirely with bash parameter expansion operators instead of using tr and sed.
csv_string=$(<file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_string%,} # remove trailing comma
One way with Awk would be to reset the RS and treat the records as separated by blank lines. This would handle words with spaces and format them in CSV format as expected.
awk '{$1=$1}1' FS='\n' OFS=',' RS= file
The {$1=$1} is a way to reconstruct the fields in each line($0) of the file based on modifications to Field (FS/OFS) and/or Record separators(RS/ORS). The trailing 1 is to print every line with the modifications done inside {..}.
With Perl one-liner:
$ cat csv_2_text
one
two
three
$ perl -ne '{ chomp; push(#lines,$_) } END { $x=join(",",#lines); print "$x" }' csv_2_text
one,two,three
$ perl -ne ' { chomp; $_="$_," if not eof ;printf("%s",$_) } ' csv_2_text
one,two,three
$
From #codeforester
$ perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }' csv_2_text
one,two,three
$
Tested the four approaches on a Linux box - Bash only, paste, awk, Perl, as well as the tr | sed approach shown in the question:
#!/bin/bash
# generate test data
seq 1 10000 > test.file
times=${1:-50}
printf '%s\n' "Testing paste solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(paste -sd, test.file)
done
}
printf -- '----\n%s\n' "Testing pure Bash solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(<test.file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_strings%,} # remove trailing comma
done
}
printf -- '----\n%s\n' "Testing Awk solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(awk '{$1=$1}1' FS='\n' OFS=',' RS= test.file)
done
}
printf -- '----\n%s\n' "Testing Perl solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(perl -ne '{ chomp; $_="$_," if not eof; printf("%s",$_) }' test.file)
done
}
printf -- '----\n%s\n' "Testing tr | sed solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(tr '\n' ',' < test.file | sed 's/,$//')
done
}
Surprisingly, the Bash only solution does quite poorly. paste comes on top, followed by tr | sed, Awk, and perl:
Testing paste solution
real 0m0.109s
user 0m0.052s
sys 0m0.075s
----
Testing pure Bash solution
real 1m57.777s
user 1m57.113s
sys 0m0.341s
----
Testing Awk solution
real 0m0.221s
user 0m0.152s
sys 0m0.077s
----
Testing Perl solution
real 0m0.424s
user 0m0.388s
sys 0m0.080s
----
Testing tr | sed solution
real 0m0.162s
user 0m0.092s
sys 0m0.141s
For some reasons, csv_string=${csv_string//$'\n'/,} hangs on macOS Mojave running Bash 4.4.23.
Related posts:
How to join multiple lines of file names into one with custom delimiter?
Concise and portable “join” on the Unix command-line
Turning multi-line string into single comma-separated
Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'
I was basically trying to compare two files and as part of that I assigned the cksum of the file to a variable . But when I try to compare it, it did not work. I realized that when I tried to read the variable nothing gets printed out
The below commands worked just fine
s.joseph#VA-S-JOSEPH-900 /cygdrive/c/users/Anuprita
$ test=`cksum interface2 | awk -F" " '{ print $1 }'`
s.joseph#VA-S-JOSEPH-900 /cygdrive/c/users/Anuprita
$ echo "$test"
3021988741
But when these are part of a script and I try to echo $var, nothing gets printed
$ for i in `ls interface*`;
do chksum1=`cksum $i | awk -F" " '{ print "'$1'" }'`;
echo "$chksum1";
done
s.joseph#VA-S-JOSEPH-900 /cygdrive/c/users/Anuprita
$
I am using bash shell
Without assigning it to any variable, the output is as shown below
for i in interface*; do echo "interface=\"$i\""; cksum "$i"; done
interface="interface11"
4113442291 111 interface11
interface="interface17"
1275738681 111 interface17
interface="interface2"
3021988741 186 interface2
Looks like it is an issue only with bash on cygwin. The script seems to be working just fine on unix
for i in ls interface*; do chksum1=cksum $i | awk -F" " '{ print $1 }'; echo $i, $chksum1; done
interface1, 4294967295
interface2, 4294967295
Try this;
for i in ls interface*; do echo "interface=$i"; chksum1=$(cksum $i | awk -F" " '{ print "'$1'" }'); echo "$chksum1"; done
I like adding the echo statement to verify your getting what you think with the ls statement and the variable assignment should use $(cmd) or `cmd`
Cheers
What you have in your 2nd script:
print "'$1'"
is a completely different statement from what you have in your first one:
print $1
Think about it and ask yourself why you changed it and what it is you're trying to achieve. Also man awk and see g at http://cfajohnson.com/shell/cus-faq-2.html#Q24 for what print "'$1'" does.
Best I can tell without and provided sample input your script should be written:
for i in interface*; do chksum1=$(cksum "$i" | awk '{ print $1 }'); echo "$chksum1"; done
I have a problem creating a script that reads specific value from all the files of an entire folder
I have a number of email files in a directory and I need to extract from each file, 2 specific values.
After that I have to put them into a new file that looks like that:
--------------
To: value1
value2
--------------
This is what I want to do, but I don't know how to create the script:
# I am putting the name of the files into a temp file
`ls -l | awk '{print $9 }' >tmpfile`
# use for the name of a file
`date=`date +"%T"
# The first specific value from file (phone number)
var1=`cat tmpfile | grep "To: 0" | awk '{print $2 }' | cut -b -10 `
# The second specific value from file(subject)
var2=cat file | grep Subject | awk '{print $2$3$4$5$6$7$8$9$10 }'
# Put the first value in a new file on the first row
echo "To: 4"$var1"" > sms-$date
# Put the second value in the same file on the second row
echo ""$var2"" >>sms-$date
.......
and do the same for every file in the directory
I tried using while and for functions but I couldn't finalize the script
Thank You
I've made a few changes to your script, hopefully they will be useful to you:
#!/bin/bash
for file in *; do
var1=$(awk '/To: 0/ {print substr($2,0,10)}' "$file")
var2=$(awk '/Subject/ {for (i=2; i<=10; ++i) s=s$i; print s}' "$file")
outfile="sms-"$(date +"%T")
i=0
while [ -f "$outfile" ]; do outfile="sms-$date-"$((i++)); done
echo "To: 4$var1" > "$outfile"
echo "$var2" >> "$outfile"
done
The for loop just goes through every file in the folder that you run the script from.
I have added added an additional suffix $i to the end of the file name. If no file with the same date already exists, then the file will be created without the suffix. Otherwise the value of $i will keep increasing until there is no file with the same name.
I'm using $( ) rather than backticks, this is just a personal preference but it can be clearer in my opinion, especially when there are other quotes about.
There's not usually any need to pipe the output of grep to awk. You can do the search in awk using the / / syntax.
I have removed the cut -b -10 and replaced it with substr($2, 0, 10), which prints the first 10 characters from column 2.
It's not much shorter but I used a loop rather than the $2$3..., I think it looks a bit neater.
There's no need for all the extra " in the two output lines.
I sugest to try the following:
#!/bin/sh
RESULT_FILE=sms-`date +"%T"`
DIR=.
fgrep -l 'To: 0' "$DIR" | while read FILE; do
var1=`fgrep 'To: 0' "$FILE" | awk '{print $2 }' | cut -b -10`
var2=`fgrep 'Subject' "$FILE" | awk '{print $2$3$4$5$6$7$8$9$10 }'`
echo "To: 4$var1" >>"$RESULT_FIL"
echo "$var2" >>"$RESULT_FIL"
done