sed doesn't exit, even though it is done

sed doesn't exit, even though it is done - bash

I am trying to manipulate a text file. I've got most of it figured out myself, but I'm stumped with why sed seems to go into infinite loop mode. The text file can be downloaded from census.gov.
At the moment, I just want a list of states that I can throw into a for loop to do some state-specific processing. So far, I've got this. (I'm not a bash expert, suggestions are welcome.
sed 1d tables/ansi.csv | awk -F "," '{print $1}' | uniq | tr \n : | sed s/:/" "/g
I want to put this into $() to use the output in a for loop, but for some reason, sed is hanging up and not exiting. I actually need to add a couple of things to the final sed command, to properly format things, but I want to get this running correctly before I go any further.
In the end - I want something that looks like (just showing the first few):
"AL" "AK" "AZ" "AR" "CA" "CO" ....
Right now, sed returns more of less what I expect and returns (just showing the last few)
...."MP" "PR" "UM" "VI" "
But, rather than exiting, sed hangs and I have to Ctrl-C out of the script. If I remove the final sed statement, the little script runs as I would expect, without hanging.
So, why on earth is this hanging?

I would suggest putting the sed script inside quotes:
sed 1d tables/ansi.csv | awk -F "," '{print $1}' | uniq | tr '\n' : | sed 's/:/" "/g'
The reason that sed seems to "hang" may be that tr has removed the final newline which sed requires. By the way, the newline argument to tr needs to be quoted.
However, the whole thing can be done in AWK:
awk -F, 'NR > 1 {a[$1]=$1} END { delim=":"; num=asort(a); for (i=1;i<=num;i++) printf "\"%s\" ",a[i]; printf "\n"}' tables/ansi.csv

awk -F"," 'NR>1 && (!($1 in a)){print $1;a[$1]}' file|sort|awk '{printf "\"%s\" ",$1}'

Related

How to remove the username/hostname line from an output on Korn Shell?

I run the command
df -gP /data1 /data2 | grep -v File | awk '{print $1}' |
awk -F/dev/ '$0=$2' | tr '\n' '
on the AIX shell (ksh) and it prints the output below:
lv_data01 lv_data02 root#testhost:/
However, I would like the output to be printed this way. Could someone help?
lv_data01 lv_data02

Using grep … | awk … | awk … is not necessary; a single awk could do the whole job. So could sed and it might even be easier. I'd be tempted to deal with the spacing by using:
x=$(df … | sed …); echo $x
The tr command, once corrected, replaces newlines with spaces, so the prompt follows without a newline before it. The ; echo suggestion adds the missing newline; the echo $x suggestion (note no double quotes) does too.
As for the sed command:
sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'
Don't print anything by default
If the line doesn't match File (doing the work of grep -v):
remove the first space (blank or tab) and everything after it (doing the work of awk '{print $1}')
replace everything up to /dev/ with nothing and print (doing the work of awk -F/dev/ '{$0=$2}')
The command substitution and capture, followed by echo, deals with spaces and newlines.
So, my suggested solution is:
x=$(df -gP /data1 /data2 | sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'); echo $x
You could add unset x after the echo if you are going to be using this directly in the shell and not in a shell script. If it'll be encapsulated in a shell script, you don't have to worry about it.
I'm blithely assuming the output from df -gP won't contain a path such as this, with two occurrences of /dev:
/who/knows/dev/lv_data01/dev/bin
If that's a real problem, you can fix the sed script, but I don't think it will be. It's one thing the second awk script in the question handles differently.

Sed failing with escape characters in variables

sed terminating early because of escape characters in variables. Hoping awk can do what I need but can't see how!
# Main section ==========================================╕
LASTIP=`grep -E '[0-2]{0,1}[0-9]{0,9}[0-9]{0,1}\.[0-2]{0,1}[0-9]{0,9}[0-9]{0,1}\.[0-2]{0,1}[0-9]{0,9}[0-9]{0,1}\.[0-2]{0,1}[0-9]{0,9}[0-9]{0,1}' $SRCDIR/$IPLOGFILE | tail -1|awk -F'\t' '{print$3}'`
if [ "$CURRENTIP" == "$LASTIP" ]; then
# Still using old IP===================================╕
FIRSTDETECTED=`grep $LASTIP $SRCDIR/$IPLOGFILE | tail -1|awk -F'\t' '{print$1}'`
LASTDETECTED=`grep $LASTIP $SRCDIR/$IPLOGFILE | tail -1|awk -F'\t' '{print$2}'`
OLDLINE=$(printf "$FORMAT" "$FIRSTDETECTED" "$LASTDETECTED" "$LASTIP")
AMENDEDLINE=$(printf "$FORMAT" "$FIRSTDETECTED" "$TIMESTAMP" "$LASTIP")
sed -i "s/'$OLDLINE'/'$AMENDEDLINE'/g" $SRCDIR/$IPLOGFILE
This works fine apart from the last sed, which terminates because $OLDLINE and/or $AMENDEDLINE contains escape chars. I thought I could do a direct substitution for awk to solve the issue but the more I thought about it the more I thought the whole section could be done much more efficiently with awk - maybe in one line of awk? Trouble is I don't know where to start. Am I fooling myself about simplifying it or is there a way? If there is, you may have to help me out, as I find this stuff 'warps my fragile little mind'*
*courtesy of Cartman ;P
I've snipped out the section but can supply the rest of the script if that helps?

You can perhaps try like that
sed -i 's/'"$OLDLINE"'/'"$AMENDEDLINE"'/g' "$SRCDIR"/"$IPLOGFILE"
I think '$OLDLINE' and '$AMENDEDLINE' are not expanded

reverse a file in Unix shell

I have a file parse.txt
parse.txt contains the following
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0
and I want the output.txt file as (to reverse the order from last to first)using parse.txt
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
I have tried the following code:
tail -r parse.txt

You can use the surprisingly helpful tac from GNU Coreutils.
tac -s "," parse.txt > newparse.txt
tac by default will "cat" the file to standard out, reversing the lines. By specifying the separator using the -s flag, you can simply reverse your fields as desired.
(You may need to do a post-processing step to get the commas to work out correctly, which can be another step in your pipeline.)

I like the tac solution; it's tight and elegant, but as Micah pointed out, tac is part of GNU Coreutils, which means that it's not available by default in FreeBSD, OSX, Solaris, etc.
This can be done in pure bash, no external tools required.
#!/usr/bin/env bash
unset comma
read foo < parse.txt
bar=(${foo//,/ })
for (( count="${#bar[#]}"; --count >= 0; )); do
printf "%s%s" "$comma" "${bar[$count]}"
comma=","
done
This obviously only handles one line, per your sample input. You can wrap it in something if you need to handle multiple lines of input.
The logic here is that we can convert the input into an array by replacing commas with spaces. Of course, if our input data included spaces, this would have to be adjusted. Once we have the array, we simply step backwards through it, printing each record.
Note that this does not include a terminating newline. If you want one, you can add it with:
printf '\n'
as a final line.

perl -F, -lane 'print join ",", reverse #F' parse.txt > output.txt

You can use this awk command:
awk -v RS=, '{a[++i]=$1} END{for (k=i; k>=1; k--) printf a[k] (k>1?RS:ORS)}' parse.txt
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0

The question is tagged unix and you have mentioned tail -r which suggests you might not be using Linux (with full GNU toolchain), but instead some "real" Unix (BSD variant), e.g. osx.
As such, the tac command is not available, but as mentioned in the question, tail -r is. So you can use the following:
$ tr ',' '\n' < parse.txt | tail -r | tr '\n' ',' | sed 's/,$//'
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0
$
Notes:
This only works for files that have one line, as we are relying on converting commas to newlines and back. If there is more than one line, then the newlines in between will get converted to commas by the second tr.
The final sed is to remove a trailing comma, that was converted from a trailing newline inserted by tail

Emulating tac with sed:
tr , '\n' <parse.txt | sed '1!G; h; $!d' | paste -sd ,
Alternatively, if you don't have paste:
tr , '\n' <parse.txt | sed '1!G; h; $!d' | tr '\n' , | sed 's/,$//'
Output:
julie/hello/3.0,julie/hello/2.0,whitney/hello/1.0,remo/hello3/3.0,remo/hello2/2.0,remo/hello/1.0

You can use any language to do that
xargs ruby -e "puts ARGV[0].split(',').reverse.join(',')" < parse.txt

Reverse can be done by tac (from cat). As commented this will reverse the lines not what the OP asked for.
tac filename
You can still you tac if you provide line by line and reverse by not linefeed delimiter but the field separator, here ,.
echo "a,b,c" | tr '\n' ',' | tac -s "," | sed 's/,$/\n/'

How do i count 1 or more items in comma separated input in Shell

Here's my issue, i know how to count the files using the following two strategies but i have a problem with each one.
I am using '.sh' extension.
First:
count=`echo $2 | awk -F, {'print NF'}`
causes my program to throw an error at me: awk: cannot execute - No such file or directory
Secondly:
count=`echo $2 | tr -cd , | wc -c`
Works if you have multiple values separated by commas, however, it will not work if the input is a single item with no commas.
Like i said, this was previously working with the awk but for some reason when i ran it on the physical device instead of the virtual machine it gave me that error.
any ideas?
Thing I know are NOT the issue:
Version of shell is the same.

Try count=$(echo ${2} | awk -F, '{print NF}') instead - you have your braces and quotes inside-out.
Although, it seems your bigger problem is that awk appears to not be executable... You might try which awk and ls -l $(which awk) to see what's up with that...

Awk replace a column with its hash value

How can I replace a column with its hash value (like MD5) in awk or sed?
The original file is super huge, so I need this to be really efficient.

So, you don't really want to be doing this with awk. Any of the popular high-level scripting languages -- Perl, Python, Ruby, etc. -- would do this in a way that was simpler and more robust. Having said that, something like this will work.
Given input like this:
this is a test
(E.g., a row with four columns), we can replace a given column with its md5 checksum like this:
awk '{
tmp="echo " $2 " | openssl md5 | cut -f2 -d\" \""
tmp | getline cksum
$2=cksum
print
}' < sample
This relies on GNU awk (you'll probably have this by default on a Linux system), and it uses openssl to generate the md5 checksum. We first build a shell command line in tmp to pass the selected column to the md5 command. Then we pipe the output into the cksum variable, and replace column 2 with the checksum. Given the sample input above, the output of this awk script would be:
this 7e1b6dbfa824d5d114e96981cededd00 a test

I copy pasted larsks's response, but I have added the close line, to avoid the problem indicated in this post: gawk / awk: piping date to getline *sometimes* won't work
awk '{
tmp="echo " $2 " | openssl md5 | cut -f2 -d\" \""
tmp | getline cksum
close(tmp)
$2=cksum
print
}' < sample

This might work using Bash/GNU sed:
<<<"this is a test" sed -r 's/(\S+\s)(\S+)(.*)/echo "\1 $(md5sum <<<"\2") \3"/e;s/ - //'
this 7e1b6dbfa824d5d114e96981cededd00 a test
or a mostly sed solution:
<<<"this is a test" sed -r 'h;s/^\S+\s(\S+).*/md5sum <<<"\1"/e;G;s/^(\S+).*\n(\S+)\s\S+\s(.*)/\2 \1 \3/'
this 7e1b6dbfa824d5d114e96981cededd00 a test
Replaces is from this is a test with md5sum
Explanation:
In the first:- identify the columns and use back references as parameters in the Bash command which is substituted and evaluated then make cosmetic changes to lose the file description (in this case standard input) generated by the md5sum command.
In the second:- similar to the first but hive the input string into the hold space, then after evaluating the md5sum command, append the string G to the pattern space (md5sum result) and using substitution arrange to suit.

You can also do that with perl :
echo "aze qsd wxc" | perl -MDigest::MD5 -ne 'print "$1 ".Digest::MD5::md5_hex($2)." $3" if /([^ ]+) ([^ ]+) ([^ ]+)/'
aze 511e33b4b0fe4bf75aa3bbac63311e5a wxc
If you want to obfuscate large amount of data it might be faster than sed and awk which need to fork a md5sum process for each lines.

You might have a better time with read than awk, though I haven't done any benchmarking.
the input (scratch001.txt):
foo|bar|foobar|baz|bang|bazbang
baz|bang|bazbang|foo|bar|foobar
transformed using read:
while IFS="|" read -r one fish twofish red fishy bluefishy; do
twofish=`echo -n $twofish | md5sum | tr -d " -"`
echo "$one|$fish|$twofish|$red|$fishy|$bluefishy"
done < scratch001.txt
produces the output:
foo|bar|3858f62230ac3c915f300c664312c63f|baz|bang|bazbang
baz|bang|19e737ea1f14d36fc0a85fbe0c3e76f9|foo|bar|foobar

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

sed doesn't exit, even though it is done - bash

awk -F"," 'NR>1 && (!($1 in a)){print $1;a[$1]}' file|sort|awk '{printf "\"%s\" ",$1}'

Related

How to remove the username/hostname line from an output on Korn Shell?

Sed failing with escape characters in variables

reverse a file in Unix shell

How do i count 1 or more items in comma separated input in Shell

Awk replace a column with its hash value

Categories

Resources