bash - find the differed string in column of file - bash

I have a file input.txt, in bash using sed,awk or shell script how can I get the only differed string in a column amount all?
For example:
# cat input.txt
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1axxxxx abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fayyyyyy1c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
878933fa4965c31c88ee8696a1a5838f abc xyz
I want to pick and display only "878933fa4965c31c88ee8696a1axxxxx" and "878933fayyyyyy1c88ee8696a1a5838f"

In pure Bash:
declare -A lines
while read col1 line ; do lines["$col1"]="$col1 $line" ; done < input.txt
for i in ${!lines[#]} ; do echo "$i" ; done
First we declare the lines variable as an associative array. Then we read them all in a while loop. Then for each key (the first column) we list the lines.

Your question is kinda vague but maybe you're trying to print the $1 values that appear only once and if so this would do that:
$ awk '{cnt[$1]++} END{for (i in cnt) if (cnt[i]==1) print i}' file
878933fayyyyyy1c88ee8696a1a5838f
878933fa4965c31c88ee8696a1axxxxx

awk '{print $1}' <file> |uniq -u
awk '{print $4}' <file> |uniq -u

uniq -c will give you a count, so if you mean only the entries of one single entry you can do:
cut -d " " -f 1 file | sort | uniq -c | awk '$1==1{print $2}'
Or in perl:
perl -lane '$seen{$F[0]}++; END{for (%seen){ print if $seen{$_}==1}}' file

Try this:
cat input.txt | uniq -u | awk '{print $1}'

Related

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

Efficient way to get unique value from log file

There is a large log file of 10GB, and formatted as following:
node123`1493000000`POST /api/info`app_id=123&token=123&sign=abc
node456`1493000000`POST /api/info`app_id=456&token=456&sign=abc
node456`1493000000`POST /api/info`token=456&app_id=456&sign=abc
node456`1493000000`POST /api/info`token=456&sign=abc&app_id=456
Now I want to get unique app_ids from the log file. For example, the expected result of the log file above should be:
123
456
I do that with shell script awk -F 'app_id=' '{print $2}' $filename | awk -F '&' '{print $1}' | sort | uniq, and is there a more efficient way?
If your log file's name is log_file.txt,you can use these commands:
grep -Po "(?<=&app_id=)[0-9]+" log_file.txt
awk -F "[&=]" '{print $4}' log_file.txt
Change the logfile name
awk '{print $17" "$18" "$19" "$20}' log.txt |sort -k1|uniq >> z #apache
# filename on line number(0-9) awk result
while read x;
do
echo $x
grep "$x" log.txt | wc -l
done < z

To get first and last occurrence when cut using a delimiter

I have a text file a.txt with following data
abc/def/ghi
jkl/mno/pqr/stu
I need to cut them so that I get first and last string with "/" as delimiter
Output expected is
abc ghi
jkl stu
cat a.txt |cut -d "/" -f1 #gives me first cell
cat a.txt |rev |cut -d "/" -f1 |rev #gives me last cell
I want both cells to be available in single command. Kindly help.
You could use awk for this,
$ awk -F/ '{print $1,$NF}' file
abc ghi
jkl stu
Through sed,
$ sed 's~^\([^/]*\).*\/\(.*\)$~\1 \2~g' file
abc ghi
jkl stu
Through perl,
$ perl -pe 's;^([^/]*).*\/(.*)$;\1 \2;g' file
abc ghi
jkl stu
Ugly hack through grep and paste,
$ grep -oP '^[^/]*|\w+(?=$)' file | paste -d' ' - -
abc ghi
jkl stu
Another sed ( without capture ),
sed 's#/.*/# #g' yourfile
Test:
$ sed 's#/.*/# #g' yourfile
abc ghi
jkl stu

Unix command to convert multiple line data in a single line along with delimiter

Here is the actual file data:
abc
def
ghi
jkl
mno
And the required output should be in this format:
'abc','def','ghi','jkl','mno'
The command what I used to do this gives output as:
abc,def,ghi,jkl,mno
The command is as follows:
sed -n 's/[0-3]//;s/ //;p' Split_22_05_2013 | \
awk -v ORS= '{print $0" ";if(NR%4==0){print "\n"}}'
In response to sudo_O's comment I add an awk less solution in pure bash. It does not exec any program at all. Of course instead of <<XXX ... XXX (here-is-the-document) stuff one could add <filename.
set c=""
while read w; do
echo -e "$c'$w'\c"
c=,
done<<XXX
abc
def
ghi
jkl
mno
XXX
Output:
'abc','def','ghi','jkl','mno'
An even shorter version:
printf -v out ",'%s'" $(<infile)
echo ${out:1}
Without the horrifying pipe snakes You can try something like this:
awk 'NR>1{printf ","}{printf "\x27%s\x27",$0}' <<XXX
abc
def
ghi
jkl
mno
XXX
Output:
'abc','def','ghi','jkl','mno'
Or an other version which reads the whole input as one line:
awk -vRS="" '{gsub("\n","\x27,\x27");print"\x27"$0"\x27"}'
Or a version which lets awk uses the internal variables more
awk -vRS="" -F"\n" -vOFS="','" -vORS="'" '{$1=$1;print ORS $0}'
The $1=$1; is needed to tell to awk to repack $0 using the new field and record separators (OFS, ORS).
$ cat test.txt
abc
def
ghi
jkl
mno
$ cat test.txt | tr '\n' ','
abc,def,ghi,jkl,mno,
$ cat test.txt | awk '{print "\x27" $1 "\x27"}' | tr '\n' ','
'abc','def','ghi','jkl','mno',
$ cat test.txt | awk '{print "\x27" $1 "\x27"}' | tr '\n' ',' | sed 's/,$//'
'abc','def','ghi','jkl','mno'
The last command can be shortened to avoid UUOC:
$ awk '{print "\x27" $1 "\x27"}' test.txt | tr '\n' ',' | sed 's/,$//'
'abc','def','ghi','jkl','mno'
Using sed alone:
sed -n "/./{s/^\|\$/'/g;H}; \${x;s/\n//;s/\n/,/gp};" test.txt
Edit: Fixed, it should also work with or without empty lines now.
$ cat file
abc
def
ghi
jkl
mno
$ cat file | tr '\n' ' ' | awk -v q="'" -v OFS="','" '$1=$1 { print q $0 q }'
'abc','def','ghi','jkl','mno'
Replace '\n' with ' ' -> (tr '\n\ ' ')
Replace each separator (' ' space) with (',' quote-comma-quote) ->
(-v OFS="','")
Add quotes to the begin and end of line -> (print q $0 q)
This can be done pretty briefly with sed and paste:
<infile sed "s/^\|\$/'/g" | paste -sd,
Or more portably (I think, cannot test right now):
sed "s/^\|\$/'/g" infile | paste -s -d , -
$ sed "s/[^ ][^ ]*/'&',/g" input.txt | tr -d '\n'
'abc','def','ghi','jkl','mno',
To clean the last ,, throw in a
| sed 's/,$//'
awk 'seen == 1 { printf("'"','"'%s", $1);} seen == 0 {seen = 1; printf("'"'"'%s", $1);} END { printf("'"'"'\n"); }'
In slightly more readable format (suitable for awk -f):
# Print quote-terminator, separator, quote-start, thing
seen == 1 { printf("','%s", $1); }
# Set the "print separator" flag, print quote-start thing
seen == 0 { seen = 1; printf("'%s", $1}; }
END { printf("'\n"); } # Print quote-end
perl -l54pe 's/.*/\x27$&\x27/' file

Ignore empty fields

Given this file
$ cat foo.txt
,,,,dog,,,,,111,,,,222,,,333,,,444,,,
,,,,cat,,,,,555,,,,666,,,777,,,888,,,
,,,,mouse,,,,,999,,,,122,,,133,,,144,,,
I can print the first field like so
$ awk -F, '{print $5}' foo.txt
dog
cat
mouse
However I would like to ignore those empty fields so that I can call like this
$ awk -F, '{print $1}' foo.txt
You can use like this:
$ awk -F',+' '{print $2}' file
dog
cat
mouse
Similarly, you can use $3, $4 and $5 and so on.. $1 cannot be used in this case because the records begins with delimiter.
$ awk '{print $1}' FPAT=[^,]+ foo.txt
dog
cat
mouse
You can delete multiple repetition of a field with tr -s 'field':
$ tr -s ',' < your_file
,dog,111,222,333,444,
,cat,555,666,777,888,
,mouse,999,122,133,144,
And then you can access to dog, etc with:
$ tr -s ',' < your_file | awk -F, '{print $2}'
dog
cat
mouse
perl -anF,+ -e 'print "$F[1]\n"' foo.txt
dog
cat
mouse
this is no awk but you will get to use 1 instead of 2.
awk -F, '{gsub(/^,*|,*$/,"");gsub(/,+/,",");print $1}' your_file
tested below:
> cat temp
,,,,dog,,,,,111,,,,222,,,333,,,444,,,
,,,,cat,,,,,555,,,,666,,,777,,,888,,,
,,,,mouse,,,,,999,,,,122,,,133,,,144,,,
execution:
> awk -F, '{gsub(/^,*|,*$/,"");gsub(/,+/,",");print $1}' temp
dog
cat
mouse

Resources