split numbers in and store them in different files using unix shell script - bash

I have a file called "list.txt" which contains the following rows of numbers.
31056780
31909020
31092320
61093190
61094592
45090280
45902902
I need to now take all the rows starting with "31" and store them in another file call file31.txt take all the rows starting with "61" and store them in file61.txt, take all rows starting with "45" store it in file45.txt
file31.txt will contain.
31056780
31909020
31092320
file61.txt will contain.
61093190
61094592
file45.txt will contain.
45090280
45902902
I tried this command for all 3 but it does not do what i want it to do.
awk -F\" '/31*/ {print $0}' list.txt > file31
awk -F\" '/61*/ {print $0}' list.txt > file61
awk -F\" '/45*/ {print $0}' list.txt > file45

You can use output redirection inside a single awk script. It can construct the filename by concatenating the first two characters of the line.
awk '{ fn = "list" substr($0, 1, 2) ".txt"; print > fn }' list.txt

You could use grep or sed to filter the lines with a matching pattern, for example:
sed '/^31/!d' list.txt > list31.txt
Or in a for loop for every number you want:
for n in "31" "45" "61"; do sed '/^'"$n"'/!d' list.txt > list$n.txt; done
Hope it helps.

You can use:
awk '/^31/{print > "file31"} /^45/{print > "file45"} /^61/{print > "file61"}' file

for i in `cat list.txt | cut -c1-2 | uniq`; do cat list.txt | grep -P ^${i} > file${i}.txt; done
This command works fine and is generic enough to work for all cases.
Now let's understand how it works.
cat list.txt | cut -c1-2 | uniq
31
45
61
Next we loop over these unique identifiers to create the new files using
cat list.txt | grep -P ^${i}
grep -P finds strings with partial match - here ^ - means that we are looking at this partial string only at the beginning of the line.

Related

Extracting unique values between 2 files with awk

I need to get uniq lines when comparing 2 files. These files containing field separator ":" which should be treated as the end of line while comparing strings.
The file1 contains these lines
apple:tasty
apple:red
orange:nice
kiwi:awesome
kiwi:expensive
banana:big
grape:green
orange:oval
banana:long
The file2 contains these lines
orange:nice
banana:long
The output file should be (2 occurrences of orange and 2 occurrences of banana deleted)
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
So the only strings before : should be compared
Is it possible to complete this task in 1 command ?
I tried to complete the task in such way but field separator does not work in that situation.
awk -F: 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2 > outputfile
You basically had it, but $0 refers to the whole line when you want to deal with only the first field, which is $1.
Also you need to take care with the order of the input files. To use the values from file2 for deciding which lines to include from file1, process file2 first:
$ awk -F: 'FNR==NR {a[$1]++; next} !a[$1]' file2 file1
apple:tasty
apple:red
kiwi:awesome
kiwi:expensive
grape:green
One comment: awk is very ineffective with arrays. In real life with big files, better use something like:
comm -3 <(cut -d : -f 1 f1 | sort -u) <(cut -d : -f 1 f2 | sort -u) | grep -h -f /dev/stdin f1 f2

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Find unique words

Suppose there is one file.txt in which below content text is written:
ABC/xyz
ABC/xyz/rst
EFG/ghi
I need to write a shell script that can extract the first unique word before the first /.
So as output, I want ABC and EFG to be written in one file.
You can extract the first word with cut (slash as delimiter), then pipe to sort with the -u (for "unique") option:
$ cut -d '/' -f 1 file.txt | sort -u
ABC
EFG
To get the output into a file, just redirect by appending > filename to the command. (Or pipe to tee filename to see the output and get it in a file.)
Try this :
cat file.txt | tr -s "/" ' ' | awk -F " " '{print $1}' | sort | uniq > outfile.txt
Another interesting variation:
awk -F'/' '{print $1 |" sort -u" }' file.txt > outfile.txt
Not that it matters here, but being able to pipe and redirect within awk can be very handy.
Another easy way:
cut -d"/" -f1 file.txt|uniq > out.txt
You can use a mix of cut and sort like so:
cut -d '/' -f 1 file.txt | sort -u > newfile.txt
The first line grabs any string until a slash / and outputs it into newfile.txt.
The second line sorts the text, removing any duplicate strings you might have.

How to get word from text file BASH

I want to get only one word from this txt file: http://pastebin.com/jFDu0Le5 . The word is from last row: WER: 45.67% Correct: 65.87% Acc: 54.33%
I want to get only the value: 45.67 to save it to the file value.txt..I want to create BASH script to get this value. Can you give me an example how to do it??? I am new in Bash and I need it for school. The whole .txt file is saved on my server as text file file.txt.
Try this:
grep WER file.txt | awk '{print $2}' | uniq | sed -e 's/%//' > value.txt
Note that this will overwrite value.txt each time you run the command.
You want grep "WER:" value.txt | cut -???
I have ??? because I do not know the structure of the file. Tab delimited? Fixed Width?
Do man cut an you can get the arguments you need.
There a many ways and instruments to do the task:
sed
tac file.txt | sed -n '/^WER: /{s///;s/%.*//;p;q}' > value.txt
awk
tac file.txt | awk -F'[ %]' '/^WER:/{print $2;exit}' > value.txt
bash
while read a b c
do
if [ $a = "WER:" ]
then
b=${b%\%*}
echo ${b#* }
break
fi
done < <(tac file.txt) > value.txt
If the format is as you said, then this also works
awk -F'[: %]' '/^WER/{print $3}' file.txt > value.txt
Explanation
-F specifies the field separator as one of [: %]
/<PATTERN>/ {<ACTION>} refers to: if a line matches some PATTERN, then do some ACTION
in my case,
the PATTERN is: starts with ^ the string WER
the ACTION is: print field $3 (as split by the -F field separators)
> sends the output to value.txt

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources