Extract first word in colon separated text file - bash

How do i iterate through a file and print the first word only. The line is colon separated. example
the file contains several lines. And this is what i've done so far but it does'nt work.
while read line; do
echo $1 | awk -F ':'
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.

Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines

awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
$ ./test.sh

A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.

You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.


Evaluating a log file using a sh script

I have a log file with a lot of lines with the following format:
IP - - [Timestamp Zone] 'Command Weblink Format' - size
I want to write a script.sh that gives me the number of times each website has been clicked.
The command:
awk '{print $7}' server.log | sort -u
should give me a list which puts each unique weblink in a separate line. The command
grep 'Weblink1' server.log | wc -l
should give me the number of times the Weblink1 has been clicked. I want a command that converts each line created by the Awk command above to a variable and then create a loop that runs the grep command on the extracted weblink. I could use
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
(source: Read a file line by line assigning the value to a variable) but I don't want to save the output of the Awk script in a .txt file.
My guess would be:
while IFS='' read -r line || [[ -n "$line" ]]; do
grep '$line' server.log | wc -l | ='$variabel' |
echo " $line was clicked $variable times "
But I'm not really familiar with connecting commands in a loop, as this is my first time. Would this loop work and how do I connect my loop and the Awk script?
Shell commands in a loop connect the same way they do without a loop, and you aren't very close. But yes, this can be done in a loop if you want the horribly inefficient way for some reason such as a learning experience:
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
n=$(grep -c "$line" server.log)
echo "$line" clicked $n times
# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.
# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case
# $line is a URL which cannot contain whitespace and practically
# cannot be a glob. $n is a number and definitely safe.
# grep -c does the count so you don't need wc -l
or more simply
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
echo "$line" clicked $(grep -c "$line" server.log) times
However if you just want the correct results, it is much more efficient and somewhat simpler to do it in one pass in awk:
awk '{n[$7]++}
END{for(i in n){
print i,"clicked",n[i],"times"}}' |
# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
for(i in n){
print i,"clicked",n[i],"times"}}'
The associative array n collects the values from the seventh field as keys, and on each line, the value for the extracted key is incremented. Thus, at the end, the keys in n are all the URLs in the file, and the value for each is the number of times it occurred.

Remove everything in a pipe delimited file after second-to-last pipe

How can remove everything in a pipe delimited file after the second-to-last pipe? Like for the line
the result should be
Replace |(string without pipe)|(string without pipe) at the end of each line:
sed 's/|[^|]*|[^|]*$//' inputfile
Using awk, something like
awk -F'|' 'BEGIN{OFS="|"}{NF=NF-2; print}' inputfile
(or) use cut if you know the number of columns in total, i,e 6 -> 4
cut -d'|' -f -4 inputfile
The command I would use is
cat input.txt | sed -r 's/(.*)\|.*/\1/' > output.txt
A pure Bash solution:
while IFS= read -r line || [[ -n $line ]] ; do
printf '%s\n' "${line%|*|*}"
done <inputfile
See Reading input files by line using read command in shell scripting skips last line (particularly the answer by Jahid) for details of how the while loop works.
See pattern matching in Bash for information about ${line%|*|*}.

shell script : remove first column from txt files

Hy everyone,
I would like to remove the first column from a lot of .txt files stored in a folder.
So far I've tried this :
# loop on all .txt files
for i in $(ls *.txt); do
# remove first column
cut -d' ' -f2- < $i
# remove temporary file
rm $i.bak
This only print the result of the cut in the shell window, but it doesn't modify the files. I missing something really easy here but I can't figure out where I should indicate that I want to write the result of the cut.
#!/usr/bin/env bash
set -eu # stop on error
# loop on all .txt files
for i in *.txt; do
# remove first column
cut -d' ' -f2- < $i > $i.new
# replace old file
mv $i.new $i
Redirect STDOUT to $i.bak:
cut -d' ' -f2- < $i > $i.bak
mv $i.bak $i
Here is the awk approach to printing everything but the first column:
awk '{$1=""; print $0}'
You can set the field separator with FS= - it defaults to a white space.
Use loop control as per normal, e.g. here's how to remove the UID and GID columns from a collection of passwd files (stored as passwd-hostid_number i.e. passwd-01 ... passwd-99):
for pwdfile in passwd[0-9][0-9] ;
awk 'FS=":", OFS=":" {$3=""; $4=""; print $0}' $pwdfile > $pwdfile-no-uidgid
I would recommend to edit your files in place using sed:
sed -i -e 's/^[^ ]* //' *.txt
This will remove any non whitespace chars including the first whitespace.
Open in vi editor and in command mode (Press Esc), type
:%! awk '{$1=""; print $0}'
and press enter and save.
