printing first word in every line of a txt file unix bash - bash

So I'm trying to print the first word in each line of a txt file. The words are separated by one blank.
cut -c 1 txt file
Thats the code I have so far but it only prints the first character of each line.
Thanks

To print a whole word, you want -f 1, not -c 1. And since the default field delimiter is TAB rather than SPACE, you need to use the -d option.
cut -d' ' -f1 filename
To print the last two words not possible with cut, AFAIK, because it can only count from the beginning of the line. Use awk instead:
awk '{print $(NF-1), $NF;}' filename

you can try
awk '{print $1}' your_file

read word _ < file
echo "$word"
What's nice about this solution is it doesn't read beyond the first line of the file. Even awk, which has some very clean, terse syntax, has to be explicitly told to stop reading past the first line. read just reads one line at a time. Plus it's a bash builtin (and a builtin in many shells), so you don't need a new process to run.
If you want to print the first word in each line:
while read word _; do printf '%s\n' "$word"; done < file
But if the file is large then awk or cut will win out for reading every line.

You can use:
cut -d\ -f1 file
Where:
-d is the delimiter (here using \ for a space)
-f is the field selector
Notice that there is a space after the \.

-c is for characters, you want -f for fields, and -d to indicate your separator of space instead of the default tab:
cut -d " " -f 1 file

Related

Duplicate first column of multiple text files in bash

I have multiple text files each containing two columns and I would like to duplicate the first column in each file in bash to have three columns in the end.
File:
sP100227 1
sP100267 1
sP100291 1
sP100493 1
Output file:
sP100227 sP100227 1
sP100267 sP100267 1
sP100291 sP100291 1
sP100493 sP100493 1
I tried:
txt=path/to/*.txt
echo "$(paste <(cut -f1-2 $txt) > "$txt"
Could you please try following. Written and tested with shown samples in GNU awk. This will add fields to only those lines which have 2 fields in it.
awk 'NF==2{$1=$1 OFS $1} 1' Input_file
In case you don't care of number of fields and simply want to have value of 1st field 2 times then try following.
awk '{$1=$1 OFS $1} 1' Input_file
OR if you only have 2 fields in your Input_file then we need not to rewrite the complete line we could simply print them as follows.
awk '{print $1,$1,$2}' Input_file
To save output into same Input_file itself append > temp && mv temp Input_file for above solutions(after testing).
Use a temp file, with cut -f1 and paste, like so:
paste <(cut -f1 in_file) in_file > tmp_file
mv tmp_file in_file
Alternatively, use a Perl one-liner, like so:
perl -i.bak -lane 'print join "\t", $F[0], $_;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
The default delimiter in cut and paste is TAB, but your file looks to be space-separated.
You can't use the same file as input and output redirection, because when the shell opens the file for output it truncates it, so there's nothing for the program to read. Write to a new file and then rename it.
Your paste command is only being given one input file. And there's no need to use echo.
paste -d' ' <(cut -d' ' -f1 "$txt") "$txt" > "$txt.new" && mv "$txt.new" "$txt"
You can do this more easily using awk.
awk '{print $1, $0}' "$txt" > "$txt.new" && mv "$txt.new" "$txt"
GNU awk has an in-place extension, so you can use that if you like. See Save modifications in place with awk
Try sed -Ei 's/\s*(\S+)\s+/\1 \1 /1' $txt if your fields are separated by strings of one or more whitespace characters. This used the Stream Editor (sed) replaces (s///1) the first string of non-space characters (\S+) followed by a string of whitespace characters (\s+) with the same thing repeated with intervening spaces(\1 \1 ). It keeps the rest of the line. The -E to sed means use extended pattern matching (+, ( vs. \(). The -i means do it in-place, replacing the file with the output.
You could use awk and do awk '{ printf "%s %s\n",$1,$0 }'. This takes the first whitespace-delimited field ($1) and follows it with a space and the whole line ($0) followed by a newline. This is a little clearer than sed but it doesn't have the advantage of being in-place.
If you can guarantee they are delimited by only one space, with no leading spaces, you can use paste -d' ' <(cut -d' ' -f1 ${txt}) ${txt} > ${txt}.new; mv ${txt}.new ${txt}. The -d' ' sets the delimiter to space for both cut and paste. You know this but for others -f1 means extract the first -d-delimited field. The mv command replaces the input with the output.

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"
If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}
I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.
Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.
When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2
grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

shell: prefixing output with spaces with paste

A lot of time one needs to prefix 4 spaces to some shell output and transform it into valid markdown code. E.g. When posting a question or answer here on stackoverflow.
It's actually quite easy to do with sed:
some_command | sed -e 's/^/ /'
But I'd like to do it with paste if possible. Because paste takes 2 files as input, all I came up with was this:
some_command | paste 4_space_file -
where 4_space_file is actually a file whose whole content was 4 spaces.
Is there a neater way to achieve this with paste without having an actual file on the hard drive?
Literal Answers Using Paste
First, to answer your literal question:
some_command | paste <(printf ' \n') -
...yields the same output as passing paste the name of a file with a single line having four spaces and a newline as its content. However, the output from paste in this case is not four-character indents for each line; the first line has four spaces and a tab prepended, subsequent lines are prefixed with only a tab.
If you wanted to generate an input of the appropriate length while still using paste, then you'd end up with something uglier. Say (with bash 4.0 or newer):
ls | {
mapfile -t lines # read output from ls into an array
# our answer, here, is to move to three spaces in the input, and use paste -d' ' to
# ...add a fourth space during processing.
paste -d' ' \
<(yes ' ' | head -n "${#lines[#]}") \
<(printf '%s\n' "${lines[#]}")
}
<() is process substitution syntax, which expands to a filename which, when read from, will yield the output from the code contained.
Better Answers
For a native bash approach, you might also consider defining a function:
ident4() { while IFS= read -r line; do printf ' %s\n' "$line"; done; }
...for later use:
some_command | indent4
Unlike paste, this actually inserts exactly four spaces (with no intervening tab) on every line, for the exact number of lines in your input (no need to synthesize the correct length).
Also consider awk:
awk '{ print " " $0; }'

Extract first word in colon separated text file

How do i iterate through a file and print the first word only. The line is colon separated. example
root:01:02:toor
the file contains several lines. And this is what i've done so far but it does'nt work.
FILE=$1
k=1
while read line; do
echo $1 | awk -F ':'
((k++))
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.
Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines
awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
root:hello:1
user:bye:2
test.sh
#!/bin/bash
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
Result:
$ ./test.sh
root
user
A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.
You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.

Delete first characters off of a line in a file with awk or grep

I'm attempting to remove a certain pattern from a line, but not the entire line itself. An example would be:
Original:
user=dannyBoy
Desired:
dannyBoy
I have a file that is full of lines like that, so I was wondering how I would be able to cut a specific part of the text off, whether that be just removing the first five characters from the list or searching for the pattern "user=" and removing it.
There are many ways to do this:
cut -d'=' -f2- file
sed 's/^[^=]*//' file
awk -F= '{print $2}' file #if just one = is present
cut sets a delimiter (-d'=) and then prints all the fields starting from the 2nd one (-f2-).
sed looks for all the content from the beginning up to the first = and removes it.
awk sets = as field separator and prints the second field.
Using ex:
echo user=dannyBoy | ex -s +"norm df=" +%p -cq! /dev/stdin
where ex is equivalent to vi -e/vim -e which basically executes vi command: df= (delete until finds =), then print the buffer (%p).
If you've multiple lines like that, then it would be simpler by using substitution:
ex -s +"%s/^.*=//g" +%p -cq! foo.txt
To edit file in place, change -cq! to -cwq.
The command below deletes the first 5 characters:
$ echo "user=dannyboy" | cut -c 6-
You can use it on a file with cut -c 6- inputfilename as well.

Resources