Get only part of file using sed or awk

Get only part of file using sed or awk - bash

I have a file which contains text as follows:
Directory /home/user/ "test_user"
bunch of code
another bunch of code
How can I get from this file only the /home/user/ part?
I've managed to use awk -F '"' 'NR==1{print $1}' file.txt to get rid of rest of the file and I'm gettig output like this:
Directory /home/user/
How can I change this command to get only /home/user/ part? I'd like to make it as simple as possible. Unfortunately, I can't modify this file to add/change the content.

this should work the fastest, noticeable if your file is large
awk '{print $2; exit}' file
it will print the second field of the first line and stop processing the rest of the file.

With awk it should be:
awk 'NR==1{print $2}' file.txt
Setting the field delimiter to " was wrong Since it splits the line into these fields:
$1 = 'Directory /home/user/'
$2 = 'test_user'
$3 = '' (empty)
The default record separator, which is [[:space:]]+, splits like this:
$1 = 'Directory'
$2 = '/home/user/'
$3 = '"test_user"'

As an alternate, you can use head and cut:
$ head -n 1 file | cut -d' ' -f2

Not sure why you are using the -F" as that changes the delimiter. If you remove that, then $2 will get you what you want.
awk 'NR==1{print $2}' file.txt
You can also use awk to execute the print when the line contains /home/user instead of counting records:
awk '/\home\/user\//{print $2}' file.txt
In this case, if the line were buried in the file, or if you had multiple instances, you would get the name for every occurrence wherever it was.

Adding some grep
grep Directory file.txt|awk '{print $2}'

Related

Command to remove all but select columns for each file in unix directory

I have a directory with many files in it and want to edit each file to only contain a select few columns.
I have the following code which will only print the first column
for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i"; done
but if I try to edit each file by adding >'$I' as below then I lose all the information in my files
for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i" > "$i"; done
However I want to be able to remove all but a select few columns in each file for example 1 and 3.

Given:
cat file
1 2 3
4 5 6
You can do in place editing with sed:
sed -i.bak -E 's/^([^[:space:]]*).*/\1/' file
cat file
1
4
If you want freedom to work with multiple columns and have in place editing, use GNU awk that also supports in place editing:
gawk -i inplace '{print $1, $3}' file
cat file
1 3
4 6
If you only have POSIX awk or wanted to use cut you generally do this:
Modify the file with awk, cut, sed, etc
Redirect the output to a temp file
Rename the temp file back to the original file name.
Like so:
awk '{print $1, $3}' file >tmp_file; mv tmp_file file
Or with cut:
cut -d ' ' -f 1,3 file >tmp_file; mv tmp_file file
To do a loop on files in a directory, you would do:
for fn in /directory_path/*.txt; do
awk -F '\t' '{ print $1 }' "$fn" >tmp_file
mv tmp_file "$fn"
done

Just to add a little more to #dawg's perfectly well working answer according to my use case.
I was dealing with CSVs, and standard CSV can have , in some values as long as it's in double quotes like for example, the below-mentioned row will be a valid CSV row.
col1,col2,col2
1,abc,"abc, inc"
But the command above was treating the , between the double quotes as delimiter too.
Also, the output file delimiter wasn't specified in the command.
These are the modifications I had to make for it handle the above two problems:
for fn in /home/ubuntu/dir/*.csv; do
awk -F ',' '{ FPAT = "([^,]*)|(\"[^\"]+\")"; OFS=","; print $1,$2 }' "$fn" >tmp_file
mv tmp_file "$fn"
done
The OSF delimiter will be the diameter of the output/result file.
The FPAT handles the case of , between quotation mark.
The regex and the information for that is mentioned ins awk's official documentation in section 4.7 Defining Fields by Content.
I was led to that solution through this answer.

Save command output at filename

I've got this problem, where I want to save an output of a command as a filename and stream output from a different command (within the same script) to that file. I wasn't able to find a solution online, so here goes. Below is the code I have:
zgrep --no-filename 'some-patter\|other-pattern' /var/something/something/$1/* | awk -F '\t' '{printf $8; printf "scriptLINEbreakerPARSE"; print $27}' | while read -r line ; do
awk -F 'scriptLINEbreakerPARSE' '{print $1}' -> save output of this as a filename
awk -F 'scriptLINEbreakerPARSE' '{print $2}' >> the_filename_from_above
done
So basically I want to use the first awk in the loop to save the output as a filename and then the second awk output will save to the file with that filename.
Any help would be appreciated guys.

You're doing too much work. Just output to the desired file in the first awk command:
zgrep --no-filename 'some-patter\|other-pattern' /var/something/something/$1/* |
awk -F '\t' '{printf $27 > $8}'
See https://www.gnu.org/software/gawk/manual/html_node/Redirection.html

Shell command to retrieve specific value using pattern

I have a file which contains data like below.
appid=TestApp
version=1.0.1
We want to parse the file and capture the value assigned to appid field.
I have tried with awk command as below
awk '/appid=/{print $1}' filename.txt
However it outputs the whole line
appid=TestApp
but we required only
TestApp
Please let me know how I can achieve this using awk/grep/sed shell commands.

You need to change the field separator:
awk -F'=' '$1 ~ /appid/ {print $2}' filename.txt
or with an exact match
awk -F'=' '$1 == "appid" {print $2}' filename.txt
outputs
TestApp

There's about 20 different ways to do this but it's usually a good idea when you have name = value statements in a file to simply build an array of those assignments and then just print whatever you care about using it's name, e.g.:
$ cat file
appid=TestApp
version=1.0.1
$
$ awk -F= '{a[$1]=$2} END{print a["appid"]}' file
TestApp
$ awk -F= '{a[$1]=$2} END{print a["version"]}' file
1.0.1
$ awk -F= '{a[$1]=$2} END{for (i in a) print i,"=",a[i]}' file
appid = TestApp
version = 1.0.1

If you are in the shell already then simply sourcing the file will let you get what you want.
. filename.txt
echo $appid

shell sed get file path

I have a file path.
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz
I want to get this part
/T11073_RICekkR/Fq/AS34_59329
as $location.
How can I use sed to get those?

When working with file paths, I find it easier using awk - but you can make your own mind up. Here's what I'd do:
location=$(echo "$path" | awk -F "/" '{ print "", $6, $7, $8 }' OFS="/")
If you're trying to match on a pattern, then sed would be a good option. But you haven't mentioned any specifications.

Use cut or awk (as suggested).
But to do it with sed you do something like this:
locationpath=/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz
location=$(echo $locationpath | sed 's%\(/[^/]*\)\{4\}\(/[^/]*/[^/]*/[^/]*\).*%\2%')

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?

Or just use one command:
cut -d '|' -f FIELDNUMBER

The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...

The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file

Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'

And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get only part of file using sed or awk - bash

this should work the fastest, noticeable if your file is large awk '{print $2; exit}' file it will print the second field of the first line and stop processing the rest of the file.

As an alternate, you can use head and cut: $ head -n 1 file | cut -d' ' -f2

Adding some grep grep Directory file.txt|awk '{print $2}'

Related

Command to remove all but select columns for each file in unix directory

Save command output at filename

Shell command to retrieve specific value using pattern

shell sed get file path

How do I print a field from a pipe-separated file?

Categories

Resources