shell: how to cut first n fields from second column - bash

please help to delete first 9 fields from column number 2 (delimeter space) from a file like below.i tried using cut & awk but didnt meet required output.
my input file
appu 11062017-10:00
ammu 11062017-11:00
anna 11062017-12:00
required output..
appu 10:00
ammu 11:00
anna 12:00
please note that the fields 11062017- (date) will not be same, but digit length (9 characters) will be same.
please help me with the command.

Using awk and sub to replace 9 first chars of $2 with an empty string:
$ awk '{sub(/.{9}/,"",$2)}1' file
appu 10:00
ammu 11:00
anna 12:00

As per #anubhava's comments. It can be implemented via awk with substr.
awk -F " " '{print $1,substr($2,10)}' my_input_file.txt

An alternate awk solution
awk '{ split($2,arry,"-");print $1" "arry[2] }' filename
Here we make use of the "-" delimiter to attain the data we need

Try to use
sed -i 's/\ .*-/\ /g' file
sed remove all characters between space and -

Another way with sed
's/\([[:digit:]]\{8\}\-\)//' filename
The regex [[:digit:]]{8}- would match 8 digits followed by a hyphen.

Related

how to use awk to read a part of line including number of space?

I want to extract a value using "awk subtring" which should also count the number of spaces without any separator.
For example, below is the input, and I want to extract the "29611", including space,
201903011232101029 2961104E3021 223 0 12113 5 15 8288 298233 0 45 0 39 4
I used this method, but it used space as a separator:
more abbas.dat | awk '{print substr($1,1,16),substr($1,17,25)}'
Expected output should be :
201903011232101029 2961
But it prints only
201903011232101029
My question is how can we print using "substr" which count spaces?
I know, I can use this command to get the desired output but it is not helpful for my objective
more abbas.dat | awk '{print substr($1,1,16),substr($2,1,5)}'
1st solution: With your shown samples, please try following awk code. Written and tested in GNU awk. Using match function of awk here to get required output.
To print 1st field followed by varying spaces followed by 5 digits from 2nd field then use following:
awk 'match($0,/^[0-9]+[[:space:]]+[0-9]{5}/){print substr($0,RSTART,RLENGTH)}' Input_file
OR To print 16 letters in 1st field and 5 from second field including varying length of spaces between 1st and 2nd fields:
awk 'match($0,/^([0-9]{16})[^[:space:]]+([[:space:]]+)([0-9]{5})/,arr){print arr[1] arr[2] arr[3]}' Input_file
2nd solution: Using GNU grep please try following, considering that your 2nd column first 4 needed values can be anything(eg: digits, alphabets etc).
grep -oP '^\S+\s+.{5}' Input_file
OR to only match 4 digits in 2nd field have a minor change in above grep.
grep -oP '^\S+\s+\d{5}' Input_file
If there is always one space you can use the following command which will print the first group, plus the first 5 character of the second group.
N.B. It's not clear in the question whether you want 4 or 5 characters but that can be adjusted easily.
more abbas.dat | awk '{print $1" "substr($2,1,5) }'
I think the simplest way is to include "Fs" in your command.
more abbas.dat | awk -Fs '{print substr($1,1,16),substr($1,17,25)}'
$ awk '{print substr($0,1,24)}' file
201903011232101029 29611
If that's not all you need then edit your question to clarify your requirements.

extract words matching a pattern and print character length

I have a test file which looks like this
file.txt
this is a smart boy "abc.smartxyz" is the name
what you in life doesn;t matter
abc.smartabc is here to help you.
where is the joy of life
life is joyous at "https://abc.smart/strings"
grep 'abc.smart' file.txt
this is a smart boy "abc.smartxyz" is the name
abc.smartabc is here to help you.
life is joyous at "https://abc.smart/strings"
Now I want to be able to extract all words that have the string abc.smart from this grepped file and also print out how many characters they are. Output I am after is something like
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27
Please can someone help with this.
With awk
awk '{for (i=1;i<=NF;i++) if ($i~/abc.smart/) print $i,length($i)}' file
You can run it directly on the first file. Output:
"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27
This might work for you (GNU grep and sed):
grep -o '\S*abc\.smart\S*' file | sed 's/"/\\"/g;s/.*/echo "& $(expr length &)"/e'
Use grep to output words containing abc.smart and sed to evaluate each string to calculate its length using the bash expr command.

How to extract multiple fields with specific character lengths in Bash?

I have a file (test.csv) with a few fields and what I wanted is the Title and Path with 10 character for the title and remove a few levels from the path. What have done is use the awk command to pick two fields:
$ awk -F "," '{print substr($4, 1, 10)","$6}' test.csv [1]
The three levels in the path need to be removed are not always the same. It can be /article/17/1/ or this /open-organization/17/1 so I can't use the substr for field $6.
Here the result I have:
Title,Path
Be the ope,/article/17/1/be-open-source-supply-chain
Developing,/open-organization/17/1/developing-open-leaders
Wanted result would be:
Title,Path
Be the ope,be-open-source-supply-chain
Developing,developing-open-leaders
The title is ok with 10 characters but I still need to remove 3 levels off the path.
I could use the cut command:
cut -d'/' -f5- to remove the "/.../17/1/"
But not sure how this can be piped to the [1]
I tried to use a for loop to get the title and the path one by one by but I have difficulty in getting the awk command to run one line at time.
I have spent hours on this with no luck. Any help would be appreciated.
Dummy Data for testing:
test.csv
Post date,Content type,Author,Title,Comment count,Path,Tags,Word count
31 Jan 2017,Article,Scott Nesbitt,Book review: Ours to Hack and to Own,0,/article/17/1/review-book-ours-to-hack-and-own,Books,660
31 Jan 2017,Article,Jason Baker,5 new guides for working with OpenStack,2,/article/17/1/openstack-tutorials,"OpenStack, How-tos and tutorials",419
you can replace the string by using regex.
stringZ="Be the ope,/article/17/1/be-open-source-supply-chain"
sed -E "s/((\\/\\w+){3}\\/)//" <<< $stringZ
note that you need to use -i if you are going to give file as input to sed

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

Replacing newlines with commas at every third occurrence using AWK?

For example: a given file has the following lines:
1
alpha
beta
2
charlie
delta
10
text
test
I'm trying to get the following output using awk:
1,alpha,beta
2,charlie,delta
10,text,test
Fairly simple. Use the output record separator as follows. Specify the comma delimiter when the line number is not divisible by 3 and the newline otherwise:
awk 'ORS=NR%3?",":"\n"' file
awk can handle this easily by manipulating ORS:
awk '{ORS=","} !(NR%3){ORS="\n"} 1' file
1,alpha,beta
2,charlie,delta
10,text,test
there is a tool for this kind of text processing pr
$ pr -3ats, file
1,alpha,beta
2,charlie,delta
10,text,test
You can also use xargs with sed to coalesce multiple lines into single lines, useful to know:
cat file|xargs -n3|sed 's/ /,/g'

Resources