I would like print strings/texts after every ! (exclamation mark) in bash.
Example string/text:
check_queue!TEST_IN!Queue!400!750
I want this output:
TEST_IN Queue 400 750
I have tried this:
cat filename | cut -d "!" -f2
You were almost there:
cut -d! -f2- filename | tr '!' ' '
-f2- means field 2 and all following fields
No need for cat, just work on file
tr '!' ' ' translates exclamation mark ! to space .
Or if your version of cut has an --output-delimiter= option:
cut --delimiter=! --fields=2- --output-delimiter=' ' filename
Or using awk:
awk -F! '{$1=""; print substr($0,2)}' filename
-F!: Sets the field delimiter to !
$1="": Erase first field
print substr($0,2): Print the whole record starting at 2nd character, since first one is blank delimiter remain from erased first field.
Fist apply a FOREACH line loop
while read s; do
#foreach substring
for (( i=0; i<${#s}; i++ )); do
if ["${s:$i:1}" != “!”] then
#add to the String the extra letter
String="${s} ${s:$i:1}"
else
#print each time you find !
print String
#prepare it to print next String
String =“”
fi
done
done <your filename.txt
Related
I have data in below format
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW [123457, 89053]
FAFDJ-ER 1234 MNO [6532, 789, 234578]
I want to create the data in below format using sed or awk.
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW 123457
FFANKN 2345 QW 89053
FAFDJ-ER 1234 MNO 6532
FAFDJ-ER 1234 MNO 789
FAFDJ-ER 1234 MNO 234578
I can extract the data before bracket but I don't know how to concatenate the same with data from bracket repeatedly.
My Effort :--
# !/bin/bash
while IFS= read -r line
do
echo "$line"
cnt=`echo $line | grep -o "\[" | wc -l`
if [ $cnt -gt 0 ]
then
startstr=`echo $line | awk -F[ '{print $1}'`
echo $startstr
intrstr=`echo $line | cut -d "[" -f2 | cut -d "]" -f1`
echo $intrstr
else
echo "$line" >> newfile.txt
fi
done < 1.txt
I am able to get the first part and also keep the rows not having "[" in new file but I dont know how to get the values in "[" and pass it at end as number of variables in "[" keep changing randomly.
Regards
With your shown samples, please try following awkcode.
awk '
match($0,/\[[^]]*\]$/){
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ")
for(i=1;i<=num;i++){
print substr($0,1,RSTART-1) arr[i]
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
match($0,/\[[^]]*\]$/){ ##Using match function to match from [ till ] at the end of line.
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ") ##Splitting matched values by regex above and passing into array named arr with delimiters comma and space.
for(i=1;i<=num;i++){ ##Running for loop till value of num.
print substr($0,1,RSTART-1) arr[i] ##printing sub string before matched along with element of arr with index of i.
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.
Suggesting simple awk script:
awk 'NR==1{print}{for (i=2;i<NF;i++)print $1, $i}' FS="( \\\[)|(, )|(\\\]$)" input.1.txt
Explanation:
FS="( \\\[)|(, )|(\\\]$)" Set awk field seperator to be either [ , ]EOL
This will make the interesting fields $2 ---> $FN to be appended to $1
NR==1{print} print first line only as it is.
{for (i=2;i<NF;i++)print $1, $i} for 2nd line on, print: field $1 appended by current field.
This might work for you (GNU sed):
sed -E '/(.*)\[([^,]*), /{s//\1\2\n\1[/;P;D};s/[][]//g' file
Match the string up to the opening square bracket and also the string after before the comma and space.
Replace the entire match by the leading and trailing matching strings, followed be a newline and the leading matching string.
Print/delete the first line and repeat.
The last line of any repeat above will fail because there is not trailing comma space, in which case the opening and closing square brackets should also be removed.
Alternative:
sed -E ':a;s/([^\n]*)\[([^,]*), /\1\2\n\1[/;ta;s/[][]//g' file
I am trying to extract two pieces of data from a string and I have having a bit of trouble. The string is formatted like this:
11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd
What I am trying to achieve is to print the first column (11111111-2222:3333:4444:555555555555) and the third section of the colon string (cccccccc), on the same line with a space between the two, as the first column is an identifier. Ideally in a way that can just be run as one-line from the terminal.
I have tried using cut and awk but I have yet to find a good way to make this work.
How about a sed expression like this?
echo "11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd" |
sed -e "s/\(.*\) .*:.*:\(.*\):.*/\1 \2/"
Result:
11111111-2222:3333:4444:555555555555 cccccccc
The following awk script does the job without relying on the format of the first column.
awk -F: 'BEGIN {RS=ORS=" "} NR==1; NR==2 {print $3}'
Use it in a pipe or pass the string as a file (simply append the filename as an argument) or as a here-string (append <<< "your string").
Explanation:
Instead of lines this awk script splits the input into space-separated records (RS=ORS=" "). Each record is subdivided into :-separated fields (-F:). The first record will be printed as is (NR==1;, that's the same as NR==1 {print $0}). In the second record, we will only print the 3rd field (NR==2 {print {$3}}); in case of the record aaa:bbb:ccc:ddd the 3rd field is ccc.
I think the answer from user803422 is better but here's another option. Maybe it'll help you use cut in the future.
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
first=$(echo "$str" | cut -d ' ' -f1)
second=$(echo "$str" | cut -d ':' -f6)
echo "$first $second"
With pure Bash Regex:
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
echo "$([[ $str =~ (.*\ ).*:.*:([^:]*) ]])${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Explanations:
[[ $str =~ (.*\ ).*:.*:([^:]* ]]: Match $str against the POSIX Extended RegEx (.*\ ).*:.*:([^:]*) witch contains two capture groups: 1: (.*\ ) 0 or more of any characters, followed by a space; and capture group 2: ([^:]*) witch contains any number of characters that are not :.
$([[ $str =~ (.*\ ).*:.*:([^:]*) ]]): execute the RegEx match in a sub-shell during the string value expansion. (here it produces no output, but the RegEx captured groups are referenced later).
${BASH_REMATCH[1]}${BASH_REMATCH[2]}: expand the content of the RegEx captured groups that Bash keeps in the dedicated $BASH_REMATCH array.
I want to add a new line character '\n' at the end of each line in my file.
Here is my code:
while read -r line
do
echo "$line"|awk -F'\t' '{
print($1);
for (i=2; i<=NF; i++){
split($i,arr,":")
print(arr[1])
};
}' | tr '\n' '\t' | tr '|' ' ' | tr '/' ' ' | {END OF LINE, WANNA ADD NEW LINE}
>> genotype_processed.txt
done < file_in
Also, is there any way that I can combine the 3 tr commands into one? They just look too redundant.
Many thanks!
EDIT:
The input looks like this:
id123 0|1:a:b:c 0/0:i:j:k ...
id456 1/1:j:f:z 1|0:.:j:v ...
...
The desired output:
id123 0 1 0 0 ...
id456 1 1 1 0 ...
...
You could open the output file only once, at the end of the while loop, and then do whatever you want to output within the loop.
while read -r line
do
echo "$line"|awk -F'\t' '{
print($1);
for (i=2; i<=NF; i++){
split($i,arr,":")
print(arr[1])
};
}' | tr '\n' '\t' | tr '|' ' ' | tr '/' ' '
# This will be the new "newline"
echo
done < file_in >genotype_processed.txt
And by the way, your loop could be improved, I think, by using a single command to do the replacements. Probably a sed could be a good choice.
Provide us more example of input and expected output
EDIT
After your input/output description, I think you could improve this part a lot.
You do while read line; do echo "$line" | awk '...'; done <input which is basically what you would get by doint a single awk '...' input.
I don't get exactly what you want to achieve, I think you misunderstand some things, but if what I think is right, then this is what you want.
sed -r 's/:[^[:blank:]]+//g; s/[|/]/ /g' input
Here, I first remove what follows the first : for in each column, and then I replace the characters | or / with a space.
Does that meets your needs ?
How about:
while read -r line
do
newline="\n"
echo $line$newline >> genotype_processed.txt
done < file_in
Or use "$line"$newline if needing to retain original formatting of $line according to your requirement.
Maybe you're overcomplicating this. (XY Problem?)
$: sed 's,[|/], ,g; s/:[^ ]*//g;' file_in > genotype_processed.txt
I sub'd all | and / with spaces, and any : followed by any number of non-tabs with nothing.
I used a single truncating output redirection since I did it all in one step. If there was stuff in the file you wanted to keep, then go back to the append.
I have a file with each line in below format
KeyA=ValA1,ValA2,ValA3...ValAn
KeyB=ValB1,ValB2,ValB3....ValBn
I have multiples lines in that file with varying number of values for each line.
My task is to append Val Key for each line. Expected sample output:
ValA1 KeyA
ValA2 KeyA
ValA3 KeyA
ValB1 KeyB
ValB2 KeyB
ValB3 KeyB
What I tried is :
while read -r line; do
KEY=$(echo $line | cut -d '=' -f 1)
VALUES=$(echo $line | cut -d '=' -f 2)
for VAL in $VALUES;do
echo $VAL $KEY
done
done < file.txt
I am able to achieve the expected output, but I am supposed to complete this without using the for loop.
Can someone suggest me any other solution.
One should not parse line-based text files with shell loops; shell is interpreted one line at a time as a program is read. This is extremely inefficient for bulk jobs. Please use dedicated text processors like awk or perl.
awk -F'[=,]' '{k=$1; for(f=2;f<=NF;f++) print $f, k}' file
-F'[=,]' - Fields are delimited by a single comma/equals
{...} - with no condition, this action will be performed on every line
k=$1 - set k to Field 1
for(f=2;f<=NF;f++) - iterate over all remaining fields (NF = Number of Fields)
print $f, k - print the field, a space, and the value of k
I got this solution. First substitute = and , for a space. Then read each line with xargs and execute a script, that will buffer the first argument (ie. they key) and output with iterating over all the others:
<inputfile tr '[=,]' ' ' |
xargs -l sh -c 't="$1"; shift; printf "$t %s\n" "$#"' --
On my second try I did the following, where I don't substitute = for a space, so if values have = in them, it doesn't get's split up.
while IFS== read -r key vals; do
printf "%s" "$vals" |
xargs -d, printf "$key %s\n"
done <inputfile
I'm trying to write a script that reads the file content below and extract the value in the 6th column of each line, then print each line without the 6th column. The comma is used as the delimiter.
Input:
123,456,789,101,145,5671,hello world,goodbye for now
223,456,789,101,145,5672,hello world,goodbye for now
323,456,789,101,145,5673,hello world,goodbye for now
What I did was
#!/bin/bash
for i in `cat test_input.txt`
do
COLUMN=`echo $i | cut -f6 -d','`
echo $i | cut -f1-5,7- -d',' >> test_$COLUMN.txt
done
The output I got was
test_5671.txt:
123,456,789,101,145,hello
test_5672.txt:
223,456,789,101,145,hello
test_5673.txt:
323,456,789,101,145,hello
The rest of "world, goodbye for now" was not written into the output files, because it seems like the space between "hello" and "world" was used as a delimiter?
How do I get the correct output
123,456,789,101,145,hello world,goodbye for now
It's not a problem with the cut command but with the for loop you're using. For the first loop run the variable i will only contain 123,456,789,101,145,5671,hello.
If you insist to read the input file line-by-line (not very efficient), you'd better use a read-loop like this:
while read i
do
...
done < test_input.txt
echo '123,456,789,101,145,5671,hello world,goodbye for now' | while IFS=, read -r one two three four five six seven eight rest
do
echo "$six"
echo "$one,$two,$three,$four,$five,$seven,$eight${rest:+,$rest}"
done
Prints:
5671
123,456,789,101,145,hello world,goodbye for now
See the man bash Parameter Expansion section for the :+ syntax (essentially it outputs a comma and the $rest if $rest is defined and non-empty).
Also, you shouldn't use for to loop over file contents.
As ktf mentioned, your problem is not with cut but with the way you're passing the lines into cut. The solution he/she has provided should work.
Alternatively, you could achieve the same behaviour with a line of awk:
awk -F, '{for(i=1;i<=NF;i++) {if(i!=6) printf "%s%s",$i,(i==NF)?"\n":"," > "test_"$6".txt"}}' test_input.txt
For clarity, here's a verbose version:
awk -F, ' # "-F,": using comma as field separator
{ # for each line in file
for(i=1;i<=NF;i++) { # for each column
sep = (i == NF) ? "\n" : "," # column separator
outfile = "test_"$6".txt" # output file
if (i != 6) { # skip sixth column
printf "%s%s", $i, sep > outfile
}
}
}' test_input.txt
an easy method id to use tr commende to convert the espace carracter into # and after doing the cat commande retranslate it into the espace.