interactive shell or bash script to manipulate a text file - bash

I have a text file that contains 2 columns( example below )
Account_name Device_name
12345 1a3T567890f2
Values of the Device_name column then needs to be changed to:
Uppercase letters if letters exist (example 1A3T567890F2)
awk '{ print toupper($0) }' file.txt > file2.txt
The Colon symbol needs to be inserted to separate the value in to 2 char
chunks (example 1A:3T:56:78:90:F2)
sed 's/\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)\(\w\w\)/\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
I would like to create a script that does those two functions at once.

You can just add \U at the start of your sed's replace expression to switch the following to uppercase :
sed 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g' file2.txt > file3.txt
Test run :
$ echo "1a3T567890f2" | sed -r 's/(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)/\U\1:\2:\3:\4:\5:\6/g'
1A:3T:56:78:90:F2

You can do everything in awk:
awk '{$2=toupper($2);gsub(/[[:alnum:]]{2}/,"&:", $2);sub(/:[[:space:]]*$/,"",$2)}1' file
That's a bit more intuitive and it works for various amount of digits.

Related

Not getting the values of columns in shell script

-1
I have a text file abc.txt in which there are n number of columns I want to extract only column number 5 whose heading is senddata I am using the below command for that:-
awk -F "(|~|)" '{ print $5 }' /opt/var/acb.txt
I am using the above command to extract column 5 from file abc.txt in which I am getting the complete column values but not the heading of the column.
the file abc.txt has data as follows :-
prodid|~|prodtype|~|creationtime|~|affirmcode|~|senddata|~|city|~|country
334|~|T|~|4:09|~|BC334|~|Y|~|KG|~|ABC
443|~|F|~|4:44|~|RT548|~|Y|~|FR|~|FR
How can I achieve that ?
The pipe character is special for regular expressions. A multi-character FS is treated as a regular expression. Try
awk -F '[|]~[|]' ...
As | has special meaning in regular expressions you have to escape it if you want literal |. Let file.txt content be
prodid|~|prodtype|~|creationtime|~|affirmcode|~|senddata|~|city|~|country
334|~|T|~|4:09|~|BC334|~|Y|~|KG|~|ABC
443|~|F|~|4:44|~|RT548|~|Y|~|FR|~|FR
then
awk 'BEGIN{FS="\\|~\\|"}{print $5}' file.txt
output
senddata
Y
Y
(tested in gawk 4.2.1)

How to merge 2 files just if the first field is a line with a date

I have 2 files,
file1.txt file2.txt
--------- ---------
2- 14/07/2020 00:00:00 some text
3- 15/07/2020 00:00:01 some text
1- some text
5- some text
24/07/2020 00:10:01 some text
some text
30/07/2020 00:20:01 some text
I am looking to create the next file:
finalResult.txt
---------------
2-14/07/2020 00:00:00 some text
3-15/07/2020 00:00:01 some text
some text
some text
1-24/07/2020 00:10:01 some text
some text
5-30/07/2020 00:20:01 some text
I tried to use paste command
paste file1.txt file2.txt > finalResult.txt
But it gives me wrong results
Thanks for all your help
The following perl one-liner replaces newlines (not followed by pattern \d\d/\d\d/\d{4}), by non printable character ascii 1.
perl -0777pe 's~\n(?!\d\d/\d\d/\d{4})~\x1~g' file2.txt
So, assuming there is no character 1, the command can be
paste -d '' file1.txt <(perl -0777pe 's~\n(?!\d\d/\d\d/\d{4})~\x1~g' file2.txt) | tr '\1' '\n'
or can also be done with GNU sed, changing newlines when line doesn't start with a number
paste -d '' file1.txt <(sed -zr 's~\n([^0-9])~\x1\1~g' file2.txt) | tr '\1' '\n'
This question is easily answered with a quick awk:
awk '(NR==FNR){a[FNR]=$0;next}/^[0-9]{2}[/][0-9]{2}[/][0-9]{4}/{$0=a[++c] $0}1' file1.txt file2.txt
The answer consists out of 3 parts:
(NR==FNR){a[FNR]=$0;next}: When we read the first file (NR==FNR) store the record/line in an array which we index by the record number FNR and move to the next record (next)
/^[0-9]{2}[/][0-9]{2}[/][0-9]{4}/{$0=a[++c] $0}: When we notice that the record starts with a string of the form xx/yy/zzzz where x, y, and z are decimal digits, prepend the corresponding record of file1. We keep track of this using a counter c which we increment every time we find such match. ($0=a[++c] $0). Note, we could improve the regex to properly match the date-time format, but this seems overkill here:
1: perform the default action, i.e. print $0
If, for whatever reason, your input file could contain strings which accidently represent something similar to a date, but could be wrong (e.g. replace some text by 29-02-2021, then you have to do some smarter stuff and actually validate the date-time format. With GNU awk you can do this in the following way:
awk 'function is_date(d,t) {
split(d,b,/[^0-9]);
return (d" "t)==strftime(mktime(b[3]" "b[1]" "b[2]" "t),"%m/%d/%Y %T")
}
(NR==FNR){a[FNR]=$0;next}(is_date($1,$2)){$0=a[++c] $0}1' file1.txt file2.txt
General comment: always use the ISO date-format of the form YYYY-mm-ddTHH:MM:SS it makes your life really easy!

Merge multiple files into a single row file with a delimeter

UPDATED QS:
I have been working on a bash script that will merge multiple text files with numerical values into one a single row text file using delimiter for each file values while merging
Example:
File1.txt has the followling contents:
168321099
File2.txt has:
151304
151555
File3.txt has:
16980925
File4.txt has:
154292
149092
Now i want a output.txt file like below:
, 168321099 151304 151555 16980925 , 154292 149092
Basically each file delimited by space and in a single row. with comma as first and 6 field of the outputrow
tried:
cat * > out.txt but its not coming as expected
I am not very sure If I understood your question correctly, but I interpreted it as following :
The set of files file1,...,filen contain a set of words which you want to have printed in one single line.
Each word is space separated
In addition to the string of words, you want the first character to be a , and between word 4 and 5 you want to have a ,.
The cat+tr+awk solution:
$ cat <file1> ... <filen> | tr '\n' ' ' | awk '{$1=", "$1; $4=$4" ,"; print}'
The awk solution:
$ awk 'NR==1||NR==4{printf s",";s=" "}{printf " "$1}' <file1> ... <filen>
If tr is available on your system you can do the following cat * | tr "\n" " " > out.txt
tr "\n" " " translates all line breaks to spaces
If the number of lines per file is constant, then the easiest way is tr as #Littlefinix suggested, with a couple of anonymous files to supply the commas, and an echo at the end to add an explicit newline to the output line:
cat <(echo ",") File1.txt File2.txt File3.txt <(echo ",") File4.txt | tr "\n" " " > out.txt; echo >> out.txt
out.txt is exactly what you specified:
, 168321099 151304 151555 16980925 , 154292 149092
If the number of lines per input file might vary (e.g., File2.txt has 3 or 4 lines, etc.), then placing the commas always in the 1st and 6th field will be more involved, and you'd probably need a script and not a one-liner.
Following single awk could help you on same.
awk 'FNR==1{count++;} {printf("%s%s",count==1||(count==(ARGC-1)&&FNR==1)?", ":" ",$0)} END{print ""}' *.txt
Adding a non-one liner form of solution too now.
awk '
FNR==1 { count++ }
{ printf("%s%s",count==1||(count==(ARGC-1)&&FNR==1)?", ":" ",$0) }
END { print "" }
' *.txt

bash: using 2 variables from same file and sed

I have a 2 files:
file1.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T 45000079
rs111285978:45000103:A:AT 45000103
rs190363568:45000168:C:T 45000168
file2.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T rs142159069
rs111285978:45000103:A:AT rs111285978
rs190363568:45000168:C:T rs190363568
Using file2.txt, I want to replace the names (column2 of file1.txt which is column1 of file2.txt) by the entry in column 2. The output file would then be:
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168
I have tried inputing the columns of file2.txt but without success:
while read -r a b
do
cat file1.txt | sed s'/$a/$b/'
done < file2.txt
I am quite new to bash. Also, not sure how to write an output file with my command. Any help would be deeply appreciated.
In your case, using awk or perl would be easier, if you are willing to accept an answer without sed:
awk '(NR==FNR){out[$1]=$2;next}{out[$1]=out[$1]" "$2}END{for (i in out){print out[i]} }' file2.txt file1.txt > output.txt
output.txt :
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168
Note: this assume all symbols in column1 are unique, and that they are all present in both files
explanation:
(NR==FNR){out[$1]=$2;next} : while you are parsing the first file, create a map with the name from the first column as key
{out[$1]=out[$1]" "$2} : append the value from the second column
END{for (i in out){print out[i]} } : print all the values in the map
Apparently $2 of file2 is part of $1 of file1, so you could use awk and redefine FS:
$ awk -F"[: ]" '{print $1,$NF}' file1
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

Resources