Unix Shell Programming - shell

Have an input file with 200 lines, each line just one field which is a number.
E.g.
89970060122507635800
I need to create one output file in a way that it will look like for every input line like following:
INSERT,89970060122507635800,425062250763580,,0000,29514215,0000,29514215,,,,NORMAL,425062260621583,Blank,sim,9877
where:
All the fields have constant value (including empty values within commas) except the Second and the Third one
Second field is filled by input file, the third one is obtained by removing last digit from the second field and replace at the beginning 899700601 with 42506 (as in the example).
I'm sure I can find ways how to do that (and I will try before getting answers) but I'm more interested in knowing which could be the more efficient in your opinion. Awk, sed, a shell script using both?

This will replace the beginning "123" from the input with "AAA" and trim the last digit for the third field.
awk -v OFS="," '{$2=substr($1,1,length($1)-1); gsub(/^123/,"AAA",$1); print "bla bla bla",$1,$2,"bla bla bla"}'
replace the magic values and add the proper template for the print statement.

Related

Bash: Sort file numerically, but only where the first field matches a pattern

Due to poor past naming practices, I'm left with a list of names that is proving to be a challenge to work with. The bottom line is that I want the most current name (by date) to be placed in a variable. All the names are listed (unsorted) in a file called bar.txt.
In this case I can't rename, and there's no way to get the actual dates of the images; these names are all I have to go on. The names can follow one of several patterns;
foo
YYYYMMDD-foo
YYYYMMDD##-foo
foo can be anything from a single character to a long string of letters/numbers/symbols. I am interested only in the names matching the second use case, YYMMDD-foo, as those are from after we started tagging consistently.
I would like to end up with a variable containing the most recent date that follows the pattern YYMMDD-foo.
I know sort -k1 -n < bar.txt, but then I'm not sure how to isolate the second pattern's results to extract what I need.
How do I sort the file to ignore anything but the second pattern, and return the most current date?
Sample
Given that bar.txt looks like this;
test
2017120901-develop-BUILD-31
20170326-TEST-1.2.0
20170406-BUILD-40-1.2.0-test
2010818_001
I would want to extract 20170406-BUILD-40-1.2.0-test
Since your requirement involves 1) to get only files of a certain format 2) apply sorting and get only the latest file. Am using a Awk & GNU sort together to achieve it
awk -F'-' 'length($1) == 8' file | sort -nrk1 | head -1
20170406-BUILD-40-1.2.0-test
The solution works by only getting those lines in the file whose first column has 8 characters exactly corresponding to YYYYMMDD alignment. Once those filtered, sort applied on first field and the first line is obtained using head.

How to extract lines containing unique text in a column

I have a text file similar to
"3"|"0001"
"1"|"0003"
"1"|"0001"
"2"|"0001"
"1"|"0002"
i.e. a pipe-delimited text file containing quoted strings.
What I need to do is:
First, extract the first line which contains each value in the first column, producing
"3"|"0001"
"1"|"0003"
"2"|"0001"
Then, sort by the values in the first column, producing
"1"|"0003"
"2"|"0001"
"3"|"0001"
Performing the sort is easy - sort -k 1,1 -t \| - but I'm stuck on extracting the first line in the file which contains each value in the first column. I thought of using uniq but it doesn't do what I want, and it's "column-handling" abilities are limited to ignoring the first 'x' columns of space-or-tab delimited text.
Using the Posix shell (/usr/bin/sh) under HP-UX.
I'm kind of drawing a blank here. Any suggestions welcomed.
you can do:
awk -F'|' '!a[$1]++' file|sort...
The awk part will remove the duplicated lines, only leave the first occurrence.
I don't have a HP-unix box, I therefore cannot do real test. But I think it should go...

AWK - I need to write a one line shell command that will count all lines that

I need to write this solution as an AWK command. I am stuck on the last question:
Write a one line shell command that will count all lines in a file called "file.txt" that begin with a decimal number in parenthesis, containing a mix of both upper and lower case letters, and end with a period.
Example(s):
This is the format of lines we want to print. Lines that do not match this format should be skipped:
(10) This is a sample line from file.txt that your script should
count.
(117) And this is another line your script should count.
Lines like this, as well as other non-matching lines, should be skipped:
15 this line should not be printed
and this line should not be printed
Thanks in advance, I'm not really sure how to tackle this in one line.
This is not a homework solution service. But I think I can give a few pointers.
One idea would be to create a counter, and then print the result at the end:
awk '<COND> {c++} END {print c}'
I'm getting a bit confused by the terminology. First you claim that the lines should be counted, but in the examples, it says that those lines should be printed.
Now of course you could do something like this:
awk '<COND>' file.txt | wc -l
The first part will print out all lines that follow the condition, but the output will be parsed to wc -l which is a separate program that counts the number of lines.
Now as to what the condition <COND> should be, I leave to you. I strongly suggest that you google regular expressions and awk, it shouldn't be too hard.
I think the requirement is very clear
Write a one line shell command that will count all lines in a file called "file.txt" that begin with a decimal number in parenthesis, containing a mix of both upper and lower case letters, and end with a period.
1. begin with a decimal number in parenthesis
2. containing a mix of both upper and lower case letters
3. end with a period
check all three conditions. Note that in 2. it doesn't say "only" so you can have extra class of characters but it should have at least one uppercase and one lowercase character.
The example mixes concepts printing and counting, if part of the exercise it's very poorly worded or perhaps assumes that the counting will be done by wc by a piped output of a filtering script; regardless more attention should have been paid, especially for a student exercise.
Please comment if anything not clear and I'll add more details...

Nested for each to compare two arrays doesn't match final item

I have two files of user-guids that I need to compare.
FileA contains a list sent from a client and contains duplicates and FileB is a list of user-guids from our system.
My first task is to make sure that our system has all the unique user-guids from the client's system (ie FileB contains all the user-guids that are in FileA). After that I need to determine how many of the user-guids in our system are NOT in the client's list but that's another task and is unrelated.
The files contain one guid per line so I'm reading them into arrays and using a nested for each to compare them.
Here is my code:
# Open each file of users
FileA = File.open("file_a.txt")
FileB = File.open("file_b.txt")
# Turn file_a into an array with only unique values and close the file
file_a_array = IO.readlines(FileA).uniq
FileA.close
# Turn the local file into an array, we already know each line is unique
file_b_array = IO.readlines(FileB)
FileB.close
file_a_array.each do |i|
file_b_array.each do |j|
if i == j
puts i
end
end
end
This code again is meant to return all the matches, but in reality I was seeing all the matches except one, incidentally the last one on the list of FileB.
In trying to guess at why I was not seeing the last match I noticed that the FileA had an empty line at the end of the file but FileB did not.
Here's an example:
FileA Contents:
guid_a
guid_b
guid_c
guid_d
[empty line]
FileB Contents:
guid_a
guid_aa
guid_b
guid_bb
guid_c
guid_cc
guid_d
Notice each file contains guid_d but the results of running my code was returning the following as the matches:
guid_a
guid_b
guid_c
When I added an extra line to the end of FileB suddenly I was getting the full set.
So the question is why?
I'm adding my own answer because the two that are here while technically both correct aren't very descriptive and didn't lead me to my solution. Only after I figured it out on my own did I finally understand what they were saying.
When I was loading my files into arrays using IO.readlines the contents of each array item contained a newline character \n.
So going off the example in my original question, the reason guid_d wasn't being matched is because in file_a_array, the value being used for comparison was guid_d\n and the value in file_b_array was guid_d. The line of FileB with guid_d did not contain a newline character until I added it by adding the empty last line.
Use the chomp function.
This function removes trailed line breaks from a string is meant to be used to sanitize input from files. As you mention in your answer Ruby reads lines including the linebreak
the reason guid_d wasn't being matched is because in file_a_array, the value being used for comparison was guid_d\n and the value in file_b_array was guid_d.
Use chomp to fix this
"guid_d\n".chomp # => "guid_d"
"guid_d".chomp # => "guid_d"
Change your program to use
IO.readlines(...).map(&:chomp)...
You have left out some important information about the file format and how you were reading in the contents, so I am going to make an educated guess that your comparisons are including a newline or return character. Therefore the last item in the list was different until you added the newline character.
So my question is why did I need to have an empty line in order for the last item to show up in my results of matched items?
Because then the last item will also end with a newline like all the others.

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

Resources