Ruby execute shell command and get array - ruby

I'm getting a string of few lines from the shell. Is it possible to get an Array with each line being its element?

Sure, depending on the output you could just split it. For example:
lines = `ls`.split
This solution is independent of the method you're using to execute the program. As long as you get the complete string you can split it.

The original question was splitting on lines, and the split function, by default, splits on white space. While that may be sufficient, you may want to pass in a regular expression, as in:
`ls -l`.split(/$/)
Which returns each line in a separate element in the array. However, it doesn't get rid of the initial carriage return or line feed. For that, you will want to use the map function to iterate over the array and apply strip to each, as in:
`ls -l`.split(/$/).map(&:strip)

Related

Bash string manipulation, extracting/removing parts

I'm modifying an old bash file and am having some trouble manipulating strings. The problem is that the strings can be anything random to the left of _<date>.<num>. For example, from ThisIsAString-Sub_tag_150827.1, I need to extract _150827.1. In bash, this seems very difficult to do. In any other language, I would split on _, and just grab the last element of the list. How do I do this in bash? I've tried a few different ways (including with awk), but cannot seem to get it right.
With bash's Parameter Expansion:
a="ThisIsAString-Sub_tag_150827.1"
echo "${a##*_}"
Output:
150827.1

Line count in csv doesn't match

I have a large CSV with a large number of columns. I am trying to count the number of lines using
File.open(file).readlines.to_a.compact.count.to_i
It displays 57 although there are only 56 rows. Upon close examination I found that a part of one line is wrapped to form the next line. How to get the correct count?
Upon close examination I found that a part of one line is wrapped to form the next line. How to get the correct count?
You need to show an example of the incoming data if you want us to help beyond generic answers.
To fix the problem, you have to be able to identify the line. We can't help you there because it could look like anything. Making a wild guess, I'd say that one of the columns had an embedded new-line in it, which forces the line to wrap.
It the file is a true CSV file, that column should be wrapped in double-quotes, so you could search the file for lines that do NOT end with whatever data type should be in the last column, then read the next line, join them, then rewrite the file. But, again, we have nothing to work with, because your file's format could be a huge number of different things.
Your best bet is to use the CSV class that comes with Ruby, and let it read the file, instead of trying to treat it like a text file. CSV files are text, but they are formatted to maintain the columns and rows, so using the CSV class will give you a better chance of getting at the data.
Looking at your code:
There are a number of ways to count the number of lines in a file, including the easiest which is:
`wc -l /path/to/file`.to_i
if you're using *nix.
Using File.open(file).readlines.to_a is horribly redundant and not fast or scalable if your file is big.
readlines returns an array.
to_a returns an array.
Why turn the array into an array?
readlines loads an entire file into memory, then splits it on line ends into an array. That process can be a lot slower than simply reading the file line-by-line and incrementing a counter, plus "slurping" can make your program crawl if the file is larger than available memory.
See "Why is "slurping" a file not a good practice?" for more information.
compact removes nils from an array. readlines should never return any nils so compact will iterate over the array looking for something that shouldn't exist.
count returns an integer.
to_i converts the receiver to an integer.
In other words, to_i is turning an integer into an integer. Why?
If you want to do it in Ruby instead of using wc -l, do something simple and fast:
lines_in_file = 0
File.foreach(some_file) { lines_in_file += 1 }
After running that, lines_in_file will contain the number of lines read. Memory won't be impacted and it'll run like blue blazes on huge files.

How do I filter file names out of a SQLite dump?

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.
You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.
I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

How can I split a string into an array in one operation, but only when the line contains a given pattern?

I have to match a line in a file and capture the lines contents.
The line is as as follows:
key:value key:value abc:123
I have a block of code processing different lines in the file based on the line content.
The above line can be identified by the key "abc" being present in the line.
I need one regex which does the following
Check if "abc" is present in the line
if "abc" is present get the contents in the form of an array
I am able to do these separately
#gives me an array of the key,value pairs
array = line.scan(/\w+:\d+/)
#matches "abc:value" but does not give me the other keys
/.*(abc:\d+)/.match(line)
Looking for a way do this in one operation
Don't Complicate Things
A regular expression, especially a single monolithic one, isn't the solution for everything. Even when it's possible, overly complex expressions don't make your code more readable or more maintainable. Unless your employer is charging you for each line of code, don't be afraid to use multiple lines of code to express a concept.
Use a Conditional Expression
You can use a conditional expression in your statement to match within a single line. For example:
line = 'key:value key:value abc:123'
line.scan /(\S+:\S+)/ if line =~ /abc:/
# => [["key:value"], ["key:value"], ["abc:123"]]
This will only split the line into an array of matches if it first matches the condition in the if statement. However, note that you're still fundamentally doing two regular expression matches.
If you're trying to avoid performing two regular expression matches, perhaps for performance reasons inside a tight loop, you can do something similar with a string pattern match as your condition. For example:
line = 'key:value key:value abc:123'
line.scan /(\S+:\S+)/ if line.include? 'abc:'
# => [["key:value"], ["key:value"], ["abc:123"]]
The results are the same, but String#scan uses a regular expression match while the conditional uses String#include?. The latter may be faster.
How about:
array = line.scan(/\w+:\d+/) if line[/abc:\d+/]

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

Resources