Are there no end of line characters when using ARGF?

Are there no end of line characters when using ARGF? - ruby

I have a csv file formatted like this:
Wed Mar 07 00:00:00 CET 2012,78.0
Thu Mar 08 00:00:00 CET 2012,46.0
...
I read it using standard input, and tried to match parts of each line using:
ARGF.each_line do |line|
time << line.scan(/^\w{3} (\w{3}) (\d{2}) (\d{2}:\d{2}:\d{2}) \w+ (\d{4}),([.\d]+)$/)
end
Which for some reason only returns the last line in the file.
If I copy the CSV file to a string variable, it starts to match each line correctly. If I remove the dollar sign from the regex it matches correctly as well, but I don't understand why $ doesn't work when reading from ARGF. Any ideas?

Is there a reason you have to use ARGF? You can check out the CSV class in the standard library, which gives tools to make the processing easier.
Here's an example that yields one row at a time to foreach. I would guess that this allows you to not worry about where lines begin or end:
require "csv"
CSV.foreach("path/to/file.csv") do |row|
time << line.scan(/^\w{3} (\w{3}) (\d{2}) (\d{2}:\d{2}:\d{2}) \w+ (\d{4}),([.\d]+)$/)
end

Related

Bash shell: How to reformat string date from a variable value

I understand how to reformat a date using the date command, and I am fine with that. However, I have a wrinkle in that I am struggling with - the date I want to reformat is the output of another command, so I am storing it in the variable. I am struggling with the syntax of how to specify that I want to take the output of one command, and run it through date -d, and store it in another variable. Here is what I tried:
expdate=`get_expire_date.sh`
echo $expdate
Mon 23 Mar 2022 05:05:05 PM UTC
expdval=`date -d'($expdate)'`
echo $expdval
I get today's date, not the converted expire date from the script output. If I leave the parenthesis out, of course, it treats $expdate as the literal text to translate and gives an error, whereas if I leave the single quote marks off, it uses the spaces in the date string as a delimiter and only grabs the first token.
What am I doing wrong?

First, parameter expansion doesn't occur inside single quotes. You would need to change the single quotes
expdval=`date -d'($expdate)'`
to double quotes
expdval=`date -d"($expdate)"`
Second, the parentheses create an invalid input, which results (for reasons I don't really understand) in an output of midnight of the current day. (You'll get the same result with the trivial invalid date date -d "".)
Drop the parentheses, and you'll get the same date back (because the input format matches the default output format).
$ date -d "$expdate"
Wed Mar 23 13:05:05 EDT 2022
To actually manipulate it, you'll need an explicit output format:
$ date -d "$expdate" +%Y-%m-%d
2022-03-23
or some form of date "arithmetic":
$ date -d "$expdate + 2 days"
Fri Mar 25 13:05:05 EDT 2022

I found I had to use double-quotes instead, like this (and sorry for the old way of doing things, updating to new shell syntax):
expdval=$(date -d"$(get_expire_date.sh)")

how to extract date only from the creation/modified date of a file?

Let's say the file name is : filetest.txt
and is present in the directory: inventory/data
How can I get the date at which this file was last modified/created?
Example : let's say if the file is created on 3rd June 2007 then it should return only value 3.
That means, how to extract the date only from that timestamp of file?

You can use the date command to extract the modification time of a file by passing it as reference.
date -r filename prints the last modified timestamp of the file.
Sat Jul 3 16:03:53 IST 2021
date +%d -r filename print just the day you asked for, but with a leading zero.
03
date +%-d -r filename prints the day, without a leading zero.
3

RobotFramework: Looking for xpath with spaces in search item

I am not sure how to format strings that im looking for that contain spaces. When i look for a string in format 21 Jun 2017
${date_to_search_for}= Convert Date ${completion_date} date_format=%Y-%m-%d %H:%M:%S.%f result_format=%d %b %Y exclude_millis=True
Wait Until Element Is visible xpath=//*[contains(text(),\"${date_to_search_for}\")]
It gives error
Element 'xpath=//*[contains(text(),"21 Jun 2017")]' was not visible in 30 seconds
Arguments: [ 'xpath=//*[contains(text(),"21 Jun 2017")]' ]
Ive tried escaping the quotes as above and it gives "
When i dont escape them, it gives \"
Any ideas?

no need to trim spaces when searching xPath. Please check original html element, does it include any other elements (not only text you are searching for)? if yes - change locator to
//*[contains(. ,'21 Jun 2017')]

Is there a regex to parse the values from an FTP directory listing?

I'm retrieving a list of files but the parser is breaking on one of them. So I'm going in to modify it:
RegEx:
/ +/g
Here is a listing it is having trouble with:
drw-rw---- 2 joeUser dhapache 7 May 18 2011 Hello World.html
Is there a RegEx people use to parse file lists? Also, is it OK to assume that anything after the 9th position is the filename (where you are splitting on space characters?

i'm using this regex :
^([\-ld])([\-rwxs]{9})\s+(\d+)\s+(\w+)\s+(\w+)\s+(\d+)\s+(\w{3}\s+\d{1,2}\s+(?:\d{1,2}:\d{1,2}|\d{4}))\s+(.+)$
i tested that on both DART (add r at the start of regex) and JS and it works well.
here is the result:
https://regex101.com/r/8osTeQ/1
you can iterate like this:
//code in dart
String s = "drw-rw---- 2 owner group 7 May 12 11:30 index.html";
Iterable<Match> matches = regExp.allMatches(s);
for (Match match in matches) {
print("${match.group(1)}\n");//type file or dir
print("${match.group(2)}\n");//permission
print("${match.group(3)}\n");//nb files
print("${match.group(4)}\n");//owner
print("${match.group(5)}\n");//group
print("${match.group(6)}\n");//size
print("${match.group(7)}\n");//date
print("${match.group(8)}\n");//file/dir name
}

If possible, use MLSD FTP command that has a fixed format like:
type=file;modify=20150803062903;size=1410887680; filename.avi
See RFC 3659.
Fallback to the LIST command only as the last resort for servers that do not support it (like IIS).
If you need file names only, use NLIST command. All servers do support it and it produces file names only, so no parsing is needed.
There's no defined format for the LIST command. So it's not safe to assume anything.
Particularly the IIS uses a completely different format:
02-11-15 03:05PM 1410887680 movie.avi
Even for servers that use a Unix-like format, the format will differ with age of the file. If the file is new enough, its time format will be like:
-rw-r--r-- 1 ftp ftp 11 Sep 09 16:00 new_file.txt
It's also pretty common for the user and group columns to blend together if the username is too long.

If it were me, I'd ensure that your return data is formatted to fill a given number of characters per field, padded where needed with spaces. Then my RegEx would be something like:
^(.*{12})(.*{3})(.*{20})(.*{20})(.*{3})(.*{12})(.*{30})
Where:
\1 = permissions
[...]
\6 = date
\7 = filename
Once you have the values for each match/group, trim the resulting data.
Make sense?

Regex, how to match multiple lines?

I'm trying to match the From line all the way to the end of the Subject line in the following:
....
From: XXXXXX
Date: Tue, 8 Mar 2011 10:52:42 -0800
To: XXXXXXX
Subject: XXXXXXX
....
So far I have:
/From:.*Date:.*To:.*Subject/m
But that doesn't match to the end of the subject line. I tried adding $ but that had no effect.

You can use the /m modifier to enable multiline mode (i.e. to allow . to match newlines), and you can use ? to perform non-greedy matching:
message = <<-MSG
Random Line 1
Random Line 2
From: person#example.com
Date: 01-01-2011
To: friend#example.com
Subject: This is the subject line
Random Line 3
Random Line 4
MSG
message.match(/(From:.*Subject.*?)\n/m)[1]
=> "From: person#example.com\nDate: 01-01-2011\nTo: friend#example.com\nSubject: This is the subject line"
See http://ruby-doc.org/core/Regexp.html and search for "multiline mode" and "greedy by default".

If you are using ruby, you can try :
Regexp.new("some reg", Regexp::MULTILINE)
If you are not using ruby, I suggest you hack this question:
replace all the "\n" with SOME_SPECIAL_TOKEN
search the regexp, and do other operations...
restore: replace SOME_SPECIAL_TOKEN with "\n"

If you want to match across linebreaks, one possibility is to first replace all newline characters with some other character (or character sequence) that wouldn't otherwise appear in the text. For example, if you have all of the text in one string variable you can do something like aString.split("\n").join("|") to replace all newlines in the string with pipe characters.
Also, look at Alan Moore's answer to your previous question regarding how to match the newline character in a regular expression.

Try:
/...^Subject:[^\n]*/m

Using the following data:
From: XXXXXX
Date: Tue, 8 Mar 2011 10:52:42 -0800
To: XXXXXXX
Subject: XXXXXXX
The following regex will do the magic:
From:([^\r\n]+)[\r\n]+Date:([^\r\n]+)[\r\n]+To:([^\r\n]+)[\r\n]+Subject:([^\r\n]+)[\r\n]+
But I would recommend that you don't try and do this in 1 regex. Push into a regex "^(\w+):(.+)$" line by line, unless you are sure that the sequence of the FROM/DATE/TO/SUBJECT is not going to change ;)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Are there no end of line characters when using ARGF? - ruby

Related

Bash shell: How to reformat string date from a variable value

how to extract date only from the creation/modified date of a file?

RobotFramework: Looking for xpath with spaces in search item

Is there a regex to parse the values from an FTP directory listing?

Regex, how to match multiple lines?

Categories

Resources