Slow Rust Performance for a SSH Log Parsing project - performance

I'm a student who is interested in learning rust. For a class project I wrote a rust script that parses a SSH log file, which specifically captured dates and IP addresses in the log.
When I first finished the project, the script took 3 minutes to run through a log file with 655147 entries. After major optimizations I got the processing down to 30 seconds. This is fine, but other students' python programs did it in 3 seconds. So I know it's definitely my fault and I want to know how to write it better. Could someone show me where I went wrong?
Here are the structs I made for reference:
struct DateLogins {
date: NaiveDate,
success: i32,
failure: i32,
}
struct IpAuth {
success: i32,
failure: i32,
first_attempt: NaiveDateTime,
successful_attempt: NaiveDateTime,
failed_reverse: bool,
break_in_attempt: bool,
ip_addr: String,
}
struct MinedReport {
start_date: NaiveDateTime,
end_date: NaiveDateTime,
total_success: i32,
total_failure: i32,
total_addrs: i32,
login_attempts: HashMap<String, DateLogins>,
unique_addrs: HashMap<String, IpAuth>,
}
And here is the main processing logic:
lazy_static! {
static ref IP_RGX: Regex = Regex::new(r"(\d{1,3}\.\d{1,3}\.\d{1,3}\d\.\d{1,3})").unwrap(); // regex for capturing the IP address of a message.
static ref LOGIN_GOOD_RGX: Regex = Regex::new(r"Accepted password").unwrap(); // regex for successful login attempts -- for total, date, and IP address.
static ref LOGIN_FAIL_RGX: Regex = Regex::new(r"Failed password").unwrap(); // regex for failed login attempts -- for total, date, and IP address
static ref REVERSE_LOOK_RGX: Regex = Regex::new(r"reverse mapping checking getaddrinfo").unwrap(); // regex for a failed reverse lookup
static ref BREAK_IN_RGX: Regex = Regex::new(r"POSSIBLE BREAK-IN ATTEMPT!").unwrap(); // regex for a break in attempt -- for IP address
}
fn main() {
let start: std::time::Instant;
let duration: std::time::Duration;
let file: File = File::open("./SSH.log").expect("Could not open log file!");
let reader: BufReader<File> = BufReader::new(file);
let origin_date: NaiveDateTime = NaiveDateTime::parse_from_str("0000 01 01 00:00:00", "%Y %m %d %H:%M:%S").expect("Could not parse start time!");
let mut report: MinedReport = MinedReport::new(origin_date.clone(), origin_date.clone());
start = Instant::now();
for line in reader.lines() {
let l = line.expect("Could not read a line!");
parse_line(l, &mut report, origin_date);
}
duration = start.elapsed();
store_log(report, "./report.txt");
println!("Total time elapsed: {:?}\n", duration);
}
// parses the values in each line
fn parse_line(line: String, report: &mut MinedReport, origin_date: NaiveDateTime) {
let time: &str = &line[..15];
let message: &str = &line[15..];
// parse the time to DateTime. Store things in the report.
let date_time: NaiveDateTime = parse_time(time, report, origin_date).unwrap();
// parse the IP address of the line. Store things in the report.
parse_ip(message, report, origin_date, date_time);
}
// parses the time value for each line.
fn parse_time(time_cap: &str, report: &mut MinedReport, origin_date: NaiveDateTime) -> Result<NaiveDateTime, ParseError> {
// add a random year just to have a string.
let time_str: String = time_cap.to_string();
let full_date: String;
let date_time: NaiveDateTime;
let date: NaiveDate;
let d: String;
// add a year. Move to the next year if it's january.
// (I know this is a bad solution, but the only months in the log are Dec and Jan, with no year)
if &time_str[..3] == "Dec" {
full_date = format!("{}{}", "0000 ", time_cap); // No year given, set it to 0000
} else {
full_date = format!("{}{}", "0001 ", time_cap); // No year given, set it to 0001
}
// get date time and date only.
date_time = NaiveDateTime::parse_from_str(&full_date, "%Y %b %d %H:%M:%S").unwrap();
date = date_time.date();
d = date.to_string();
if report.start_date == origin_date {
report.start_date = date_time;
}
report.end_date = date_time;
if !report.login_attempts.contains_key(&d) {
report.login_attempts.insert(d, DateLogins::new(date));
}
Ok(date_time)
}
fn parse_ip(message: &str, report: &mut MinedReport, origin_date: NaiveDateTime, current_date: NaiveDateTime) {
for ip in IP_RGX.captures_iter(message) {
let new_ip: String = String::from(&ip[1]);
let ip_clone: String = new_ip.clone();
let date: NaiveDate = current_date.date();
let d: String = date.to_string();
report.total_addrs += 1;
if !report.unique_addrs.contains_key(&new_ip) {
report.unique_addrs.insert(new_ip, IpAuth::new(ip_clone.clone(), origin_date.clone(), current_date.clone()));
}
let login_date = report.login_attempts.get_mut(&d).unwrap();
let unique_ip = report.unique_addrs.get_mut(&ip_clone).unwrap();
if LOGIN_FAIL_RGX.is_match(message) {
report.total_failure += 1;
login_date.failure += 1;
unique_ip.failure += 1;
} else if LOGIN_GOOD_RGX.is_match(message) {
report.total_success += 1;
login_date.success += 1;
unique_ip.success += 1;
unique_ip.successful_attempt = current_date.clone();
} else {
if REVERSE_LOOK_RGX.is_match(message) {
unique_ip.failed_reverse = true;
}
if BREAK_IN_RGX.is_match(message) {
unique_ip.break_in_attempt = true;
}
}
}
}
Like I said I'm new to rust, and programming in general, so there may be something I just don't know about. I already switched to using a hash map from a vector, but maybe there's something better I can use? I don't know. I have also wondered if the chrono or regex crates are my issue here and maybe there's a faster alternative. Either way, thanks to anyone who tries to understand and correct my code!

Related

Trying to read MacOS clipboard contents

On my adventure to learn Rust I decided to try and print to the cli contents of the clipboard. I've done this before in Swift so thought I would have much issues in Rust.
However I'm having a hard time printing the contents of the returned NSArray. I've spent a few hours playing around with different functions but haven't made much progress.
The Swift code I have that works:
import Foundation
import AppKit
let pasteboard = NSPasteboard.general
func reload() -> [String]{
var clipboardItems: [String] = []
for element in pasteboard.pasteboardItems! {
if let str = element.string(forType: NSPasteboard.PasteboardType(rawValue: "public.utf8-plain-text")) {
clipboardItems.append(str)
}
}
return clipboardItems;
}
// Access the item in the clipboard
while true {
let firstClipboardItem = reload()
print(firstClipboardItem);
sleep(1);
}
Here is the Rust code:
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardReading, NSPasteboardTypeString};
use cocoa::foundation::NSArray;
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let changec = pid.changeCount();
let pid_item = pid.pasteboardItems();
if pid_item.count() != 0 {
let items = &*pid_item.objectAtIndex(0);
println!("{:?}", items);
}
println!("{:?}", *pid.stringForType(NSPasteboardTypeString));
}
}
The code above produces: *<NSPasteboardItem: 0x6000021a3de0>*
EDIT:
I've made a little progress but stuck on one last bit. I've managed to get the first UTF8 char out of the clipboard.
The issue I have is if I copy the text: World the system will loop the correct amount of times for the word length but will only print the first letter, in this case W. Output below:
TEXT 'W'
TEXT 'W'
TEXT 'W'
TEXT 'W'
TEXT 'W'
The bit I'm trying to get my head around is how to move to the next i8. I can't seem to find a way to point to the next i8.
The NSString function UTF8String() returns *const i8. I'm scratching my head with how one would walk the text.
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardTypeString};
use cocoa::foundation::{NSArray, NSString};
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let changec = pid.changeCount();
let nsarray_ptr = pid.pasteboardItems();
if nsarray_ptr.count() != 0 {
for i in 0..NSArray::count(nsarray_ptr) {
let raw_item_ptr = NSArray::objectAtIndex(nsarray_ptr, i);
let itm = raw_item_ptr.stringForType(NSPasteboardTypeString);
for u in 0..itm.len() {
let stri = itm.UTF8String();
println!("TEXT {:?}", *stri as u8 as char);
}
}
}
}
}
To everyone who's looked/commented on this so far thank you.
After reading some tests provided by cocoa I figured out what I needed to do.
The code below prints the contents of the clipboard. Thanks to those who pointed me in the right direction.
use cocoa::appkit::{NSApp, NSPasteboard, NSPasteboardTypeString};
use cocoa::foundation::{NSArray, NSString};
use std::{str, slice};
fn main() {
unsafe {
let app = NSApp();
let pid = NSPasteboard::generalPasteboard(app);
let nsarray_ptr = pid.pasteboardItems();
if nsarray_ptr.count() != 0 {
for i in 0..NSArray::count(nsarray_ptr) {
let raw_item_ptr = NSArray::objectAtIndex(nsarray_ptr, i);
let itm = raw_item_ptr.stringForType(NSPasteboardTypeString);
let stri = itm.UTF8String() as *const u8;
let clipboard = str::from_utf8(slice::from_raw_parts(stri, itm.len()))
.unwrap();
println!("{}", clipboard);
}
}
}
}

Ownership question (case with immutable and mutable borrow)

I have a newbie question about ownership, I'm trying to update (+= 1) on the last bytes and print out the UTF-8 characters.
But I have mutable borrow to the String s in order to change the last byte thus I can't print it (using immutable borrow).
What would be the Rustacean way to do so?
Note: I'm aware I'm not doing it properly, I'm at learning stage, thanks.
fn main() {
let s = vec![240, 159, 140, 145];
let mut s = unsafe {
String::from_utf8_unchecked(s)
};
unsafe {
let bytes = s.as_bytes_mut(); // mutable borrow occurs here
for _ in 0..7 {
println!("{}", s); // Crash here as immutable borrow occurs here
bytes[3] += 1;
}
}
println!("{}", s);
}
You can use std::str::from_utf8 to make a &str from bytes to print it as a string.

Insert into hashmap in a loop

I'm opening a CSV file and reading it using BufReader and splitting each line into a vector. Then I try to insert or update the count in a HashMap using a specific column as key.
let mut map: HashMap<&str, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens: Vec<&str> = s.split(&d).collect(); // <-- `s` does not live long enough
if tokens.len() > c {
println!("{}", tokens[c]);
let count = map.entry(tokens[c].to_string()).or_insert(0);
*count += 1;
}
}
The compiler kindly tells me s is shortlived. Storing from inside a loop a borrowed value to container in outer scope? suggests "owning" the string, so I tried to change
let count = map.entry(tokens[c]).or_insert(0);
to
let count = map.entry(tokens[c].to_string()).or_insert(0);
but I get the error
expected `&str`, found struct `std::string::String`
help: consider borrowing here: `&tokens[c].to_string()`
When I prepend ampersand (&) the error is
creates a temporary which is freed while still in use
note: consider using a `let` binding to create a longer lived
There is some deficiency in my Rust knowledge about borrowing. How can I make the hashmap own the string passed as key?
The easiest way for this to work is for your map to own the keys. This means that you must change its type from HasMap<&str, i32> (which borrows the keys) to HashMap<String, i32>. At which point you can call to_string to convert your tokens into owned strings:
let mut map: HashMap<String, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens:Vec<&str> = s.split(&d).collect();
if tokens.len() > c {
println!("{}", tokens[c]);
let count = map.entry(tokens[c].to_string()).or_insert(0);
*count += 1;
}
}
Note however that this means that tokens[c] will be duplicated even if it was already present in the map. You can avoid the extra duplication by trying to modify the counter with get_mut first, but this requires two lookups when the key is missing:
let mut map: HashMap<String, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens:Vec<&str> = s.split(&d).collect();
if tokens.len() > c {
println!("{}", tokens[c]);
if let Some (count) = map.get_mut (tokens[c]) {
*count += 1;
} else {
map.insert (tokens[c].to_string(), 1);
}
}
}
I don't know of a solution that would only copy the key when there was no previous entry but still do a single lookup.

std::process::Command cannot run hdiutil on macOS (mount failed - No such file or directory) but the command works fine when run in the terminal

hdiutils, when fed a correct path to a valid file, returns error 2, no such file or directory. When I join the indices of the command array with " ", print them, copy them and run the exact string in a terminal, it works fine.
This is the function edited to contain only the relevant bits. In order to reproduce my error, you will need a disk image located at ~/Downloads/StarUML.dmg.
use std::env;
use std::fs;
use std::process::Command;
fn setup_downloads(download_name: &str) {
let downloads_path: String = {
if cfg!(unix) {
//these both yield options to unwrap
let path = env::home_dir().unwrap();
let mut downloads_path = path.to_str().unwrap().to_owned();
downloads_path += "/Downloads/";
downloads_path
} else {
"we currently only support Mac OS".to_string()
}
};
let files_in_downloads =
fs::read_dir(&downloads_path).expect("the read_dir that sets files_in_downloads broke");
let mut file_path: String = "None".to_string();
for file_name in files_in_downloads {
let file_name: String = file_name
.expect("the pre string result which sets file_name has broken")
.file_name()
.into_string()
.expect("the post string result which sets file_name has broken")
.to_owned();
if file_name.contains(&download_name) {
file_path = format!("'{}{}'", &downloads_path, &file_name);
}
}
let len = file_path.len();
if file_path[len - 4..len - 1] == "dmg".to_string() {
let mount_command = ["hdiutil", "mount"];
let output = Command::new(&mount_command[0])
.arg(&mount_command[1])
.arg(&file_path)
.output()
.expect("failed to execute mount cmd");
if output.status.success() {
println!(
"command successful, returns: {}",
String::from_utf8_lossy(&output.stderr).into_owned()
);
} else {
println!(
"command failed, returns: {}",
String::from_utf8_lossy(&output.stderr).into_owned()
);
}
}
}
fn main() {
setup_downloads(&"StarUML".to_string());
}
Split your Command into a variable and print it using the debugging formatter after you have specified the arguments:
let mut c = Command::new(&mount_command[0]);
c
.arg(&mount_command[1])
.arg(&file_path);
println!("{:?}", c);
This outputs
"hdiutil" "mount" "\'/Users/shep/Downloads/StarUML.dmg\'"
Note that Command automatically provides quoting for each argument, but you have added your own set of single quotes:
format!("'{}{}'", &downloads_path, &file_name);
// ^ ^
Remove these single quotes.

Rust: explicit type definition syntax in for loop [duplicate]

C++ example:
for (long i = 0; i < 101; i++) {
//...
}
In Rust I tried:
for i: i64 in 1..100 {
// ...
}
I could easily just declare a let i: i64 = var before the for loop
but I'd rather learn the correct way to doing this, but this resulted in
error: expected one of `#` or `in`, found `:`
--> src/main.rs:2:10
|
2 | for i: i64 in 1..100 {
| ^ expected one of `#` or `in` here
You can use an integer suffix on one of the literals you've used in the range. Type inference will do the rest:
for i in 1i64..101 {
println!("{}", i);
}
No, it is not possible to declare the type of the variable in a for loop.
Instead, a more general approach (e.g. applicable also to enumerate()) is to introduce a let binding by destructuring the item inside the body of the loop.
Example:
for e in bytes.iter().enumerate() {
let (i, &item): (usize, &u8) = e; // here
if item == b' ' {
return i;
}
}
If your loop variable happens to be the result of a function call that returns a generic type:
let input = ["1", "two", "3"];
for v in input.iter().map(|x| x.parse()) {
println!("{:?}", v);
}
error[E0284]: type annotations required: cannot resolve `<_ as std::str::FromStr>::Err == _`
--> src/main.rs:3:37
|
3 | for v in input.iter().map(|x| x.parse()) {
| ^^^^^
You can use a turbofish to specify the types:
for v in input.iter().map(|x| x.parse::<i32>()) {
// ^^^^^^^
println!("{:?}", v);
}
Or you can use the fully-qualified syntax:
for v in input.iter().map(|x| <i32 as std::str::FromStr>::from_str(x)) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
println!("{:?}", v);
}
See also:
How do I imply the type of the value when there are no type parameters or ascriptions?
There used to be a discussion where this was requested, which was followed up by an actual RFC.
It seems like the discussion was postponed, though, because not enough people really cared about the topic.
Currently, if you absolutely want to annotate, it seems like the best option you have is:
fn main() {
let my_vec: Vec<i32> = vec![-1, 22, -333];
for i in my_vec.iter() {
let _: &i32 = i;
println!("{}", i);
}
}
As you can see, this fails if the type doesn't match:
fn main() {
let my_vec: Vec<i32> = vec![-1, 22, -333];
for i in my_vec.iter() {
let _: &i16 = i;
println!("{}", i);
}
}
--> src/main.rs:4:23
|
4 | let _: &i16 = i;
| ---- ^ expected `i16`, found `i32`
| |
| expected due to this
|
= note: expected reference `&i16`
found reference `&i32`
Of course due to automatic dereferencing this method cannot differentiate between &i32 and &&i32, which might be a problem in some cases:
fn main() {
let my_vec: Vec<i32> = vec![-1, 22, -333];
for i in my_vec.iter() {
let _: &i32 = &i; // Compiles, but the right side is &&i32
println!("{}", i);
}
}
But in general this should bring enough confidence to potential reviewers, in my opinion.
Try casting with as:
for i in 1..100 as i64 {
// ...
}

Resources