Regex issue with building a file system crawler

Regex issue with building a file system crawler - ruby

I am building a crawler to search my file system for specific documents containing specific information. However, the regex part is leaving me a little perplexed. I have a testfile on my desktop containing 'teststring' and a test credit card number '4060324066583245' and the code below will run properly and find the file containing teststring:
require 'find'
count = 0
Find.find('/') do |f| # '/' for root directory on OS X
if f.match(/\.doc\Z/) # check if filename ends in desired format
contents = File.read(f)
if /teststring/.match(contents)
puts f
count += 1
end
end
end
puts "#{count} sensitive files were found"
Running this confirms that the crawler is working and properly finding matches. However, when I try to run it for finding the test credit card number it fails to find a match:
require 'find'
count = 0
Find.find('/') do |f| # '/' for root directory on OS X
if f.match(/\.doc\Z/) # check if filename ends in desired format
contents = File.read(f)
if /^4[0-9]{12}(?:[0-9]{3})?$/.match(contents)
puts f
count += 1
end
end
end
puts "#{count} sensitive files were found"
I checked the regex on rubular.com with 4060324066583245 as a piece of test data, which is contained in my test document, and Rubular verifies that the number is a match for the regex. To sum things up:
The crawler works on the first case using teststring - verifying that the crawler is properly scanning my file system and reading contents of the desired file type
Rubular verifies that my regex successfully matches my test credit card number 4060324066583245
The crawler fails to find the test credit card number.
Any suggestions? I'm at a loss why Rubular shows the regex as working but the script won't work when run on my machine.

^ and $ are anchors that tie the match to the start and end of the string, respectively.
Therefore, ^[0-9]{4}$ will match "1234", but not "12345" or " 1234 " etc.
You should be using word boundaries instead:
if contents =~ /\b4[0-9]{12}(?:[0-9]{3})?\b/

Related

My gem cannot see a dictionary text file, I cannot get the path

I wrote an implementation of the hangman game. (It is played on the terminal). The game is working fine but I usually have some problems getting my program to open a dictionary txt file from where the word will be generated. Below is my code for generating the word
def word_generator(min, max)
words = File.open("../dictionary.txt"), "r").readlines.map!(&:chomp)
level_words = words.select { |i| i.length >= min && i.length <= max }
random_index = rand(level_words.length)
#game_word = level_words[random_index]
end
This approach works fine when I play my game locally and the dictionary text file is just one directory level away from my ruby file. Here is the problem:
When I package the project as a gem, and install it. It will throw this error in initialize': No such file or directory # rb_sysopen /Users/andeladev/Desktop/paitin_hangman/bin/dictionary.txt (Errno::ENOENT). It will only run fine when I put the text file in the present working directory of the terminal.
How do I go about writing the path in the argument passed to File.open that will tell the program to look for the file in the gem path rather than the present working directory.

Try this:
file_name = File.join(File.dirname(File.expand_path(__FILE__)), '../dictionary.txt')
words = File.open(file_name, "r").readlines.map!(&:chomp)

in `initialize': string contains null byte Ruby

I wrote the following code on my desktop and it worked fine. I downloaded it on my laptop, downloaded ruby (v1.9.3), and tried to run it but got the following error. I'm pretty sure it has to do with Ruby being used for the first time but never got this problem on my desktop when I first ran Ruby.
C:/Users/Downloads/vscript.rb:18:in 'initialize': string contains null byte (ArgumentError)
from C:/Users/Downloads/vscript.rb:18:in 'open'
from C:/Users/Downloads/vscript.rb:18:in 'main'
Line 18 is the File.open line:
File.open("filename", "r") do |f|
# Do while there are characters in the text file
f.each do |line|
# Checks to see if any parts in file match the regex and inform the user
if x = line.match(/\d\.\d\.\d{4}\.\d/)
puts "#{x} was found in the file."
end
end
end

Figured it out. When I originally wrote the code the filename had /'s separating the folders. When I downloaded the file on my laptop, I copied its new directory from the address bar which uses \'s. Changed that and it works fine now.

Moving image to another folder

currently reading Learn To Program. Im at page 91-92 where you create a program that moves images from your USB drive to desired location and changing the name of each image. But i get the follwing error when runing the program. Using Ubuntu as you can tell, but get "Invalid cross-device link". Any ideas of how to solve this?
pierre#ubuntu:~/ruby$ ruby move.rb
What would you like to call this batch?
IMG
Downloading 1 files: .move.rb:36:in `rename': Invalid cross-device link - (/media/SanDisk Cruzer Blade/pictures/UMG.jpg, IMG01.jpg) (Errno::EXDEV)
from move.rb:36:in `block in <main>'
from move.rb:17:in `each'
from move.rb:17:in `<main>'
This is the code
# Heres where the pictures are stored
Dir.chdir '/home/pierre/Skrivbord'
# First we find all of the pictures to be moved
pic_names = Dir['/media/SanDisk Cruzer Blade/pictures/**/*.{JPG,jpg}']
puts 'What would you like to call this batch?'
batch_name = gets.chomp
puts
print "Downloading #{pic_names.length} files: "
# This will be our counter. We'll start at 1 today,
# though normally I like to count from 0.
pic_number = 1
pic_names.each do |name|
print '.' # This is our "progress bar".
new_name = if pic_number < 10
"#{batch_name}0#{pic_number}.jpg"
else
"#{batch_name}#{pic_number}.jpg"
end
# This renames the picture, but since "name" has a big long
# path on it, and "new_name" doesn't, it also moves the file to the current
# working directory, which is now Katy's PictureInbox folder. Since it's a
# *move*, this effectively downloads and deletes the originals. And since this
# is a memory card, not a hard drive, each of these takes a second or so; hence,
# the little dots let her know that my program didn't hose her machine.
# (Some marriage advice from your favourite author/programmer: it's all about
# the little things.)
# Now where were we? Oh, yeah...
File.rename name, new_name
# Finally, we increment the counter.
pic_number = pic_number + 1
end
puts # This is so we aren't on progress bar line.
puts 'Done, cutie!'

.. you are trying to use "rename" to physically move a file, and the system is objecting
to this misconception. File.rename can only rename files, it cannot
move them. It works only on one storage device/volume/whatever.
require 'fileutils'
include FileUtils
cp(old, new )
rm (old)
http://www.ruby-forum.com/topic/78627

Regex works in textwrangler but something isn't right in my ruby script

Could I get someone to punch holes in my script? My regex works fine to find urls in textwrangler but when I run my script the parseducc.txt file is putting bits and pieces of things on different lines.
export = File.new("parseducc.txt" , "w+")
File.open("uccdata.txt").each_line do |line|
line.scan(/(([a-zA-Z0-9-])+\.)+([a-zA-Z]){3,4}/) do |x|
export.puts x
end
end
sample output
dhl-usa.
a
m
upsfreight.
t
m
fedex.
x
m
myyellow.
w
m
My goal with this script is to scan through a file line by line and pull out the URLs and dump them one per line into a new output file. I have tried several variations of this script but clearly I am missing something. I'm guessing it is in my regex but I've used different variations of that which I found on regexlib.com and they displayed vary similar problems.

Try this one:
export = File.new("parseducc.txt" , "w+")
File.open("uccdata.txt").each_line do |line|
line.scan(/(https?:\/\/\S+)/) do |x|
export.puts x
end
end

Recursive directory listing using Ruby with Chinese characters in file names

I would like to generate a list of files within a directory. Some of the filenames contain Chinese characters.
eg: [试验].Test.txt
I am using the following code:
require 'find'
dirs = ["TestDir"]
for dir in dirs
Find.find(dir) do |path|
if FileTest.directory?(path)
else
p path
end
end
end
Running the script produces a list of files but the Chinese characters are escaped (replaced with backslashes followed by numbers). Using the example filename above would produce:
"TestDir/[\312\324\321\351]Test.txt" instead of "TestDir/[试验].Test.txt".
How can the script be altered to output the Chinese characters?

Ruby needs to know that you are dealing with unicode in your code. Set appropriate character encoding using KCODE, as below:
$KCODE = 'utf-8'
I think utf-8 is good enough for chinese characters.

The following code is more elegant and doesn't require 'find.' It produces a list of files (but not directories) in whatever the working directory is (or whatever directory you put in).
Dir.entries(Dir.pwd).each do |x|
p x.encode('UTF-8') unless FileTest.directory?(x)
end
And to get a recursive digging down one level use:
Dir.glob('*/*').each do |x|
p x.encode('UTF-8') unless FileTest.directory?(x)
end
I'm sure there is a way to get it to go all the way down but Dir.glob('**/*') will go through the whole file system if I remember right.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regex issue with building a file system crawler - ruby

^ and $ are anchors that tie the match to the start and end of the string, respectively. Therefore, ^[0-9]{4}$ will match "1234", but not "12345" or " 1234 " etc. You should be using word boundaries instead: if contents =~ /\b4[0-9]{12}(?:[0-9]{3})?\b/

Related

My gem cannot see a dictionary text file, I cannot get the path

in `initialize': string contains null byte Ruby

Moving image to another folder

Regex works in textwrangler but something isn't right in my ruby script

Recursive directory listing using Ruby with Chinese characters in file names

Categories

Resources