using variables in gsub - ruby

I have a variable address which for now is a long string containing some unneccessary info, eg: "Aboriginal Relations 11th Floor Commerce Place 10155 102 Street Edmonton AB T5J 4G8 Phone 780 427-9658 Fax 780 644-4939 Email gerry.kushlyk#gov.ab.ca"
Aboriginal Relations is in a variable called title, and I'm trying to call address.gsub!(title,''), but its returning the original string.
I've also tried address.gsub!(/#{title}/,'') and address.gsub!("#{title}",'') but those won't work either. Any ideas?
Sorry, the typo occurred when I typed it into stack overflow, heres the code and the output, copied and pasted:
(this is within a loop, so there will be multiple outputs)
p title
address.gsub!(title,'')
p address
output
"Aboriginal Relations "
"Aboriginal Relations 11th Floor Commerce Place 10155 102 Street Edmonton AB T5J 4G8 Phone 780 427-9658 Fax 780 644-4939 Email gerry.kushlyk#gov.ab.ca"
"Aboriginal Tourism Advisory Council "
"Aboriginal Tourism Advisory Council 5th Floor Terrace Building 9515 107 Street Edmonton AB T5K 2C3 Phone 780 427-9687 Fax 780 422-7235 Email foip.fintprccs#gov.ab.ca"
"Acadia Foundation "
"Acadia Foundation PO Box 96 Oyen AB T0J 2J0 Phone 403 664-3384 Fax 403 664-3316 Email acadiafoundation#telus.net"
"Access Advisory Council "
"Access Advisory Council 12th Floor Centre West Building 10035 108 Street Edmonton AB T5J 3E1 Phone 780 427-2805 Fax 780 422-3204 Email barb.joyner#gov.ab.ca"
"ACCM Benevolent Association "
"ACCM Benevolent Association Suite 100 9403 95 Avenue Edmonton AB T6C 4M7 Phone 780 468-4648 Fax 780 468-4648 Email accmmanor#shaw.ca"
"Acme Municipal Library "
"Acme Municipal Library PO Box 326 Acme AB T0M 0A0 Phone 403 546-3845 Fax 403 546-2248 Email aamlibrary#marigold.ab.ca"
likewise, if I try address.match(/#{title}/) I get nil.

I'm assuming you're using ruby 1.9 or higher.
It's possible that the trailing whitespace is a non-breaking space:
p "Relations\u00a0" # looks like a trailing space, but strip won't remove it
to get rid of it:
"Relations\u00a0".gsub!(/^\u00a0|\u00a0$/, '') # => "Relations"
A more generic solution for all unicode whitespace:
"Relations\u00a0".gsub!(/^[[:space:]]|[[:space:]]$/, '') # => "Relations"
To see what the character is in your case:
title[-1].ord # => 160 (example only)
'%x' % title[-1].ord # => "a0" (hex equivalent; example only)

title = title[0..-2] seemed to solve it. for some reason strip and chomp wouldn't work.

Related

How to convert image to text in codeignter

Hii ijust want to ask that how can i convert a image to text using OCR ?
if(isset($_FILES['image'])){
$file_name = $_FILES['image']['name'];
$file_tmp =$_FILES['image']['tmp_name'];
move_uploaded_file($file_tmp,"image/".$file_name);
echo "<h3>Image Upload Success</h3>";
echo '<img src="'.$file_name.'" style="width:70%">';
shell_exec('"C:\\Program Files\\Tesseract-OCR\\tesseract" "D:\\xampp\\htdocs\\ci3\\image\\'.$file_name.'" out');
echo "<br><h3>OCR after reading</h3><br><pre>";
$myfile = fopen("out.txt", "r") or die("Unable to open file!");
echo fread($myfile,4045);
fclose($myfile);
echo "</pre>";
}
I just write this code but it is like not convert the image text properly so is their any solution so please let me know !!
I Expecting that its work i vertical image to read the text but in my output it read like.........
The Registration Directorate at the Ministry of Industry, Commerce and Tourism
certifies that the merchant's below details have been registered in accordance with
Decree law No. (27) for the year 2015 of the Commercial Registration.
22/04/2023 GliaiuY! ~,6 Registration 22/04/2007
Date
eroup HORIZON TELECOM SERVICES COMPANY WLL
Name
Commercial
Name
Registration
Type
CR
Status
HORIZON TELECOM SERVICES COMPANY WLL
With Limited Liability Company
ACTIVE
Area 4élaicl!
ABU SAYBA/ a2 5!
P.O.BOX #.ye Road & »b
7325
Block asx
473
Commercial
Address
Activities
Sale and installation of telecommunications equipment and parts
ere! Bylo!
Registration Directorate
QF. 409 Issue 0
* This CR does not permit its holder to practice investment activities on behalf of others.
igiwltzads
(alka,
KINGDOM OF BAHRAIN pues.
Ministry of Industry, ©
Commerce and Tourism R
Solenitl J) 15 bol gd
Commercial Registration Certificate
ell ad oi hath dala yb yleill s Ae licall 8 51} 52 apsaill 6 pla) agts
GSM Dasa GLE; 2015 Aid (27) aby cy silds p pes pall Cady alld g oLisi atltly Aba uell
Judll 6 Registration == 4908 - 1
aad CYL) Glesal 6 5 jl st 4S 8 Ac gore! pul
aed VLA) Las) Oy jolt ASS ole usd
Ba gdare Aud gious IS AS ph andl ¢ 93
dads aud Le
Flat/Shop No. J=«/4a4
11
Building +
608 Gola ol gual
Woke abby SYLSIYI Glace af jlo
wsdl Ul gal pletion! LU 49) jo: 4ualial jin Y all lhe *
Issued Date: 20/04/2022 Page 1 of 1
}
Z/
Please post this certificate at a visible place.
Tel: +973 80001700 - www. sijilat.bh - www.moic.gov.bh
boat! S12 Sol GIS Bolg S! che 5 oe
but i need a seprate column to read a proper text formate

Extract a portion of text from a file name

I have files with following names, Each file name contains the information of Area code and house number. I'm new to scripting. How to write bash script for extracting area code and house number?
ID-Final_RDX_301_002-14_33_1992
Area code is 301
house number is 002
ID-Final_RDX-311-004-14_28_1992
Area code is 311
house number is 004
ID-Final_RDX311021-14_28_1992
Area code is 311
house number is 021
ID-Final_RDX-XT-Se3-14_28_1992
Area code is XT
house number is Se3
ID-Final_RDX-XT-Se11-14_28_1992
Area code is XT
house number is Se11
Your filenames doesn't follow a pattern as mentioned in [this] comment. But I hope it is a typo. If that is the case, you could do something in similar terms as mentioned below :
find . -type f -name "ID-Final*" -exec awk -vfile={} 'BEGIN{
split(file,res,"-|_");
printf "Area Code : %s House Number : %s%s",res[4],res[5],ORS
}' \;
Area Code : 311 House Number : 004
Area Code : 14 House Number : 28
Area Code : XT House Number : Se3
Area Code : XT House Number : Se11
Area Code : 301 House Number : 002
Well,if the missing -/_s are intentional, then you need much more than this simple awk to solve this.

Different Code 128 barcode symbols representing the same data

I'm currently using software called LineView. It generates downtime reason codes for our factory lines. An operator scans the barcodes with an RS232 scanner and it goes into our XL board system.
The software itself generates the barcodes within an internet browser, but I am trying to make it so our own labeling machine can also print out the barcodes. However, the barcodes that are produced by the labeler (and the many online barcode generators I've tried) look longer and do not work.
The data for the example 128 barcode that I am trying to replicate is [SOH]1[STX]65;1067[ETX].
According to the manual:
- The Start of Header character (ASCII 0x01) starts the XL Command packet.
1 - The Serial Address of the XL device (the default is 1).
- The Start of Transmission character (ASCII 0x02) marks the start of the actual command.
65; - The ID of the Production State > Set Reason Code command.
The Reason Code ID (which can range from 1 to 999 for system reasons or 1000 to 1999 for user defined reasons). In my case it is 1067
- The End of Transmission character (ASCII 0x03) ends the XL Command packet.
I have attatched the pictures of what LineView produces (which is what I want it to look like) and what it is currently printing like on our labeller.
When I scan them they both come up with the [SOH]1[STX]65;1067[ETX] code despite them looking different.
Any help with this would be very much appreciated.
Your intended barcode is constructed internally using the following series of Code 128 codewords which correctly represent the ASCII control characters:
103 Start-in-Mode-A (Upper-case and control characters)
65 [SOH] (ASCII 1)
17 1
66 [STX] (ASCII 2)
22 6
21 5
27 ;
99 Switch-to-Mode-C (Double-density numeric)
10 10
67 67
101 Switch-to-Mode-A
67 [ETX] (ASCII 3)
67 Check-digit
106 Stop
Your label printer is printing a barcode representing the literal string [SOH]1[STX]65;1067[ETX] with no ASCII control characters (i.e. left-bracket, S, O, H, right-bracket, ...) using the following internal codewords:
104 Start-in-Mode-B (Mixed-case)
59 [
51 S
47 O
40 H
61 ]
17 1
59 [
51 S
52 T
56 X
61 ]
22 6
21 5
27 ;
99 Switch-to-Mode-C (Double-density numeric)
10 10
67 67
100 Switch-to-Mode-B
59 [
37 E
52 T
56 X
61 ]
57 Check-digit
106 Stop
So you need to work out how to correctly specify ASCII control characters in the input to your labelling machine.

bash awk get numbers in two digits

I want to correct wrong meta data or add missing meta data for the 75 cd's I have ripped from disc.
I got the track info from AllMusic en stripped it to almost usable "CSV" data.
Number";"1";"Piece";"Nocturne for piano No. 2 in E flat major, Op. 9/2, CT. 109";"Componist";"Frédéric Chopin
MainPiece";"";"Piece";"Symphony No. 9 in E minor ("From the New World"), B. 178 (Op. 95) (first published as No. 5)
Number";"2";"Piece";"Largo";"Componist";"Antonin Dvorák
Number";"3";"Piece";"La plus que lente, waltz for piano (or orchestra), L. 121";"Componist";"Claude Debussy
Number";"4";"Piece";"Waldesrauschen (Forest Murmurs), for piano (Zwei Konzertetuden No. 1), S. 145/1 (LW A218/1)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Oboe Concerto, for oboe, strings & continuo in D minor, Op. 8/9, RV 454
Number";"5";"Piece";"Allegro";"Componist";"Antonio Vivaldi
Number";"6";"Piece";"Largo";"Componist";"Antonio Vivaldi
Number";"7";"Piece";"Allegro";"Componist";"Antonio Vivaldi
MainPiece";"";"Piece";"Cello Concerto in A major, G. 475
Number";"8";"Piece";"1. Allegro";"Componist";"Luigi Boccherini
Number";"9";"Piece";"2. Adagio";"Componist";"Luigi Boccherini
Number";"10";"Piece";"3. Rondò - Allegro";"Componist";"Luigi Boccherini
MainPiece";"";"Piece";"Serenade No. 12 for winds in C minor ("Nacht Musique"), K. 388 (K. 384a)
Number";"11";"Piece";"Allegro";"Componist";"Wolfgang Amadeus Mozart
Number";"12";"Piece";"Liebesträume, notturno for piano No. 3 in A flat major ("O Lieb, so lang du lieben kannst"), S. 541/3 (LW A103/3)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Phantasiestücke (4) for violin, cello & piano in A minor, Op. 88
Number";"13";"Piece";"Romanze";"Componist";"Robert Schumann
MainPiece";"";"Piece";"Sinfonia Concertante for violin, cello, oboe, bassoon & orchestra, H. 1/105
Number";"14";"Piece";"Andante";"Componist";"Franz Joseph Haydn
I would like to rewrite this with awk to a script to set meta data
eyeD3 -n 01 -a composer -t mainpiece piece 01*.mp3
And with awk to rename the files
mv 01*.mp3 01 [composer] mainpiece piece.mp3
The mainpiece / piece is an manual part but I would like to rewrite 1 to 01.
I found something with printf ("%2d" ,$1,$2) but thins complaints about .mp3
Has anyone suggestions for me?

In Ruby, how to UTF-8 encode this weird character?

I'm importing content from an outside database that is infected with a variety of odd characters, e.g.
> str
=> "Nature’s Variety, Best Friends Animal Society team up"
From context it seems that ’ represents a right single-quote. In cp1252 encoding:
> str.encode('cp1252')
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
So how do I convert it to the correct UTF-8 character? Here's what I've tried:
> str.encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('UTF-8', invalid: :replace, replace: '?', undef: :replace)
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').encode('UTF-8', invalid: :replace, replace: '?', undef: :replace)
=> "Nature’s Variety, Best Friends Animal Society team up"
I'd rather find a way to do a generic re-encoding so that it will handle all such miss-encoded characters. But if I have to I'll do individual search and replacing. But I'm not able to make that work either:
> str.encode('cp1252').gsub('\xE2/x80/x99', "'")
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').gsub(%r{\xE2\x80\x99}, "'")
SyntaxError: unexpected tIDENTIFIER, expecting $end
> str.encode('cp1252').gsub(Regexp.escape('\xE2\x80\x99'), "'")
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
I'd like to do this, but I can't even paste these characters into my REPL:
> str.gsub('’', "'")
When I try I get:
> str.gsub('C"b,b,b
* "', ",")
=> "Nature’s Variety, Best Friends Animal Society team up"
Frustrating. Any suggestions on how to encode this properly into UTF-8?
Edit: At the request for the actual bytes in the string:
> str.bytes.to_a.join(' ')
=> "78 97 116 117 114 101 195 162 226 130 172 226 132 162 115 32 86 97 114 105 101 116 121 44 32 66 101 115 116 32 70 114 105 101 110 100 115 32 65 110 105 109 97 108 32 83 111 99 105 101 116 121 32 116 101 97 109 32 117 112"
I had this problem with Fixing Incorrect String Encoding From MySQL. You need to set the proper encoding and then force it back.
fallback = {
"\u0081" => "\x81".force_encoding("CP1252"),
"\u008D" => "\x8D".force_encoding("CP1252"),
"\u008F" => "\x8F".force_encoding("CP1252"),
"\u0090" => "\x90".force_encoding("CP1252"),
"\u009D" => "\x9D".force_encoding("CP1252")
}
str.encode('CP1252', fallback: fallback).force_encoding('UTF-8')
The fallback may not be necessary depending on your data, but it ensures that it won't raise an error by handling the five bytes which are undefined in CP1252.
Once Ruby has got the encoding wrong, the characters will stay incorrect, according to the original mistake. Conversions simply convert the now wrong characters into the new encoding.
To correct Ruby's mistake on input, you need to use the force_encoding method, which does not do a conversion, it just corrects Ruby's note of what encoding a String has.
In your case the fault has occurred before you read the values from the DB. If you pick out the problem bytes: bytes = %w(195 162 226 130 172 226 132 162).map(&:to_i) they look to be in UTF-8 encoding, and already in the database double-encoded. You can probably assume a problem with whatever has written these into the DB (note if it is a live process, this is a bug that needs sorting, you will continue to get these bad values in).
What has happened is your DB (or code that writes to it) received some UTF-8 bytes representing the correct character, but assumed they were CP1252 to be converted to UTF-8. It made that conversion and wrote valid UTF-8 (but wrong characters) into the DB.
If I do the following in Ruby console using UTF-8 encoding in my terminal and as the default Ruby encoding, I can replicate your problem:
str = "Nature’s Variety, Best Friends Animal Society team up"
=> "Nature’s Variety, Best Friends Animal Society team up"
str = str.force_encoding('CP1252').encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
The fault is reversible, as shown here:
str = str.encode('CP1252').force_encoding('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
The encode('CP1252') undoes the original mistaken conversion.
The force_encoding('UTF-8') sets the encoding back to what the system most likely received in the first place.
You will want to find where in your system an assumption of CP1252 input is being made, and instead assume UTF-8 (it may get more complicated than that if you have multiple sources in different encodings).

Resources