shuf command to extract lines with spaces in CSV file - bash

I have a CSV file with set of 1000 addresses. I used shuf command to shuffle the 10 lines randomly for a process. Since the addresses are available with spaces, the shuf command collects all 10 addresses into a single element in an array rather than 10 different elements in the array. Please help resolving the issue.
Sample CSV
from_address
"303 Co Rd 405, Floresville, TX 78114,US"
"4422 Oakside Dr, Houston, TX 77053,US"
"4218 S 245th Ct, Kent, WA 98032,US"
"1407 Marion Manor Dr, Marion, VA 24354,US"
"7400 Englewood Ave, Yakima, WA 98908,US"
"8012 Burly Wood Way, Hampton, GA 30253,US"
"931 Beacon Square Ct, Gaithersburg, MD 20878,US"
"12 Truval la, Nesconset, NY 11767,US"
"121 Pet Rock Ct, Clayton, NC 27520,US"
"235 Whitaker Rd, Westfield, PA 16950,US"
"13422 NE 133rd St, Kirkland, WA 98034,US"
"1620 27th St NW, Canton, OH 44709,US"
"488 Andrews Rd, Columbus, GA 31903,US"
"4742 Janet Ln, Bethlehem, PA 18017,US"
"2622 Cherokee Ct, West Palm Beach, FL 33406,US"
"111 Westbury Ct, Doylestown, PA 18901,US"
"820 Main St, Belpre, OH 45714,US"
"1307 Stevenson Ln, Towson, MD 21286,US"
"2725 Hartford Rd, East York, PA 17402,US"
"9 Winding Brook Rd, Rhinebeck, NY 12572,US"
"433 Willowbrook Dr, Norristown, PA 19403,US"
"208 N Kayla Dr, Granite Quarry, NC 28146,US"
"931 Pimlico Dr, Centerville, OH 45459,US"
Shell Script
list_=("$(shuf -n 10 sample_addresses.csv)")
echo ${#list_[#]}
Expected Result
10
Actual Result
1

list_=("$(shuf -n 10 sample_addresses.csv)")
That's creating a list with one single element.
To read the lines into an array, use the mapfile command:
mapfile -t list_ < <(shuf -n 10 sample_addresses.csv)
A good way to inspect the contents of a variable is
declare -p list_

Related

How to convert image to text in codeignter

Hii ijust want to ask that how can i convert a image to text using OCR ?
if(isset($_FILES['image'])){
$file_name = $_FILES['image']['name'];
$file_tmp =$_FILES['image']['tmp_name'];
move_uploaded_file($file_tmp,"image/".$file_name);
echo "<h3>Image Upload Success</h3>";
echo '<img src="'.$file_name.'" style="width:70%">';
shell_exec('"C:\\Program Files\\Tesseract-OCR\\tesseract" "D:\\xampp\\htdocs\\ci3\\image\\'.$file_name.'" out');
echo "<br><h3>OCR after reading</h3><br><pre>";
$myfile = fopen("out.txt", "r") or die("Unable to open file!");
echo fread($myfile,4045);
fclose($myfile);
echo "</pre>";
}
I just write this code but it is like not convert the image text properly so is their any solution so please let me know !!
I Expecting that its work i vertical image to read the text but in my output it read like.........
The Registration Directorate at the Ministry of Industry, Commerce and Tourism
certifies that the merchant's below details have been registered in accordance with
Decree law No. (27) for the year 2015 of the Commercial Registration.
22/04/2023 GliaiuY! ~,6 Registration 22/04/2007
Date
eroup HORIZON TELECOM SERVICES COMPANY WLL
Name
Commercial
Name
Registration
Type
CR
Status
HORIZON TELECOM SERVICES COMPANY WLL
With Limited Liability Company
ACTIVE
Area 4élaicl!
ABU SAYBA/ a2 5!
P.O.BOX #.ye Road & »b
7325
Block asx
473
Commercial
Address
Activities
Sale and installation of telecommunications equipment and parts
ere! Bylo!
Registration Directorate
QF. 409 Issue 0
* This CR does not permit its holder to practice investment activities on behalf of others.
igiwltzads
(alka,
KINGDOM OF BAHRAIN pues.
Ministry of Industry, ©
Commerce and Tourism R
Solenitl J) 15 bol gd
Commercial Registration Certificate
ell ad oi hath dala yb yleill s Ae licall 8 51} 52 apsaill 6 pla) agts
GSM Dasa GLE; 2015 Aid (27) aby cy silds p pes pall Cady alld g oLisi atltly Aba uell
Judll 6 Registration == 4908 - 1
aad CYL) Glesal 6 5 jl st 4S 8 Ac gore! pul
aed VLA) Las) Oy jolt ASS ole usd
Ba gdare Aud gious IS AS ph andl ¢ 93
dads aud Le
Flat/Shop No. J=«/4a4
11
Building +
608 Gola ol gual
Woke abby SYLSIYI Glace af jlo
wsdl Ul gal pletion! LU 49) jo: 4ualial jin Y all lhe *
Issued Date: 20/04/2022 Page 1 of 1
}
Z/
Please post this certificate at a visible place.
Tel: +973 80001700 - www. sijilat.bh - www.moic.gov.bh
boat! S12 Sol GIS Bolg S! che 5 oe
but i need a seprate column to read a proper text formate

how to grab text after newline and concat each line to make a new one in a text file no clean of spaces, tabs

I have a text like this:
Print <javascript:PrintThis();>
www.example.com
Order Number: *912343454656548 * Date of Order: November 54 2043
------------------------------------------------------------------------
*Dicders Folcisad:
* STACKOVERFLOW
*dum FWEFaadasdd:* ‎[U+200E] ‎
STACK OVERFLOW
BLVD OF SOMEPLACENICE 434
SANTA MONICA, COUNTY
LOS ANGEKES, CALI 90210
(SW)
*Order Totals:*
Subtotal Usd$789.75
Shipping Usd$87.64
Duties & Taxes Usd$0.00 ‎
Rewards Credit Usd$0.00
*Order Total * *Usd$877.39 *
*Wordskccds:*
STACKOVERFLOW
FasntAsia
xxxx-xxxx-xxxx-
*test Method / Welcome Info *
易客满x京配个人行邮税- 运输 + 关税 & 税费 / ADHHX15892013504555636
*Order Number: 916212582744342X*
*#* *Item* *Price* *Qty.* *Discount* *Subtotal*
1
Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535
Usd$141.92 4 -Usd$85.16 Usd$482.52
2
Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg,
60 Stresss CXB-034251
Usd$192.24 1 -Usd$28.83 Usd$163.41
3
34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Tablesds XDF-38452
Usd$169.20 1 -Usd$25.38 Usd$143.82
*Extra Discounts:* Extra 15% discounts applied! Usd$139.37
*Stackoverflox Contact Information :*
*Web: *www.example.com
*Disclaimer:* something made, or service sold through this website,
have not been test by the sweden Spain norway and Dumrug
Advantage. They are not intended to treet, treat, forsee or
forshadow somw clover.
I'm trying to grab each line that start with number, then concat second line, and finally third line. example text:
1 Random's Bounty, Product, 500 mg, 100 Rainsd Harrys AXK-0ew5535 Usd$141.92 4 -Usd$85.16 Usd$482.52
2 Random Product, Fast Forlang, Mayority Stonghold, Flavors, 10 mg, 60 Stresss CXB-034251 Usd$192.24 1 -Usd$28.83 Usd$163.41 <- 1 line
3 34st Omicron, Novaccines Percent Pharmaceutical, 10 mg, 120 Wedscsd XDF-38452 Usd$169.20 1 -Usd$25.38 Usd$143.82 <- 1 lines as first
as you may notices Second line has 3 lines instead of 2 lines. So make it harder to grab.
Because of the newline and whitespace, the next command only grabs 1:
grep -E '1\s.+'
also, I have been trying to make it with new concats:
grep -E '1\s|[A-Z].+'
But doesn't work, grep begins to select similar pattern in different parts of the text
awk '{$1=$1}1' #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r" #done already
I'm trying to make a script, so I give as an ARGUMENT a not clean FILE and then grab the table and select each number with their respective data. Sometimes data has 4 lines, sometimes 3 lines. So copy/paste don't work for ME.
I think the last line to be joined is the line starting with "Usd". In that case you only need to change the formatting in
awk '
!orderfound && /^[0-9]/ {ordernr++; orderfound=1 }
orderfound { order[ordernr]=order[ordernr] " " $0 }
$1 ~ "Usd" { orderfound = 0 }
END {
for (i=1; i<=ordernr; i++) { print order[i] }
}' inputfile

Using grep command to transfer data into a new file

This might be super simple, but I have a .txt file with seismic data in which I'm trying to use the grep command to print out specific data only from Nevada (data in the file is marked either CA or NV) and to put it into its own .txt file.
Sample data:
map 0.2 2016/09/26 18:36:51 39.330N 119.991W 4.7 9 km ( 6 mi) N of Incline Village, NV
map 1.5 2016/09/26 18:26:27 39.362N 122.781W 19.5 25 km (15 mi) NNE of Upper Lake, CA
map 1.5 2016/09/26 18:18:16 36.055N 117.857W 2.2 8 km ( 5 mi) E of Coso Junction, CA
map 0.2 2016/09/26 18:10:46 38.363N 118.324W 4.6 32 km (20 mi) SE of Hawthorne, NV
I'm typing: grep NV filename > newfilename
But nothing is showing up. What's wrong? (My homework is to specifically use the grep command.)
You want this:
cat *filename* | grep something > result.txt
Your command appears as though it should have worked, but for safety you would want to be sure you're getting exactly what you want. Below grep would only get lines that end in NV
grep " NV$" filename > newfilename
When you say you can't see anything are you viewing the file contents afterward?
I copy/pasted your sample data into a file called sample-data and tried the grep pattern I would have used (' NV$'), but then I found only one line came through, the last one, because there was [invisible] whitespace after the NV in the first line. So to guard against that, I put [ \t]* between the NV and the $ (end of line symbol) in the grep pattern, and I got the result I expected. See below:
$ grep ' NV$' sample-data > result.txt
$ cat result.txt
map 0.2 2016/09/26 18:10:46 38.363N 118.324W 4.6 32 km (20 mi) SE of Hawthorne, NV
$ cat sample-data
map 0.2 2016/09/26 18:36:51 39.330N 119.991W 4.7 9 km ( 6 mi) N of Incline Village, NV
map 1.5 2016/09/26 18:26:27 39.362N 122.781W 19.5 25 km (15 mi) NNE of Upper Lake, CA
map 1.5 2016/09/26 18:18:16 36.055N 117.857W 2.2 8 km ( 5 mi) E of Coso Junction, CA
map 0.2 2016/09/26 18:10:46 38.363N 118.324W 4.6 32 km (20 mi) SE of Hawthorne, NV
$ grep ' NV[ \t]*$' sample-data > result.txt
$ cat result.txt
map 0.2 2016/09/26 18:36:51 39.330N 119.991W 4.7 9 km ( 6 mi) N of Incline Village, NV
map 0.2 2016/09/26 18:10:46 38.363N 118.324W 4.6 32 km (20 mi) SE of Hawthorne, NV
$
In short, I think what you want, to be safe, is:
grep ' NV[ \t]*$' sample-data > result.txt
Or, even safer, if you don't trust there always to be a space between the comma and NV:
grep ',[ \t]*NV[ \t]*$' sample-data > result.txt
which, translated, means, "match lines that have a comma, zero or more spaces or tabs, NV, zero or more spaces or tabs, and nothing more before the end of the line."
By the way, if this is homework, and not for your job or home project, technically you should probably admit to your teacher that you asked for help on StackOverflow. Your teacher will be more impressed with your honesty, and probably won't ding you if you can say, "I get it, I get it, look at these other examples I tested and they worked too!" A teacher's main goal is that you learn and understand, not that you get some score. My purpose in providing this answer is to help you understand a tiny bit more about grep, which millions of us use every single day in our lives to get our work done, so it really is worth learning. Probably I should have provided a "teaching" example that was not the exact answer, but this was such a small, trivial problem I just answered it during my coffee break. Just be honest with your teacher is all I ask.

Apache PIG - How to get the Flop 10 data records?

I have data records like this:
Name customerID revenue(Mio) premium
Michael James 078932832 2.7 y
Susan Miller 024383490 3.9 n
John Cooper 021023023 2.1 y
How do I get the records - divided into the premium flag - each with the lowest revenue (=Flop 10)?
The result should be given as:
Nr Name customerID revenue(Mio) premium
1 John Cooper 021023023 2.1 y
2 Michael James 078932832 2.7 y
3 Andrew Murs 044834399 3.0 y
. ... ..... ... .
10 th entry with flag y
1 Susan Miller 024383490 3.9 n
. ... ..... ... .
10 th entry with flag n
As you see the list is ordered ascending (beginning with the lowest revenue).
I guess you should use split
Considering A is your load statement
A = load 'data' as (Nr,Name,customerID,revenue,premium);
B = split A into PRE if premium =='y', NONPRE if premium == 'n';
C = order PRE by revenue asc;
D = order NONPRE by revenue asc;
Disclaimer: Be careful while using split as null records get dropped. I have not compiled this code.

bash awk get numbers in two digits

I want to correct wrong meta data or add missing meta data for the 75 cd's I have ripped from disc.
I got the track info from AllMusic en stripped it to almost usable "CSV" data.
Number";"1";"Piece";"Nocturne for piano No. 2 in E flat major, Op. 9/2, CT. 109";"Componist";"Frédéric Chopin
MainPiece";"";"Piece";"Symphony No. 9 in E minor ("From the New World"), B. 178 (Op. 95) (first published as No. 5)
Number";"2";"Piece";"Largo";"Componist";"Antonin Dvorák
Number";"3";"Piece";"La plus que lente, waltz for piano (or orchestra), L. 121";"Componist";"Claude Debussy
Number";"4";"Piece";"Waldesrauschen (Forest Murmurs), for piano (Zwei Konzertetuden No. 1), S. 145/1 (LW A218/1)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Oboe Concerto, for oboe, strings & continuo in D minor, Op. 8/9, RV 454
Number";"5";"Piece";"Allegro";"Componist";"Antonio Vivaldi
Number";"6";"Piece";"Largo";"Componist";"Antonio Vivaldi
Number";"7";"Piece";"Allegro";"Componist";"Antonio Vivaldi
MainPiece";"";"Piece";"Cello Concerto in A major, G. 475
Number";"8";"Piece";"1. Allegro";"Componist";"Luigi Boccherini
Number";"9";"Piece";"2. Adagio";"Componist";"Luigi Boccherini
Number";"10";"Piece";"3. Rondò - Allegro";"Componist";"Luigi Boccherini
MainPiece";"";"Piece";"Serenade No. 12 for winds in C minor ("Nacht Musique"), K. 388 (K. 384a)
Number";"11";"Piece";"Allegro";"Componist";"Wolfgang Amadeus Mozart
Number";"12";"Piece";"Liebesträume, notturno for piano No. 3 in A flat major ("O Lieb, so lang du lieben kannst"), S. 541/3 (LW A103/3)";"Componist";"Franz Liszt
MainPiece";"";"Piece";"Phantasiestücke (4) for violin, cello & piano in A minor, Op. 88
Number";"13";"Piece";"Romanze";"Componist";"Robert Schumann
MainPiece";"";"Piece";"Sinfonia Concertante for violin, cello, oboe, bassoon & orchestra, H. 1/105
Number";"14";"Piece";"Andante";"Componist";"Franz Joseph Haydn
I would like to rewrite this with awk to a script to set meta data
eyeD3 -n 01 -a composer -t mainpiece piece 01*.mp3
And with awk to rename the files
mv 01*.mp3 01 [composer] mainpiece piece.mp3
The mainpiece / piece is an manual part but I would like to rewrite 1 to 01.
I found something with printf ("%2d" ,$1,$2) but thins complaints about .mp3
Has anyone suggestions for me?

Resources