Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a TSV file with one column in it. In that column are a bunch of numbers. The column has a header.
What is the most efficient way to get all of the numbers in that column into one array? (say like 2,000,000 numbers).
Example data:
income
2000\n
80000\n
50000\n
30000\n
I have tried:
File.readlines(path)[1..-1].collect{|salary| salary.gsub("\n",'')}
I want to have the following output:
[2000,80000,50000,30000]
What I have works but I'm not sure it is the most efficient because I would be reading a million rows into memory.
You can use CSV to do this, and it's really easy because you only have one column.
require 'csv'
CSV.read("/path/to/file.tsv").flatten
You can do something like:
array = []
File.foreach('test.txt') do |line|
next if $. == 1
line.chomp!
array << line if line > ''
end
p array
Which returns array as:
["2000", "80000", "50000", "30000"]
However, that is hardly a scalable solution. Depending on your machine you could run out of memory and take the app to a crawl. Instead, I'd strongly suggest using a simple database to store the values, and then operate on it. Databases are designed for this sort of purpose and can be extremely fast. I recommend using the Sequel gem for that.
$. is a special variable that is used to track the row number of the last read file, so, as foreach passes lines into the block, $. will increment. That makes skipping a particular line easy.
array << line if line > ''
is used to avoid appending empty/blank lines if the input file contains a trailing/terminating line-end.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a json file such as below, i need to split this flowfile into numbers of flowfile as per each line
Input flowfile:
{a:122, b: 12, c: dev}
{b: 19, c: dev}
{a:111, b: 12, c: roman,d: 2.3}
Output Flowfile will have 3 files with each row.
Splitjson is just just spliting the first line, please suggest
Do you have downstream processors that expect one JSON per flow file? Otherwise you may be able to skip the Split entirely and just use the Record processors (ConvertRecord, PutDatabaseRecord, e.g.). The JsonTreeReader (in later versions of NiFi) accept the one-JSON-per-line format (even though that's not valid JSON per se). If you do need one JSON object per flowfile, Bryan's suggestion of SplitText with a Line Count of 1 is spot-on.
SplitText with line count of 1
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need to add a - sign before any phone number ending with } and remove that }
For example, consider this sample file:
*My phone number is 9999999999}<br>
Ram is calling to 88888888}<br>
653426} Rohan is trying to call 777777777*
Expected output:
*My phone number is -9999999999<br>
Ram is calling to -88888888<br>
-653426 Rohan is trying to call 777777777*
This will do the trick:
sed -i 's/\s\([0-9]*\)\}/ -\1/g' vv.txt
Description:
vv.txt is the file from the sample provided by you
\s\([0-9]*\)\} : will capture everything that is enclosed in a space and is a digit upto }
-\1/g : here we are replacing everything we captured in between space \s and symbol } in the previous step, with a negative sign
the output on the sample file is:
*My phone number is -9999999999<br>
Ram is calling to -88888888<br>
Rohan is trying to call 777777777*
To take care of the additional condition mentioned in comments you can do two things:
1.
sed -i 's/\([0-9]*\)\}/ -\1/g' vv.txt
but this will add a space in front of - if the line begins with the number as you said.
Alternatively if you don't want the space.
2.
sed -i 's/\s\([0-9]*\)\}/ -\1/g;s/^\([0-9]*\)\}/-\1/g' vv.txt
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have two files FILE1 & FILE2, and lets say both are fixed length of 30 characters. I need to find the records from FILE1 & FILE2 which contain the string 'COBOL', where the position of this key-word is unknown and changes for every record. To be more clear below is as sample layout.
FILE1 :
NVWGNLVKAKOIVCOBOLLKASVOIWSNVS
SOSIVSNAVIS7780HLSVHSKSCOBOL56
ZXCVBNMASDFGHJJKKLIIUYYTRREEWQ
1234567890COBOL1234556FCVHJJHH
COBOL1231231231231231341234334
FILE2 :
123456789012345678901234567890
COBOL1231231231231231341234334
GYKCHYYIIHHFTIUIHGJUGTUHGFUYHG
Can any one explain me how to do it using SORT or JOINKEYS and also by using COBOL program.
I need two output files.
Output FILE-OP1 : (which contain all the records having COBOL key-word from file1 & file2)
NVWGNLVKAKOIVCOBOLLKASVOIWSNVS
SOSIVSNAVIS7780HLSVHSKSCOBOL56
1234567890COBOL1234556FCVHJJHH
COBOL1231231231231231341234334
COBOL1231231231231231341234334
Output File-OP2 (which contain only matching records with COBOL key-word from file1 & file2)
COBOL1231231231231231341234334
An example, pseudo-codeish, Cobol:
Open File1
Read File1 into The-Record
Perform until End-Of-File
Perform varying II from 1 by 1
until II > length of The-Record
If The-Record (II:5) = 'COBOL'
Display "Found COBOL at position " II
End-If
End-Perform
Read File1 into The-Record
End-perform
Repeat for file2 with the same program pointed at your other file.
As this sounds homework-y, I've left several little quirks that you will need to fix in that code, but you should see where it blows up or fails and be able to resolve those reasonably easily.
If you need to do some sort of matching and dropping between the two files, that is a different animal and you need to get your rules for it. Are you trying to match the files that have "COBOL" located in the same position or something? What behavior do you expect?
For your FILE1, SORT it on the entire input data, only including records which contain COBOL and appending a sequence number (you show your output in the original sequence). If there can be duplicate records, SORT on the sequence-number you attach as well.
Similar for FILE2.
The SORT for each program can be stand-alone (DFSORT or SyncSORT) or within a COBOL program.
You then "match" the files, here's a useful bit of pseudo-code from Bruce Martin: https://stackoverflow.com/a/22950005/1927206
Logically after the match, you then need to SORT both outputs on the sequence-number alone, and after that remove the sequence-numbers.
Remembering that you only need to know if COBOL is present in the data, if using COBOL for the first two SORTs you have a variety of ways to locate the word COBOL (and remembering you only need to know if it is there, not where it is or how many times it may be there): as Joe Zitzelberger showed, you can use a one-byte reference-modification, but be careful not to go beyond the data with your PERFORM VARYING (use compiler option SSRANGE if you are unclear what I mean); you can use INSPECT; UNSTRING; STRING; define you data with an OCCURS, for a length of five and use an index for a one-byte table; use OCCURS DEPENDING ON; do it "byte at a time"; etc.
This is a little bit like free-format number handling.
You can use "SS" in DFSORT to find records containing cobol.
Step 1. read both infiles, produce one outfile OP-1
INCLUDE COND=(1,30,SS,EQ,C'COBOL')
Step2. produce a work file in the same way as step 1. using only File 1.
Step3. produce a work file in the same way as step 1. using only File 2.
Run joinkeys on these two to find matches. ==> outfile OP-2
Essentially this strategy serves to eliminate non qualifying rows from the join.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have list of strings. I am trying to append those string values to a text file.
Here is my code:
java_location = "#{second}#{first}"
The output of java_location is:
1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/bin/java
1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/jre/bin/java
1.5.0/opt/itm/v6.2.2/JRE/lx8266/bin/java
1.6.0_35/u01/app/oracle/product/Middleware/Oracle_BI1/jdk/jre/bin/java
I want this output writing into a text file.
How can i do that?
File.write('file.txt', java_location)
You want to open the file in append mode ('a') rather than readwrite ('w+') which truncates the existing file to zero length before writing
http://alvinalexander.com/blog/post/ruby/example-how-append-text-to-file-ruby
if first && second
java_location = "#{second}#{first}"
a << java_location
File.open("/home/weblogic/javafoundmodified.txt", 'a') do |file|
a.each {
|item|
file.puts item
}
end
end
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am trying to retrieve key value pairs defined on two different .yml files. Is it possible to do so in a single Ruby file ?
Sure. Try this:
require 'yaml'
file1 = YAML.load_file("/home/abdo/settings.yml")
file2 = YAML.load_file("/home/abdo/database.yml")
This is an example I'm using in Rails to load a settings file:
SETTINGS = YAML.load_file("#{Dir.pwd}/config/settings.yml")[Rails.env]
If you want to load multiple files in 1 hash, you can do the following:
files = %w(database.yml settings.yml)
yamls = files.map { |f| YAML.load_file("#{Dir.pwd}/config/#{f}") }
H = files.each_with_object({}).with_index { |(e, hash), i| hash[e] = yamls[i] }
You can access H["database.yml"] to get the Hash representing the file with name database.yml
If you want to load a list of files following a certain pattern in a directory, you can use Dir.glob as mentioned in Iterate through every file in one directory
EDIT If your YAML files have non-conflicting data (data that does not get overridden when merged) and you'd like to merge all of them in a single Hash, you can do:
yamls.inject({}) { |hash, yaml| hash.merge(yaml) }