What could be Regex for reading first line of file with first few bytes and then rest of file content except last 8 bytes of last line of file? - ruby

I am processing one binary file in which I want to retrieve first 4 bytes, next 4 bytes, again 4 bytes and then rest of the file contents except last 8 bytes of last line.
I have tried with this line file.read.scan(/(.{4})(.{4})(.{4})(.*\w)(.{8})/).each do |a,b,c,d,e| but after some iterations this regex starts from some line in between with first 4 bytes, next 4 bytes, next 4 bytes pattern. Because of this my condition check fails.
I want to do following.
Read first 4 bytes of first line of file, then bytes 5 to 7, then bytes 8 to 11, then rest of the file content except last 8 bytes of last line of the file.
What could be the regex for this in Ruby?

Use #read instead of a regexp:
f = File.open(file_name,"rb")
chunk1 = f.read(4)
chunk2 = f.read(3)
chunk3 = f.read(4)
chunk4 = f.read(f.size - (4 + 3 + 4 + 8))

How about:
/(.{4})(.{3})(.{4})(.*).{8}/m

Related

Sort a text file and then extract the last 5 lines [duplicate]

This question already has answers here:
How do I sort arrays using vbscript?
(13 answers)
Natural Sorting using VB script
(1 answer)
Closed last year.
i have an issue that i just cant get working - I have to use VBSCRIPT for this.
I have a data that is being extracted from a log file but unfortunatly its not sorted post the extract (Im just parsing the log and using regex to grab spefic lines). My problem is that post this extration i am looking at the last 5 lines but the data extraction isnt sorted:
Sample Text Extraction:
2022-02-08,16:46:46.390,4,3812,Tag 7145
2022-02-08,16:46:46.390,4,3812,Tag 7145
2022-02-09,14:48:41.609,4,3460,Tag 22860
2022-02-09,14:48:50.609,4,872,Tag 22863
2022-02-09,14:48:59.156,4,3576,Tag 22866
2022-02-09,14:49:15.015,4,3932,Tag 22871
2022-02-09,14:49:39.218,4,4060,Tag 22877
2022-02-09,14:56:06.953,4,2440,Tag 22952
2022-02-08,08:34:55.703,4,3580,Tag 1347
2022-02-08,08:35:04.656,4,2124,Tag 1350
2022-02-08,08:35:14.609,4,2300,Tag 1353
2022-02-08,08:35:29.500,4,3996,Tag 1357
2022-02-08,08:35:35.296,4,2484,Tag 1359
2022-02-08,16:46:25.593,4,1648,Tag 7139
2022-02-09,14:49:06.968,4,2992,Tag 22868
2022-02-08,16:46:55.328,4,3236,Tag 7148
2022-02-08,16:47:04.953,4,3964,Tag 7151
I am using the following to get the last 5 lines:
ReDim buf(numlines-1)
For n = 0 To UBound(buf)
buf(n) = Null
Next
i = 0
Set f = oFSO.OpenTextFile(filename)
Do Until f.AtEndOfStream
buf(i) = f.ReadLine
i = (i+1) Mod numlines
Loop
f.Close
When i run this code it will give me the 5 last lines from the text file but not the correct ones. How can i sort the data in the array so that the extaction is correct (In this case i should get the 9th Feb data and not the 8th)

How to split a string by amount of characters in a batch file?

I have about 6GB of various text files, the files have many lines but each record is missing its commas so all the data is in 1 record. I want to create a batch file where I can add commas at the appropriate places in each "record". I'm hoping to add commas so I can then import this into a database.
For example the file would be structured like this.
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
IDnameADDRESSphoneEMAILetc
Each field has a unique length which I know, and it's static between all files.
For example
ID - 10 characters
NAME - 40 characters
ADDRESS - 30 characters
etc
This will need to be run on an ongoing basis as new files come in so I'm hoping for something I can give a non technical person they can just run.
Any quick way to do this in a bat file?
Using your example above. Note we count the characters starting from 0, then tell the set to use letters starting at a certain count, counting the word length from there. See bottom for layout.
#echo off
setlocal enabledelayedexpansion
for /F "tokens=* delims=" %%a in (filename.txt) do (
set str=%%a
set id=!str:~0,2!
set na=!str:~2,4!
set add=!str:~6,7!
set ph=!str:~13,5!
set em=!str:~18,5!
set etc=!str:~23,3!
echo !id!,!na!,!add!,!ph!,!em!,!etc!
)
Characters assigned in a string as:
I D n a m e A D D R E S S p h o n e E M A I L e t c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
ID starts at Character 0 and is 2 characters, including itself :~0,2
name starts at character 2 and is 4 characters long :~2,4
etc..
For many files just add another loop as a main loop or give a list of files.
Based on your provided example, here is a quick powershell command, (despite no tag):
(GC 'Report.txt' | Select -First 1).Insert(10,',').Insert(51,',').Insert(82,',') > 'Fixed.txt'
It takes the first line of Report.txt…
After 10 characters insert ,(0 + 10 = 10) + 1
After another 40 characters insert ,(11 + 40 = 51) + 1
After another 30 characters insert ,(52 + 30 = 82) + 1
etc.
…then outputs the line complete with insertions to Fixed.txt
Just continue the .Insert(<number>,',') sequence for your other fixed width column sizes and ensure you've changed the filenames to suit your circumstances.
Edit
The following as an update to your comment and subsequent edit should work for all lines in the file.
GC 'Report.txt' | % {($_).Insert(10,',').Insert(51,',').Insert(82,',')} | Out-File 'Fixed.txt'

Wireshark: read 8 bytes of timestamp

I'm new to writing dissectors in 'C' and I came across the need to read 8 bytes timestamp from a packet.
I'm trying the following code:
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_letoh64(tvb, offset));
and I get:
offset=8, starttime=0x0362ea14
which is only 4 bytes out of the 8 I was expecting.
How can I read it so the output would be:
offset=8, starttime=0x14ea620305779840
I also tried reading it using:
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_bits64(tvb, 64, 32, ENC_LITTLE_ENDIAN));
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_bits64(tvb, 64, 64, ENC_LITTLE_ENDIAN));
and it printed the 4 first bytes of the timestamp and the 2nd call printed the last 4 bytes. I'm missing something very basic...
2nd question, ok, let's assume I get the value right and convert it into nstime_t, How can I format this into a Date\time format, something like:
YYYY-MM-DDZHH:MM:SS:MMMM
Thank you so much!
What output do you get with this?
g_print("offset=%d, starttime=0x%08lx\n", offset, tvb_get_letoh64(tvb, offset));
As for your 2nd question, what is the meaning of these 8 bytes? Maybe you can declare your hf variable using FT_ABSOLUTE_TIME and use something like proto_tree_add_time(), proto_tree_add_time_item(), proto_tree_add_time_format_value() or proto_tree_add_time_format()?

Wierd output characters (Chinese characters) when using Ruby to read / write CSV

I'm trying to print the first 5 lines from a set of large (>500MB) csv files into small headers in order to inspect the content more easily.
I'm using Ruby code to do this but am getting each line padded out with extra Chinese characters, like this:
week_num type ID location total_qty A_qty B_qty count਍㌀㐀ऀ猀漀爀琀愀戀氀攀ऀ㄀㤀㜀ऀ䐀䔀开伀渀氀礀ऀ㔀㐀㜀㈀ ㌀ऀ㔀㐀㜀㈀ ㌀ऀ ऀ㤀㄀㈀㔀㌀ഀ
44 small 14 A 907859 907859 0 550360਍㐀㄀ऀ猀漀爀琀愀戀氀攀ऀ㐀㈀㄀ऀ䐀䔀开伀渀氀礀ऀ㌀ ㈀㄀㜀㐀ऀ㌀ ㈀㄀
The first few lines of input file are like so:
week_num type ID location total_qty A_qty B_qty count
34 small 197 A 547203 547203 0 91253
44 small 14 A 907859 907859 0 550360
41 small 421 A 302174 302174 0 18198
The strange characters appear to be Line 1 and Line 3 of the data.
Here's my Ruby code:
num_lines=ARGV[0]
fh = File.open(file_in,"r")
fw = File.open(file_out,"w")
until (line=fh.gets).nil? or num_lines==0
fw.puts line if outflag
num_lines = num_lines-1
end
Any idea what's going on and what I can do to simply stop at the line end character?
Looking at input/output files in hex (useful suggestion by #user1934428)
Input file - each character looks to be two bytes.
Output file - notice the NULL (00) between each single byte character...
Ruby version 1.9.1
The problem is an encoding mismatch which is happening because the encoding is not explicitly specified in the read and write parts of the code. Read the input csv as a binary file "rb" with utf-16le encoding. Write the output in the same format.
num_lines=ARGV[0]
# ****** Specifying the right encodings <<<< this is the key
fh = File.open(file_in,"rb:utf-16le")
fw = File.open(file_out,"wb:utf-16le")
until (line=fh.gets).nil? or num_lines==0
fw.puts line
num_lines = num_lines-1
end
Useful references:
Working with encodings in Ruby 1.9
CSV encodings
Determining the encoding of a CSV file

Replace the n-th byte in a file with another byte

In Ruby, how do I replace, say, the 7th byte of a file with another byte?
Use binwrite method from IO class
IO.binwrite("testfile", [0x0D].pack("C"), 7) # => 1
# File could contain: "This is0two\nThis is line three\nAnd so on...\n"
0x0D is 13
Also you may need to know about pack method

Resources