Convert pcap text file to csv in Bash - bash

The content in the text files have the following formats:
|1=X1|2=Y1|3=K1|4=J1|5=S1|
|1=X2|3=K2|4=J2|5=S2|
|1=X3|2=Y3|4=J3|5=S3|
...
So sometimes it appears there are missing data and what we want is a csv file like the follows:
1,2,3,4,5
X1,Y1,K1,J1,S1
X1,,K2,J2,S2
X3,Y3,,J3,S3
...
I really have no clues on how to do it with a Bash, regarding the missing data.
There are around 5 million lines with 30+ columns and my idea was we might need to do 30x "if clause" to check and fill out ",," for any missing data. This sounds impractical and apparently there should be better methods.

You can either use tshark, or use this program.
pip3 install scapy pyshark
python3 pcap2csv --pcap inp.pcap --csv op.csv
This should work.

Related

Exiftool: batch-write metadata to JPEGs from text file

I'd like to use ExifTool to batch-write metadata that have been previously saved in a text file.
Say I have a directory containing the following JPEG files:
001.jpg 002.jpg 003.jpg 004.jpg 005.jpg
I then create the file metadata.txt, which contains the file names followed by a colon, and I hand it out to a coworker, who will fill it with the needed metadata — in this case comma-separated IPTC keywords. The file would look like this after being finished:
001.jpg: Keyword, Keyword, Keyword
002.jpg: Keyword, Keyword, Keyword
003.jpg: Keyword, Keyword, Keyword
004.jpg: Keyword, Keyword, Keyword
005.jpg: Keyword, Keyword, Keyword
How would I go about feeding this file to ExifTool and making sure that the right keywords get saved to the right file? I'm also open to changing the structure of the file if that helps, for example by formatting it as CSV, JSON or YAML.
If you can change the format to a CSV file, then exiftool can directly read it with the -csv option.
You would have to reformat it in this way. The first row would have to have the header of "SourceFile" above the filenames and "Keywords" above the keywords. If the filenames don't include the path to the files, then command would have to be run from the same directory as the files. The whole keywords string need to be enclosed in quotes so they aren't read as a separate columns. The result would look like this:
SourceFile,Keywords
001.jpg,"KeywordA, KeywordB, KeywordC"
002.jpg,"KeywordD, KeywordE, KeywordF"
003.jpg,"KeywordG, KeywordH, KeywordI"
004.jpg,"KeywordJ, KeywordK, KeywordL"
005.jpg,"KeywordM, KeywordN, KeywordO"
At that point, your command would be
exiftool -csv=/path/to/file.csv -sep ", " /path/to/files
The -sep option is needed to make sure the keywords are treated as separate keywords rather than a single, long keyword.
This has an advantage over a script looping over the file contents and running exiftool once for each line. Exiftool's biggest performance hit is in its startup and running it in a loop will be very slow, especially on a large amount of files (see Common Mistake #3).
See ExifTool FAQ #26 for more details on reading from a csv file.
I believe the answer by #StarGeek is superior to mine, but I will leave mine for completeness and reference of a more basic, Luddite approach :-)
I think you want this:
#!/bin/bash
while IFS=': ' read file keywords ; do
exiftool -sep ", " -iptc:Keywords="$keywords" "$file"
done < list.txt
Here is the list.txt:
001.jpg: KeywordA, KeywordB, KeywordC
002.jpg: KeywordD, KeywordE, KeywordF
003.jpg: KeywordG, KeywordH, KeywordI
And here is a result:
exiftool -b -keywords 002.jpg
KeywordD
KeywordE
KeywordF
Many thanks to StarGeek for his corrections and explanations.

Finding a newline in the csv file

I know there are a lot of questions about this (latest one here.), but almost all of them are how to join those broken lines into one from a csv file or remove them. I don't want to remove, but I just want to display/find that line (or probably the line number?)
Example data:
22224,across,some,text,0,,,4 etc
33448,more,text,1,,3,,,4 etc
abcde,text,number,444444,0,1,,,, etc
358890,more
,text,here,44,,,, etc
abcdefg,textds3,numberss,413,0,,,,, etc
985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
More search on this, and I know I shouldn't use bash to accomplish this, but rather shoud use perl. I tried (from various website, I don't know perl), but apparently I don't have the Text::CSV package and I don't have permission to install one.
As I told I have no idea how to even start looking for this, so I don't have any script. This is not a windows file, this is very much unix file so we can ignore the CR problem.
Desired output:
358890,more
,text,here,44,,,, etc
985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
or
Line 4: 358890,more
,text,here,44,,,, etc
Line 7: 985678,93838,text,,,,
,text,continuing,from,previous,line,,, etc
Much appreciated.
You can use perl to count the number of fields(commas), and append the next line until it reaches the correct number
perl -ne 'if(tr/,/,/<28){$line=$.;while(tr/,/,/<28){$_.=<>}print "Line $line: $_\n"}' file
I do love Perl but I don't think it is the best tool for this job.
If you want a report of all lines that DO NOT have exactly the correct number of commas/delimiters, you could use the unix language awk.
For example, this command:
/usr/bin/awk -F , 'NF != 8' < csv_file.txt
will print all lines that DO NOT have exactly 7 commas. Comma is specified as the Field with -F and the Number of Fields is specified with NF.

Opposite of Linux Split

I have a huge file and I split the big file into several small chunks and divide and conquer. Now I have a folder contains a list of files like below:
output_aa #(the output file done: cat input_aa | python parse.py > output_aa)
output_ab
output_ac
output_ad
...
I am wondering is there a way to merge those files back together FOLLOWING THE INDEX ORDER:
I know I could do it by using
cat * > output.all
but I am more curious another magical command already exist comes with split..
The magic command would be:
cat output_* > output.all
There is no need to sort the file names as the shell already does it (*).
As its name suggests, cat original design was precisely to conCATenate files which is basically the opposite of split.
(*) Edit:
Should you use an (hypothetical ?) locale that use a collating order where the a-z order is not abcdefghijklmnopqrstuvwxyz, here is one way to overcome the issue:
LC_ALL=C "sh -c cat output_* > output.all"
There are other ways to concat files together, but there is no magical "opposite of split" in "linux".
Of course, talking about "linux" in general is a bit far fetched, as many distributions have different tools (most of them use a different shell already by default, like sh, bash, csh, zsh, ksh, ...), but if you're talking about debian based linux at least, I don't know of any distribution which would provide such a tool.
For sorting you can use the linux command "sort" ;
Also be aware that using ">" for redirecting stdout will override maybe existing contents, while ">>" will concat to an existing file.
I don't want to copycat, but still make this answer complete, so what jlliagre said about the cat command should also be considered of course (that "cat" was made to con-"cat" files, effectively making it possible to reverse the split command - but that's only provided you use the same ordering of files, so it's not exactly the "opposite of split", but will work that way in close to 100% of the cases (see comments under jlliagre answer for specifics))

how to use mongoexport to get a particular format in .csv file?

I am new to mongoDb and exciting about using it at my workplace. However, I have come across a situation where one of our client has sent the data in .bson file. I have got everything working on machine. I want to use mongoexport facility to export my data in csv format. When I am using the following query
./mongoexport --db <dbname> -collection <collectionname> --csv -fields _id,field1,field2
I am getting the result in following format
ObjectID(4f6b42eb5e724242f60002ce),"[ { ""$oid"" : ""4f6b31295e72422cc5000001"" } ]",369008
However, I just want the value of the fields as a comma separated output like below: 4f6b42eb5e724242f60002ce,4f6b31295e72422cc5000001,369008
My question is, is there anything that I can do something in mongoexport to ignore certain characters?
any pointer will be helpful.
No, mongoexport has no features like this. You'll need to use tools like sed and awk to post-process the file, or read the file and munge it in a scripting language like Python.
You should be able to add the following to your list of arguments:
--csv
You may also want to supply a path:
-o something.csv
...Though I don't think you could do this in 2012 when you first posted your question :-)

ruby mechanize: how read downloaded binary csv file

I'm not very familiar using ruby with binary data. I'm using mechanize to download a large number of csv files to my local disk. I then need to search these files for specific strings.
I use the save_as method in mechanize to save the file (which saves the file as binary). The content type of the file (according to mechanize) is:
application/vnd.ms-excel;charset=x-UTF-16LE-BOM
From here, I'm not sure how to read the file. I've tried reading it in as a normal file in ruby, but I just get the binary data. I've also tried just using standard unix tools (strings/grep) to try and search without any luck.
When I run the 'file' command on one of the files, I get:
foo.csv: Little-endian UTF-16 Unicode Pascal program text, with very long lines, with CRLF, CR, LF line terminators
I can see the data just fine with cat or vi. With vi I also see some control characters.
I've also tried both the csv and fastercsv ruby libraries, but I get 'IllegalFormatError' exception for these. I've also tried this solution without any luck.
Any help would be greatly appreciated. Thanks.
You can use the command 'iconv' to conver to UTF-8,
# iconv -f 'UTF-16LE' -t 'UTF-8' bad_file.csv > good_file.csv
There is also a wrapper for iconv in the standard library, you could use that to convert the file after reading it into your program.

Resources