What is difference between Digital Forensic and Reverse Engineering? - debugging

I am not able to understand exact difference in Digital Forensic and Reverse Engineering. Will Digital Forensic has anything to do with decompilation, assembly code reading or debugging?
Thanks

Digital Forensic practice usually involves:
looking at logfiles
doing recovery of unlinked filesystem objects (e.g deleted files)
recovering browsing history through cache, etc.
looking at timestamps of files
(usually for the purpose of law enforcement)
Reverse Engineering usually involves determining how something works by:
looking at binary file formats of multiple files (or executables) to determine patterns
decompilation of binary executables to determine intent of the code
black-boxing and/or debugging of known-good applications to determine nominal behaviour with respect to data.
(usually for the purpose of interoperability)
They're completely different activities.
EDIT: so many typos.

I think the lines are a little more blurred than most realize. Digital forensics goes after the artifacts to prove certain activity has taken place. Very few software packages offer documentation on the files that are created by that application. Basically, reverse engineering is required to figure out what the artifacts are, but not all forensic examiners are required to do the actual reverse engineering part.

Both are very, very different.
Reverse Engineering is a process of deconstructing how a system behaves without its engineering documents.
It has many purposes: replicating or exploiting a system or merely to make a compatible product that works with a system. It may involve software tools (IDApro), in-circuit emulators, soldering irons, etc. One neat example is that it's possible to de-pot a chip using nitric acid https://www.youtube.com/watch?v=mT1FStxAVz4 and then place the chip under a microscope to possibly determine some of its structure and behavior. (IANAL, IANAC: Don't attempt without chemistry knowledge and lab safety.)
Digital Forensics is looking to see what people or systems may have done by examining compute, network and storage devices for evidence.
It is mostly used by persons defending systems such as system administrators or law enforcement to determine who/what/how a potential crime occurred. This can automated (Snort, Tripwire) or manual (searching logs, say in Splunk or Loggly, or searching raw disk snapshots for particular strings).

There very very different stuff!
Digital Forensics is used to retrieve deleted artifacts , logging am dd image , you can see it like viewing the big picture.
Reversing is the opposite, it's digging into a code to it binaries and understanding 100% what it does.
If you'd like to enter this field I recommend reading Practical Malware Analasys book.

Digital forensics is the practice of retrieving information from digital media (computers, phones & tablets, networks) via a number of means. Normally for law enforcement, though it can be for private organisations and other partied; especially in the rising field of e-discovery.
Reverse engineering is looking at the code or binary of a file/system and determining how it is structured and how it works.
These are two completely different sciences. But if you think about it, they go hand in hand. Digital forensics need reverse engineering to determine what information is available in files they analyse and how that information is stored. Any good digital forensics company will have a R&D department that will allow them to do this in house.

Related

Successful FPGA application for HPC, e.g. on a cluster with InfiniBand backbone?

Assuming there is a task (e.g. an image processing method with a lot math) which is reasonable to be implemented on FPGA in sense of answer https://stackoverflow.com/a/8695228/544463
Is there any known (that you can actually name) successful application or practice for combining it with "dedicated" (designed on custom demand) super computing cluster (HPC), e.g. with Infiniband stack? I wonder if that has already been done and to which extend that was successful.
My main motivation for the question is that http://en.wikipedia.org/wiki/Reconfigurable_computing is a long term (academic) perspective for the future development of cluster computing as a distinctive alternative to cloud computing (the later concentrates more on the software (higher) flexibility level but also through possible "reconfiguration"). Is it already practical?
I would also expect somebody is doing research on this... It would be nice to learn about results.
Well, it's not FPGA, but D.E. Shaw's Anton computer for molecular dynamics is famously ASICs connected with a custom high-speed network; J. P. Morgan uses clusters of FPGAs in its risk-analysis calculations (recent Forbes article here). Convey computers has been pushing FPGA+x86+high speed networking fairly hard for the past couple of years, so presumably there's some sort of market there...
http://www.maxeler.com/ - they build racks of Intel PCs hosting custom boards stuffed with FPGAs (and - critically - the associated software and FPGA code) to speed up seismic processing, financial analysis and the like.
I think they could be regarded as successful (I gather they turn a profit) and have big customers from finance and oil companies amongst their clientele.
Is there any known (that you can actually name) successful application
or practice for combining it with "dedicated" (designed on custom
demand) super computing cluster (HPC), e.g. with Infiniband stack? I
wonder if that has already been done and to which extend that was
successful.
It's being attempted academically with Novo-G.
You might be interested in Maxwell.
I know that Cray used to have a series of supercomputers some years ago that combined AMD Opterons with Xilinx FPGAs (iirc) through a HyperTransport bus, basically allowing you to create your own specialized processor for custom workloads. According to their website though, they now seem to have dropped FPGAs in favor of GPUs.
For the current research, there's always Google Scholar...
Update: After a bit of searching, it appears to have been the Cray XT5h, which had the possibility of using FPGA coprocessors...
Some have already been mentioned (convey, cray), some not (e.g. beecube).
But one of the biggest FPGA-Clusters I ever heard of, is missing:
The Large Hadron Collider at CERN. They produce in seconds enormous amounts of data (2.7 Terabit/s). They use the FPGAs (> 100) of them to reduce and filter the data to reduce it, and make it handable.
It does not fit your request to be connected to a dedicated HPC-Cluster, but they are a HPC-Cluster on their own (as on the higher hierarchy levels the used FPGAs are FX, they include two PowerPCs and are also some kind of "normal" cluster).
There is quite a lot of published work in reconfigurable computing applications.
Here's a list of links to SRC Computers-centric published papers.
There's the Center for High-Performance Reconfigurable Computing.
Google search "FPGA" or "reconfigurable" along with these academic institution names and you'll find many published papers. Some of the papers you'll find go back to 2004.
Jackson State University
Clemson University
Catholic University
George Washington University
George Mason University
National Center for Supercomputing Applications (NCSA)
University of Illinois (UIUC)
Naval Postgraduate School (NPS)
Air Force Research Lab (AFRL)
University of Dayton Research Institute (UDRI)
University of Florida
University of Arkansas
There also was a reconfigurable-centric conference hosted by NCSA, the Reconfigurable Systems Summer Institute (RSSI).
This list is certainly not exhaustive, but it will get you started.
Disclosures: I currently work for SRC Computers, LLC, I worked at NCSA/UIUC and I chaired the RSSI conference its first two years.
Yet another great use case developed by adapteva called parallela (they have a kickstarter project).
They are developing a epoch-series of processors controlled by a two cores ARM processor (that shares the board).
I am so much anticipating to have this toy in my hands!
PS
Since it was largely inspired by ardunio (and similar ARM-like) systems, this project is still limited by 1 Gbps networking.

Choosing a strategy for BI module

The company I work for produces a content management system (CMS) with different various add-ons for publishing, e-commerce, online printing, etc. We are now in process of adding "reporting module" and I need to investigate which strategy should be followed. The "reporting module" is otherwise known as Business Intelligence, or BI.
The module is supposed to be able to track item downloads, executed searches and produce various reports out of it. Actually, it is not that important what kind of data is being churned as in the long term we might want to be able to push whatever we think is needed and get a report out of it.
Roughly speaking, we have two options.
Option 1 is to write a solution based on Apache Solr (specifically, using https://issues.apache.org/jira/browse/SOLR-236). Pros of this approach:
free / open source / good quality
we use Solr/Lucene elsewhere so we know the domain quite well
total flexibility over what is being indexed as we could take incoming data (in XML format), push it through XSLT and feed it to Solr
total flexibility of how to show search results. Similar to step above, we could have custom XSLT search template and show results back in any format we think is necessary
our frontend developers are proficient in XSLT so fitting this mechanism for a different customer should be relatively easy
Solr offers realtime / full text / faceted search which are absolutely necessary for us. A quick prototype (based on Solr, 1M records) was able to deliver search results in 55ms. Our estimated maximum of records is about 1bn of rows (this isn't a lot for typical BI app) and if worse comes to worse, we can always look at SolrCloud, etc.
there are companies doing very similar things using Solr (Honeycomb Lexicon, for example)
Cons of this approach:
SOLR-236 might or might not be stable, moreover, it's not yet clear when/if it will be released as a part of official release
there would possibly be some stuff we'd have to write to get some BI-specific features working. This sounds a bit like reinventing the wheel
the biggest problem is that we don't know what we might need in the future (such as integration with some piece of BI software, export to Excel, etc.)
Option 2 is to do an integration with some free or commercial piece of BI software. So far I have looked at Wabit and will have a look at QlikView, possibly others. Pros of this approach:
no need to reinvent the wheel, software is (hopefully) tried and tested
would save us time we could spend solving problems we specialize in
Cons:
as we are a Java shop and our solution is cross-platform, we'd have to eliminate a lot of options which are in the market
I am not sure how flexible BI software can be. It would take time to go through some BI offerings to see if they can do flexible indexing, real time / full text search, fully customizable results, etc.
I was told that open source BI offers are not mature enough whereas commercial BIs (SAP, others) cost fortunes, their licenses start from tens of thousands of pounds/dollars. While I am not against commercial choice per se, it will add up to the overall price which can easily become just too big
not sure how well BI is made to work with schema-less data
I am definitely not be the best candidate to find the most approprate integration option in the market (mainly because of absence of knowledge in BI area), however a decision needs to be done fast.
Has anybody been in a similar situation and could advise on which route to take, or even better - advise on possible pros/cons of the option #2? The biggest problem here is that I don't know what I don't know ;)
I have spent some time playing with both QlikView and Wabit, and, have to say, I am quite disappointed.
I had an expectation that the whole BI industry actually has some science under it but from what I found this is just a mere buzzword. This MSDN article was actually an eye opener. The whole business of BI consists of taking data from well-normalized schemas (they call it OLTP), putting it into less-normalized schemas (OLAP, snowflake- or star-type) and creating indices for every aspect you want (industry jargon for this is data cube). The rest is just some scripting to get the pretty graphs.
OK, I know I am oversimplifying things here. I know I might have missed many different aspects (nice reports? export to Excel? predictions?), but from a computer science point of view I simply cannot see anything beyond a database index here.
I was told that some BI tools support compression. Lucene supports that, too. I was told that some BI tools are capable of keeping all index in the memory. For that there is a Lucene cache.
Speaking of the two candidates (Wabit and QlikView) - the first is simply immature (I've got dozens of exceptions when trying to step outside of what was suggested in their demo) whereas the other only works under Windows (not very nice, but I could live with that) and the integration would likely to require me to write some VBScript (yuck!). I had to spend a couple of hours on QlikView forums just to get a simple date range control working and failed because the Personal Edition I had did not support downloadable demo projects available on their site. Don't get me wrong, they're both good tools for what they have been built for, but I simply don't see any point of doing integration with them as I wouldn't gain much.
To address (arguable) immatureness of Solr I will define an abstract API so I can move all the data to a database which supports full text queries if anything goes wrong. And if worse comes to worse, I can always write stuff on top of Solr/Lucene if I need to.
If you're truly in a scenario where you're not sure what you don't know i think it's best to explore an open-source tool and evaluate its usefulness before diving into your own implementation. It could very well be that using the open-source solution will help you further crystallise your own understanding and required features.
I had worked previously w/ an open-source solution called Pentaho. I seriously felt that I understood a whole lot more by learning to use Pentaho's features for my end. Of course, as is the case of working w/ most of the open-source solutions, Pentaho seemed to be a bit intimidating at first, but I managed to get a good grip of it in a month's time. We also worked with Kettle ETL tool and Mondrian cubes - which I think most of the serious BI tools these days build on top of.
Earlier, all these components were independent, but off-late i believe Pentaho took ownership of all these projects.
But once you're confident w/ what you need and what you don't, I'd suggest building some basic reporting tool of your own on top of a mondrian implementation. Customising a sophisticated open-source tool can indeed be a big issue. Besides, there are licenses to be wary of. I believe Pentaho is GPL, though you might want to check on that.
First you should make clear what your reports should show. Which reporting feature do you need? Which output formats do you want? Do you want show it in the browser (HTML) or as PDF or with an interactive viewer (Java/Flash). Where are the data (database, Java, etc.)? Do you need Ad-Hoc reporting or only some hard coded reports? This are only some questions.
Without answers to this question it is difficult to give a real recommendation, but my general recommendation would be i-net Clear Reports (used to be called i-net Crystal-Clear). It is a Java tool. It is a commercial tool but the cost are lower as SAP and co.

FPGA Place & Route

For programming FPGAS, is it possible to write my own place & route routines? [The point is not that mine would be better; the point is whether I have the freedom to do so] -- or does the place & route stage output into undocumented bitfiles, essengially forcing me to use proprietary tools?
Thanks!
There's been some discussion of this on comp.arch.fpga in the past. The conclusion is generally that unless you want to attract intense legal action from the FPGA companies then you probably don't want to do something like this. bitfile formats are closely guarded secrets of the FPGA companies and you would likely have to understand the file format in order to do what you want to do. That implies that you would need to reverse engineer the format and that (if you made your tool public in any way) would get you a lawsuit in short order.
I will add that there probably are intermediate files and that you likely wouldn't read or write the bitfile itself to do what you want to do, but those intermediate files tend to be undocumented as well. Read the EULA for your FPGA synthesis tool (ISE from Xilinx, for example) - any kind of reverse engineering is strictly forbidden. It seems that the only way we'll ever have open source alternatives in this space is for an open source FPGA architecture to emerge.
I agree with annccodeal, but to amplify a little bit, on Xilinx, there may be a few ways to do this. The XDL file format allows (or used to allow) explicit placement and routing. In addition, it should be possible to script the FPGA Editor to implement custom routing.
As regards placement, there is a rich infrastructure to constrain technology mapping of logic to primitives and to control placement of those primitives. For example LUT_MAP constraints can control technology mapping and LOC and RLOC constraints can determine placement. In practice, these allow the experienced designer great control over how a design is implemented without requiring them to duplicate man-centuries of software development to generate a bitstream directly.
You may also find interesting the current state of the art FPGA CAD research software such VPR. In my opinion these are challenged to keep up with vendor's own tools that must cope with modern heterogeneous FPGAs with splittable 6-LUTs, DSP blocks, etc.
Happy hacking.

extracting a specific melody/beat/rhythm from a specific instument from a mixed wave (or other music format) file

Is it possible to write a program that can extract a melody/beat/rhythm provided by a specific instument in a wave (or other music format) file made up of multiple instruments?
Which algorithms could be used for this and what programming language would be best suited to it?
This is a fascinating area. The basic mathematical tool here is the Fourier Transform. To get an idea of how it works, and how challenging it can be, take a look at the analysis of the opening chord to A Hard Day's Night.
An instrument produces a sound signature, just the same way our voices do. There are algorithms out there that can pick a single voice out of a crowd and identify that voice from its signature in a database which is used in forensics. In the exact same way, the sound signature of a single instrument can be picked out of a soundscape (such as your mixed wave) and be used to pick out a beat, or make a copy of that instrument on its own track.
Obviously if you're thinking about making copies of tracks, i.e. to break down the mixed wave into a single track per instrument you're going to be looking at a lot of work. My understanding is that because of the frequency overlaps of instruments, this isn't going to be straightforward by any means... not impossible though as you've already been told.
There's quite an interesting blog post by Comparisonics about sound matching technologies which might be useful as a start for your quest for information: http://www.comparisonics.com/SearchingForSounds.html
To extract the beat or rhythm, you might not need perfect isolation of the instrument you're targeting. A general solution may be hard, but if you're trying to solve it for a particular piece, it may be possible. Try implementing a band-pass filter and see if you can tune it to selects th instrument you're after.
Also, I just found this Mac product called PhotoSounder. They have a blog showing different ways it can be used, including isolating an individual instrument (with manual intervention).
Look into Karaoke machine algorithms. If they can remove voice from a song, I'm sure the same principles can be applied to extract a single instrument.
Most instruments make sound within certain frequency ranges.
If you write a tunable bandpass filter - a filter that only lets a certain frequency range through - it'll be about as close as you're likely to get. It will not be anywhere near perfect; you're asking for black magic. The only way to perfectly extract a single instrument from a track is to have an audio sample of the track without that instrument, and do a difference of the two waveforms.
C, C++, Java, C#, Python, Perl should all be able to do all of this with the right libraries. Which one is "best" depends on what you already know.
It's possible in principle, but very difficult - an open area of research, even. You may be interested in the project paper for Dancing Monkeys, a step generation program for StepMania. It does some fairly sophisticated beat detection and music analysis, which is detailed in the paper (linked near the bottom of that page).

Digital Circuit understanding

In my quest for getting some basics down before I start going into programming I am looking for essential knowledge about how the computer works down at the core level.
I have a theory that actually understanding what for instance a stackoverflow let alone a stack is, instead of my sporadic knowledge about computer systems, will help me longer term.
Is there any books or sites that take you through how processors are structured and give a holistic overview and that somehow relates to good to know about digital logic?
Am i making sense?
Yes, you should read some topics of
John L. Hennessy & David A. Patterson, "Computer Architecture: A quantitative Approach"
It has microprocessors' history and theory , (starting with RISC archs - MIPS), pipelining, memory, storage, etc.
David Patterson is a Professor of Computer of Computer Science on EECS Department - U. Berkeley. http://www.eecs.berkeley.edu/~pattrsn/
Hope it helps, here's the link
Tanenbaum's Structured Computer Organization is a good book about how computers work. You might find it hard to get through the book, but that's mostly due to the subject, not the author.
However, I'm not sure I would recommend taking this approach. Understanding how the computer works can certainly be useful, but if you don't really have any programming knowledge, you can't really put your knowledge to good use - and you probably don't need that knowledge yet anyway. You would be better off learning about topics like object-oriented programming and data structures to learn about program design, because unless you're looking at doing embedded programming on very limited systems, you'll find those skills far more useful than knowledge of a computer's inner workings.
In my opinion, 20 years ago it was possible to understand the whole spectrum from BASIC all the way through operating system, hardware, down to the transistor or even quantum level. I don't know that it's possible for one person to understand that whole spectrum with today's technology. (Years ago, everyone serviced their own car. Today it's too hard.)
Some of the "layers" that you might be interested in:
http://en.wikipedia.org/wiki/Boolean_logic (this will be helpful for programming)
http://en.wikipedia.org/wiki/Flip-flop_%28electronics%29
http://en.wikipedia.org/wiki/Finite-state_machine
http://en.wikipedia.org/wiki/Static_random_access_memory
http://en.wikipedia.org/wiki/Bus_%28computing%29
http://en.wikipedia.org/wiki/Microprocessor
http://en.wikipedia.org/wiki/Computer_architecture
It's pretty simple really - the cpu loads instructions and executes them, most of those instructions revolve around loading values into registers or memory locations, and then manipulating those values. Certain memory ranges are set aside for communicating with the peripherals that are attached to the machine, such as the screen or hard drive.
Back in the days of Apple ][ and Commodore 64 you could put a value directly in to a memory location and that would directly change a pixel on the screen - those days are long gone, it is abstracted away from you (the programmer) by several layers of code, such as drivers and the operating system.
You can learn about this sort of stuff, or assembly language (which i am a huge fan of), or AND/NAND gates at the hardware level, but knowing this sort of stuff is not going to help you code up a web application in ASP.NET MVC, or write a quick and dirty Python or Powershell script.
There are lots of resources out there sprinkled around the net that will give you insight into how the CPU and the rest of the hardware works, but if you want to get down and dirty i honestly think you should buy one of those older machines off eBay or somewhere, and learn its particular flavour of assembly language (i understand there are also a lot of programmable PIC controllers out there that might also be good to learn on). Picking up an older machine is going to eliminate the software abstractions and make things way easier to learn. You learn way better when you get instant gratification, like making sprites move around a screen or generating sounds by directly toggling the speaker (or using a PIC controller to control a small robot). With those older machines, the schematics for an Apple ][ motherboard fit on to a roughly A2 size sheet of paper that was folded into the back of one of the Apple manuals - i would hate to imagine what they look like these days.
While I agree with the previous answers insofar as it is incredibly difficult to understand the entire process, we can at least break it down into categories, from lowest (closest to electrons) to highest (closest to what you actually see).
Lowest
Solid State Device Physics (How transistors work physically)
Circuit Theory (How transistors are combined to create logic gates)
Digital Logic (How logic gates are put together to create digital functions or digital structures i.e. multiplexers, full adders, etc.)
Hardware Organization (How the data path is laid out in the CPU, the components of a Von Neuman machine -> memory, processor, Arithmetic Logic Unit, fetch/decode/execute)
Microinstructions (Bit level programming)
Assembly (Programming with words, but directly specifying registers and takes forever to program even simple things)
Interpreted/Compiled Languages (Programming languages that get compiled or interpreted to assembly; the operating system may be in one of these)
Operating System (Process scheduling, hardware interfaces, abstracts lower levels)
Higher level languages (these kind of appear twice; it depends on the language. Java is done at a very high level, but C goes straight to assembly, and the C compiler is probably written in C)
User Interfaces/Applications/Gui (Last step, making it look pretty)
You can find out a lot about each of these. I'm only somewhat expert in the digital logic side of things. If you want a thorough tutorial on digital logic from the ground up, go to the electrical engineering menu of my website:
affablyevil.wordpress.com
I'm teaching the class, and adding online lessons as I go.

Resources