Recognizing file formats from binary (C++) - c++11

I am a beginner C++ programmer.
I wrote a simple program that creates a char array (the size is user's choice) and reads what previous information was in it. Often you can find something that makes sense (I always find the alphabet?) but most of it is just strange characters. I made it output into a binary file.
However, How do I:
Recognize the different chunks of data
Recognize what chunks are what file format (i.e. what chunk is an image, audio, text, etc.)
My Code:
// main.cpp
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
int memory_size = 4000;
string data = "";
bool inFile = false;
cout << "How many bytes do you want to retrieve? (-1 to exit)\n";
cin >> memory_size;
string y_n;
cout << "Would you like to write output into a file? (Y/N)\n";
cin >> y_n;
if (y_n.compare("Y") == 0 || y_n.compare("y") == 0)
inFile = true;
else
inFile = false;
char memory_chunk[memory_size];
for (int i=0;i<memory_size;i++) {
cout << memory_chunk[i] << "";
data += memory_chunk[i] + "";
}
if (inFile) {
ofstream file("output.binary", ios::out | ios::binary);
file.write(memory_chunk, sizeof memory_chunk);
file.close();
}
cin >> data;
return 0;
}
Example of the retrieved data: (This is A LOT smaller than what it usually can retrieve)
dû( L)   àýtú( ¯1Œw ÐýDú( #ú( Lú( dû( ¼û(   L) º
 ‰v8û( 7Œw û(  ú( 0ý( k7Œwdû( # 5 À ü( ¨›w ó˜wÞ¯ › Ø› 0ý( Hû( À › `›  À Dû( LŒw › #› `› › lû( ÷Œw ›  › ˜› › û( 3YŒw › ~Œw › €› › à› Dü( › €› Dü( ßWŒwXŒwDÞ¯ › ›  €› ˆ› À › ¦› › !› : À › `›   À ü( › ˆ› V €›
Œw ˆ› ¬û( Äÿ( ‘Q‡w€ôçþÿÿÿXŒwµTŒw ‚› xü(  È6‹w › À×F  fÍñt"ãŠvEA #ÒF  ¸ü(
 þÿÿÿ#ÒF Ã~“v Øü( O¯‰vØÞ¯øü( œ›‰v ›   ˆý( ‡ÌE  #ÒF
8|“v ý(  ‰v#M“v,ý( wî‰v hý( ¬_‘v8|“v˜_‘vݧY‘ ÀwF
<ý( Äÿ( e‹vàçþÿÿÿ˜_‘v"A
8|“v#ÒF ÀwF ïÀE ÕF ”› ÓºA ”› ÕF lF €F  F 2 àýàý( ð #

Some file formats start with magic numbers that help to identify them, though this is not always the case. Wikipedia has some here:
http://en.wikipedia.org/wiki/List_of_file_signatures. The unix command 'file' trys to guess file formats based on magic numbers in the data. The source code to that is most likely available somewhere. (apple darwin sources if nowhere else).

Related

boost::asio::async_read_until with custom match_char to accept only JSON format

I've been trying to change match_char function to accept only JSON messages when reading data from a socket.
I have 2 implementations (one does not work and the other one works but I don't think it's efficient).
1- First approach (working)
typedef boost::asio::buffers_iterator<boost::asio::streambuf::const_buffers_type> buffer_iterator;
static std::pair<buffer_iterator, bool> match_json2(const buffer_iterator begin,
const buffer_iterator end) {
buffer_iterator i = begin;
while (i != end) {
if ((*i == ']') || (*i == '}')) {
return std::make_pair(i, true);
}
*i++;
}
return std::make_pair(i, false);
}
With this definition, I read in a loop and reconstruct the json. This is a working version, but if I receive a message different from a valid json, I stay in the loop, can't clear tmp_response and never recover from it...
std::string read_buffer_string() {
std::string response;
bool keepReading = true;
while (keepReading) {
std::string tmp_response;
async_read_until(s, ba::dynamic_buffer(tmp_response), match_json2, yc);
if (!tmp_response.empty()) {
response += tmp_response;
if (nlohmann::json::accept(response)) {
keepReading = false;
}
}
}
return response;
}
Second approach (not working). Ideally I would like something like this one (this implementation doesn't work because begin iterator doesn't always point to the start of the message - I guess some data is already been transferred to the buffer-, and therefore match_json returns invalid values.
static std::pair<buffer_iterator, bool> match_json(const buffer_iterator begin,
const buffer_iterator end) {
buffer_iterator i = begin;
while (i != end) {
if ((*i == ']') || (*i == '}')) {
std::string _message(begin, i);
std::cout << _message << std::endl;
if (nlohmann::json::accept(_message)) {
return std::make_pair(i, true);
}
}
*i++;
}
return std::make_pair(i, false);
}
And then call it like this:
std::string read_buffer_string() {
std::string response;
async_read_until(s, ba::dynamic_buffer(response), match_json, yc);
return response;
}
Does anybody now a more efficient way to do it?
Thanks in advance! :)
Of course, right after posting my other answer I remembered that Boost has accepted Boost JSON in 1.75.0.
It does stream parsing way more gracefully: https://www.boost.org/doc/libs/1_75_0/libs/json/doc/html/json/ref/boost__json__stream_parser.html#json.ref.boost__json__stream_parser.usage
It actually deals with trailing data as well!
stream_parser p; // construct a parser
std::size_t n; // number of characters used
n = p.write_some( "[1,2" ); // parse some of a JSON
assert( n == 4 ); // all characters consumed
n = p.write_some( ",3,4] null" ); // parse the remainder of the JSON
assert( n == 6 ); // only some characters consumed
assert( p.done() ); // we have a complete JSON
value jv = p.release(); // take ownership of the value
I would also submit that this could be a better match for a CompletionCondition: see https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/reference/read/overload3.html
Here's an implementation that I tested with:
template <typename Buffer, typename SyncReadStream>
static size_t read_json(SyncReadStream& s, Buffer buf,
boost::json::value& message, boost::json::parse_options options = {})
{
boost::json::stream_parser p{{}, options};
size_t total_parsed = 0;
boost::asio::read(s, buf, [&](boost::system::error_code ec, size_t /*n*/) {
size_t parsed = 0;
for (auto& contiguous : buf.data()) {
parsed += p.write_some(
boost::asio::buffer_cast<char const*>(contiguous),
contiguous.size(), ec);
}
buf.consume(parsed);
total_parsed += parsed;
return ec || p.done(); // true means done
});
message = p.release(); // throws if incomplete
return total_parsed;
}
Adding a delegating overload for streambufs:
template <typename SyncReadStream, typename Alloc>
static size_t read_json(SyncReadStream& s,
boost::asio::basic_streambuf<Alloc>& buf,
boost::json::value& message,
boost::json::parse_options options = {})
{
return read_json(s, boost::asio::basic_streambuf_ref<Alloc>(buf), message, options);
}
Demo Program
This demo program adds the test-cases from earlier as well as a socket client with some benchmark statistics added. Arguments:
test to run the tests instead of the socket client
streambuf to use the streambuf overload instead of std::string dynamic buffer
comments to allow comments in the JSON
trailing_commas to allow trailing commas in the JSON
invalid_utf8 to allow invalid utf8 in the JSON
Live On Compiler Explorer¹
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
int main() {
std::string const s =
"? 8==2 : true ! false"
"? 9==3 : 'book' ! 'library'";
using expression = std::string;
using ternary = std::tuple<expression, expression, expression>;
std::vector<ternary> parsed;
auto expr_ = x3::lexeme [+~x3::char_("?:!")];
auto ternary_ = "?" >> expr_ >> ":" >> expr_ >> "!" >> expr_;
std::cout << "=== parser approach:\n";
if (x3::phrase_parse(begin(s), end(s), *x3::seek[ ternary_ ], x3::space, parsed)) {
for (auto [cond, e1, e2] : parsed) {
std::cout
<< " condition " << std::quoted(cond) << "\n"
<< " true expression " << std::quoted(e1) << "\n"
<< " else expression " << std::quoted(e2) << "\n"
<< "\n";
}
} else {
std::cout << "non matching" << '\n';
}
}
With test prints:
----- valid test cases
Testing {} -> Success {}
Testing {"a":4, "b":5} -> Success {"a":4,"b":5}
Testing [] -> Success []
Testing [4, "b"] -> Success [4,"b"]
----- incomplete test cases
Testing { -> (incomplete...)
Testing {"a":4, "b" -> (incomplete...)
Testing [ -> (incomplete...)
Testing [4, " -> (incomplete...)
----- invalid test cases
Testing } -> syntax error
Testing "a":4 } -> Success "a" -- remaining `:4 }`
Testing ] -> syntax error
----- excess input test cases
Testing {}{"a":4, "b":5} -> Success {} -- remaining `{"a":4, "b":5}`
Testing []["a", "b"] -> Success [] -- remaining `["a", "b"]`
Testing {} bogus trailing data -> Success {} -- remaining `bogus trailing data`
With socket client some demos:
Mean packet size: 16 in 2 packets
Request: 28 bytes
Request: {"a":4,"b":"5"} bytes
Remaining data: "bye
"
took 0.000124839s, ~0.213899MiB/s
With a large (448MiB) location_history.json:
Mean packet size: 511.999 in 917791 packets
Request: 469908167 bytes
(large request output suppressed)
took 3.30509s, ~135.59MiB/s
¹ linking non-header only liobraries is not supported on Compiler Explorer
TL/DR;
Seriously, just add framing to your wire protocol. E.g. even HTTP responses do this (e.g. via the content length headers, and maybe chunked encoding)
UPDATE:
Instead of handrolling you can go with Boost JSON as I added in another answer
The first approach is flawed, because you are using "async_read_until" yet treat the operation as if it were synchronous.
The second problem is, neither json::parse nor json::accept can report the location of a complete/broken parse. This means that you really do need framing in your wire protocol, because you CANNOT detect message boundaries.
The rest of this answer will first dive in to expose how the limitations of the nlohmann::json library make your task impossible¹.
So even though it's commendable for you to use an existing library, we look for alternatives.
Making It Work(?)
You could use the approach that Beast uses (http::read(s, buf, http::message<>). That is: have a reference to the entire buffer.
flat_buffer buf;
http::request<http::empty_body> m;
read(s, buf, m); // is a SyncStream like socket
Here, read is a composed operation over the message as well as the buffer. This makes it easy to check the completion criteria. In our case, let's make a reader that also serves as a match-condition:
template <typename DynamicBuffer_v1>
struct JsonReader {
DynamicBuffer_v1 _buf;
nlohmann::json message;
JsonReader(DynamicBuffer_v1 buf) : _buf(buf) {}
template <typename It>
auto operator()(It dummy, It) {
using namespace nlohmann;
auto f = buffers_begin(_buf.data());
auto l = buffers_end(_buf.data());
bool ok = json::accept(f, l);
if (ok) {
auto n = [&] {
std::istringstream iss(std::string(f, l));
message = json::parse(iss);
return iss.tellg(); // detect consumed
}();
_buf.consume(n);
assert(n);
std::advance(dummy, n);
return std::pair(dummy, ok);
} else {
return std::pair(dummy, ok);
}
}
};
namespace boost::asio {
template <typename T>
struct is_match_condition<JsonReader<T>> : public boost::true_type { };
}
This is peachy and works on the happy path. But you run into big trouble on edge/error cases:
you can't distinguish incomplete data from invalid data, so you MUST assume that unaccepted input is just incomplete (otherwise you would never wait for data to be complete)
you will wait until infinity for data to become "valid" if it's just invalid or
worse still: keep reading indefinitely, possibly running out of memory (unless you limit the buffer size; this could lead to a DoS)
perhaps worst of all, if you read more data than the single JSON message (which you can not in general prevent in the context of stream sockets), the original message will be rejected due to "excess input". Oops
Testing It
Here's the test cases that confirm the analysis conclusios predicted:
Live On Compiler Explorer
#include <boost/asio.hpp>
#include <nlohmann/json.hpp>
#include <iostream>
#include <iomanip>
template <typename Buffer>
struct JsonReader {
static_assert(boost::asio::is_dynamic_buffer_v1<Buffer>::value);
Buffer _buf;
nlohmann::json message;
JsonReader() = default;
JsonReader(Buffer buf) : _buf(buf) {}
template <typename It>
auto operator()(It dummy, It) {
using namespace nlohmann;
auto f = buffers_begin(_buf.data());
auto l = buffers_end(_buf.data());
bool ok = json::accept(f, l);
if (ok) {
auto n = [&] {
std::istringstream iss(std::string(f, l));
message = json::parse(iss);
return iss.tellg(); // detect consumed
}();
_buf.consume(n);
assert(n);
//std::advance(dummy, n);
return std::pair(dummy, ok);
} else {
return std::pair(dummy, ok);
}
}
};
namespace boost::asio {
template <typename T>
struct is_match_condition<JsonReader<T>> : public boost::true_type { };
}
static inline void run_tests() {
std::vector<std::string> valid {
R"({})",
R"({"a":4, "b":5})",
R"([])",
R"([4, "b"])",
},
incomplete {
R"({)",
R"({"a":4, "b")",
R"([)",
R"([4, ")",
},
invalid {
R"(})",
R"("a":4 })",
R"(])",
},
excess {
R"({}{"a":4, "b":5})",
R"([]["a", "b"])",
R"({} bogus trailing data)",
};
auto run_tests = [&](auto& cases) {
for (std::string buf : cases) {
std::cout << "Testing " << std::left << std::setw(22) << buf;
bool ok = JsonReader { boost::asio::dynamic_buffer(buf) }
(buf.begin(), buf.end())
.second;
std::cout << " -> " << std::boolalpha << ok << std::endl;
if (ok && !buf.empty()) {
std::cout << " -- remaining buffer " << std::quoted(buf) << "\n";
}
}
};
std::cout << " ----- valid test cases \n";
run_tests(valid);
std::cout << " ----- incomplete test cases \n";
run_tests(incomplete);
std::cout << " ----- invalid test cases \n";
run_tests(invalid);
std::cout << " ----- excess input test cases \n";
run_tests(excess);
}
template <typename SyncReadStream, typename Buffer>
static void read(SyncReadStream& s, Buffer bufarg, nlohmann::json& message) {
using boost::asio::buffers_begin;
using boost::asio::buffers_end;
JsonReader reader{bufarg};;
read_until(s, bufarg, reader);
message = reader.message;
}
int main() {
run_tests();
}
Prints
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> false
Testing "a":4 } -> false
Testing ] -> false
----- excess input test cases
Testing {}{"a":4, "b":5} -> false
Testing []["a", "b"] -> false
Testing {} bogus trailing data -> false
Looking For Alternatives
You could roll your own as I did in the past:
Parse a substring as JSON using QJsonDocument
Or we can look at another library that DOES allow us to either detect boundaries of valid JSON fragments OR detect and leave trailing input.
Hand-Rolled Approach
Here's a simplistic translation to more modern Spirit X3 of that linked answer:
// Note: first iterator gets updated
// throws on known invalid input (like starting with `]' or '%')
template <typename It>
bool tryParseAsJson(It& f, It l)
{
try {
return detail::x3::parse(f, l, detail::json);
} catch (detail::x3::expectation_failure<It> const& ef) {
throw std::runtime_error("invalid JSON data");
}
}
The crucial point is that this *in addition to return true/false will update the start iterator according to how far it consumed the input.
namespace JsonDetect {
namespace detail {
namespace x3 = boost::spirit::x3;
static const x3::rule<struct value_> value{"value"};
static auto primitive_token
= x3::lexeme[ x3::lit("false") | "null" | "true" ];
static auto expect_value
= x3::rule<struct expect_value_> { "expect_value" }
// array, object, string, number or other primitive_token
= x3::expect[&(x3::char_("[{\"0-9.+-") | primitive_token | x3::eoi)]
>> value
;
// 2.4. Numbers
// Note our spirit grammar takes a shortcut, as the RFC specification is more restrictive:
//
// However non of the above affect any structure characters (:,{}[] and double quotes) so it doesn't
// matter for the current purpose. For full compliance, this remains TODO:
//
// Numeric values that cannot be represented as sequences of digits
// (such as Infinity and NaN) are not permitted.
// number = [ minus ] int [ frac ] [ exp ]
// decimal-point = %x2E ; .
// digit1-9 = %x31-39 ; 1-9
// e = %x65 / %x45 ; e E
// exp = e [ minus / plus ] 1*DIGIT
// frac = decimal-point 1*DIGIT
// int = zero / ( digit1-9 *DIGIT )
// minus = %x2D ; -
// plus = %x2B ; +
// zero = %x30 ; 0
static auto number = x3::double_; // shortcut :)
// 2.5 Strings
static const x3::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
static auto char_ = ~x3::char_("\"\\") |
x3::char_(R"(\)") >> ( // \ (reverse solidus)
x3::char_(R"(")") | // " quotation mark U+0022
x3::char_(R"(\)") | // \ reverse solidus U+005C
x3::char_(R"(/)") | // / solidus U+002F
x3::char_(R"(b)") | // b backspace U+0008
x3::char_(R"(f)") | // f form feed U+000C
x3::char_(R"(n)") | // n line feed U+000A
x3::char_(R"(r)") | // r carriage return U+000D
x3::char_(R"(t)") | // t tab U+0009
x3::char_(R"(u)") >> _4HEXDIG ) // uXXXX U+XXXX
;
static auto string = x3::lexeme [ '"' >> *char_ >> '"' ];
// 2.2 objects
static auto member
= x3::expect [ &(x3::eoi | '"') ]
>> string
>> x3::expect [ x3::eoi | ':' ]
>> expect_value;
static auto object
= '{' >> ('}' | (member % ',') >> '}');
// 2.3 Arrays
static auto array
= '[' >> (']' | (expect_value % ',') >> ']');
// 2.1 values
static auto value_def = primitive_token | object | array | number | string;
BOOST_SPIRIT_DEFINE(value)
// entry point
static auto json = x3::skip(x3::space)[expect_value];
} // namespace detail
} // namespace JsonDetect
Obviously you put the implementation in a TU, but on Compiler Explorer we can't: Live On Compiler Explorer, using an adjusted JsonReader prints:
SeheX3Detector
==============
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> invalid JSON data
Testing "a":4 } -> true -- remaining `:4 }`
Testing ] -> invalid JSON data
----- excess input test cases
Testing {}{"a":4, "b":5} -> true -- remaining `{"a":4, "b":5}`
Testing []["a", "b"] -> true -- remaining `["a", "b"]`
Testing {} bogus trailing data -> true -- remaining ` bogus trailing data`
NlohmannDetector
================
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> false
Testing "a":4 } -> false
Testing ] -> false
----- excess input test cases
Testing {}{"a":4, "b":5} -> false
Testing []["a", "b"] -> false
Testing {} bogus trailing data -> false
Note how we now achieved some of the goals.
accepting trailing data - so we don't clobber any data after our message
failing early on some inputs that cannot possibly become valid JSON
However, we can't fix the problem of waiting indefinitely on /possibly/ incomplete valid data
Interestingly, one of our "invalid" test cases was wrong (!). (It is always a good sign when test cases fail). This is because "a" is actually a valid JSON value on its own.
Conclusion
In the general case it is impossible to make such a "complete message" detection work without at least limiting buffer size. E.g. a valid input could start with a million spaces. You don't want to wait for that.
Also, a valid input could open a string, object or array², and not terminate that within a few gigabytes. If you stop parsing before hand you'll never know whether it was ultimately a valid message.
Though you'll inevitably have to deal with network timeout anyways you will prefer to be proactive about knowing what to expect. E.g. send the size of the payload ahead of time, so you can use boost::asio::transfer_exactly and validate precisely what you expected to get.
¹ practically. If you don't care about performance, you could iteratively run accept on increasing lengths of buffer
² god forbid, a number like 0000....00001 though that's subject to parser implementation differences

Extract trailing int from string containing other characters

I have a problem in regards of extracting signed int from string in c++.
Assuming that i have a string of images1234, how can i extract the 1234 from the string without knowing the position of the last non numeric character in C++.
FYI, i have try stringstream as well as lexical_cast as suggested by others through the post but stringstream returns 0 while lexical_cast stopped working.
int main()
{
string virtuallive("Images1234");
//stringstream output(virtuallive.c_str());
//int i = stoi(virtuallive);
//stringstream output(virtuallive);
int i;
i = boost::lexical_cast<int>(virtuallive.c_str());
//output >> i;
cout << i << endl;
return 0;
}
How can i extract the 1234 from the string without knowing the position of the last non numeric character in C++?
You can't. But the position is not hard to find:
auto last_non_numeric = input.find_last_not_of("1234567890");
char* endp = &input[0];
if (last_non_numeric != std::string::npos)
endp += last_non_numeric + 1;
if (*endp) { /* FAILURE, no number on the end */ }
auto i = strtol(endp, &endp, 10);
if (*endp) {/* weird FAILURE, maybe the number was really HUGE and couldn't convert */}
Another possibility would be to put the string into a stringstream, then read the number from the stream (after imbuing the stream with a locale that classifies everything except digits as white space).
// First the desired facet:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white-space:
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except digits, which are digits
std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
// and '.', which we'll call punctuation:
rc['.'] = std::ctype_base::punct;
return &rc[0];
}
};
Then the code to read the data:
std::istringstream virtuallive("Images1234");
virtuallive.imbue(locale(locale(), new digits_only);
int number;
// Since we classify the letters as white space, the stream will ignore them.
// We can just read the number as if nothing else were there:
virtuallive >> number;
This technique is useful primarily when the stream contains a substantial amount of data, and you want all the data in that stream to be interpreted in the same way (e.g., only read numbers, regardless of what else it might contain).

UTF-16 to UTF-8 using ICU library

I wanted to convert UTF-16 strings to UTF-8. I came across the ICU library by Unicode. I am having problems doing the conversion as the default is UTF-16.
I have tried using converter:
UErrorCode myError = U_ZERO_ERROR;
UConverter *conv = ucnv_open("UTF-8", &myError);
int32_t bytes = ucnv_fromUChars(conv, target, 0, (UChar*)source, numread, &myError);
char *targetLimit = target + reqdLen;
const UChar *sourceLimit = mySrc + numread;
ucnv_fromUnicode(conv,&target, targetLimit, &mySrc, sourceLimit, NULL, TRUE, &myError);
I get bytes as -(big random number)
and garbage at the original target location
What am i missing?
It's a best practice to check for errors after calls that specify a UErrorCode parameter. I would start there.
Something like...
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
}

Trouble trying to output file using vtkOBJWriter

I am trying to use vtkOBJWriter from David Doria to convert a .vtk file to a .obj file. I git cloned from https://github.com/daviddoria/vtkOBJWriter, added a build directory for the CMake and make, and altered the file vtkOBJWriterExample.cxx to:
#include <vtkSmartPointer.h>
#include <vtkPolyData.h>
#include <vtkSphereSource.h>
#include <vtkPolyDataReader.h>
#include "vtkOBJWriter.h"
int main (int argc, char *argv[])
{
vtkSmartPointer<vtkPolyData> input;
std::string outputFilename;
// Verify command line arguments
if(argc > 1) // Use the command line arguments
{
if(argc != 3)
{
std::cout << "Required arguments: InputFilename.vtp OutputFilename.obj" << std::endl;
return EXIT_FAILURE;
}
vtkSmartPointer<vtkPolyDataReader> reader =
vtkSmartPointer<vtkPolyDataReader>::New();
reader->SetFileName(argv[1]);
reader->Update();
input = reader->GetOutput();
outputFilename = argv[2];
}
else
{
outputFilename = "output.obj";
vtkSmartPointer<vtkSphereSource> sphereSource =
vtkSmartPointer<vtkSphereSource>::New();
sphereSource->Update();
input->ShallowCopy(sphereSource->GetOutput());
}
vtkSmartPointer<vtkOBJWriter> writer =
vtkSmartPointer<vtkOBJWriter>::New();
writer->SetInput(input);
writer->SetFileName(outputFilename.c_str());
writer->Update();
return EXIT_SUCCESS;
}
to reflect that I am using VTK 5.8.0 . When I try to do sudo ./vtkOBJWriterExample trytry1.vtk Documents/comeOn.obj , no output file is made (I don't see it in the appropriate directory). I also tried it with trytry1.vtp, and it didn't seem to work. My vtk file format is :
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET POLYDATA
FIELD FieldData 3
group_id 1 1 int
0
base_index 1 3 int
0 0 0
avtOriginalBounds 1 6 double
-10 10 -10 10 -10 10
POINTS 14387 float
-5.10204 -2.65306 -9.69246 -5.10204 -2.75294 -9.59184 -5.37199 -2.65306 -9.59184
...
POLYGONS 28256 113024
3 0 1 2
...
POINT_DATA 14387
SCALARS hardyglobal float
LOOKUP_TABLE default
3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926
...
which doesn't seem to match the formatting of car.vtp in the data directory, but I thought I made the appropriate changes (using the formatting of vtkPolyDataReader.h instead of vtkXMLPolyDataReader.h ). I am not sure why there is no file being outputted.
I do not receive any error messages.
It was a directory problem (my command line arguments were pointing to the wrong directory). It should have been just ./vtkOBJWriterExample trytry1.vtk comeOn.obj

How to check whether a system is big endian or little endian?

How to check whether a system is big endian or little endian?
In C, C++
int n = 1;
// little endian if true
if(*(char *)&n == 1) {...}
See also: Perl version
In Python:
from sys import byteorder
print(byteorder)
# will print 'little' if little endian
Another C code using union
union {
int i;
char c[sizeof(int)];
} x;
x.i = 1;
if(x.c[0] == 1)
printf("little-endian\n");
else printf("big-endian\n");
It is same logic that belwood used.
A one-liner with Perl (which should be installed by default on almost all systems):
perl -e 'use Config; print $Config{byteorder}'
If the output starts with a 1 (least-significant byte), it's a little-endian system. If the output starts with a higher digit (most-significant byte), it's a big-endian system. See documentation of the Config module.
In C++20 use std::endian:
#include <bit>
#include <iostream>
int main() {
if constexpr (std::endian::native == std::endian::little)
std::cout << "little-endian";
else if constexpr (std::endian::native == std::endian::big)
std::cout << "big-endian";
else
std::cout << "mixed-endian";
}
If you are using .NET: Check the value of BitConverter.IsLittleEndian.
In Rust (no crates or use statements required)
In a function body:
if cfg!(target_endian = "big") {
println!("Big endian");
} else {
println!("Little endian");
}
Outside a function body:
#[cfg(target_endian = "big")]
fn print_endian() {
println!("Big endian")
}
#[cfg(target_endian = "little")]
fn print_endian() {
println!("Little endian")
}
This is what the byteorder crate does internally: https://docs.rs/byteorder/1.3.2/src/byteorder/lib.rs.html#1877
In Powershell
[System.BitConverter]::IsLittleEndian
In Linux,
static union { char c[4]; unsigned long mylong; } endian_test = { { 'l', '?', '?', 'b' } };
#define ENDIANNESS ((char)endian_test.mylong)
if (ENDIANNESS == 'l') /* little endian */
if (ENDIANNESS == 'b') /* big endian */
A C++ solution:
namespace sys {
const unsigned one = 1U;
inline bool little_endian()
{
return reinterpret_cast<const char*>(&one) + sizeof(unsigned) - 1;
}
inline bool big_endian()
{
return !little_endian();
}
} // sys
int main()
{
if(sys::little_endian())
std::cout << "little";
}
In Rust (byteorder crate required):
use std::any::TypeId;
let is_little_endian = TypeId::of::<byteorder::NativeEndian>() == TypeId::of::<byteorder::LittleEndian>();
Using Macro,
const int isBigEnd=1;
#define is_bigendian() ((*(char*)&isBigEnd) == 0)
In C
#include <stdio.h>
/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n)
{
int i;
for (i = 0; i < n; i++)
printf("%2x ", start[i]);
printf("\n");
}
/*Main function to call above function for 0x01234567*/
int main()
{
int i = 0x01234567;
show_mem_rep((char *)&i, sizeof(i));
return 0;
}
When above program is run on little endian machine, gives “67 45 23 01” as output , while if it is run on big endian machine, gives “01 23 45 67” as output.
A compilable version of the top answer for n00bs:
#include <stdio.h>
int main() {
int n = 1;
// little endian if true
if(*(char *)&n == 1) {
printf("Little endian\n");
} else {
printf("Big endian\n");
}
}
Stick that in check-endianness.c and compile and run:
$ gcc -o check-endianness check-endianness.c
$ ./check-endianness
This whole command is a copy/pasteable bash script you can paste into your terminal:
cat << EOF > check-endianness.c
#include <stdio.h>
int main() {
int n = 1;
// little endian if true
if(*(char *)&n == 1) {
printf("Little endian\n");
} else {
printf("Big endian\n");
}
}
EOF
gcc -o check-endianness check-endianness.c \
&& ./check-endianness \
&& rm check-endianness check-endianness.c
The code is in a gist here if you prefer. There is also a bash command that you can run that will generate, compile, and clean up after itself.
In Nim,
echo cpuEndian
It is exported from the system module.
In bash (from How to tell if a Linux system is big endian or little endian?):
endian=`echo -n "I" | od -to2 | head -n1 | cut -f2 -d" " | cut -c6`
if [ "$endian" == "1" ]; then
echo "little-endian"
else
echo "big-endian"
fi
C logic to check whether your processor follows little endian or big endian
unsigned int i =12345;
char *c = (char *)&i; // typecast int to char* so that it points to first bit of int
if(*c != 0){ // If *c points to 0 then it is Big endian else Little endian
printf("Little endian");
}
else{
printf("Big endian");
}
Hope this helps. Was one of the question asked in my interview for the role of embedded software engineer role
All the answers using a program to find endianess at runtime is wrong! The fact whether a machine is big endian or little endian is hidden from the programmer, by the compiler. On a big-endian machine the typecast will again return 1, because the compiler knows that the machine is big endian and the casting will fetch the higher memory address. Only way to find the endianess is to fetch the system's configuration or environment variable. Similar to some of the answers above like the one liner perl answer etc.

Resources