How to upgrade PHP function parameters work with new Zend API? - php-extension

I am working on a php extension to upgrade it to PHP7, my question is about INTERNAL_FUNCTION_PARAMETERS. In the previous version it is defined as:
INTERNAL_FUNCTION_PARAMETERS int ht, zval *return_value, zval **return_value_ptr, zval *this_ptr, int return_value_used TSRMLS_DC
and in the new zend engine it is defined as:
INTERNAL_FUNCTION_PARAMETERS zend_execute_data *execute_data, zval *return_value
I have php function which returns an array and it looks like this:
`
PHP_FUNCTION( myFunc ){ zval* myArray;
array_init(myArray);
/////
zval_ptr_dtor( &return_value );
*return_value_ptr = myArray;
}
How should I get similar functionalities without hanvig the return_value_ptr?
Should I use #define RETURN_ARR(r)?, If so how does this affect the performance?

In PHP 7, most pointers to zvals (zval*) in PHP 5 have become plain zval structs (zval) - instead of passing around pointers to heap-allocated (emalloc) zvals, zvals themselves are copied. Because of this, in a sense, return_value is the new return_value_ptr, because there's one less level of indirection everywhere.
So, to go through line-by-line:
Line 1:
zval* myArray;
In PHP 7, you don't hold a pointer to a zval, you put it directly on the stack. No external allocation. So the first line of your function should be instead:
zval myArray;
Line 2:
array_init(myArray);
array_init needs a pointer to a zval (zval*), so this should be:
array_init(&myArray);
Line 4:
zval_ptr_dtor( &return_value );
Again, PHP 7 removes a level of indirection here. It would just be this now:
zval_dtor(return_value);
However, you don't need this line in PHP 7. The zval doesn't need deallocating (in fact you can't deallocate it), you can simply overwrite it. You would need to use zval_dtor(); if the zval contained a pointer to a string, array, or some other heap-allocated object, though. But in this case it's just a null, so you don't need to run it. Carrying on:
Line 5:
*return_value_ptr = myArray;
This should now be:
*return_value = myArray;
However, while you can directly overwrite return_value here, it is recommended to use the ZVAL_COPY_VALUE macro for this:
ZVAL_COPY_VALUE(return_value, &myArray);
Even better, you could use RETVAL_ZVAL which is a shortcut for setting the return value:
RETVAL_ZVAL(&myArray);
I should point out that you probably don't need the myArray zval in this case, as you can store the array in the return_value directly and save you having to copy it later. Another thing to bear in mind is that you should probably handle parameters. If you don't take any, zend_parse_parameters_none(); is sufficient.
I suggest reading the phpng upgrading guide and the internals upgrading guide.

Related

Casting to rvalue reference to "force" a move in a return value - clarification

Ok, I am starting to get the jist of rvalue references (I think). I have this code snippet that I was writing:
#include <iostream>
using namespace std;
std::string get_string()
{
std::string str{"here is your string\n"};
return std::move(str); // <----- cast here?
}
int main ()
{
std::string my_string = std::move(get_string()); // <----- or cast here?
std::cout << my_string;
return 0;
}
So I have a simple example where I have a function that returns a copy of a string. I have read that its bad (and got the core-dumps to prove it!) to return any reference to a local temp variable so I have discounted trying that.
In the assignment in main() I don't want to copy-construct that string I want to move-construct/assign the string to avoid copying the string too much.
Q1: I return a "copy" of the temp var in get_string() - but I have cast the return value to rvalue-red. Is that pointless or is that doing anything useful?
Q2: Assuming Q1's answer is I don't need to do that. Then am I moving the fresh copy of the temp variable into my_string, or am I moving directly the temp variable str into my_string.
Q3: what is the minimum number of copies that you need in order to get a string return value stored into an "external" (in my case in main()) variable, and how do you do that (if I am not already achieving it)?
I return a "copy" of the temp var in get_string() - but I have cast the return value to rvalue-red. Is that pointless or is that doing anything useful?
You don't have to use std::move in that situation, as local variables returned by value are "implicitly moved" for you. There's a special rule in the Standard for this. In this case, your move is pessimizing as it can prevent RVO (clang warns on this).
Q2: Assuming Q1's answer is I don't need to do that. Then am I moving the fresh copy of the temp variable into my_string, or am I moving directly the temp variable str into my_string.
You don't need to std::move the result of calling get_string(). get_string() is a prvalue, which means that the move constructor of my_string will automatically be called (pre-C++17). In C++17 and above, mandatory copy elision will ensure that no moves/copies happen (with prvalues).
Q3: what is the minimum number of copies that you need in order to get a string return value stored into an "external" (in my case in main()) variable, and how do you do that (if I am not already achieving it)?
Depends on the Standard and on whether or not RVO takes place. If RVO takes place, you will have 0 copies and 0 moves. If you're targeting C++17 and initializing from a prvalue, you are guaranteed to have 0 copies and 0 moves. If neither take place, you'll probably have a single move - I don't see why any copy should occur here.
You do not need to use std::move on the return value which is a local variable. The compiler does that for you:
If expression is an lvalue expression that is the (possibly parenthesized) name of an automatic storage duration object declared in the body or as a parameter of the innermost enclosing function or lambda expression, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor), and if no suitable conversion is available, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object's type (possibly cv-qualified), overload resolution is performed a second time, with expression considered as an lvalue (so it may select the copy constructor taking a reference to non-const).

How can flex return multiple terminals at one time

In order to make my question easy to understand I want to use the following example:
The following code is called nonblock do-loop in fortran language
DO 20 I=1, N ! line 1
DO 20 J=1, N ! line 2
! more codes
20 CONTINUE ! line 4
Pay attention that the label 20 at line 4 means the end of both the inner do-loop and the outer do-loop.
I want my flex program to parse the feature correctly: when flex reads the label 20, it will return ENDDO terminal twice.
Firstly, because I also use bison, so every time bison calls yylex() to get one terminal. If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases, maybe I could solve this problem, however, I got no idea here then.
Of course there are some workarounds, for eample, I can use flex's start condition but I don't think it is a good solution. So I ask if there's any way to solve my question without a workaround?
It is easy enough to modify the lexical scanner produced by (f)lex to implement a token queue, but that is not necessarily the optimal solution. (See below for a better solution.) (Also, it is really not clear to me that for your particular problem, fabricating the extra token in the lexer is truly appropriate.)
The general approach is to insert code at the top of the yylex function, which you can do by placing the code immediately after the %% line and before the first rule. (The code must be indented so that it is not interpreted as a rule.) For non-reentrant scanners, this will typically involve the use of a local static variable to hold the queue. For a simple but dumb example, using the C API but compiling with C++ so as to have access to the C++ standard library:
%%
/* This code will be executed each time `yylex` is called, before
* any generated code. It may include declarations, even if compiled
* with C89.
*/
static std::deque<int> tokenq;
if (!tokenq.empty()) {
int token = tokenq.front();
tokenq.pop_front();
return token;
}
[[:digit:]]+ { /* match a number and return that many HELLO tokens */
int n = atoi(yytext);
for (int i = 0; i < n; ++i)
tokenq.push_back(HELLO);
}
The above code makes no attempt to provide a semantic value for the queued tokens; you could achieve that using something like a std::queue<std::pair<int, YYSTYPE>> for the token queue, but the fact that YYSTYPE is typically a union will make for some complications. Also, if that were the only reason to use the token queue, it is obvious that it could be replaced with a simple counter, which would be much more efficient. See, for example, this answer which does something vaguely similar to your question (and take note of the suggestions in Note 1 of that answer).
Better alternative: Use a push parser
Although the token queue solution is attractive and simple, it is rarely the best solution. In most cases, code will be clearer and easier to write if you request bison to produce a "push parser". With a push parser, the parser is called by the lexer every time a token is available. This makes it trivial to return multiple tokens from a lexer action; you just call the parser for each token. Similarly, if a rule doesn't produce any tokens, it simply fails to call the parser. In this model, the only lexer action which actually returns is the <<EOF>> rule, and it only does so after calling the parser with the END token to indicate that parsing is complete.
Unfortunately, the interface for push parsers is not only subject to change, as that manual link indicates; it is also very badly documented. So here is a simple but complete example which shows how it is done.
The push parser keeps its state in a yypstate structure, which needs to be passed to the parser on each call. Since the lexer is called only once for each input file, it is reasonable for the lexer to own that structure, which can be done as above with a local static variable [Note 1]: the parser state is initialized when yylex is called, and the EOF rule deletes the parser state in order to reclaim whatever memory it is using.
It is usually most convenient to build a reentrant push parser, which means that the parser does not rely on the global yylval variable [Note 2]. Instead, a pointer to the semantic value must be provided as an additional argument to yypush_parse. If your parser doesn't refer to the semantic value for the particular token type, you can provide NULL for this argument. Or, as in the code below, you can use a local semantic value variable in the lexer. It is not necessary that every call to the push parser provide the same pointer. In all, the changes to the scanner definition are minimal:
%%
/* Initialize a parser state object */
yypstate* pstate = yypstate_new();
/* A semantic value which can be sent to the parser on each call */
YYSTYPE yylval;
/* Some example scanner actions */
"keyword" { /* Simple keyword which just sends a value-less token */
yypush_parse(pstate, TK_KEYWORD, NULL); /* See Note 3 */
}
[[:digit:]]+ { /* Token with a semantic value */
yylval.num = atoi(yytext);
yypush_parse(pstate, TK_NUMBER, &yylval);
}
"dice-roll" { /* sends three random numbers */
for (int i = 0; i < 2; ++i) {
yylval.num = rand() % 6;
yypush_parse(pstate, TK_NUMBER, &yylval);
}
<<EOF>> { /* Obligatory EOF rule */
/* Send the parser the end token (0) */
int status = yypush_parse(pstate, 0, NULL);
/* Free the pstate */
yypstate_delete(pstate);
/* return the parser status; 0 is success */
return status;
}
In the parser, not much needs to be changed at all, other than adding the necessary declarations: [Note 4]
%define api.pure full
%define api.push-pull push
Notes
If you were building a reentrant lexer as well, you would use the extra data section of the lexer state object instead of static variables.
If you are using location objects in your parser to track source code locations, this also applies to yylloc.
The example code does not do a good job of detecting errors, since it doesn't check return codes from the calls to yypush_parse. One solution I commonly use is some variant on the macro SEND:
#define SEND(token) do { \
int status = yypush_parse(pstate, token, &yylval); \
if (status != YYPUSH_MORE) { \
yypstate_delete(pstate); \
return status; \
} \
} while (0)
It's also possible to use a goto to avoid the multiple instances of the yypstate_delete and return. YMMV.
You may have to modify the prototype of yyerror. If you are using locations and/or providing extra parameters to the push_parser, the location object and/or the extra parameters will also be present in the yyerror call. (The error string is always the last parameter.) For whatever reason, the parser state object is not provided to yyerror, which means that the yyerror function no longer has access to variables such as yych, which are now members of the yypstate structure rather than being global variables, so if you use these variables in your error reporting (which is not really recommended practice), then you will have to find an alternative solution.
Thanks to one of my friends, he provide a way to achieve
If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases
In flex generated flex.cpp code, there is a macro
/* Default declaration of generated scanner - a define so the user can
* easily add parameters.
*/
#ifndef YY_DECL
#define YY_DECL_IS_OURS 1
extern int yylex (void);
#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
so I can "rename" flex's yylex() function to another function like pure_yylex().
So my problem is solved by:
push all terminals I want to give bison to a global vector<int>
implement a yylex() function by myself, when bison call yylex(), this function will firstly try to get terminals from a that global vector<int>
if vector<int> is empty, yylex() calls pure_yylex(), and flex starts to work

Synonym function call for ::new((void*)p)T(value);

I am trying to write my own Allocator which can be used in STL. That far I could almost successfully finish but with one function I have my problem:
The Allocator used in STL should provide the function construct [example from a standard allocator using new & delete]:
// initialize elements of allocated storage p with value value
void construct (T* p, const T& value)
{
::new((void*)p)T(value);
}
I am stuck how to rewrite this using my own function which replaces the new keyword initializing it with the value.
This function construct is for example used in this code: fileLines.push_back( fileLine );
where
MyVector<MyString> fileLines;
MyString fileLine;
These are my typedefs where I use my own Allocator:
template <typename T> using MyVector = std::vector<T, Allocator<T>>;
using MyString = std::basic_string<char, std::char_traits<char>, Allocator<char>>;
I am confused because here is allocated a pointer to T which can be for example [when I understood it correctly] MySstring.
Do I understand it correctly that the pointer - allocated by new - will have 10 bytes, when value is 123456789 and then the provided value is copied to the new pointer?
My question:
How to rewrite the one line of code using my own function? For me the difficult point is how to get the length of value [which can have any type] in order I can correctly determinate the length of the allocated block and how to copy it in order it works for all possible types T?
The new operator in the construct function does not allocate anything at all, it's a placement new call, which takes an already allocated chunk of memory (which needs to have been previously allocated some way, and at least as large as sizeof(T)) and initializes it as a T object by calling the constructor of T, pretending that the memory pointed to by p is a T object.
::new int(7) calls the default new operator, gets some memory big enough for an int, and constructs an int with the value 7 in it.
::new(ptr) int(7) takes a void* called ptr, and constructs an int with the value 7 in it. This is called "placement new".
The important part is what is missing from the second paragraph. It does not create space, but rather constructs an object in some existing space.
It is like saying ptr->T::T() ptr->int::int(7) where we "call the constructorofinton the memory location ofptr, exceptptr->T::T()orptr->int::int(7)are not valid syntaxes. Placementnew` is the way to explicitly call a constructor in C++.
Similarly, ptr->~T() or ptr->~int() will call the destructor on the object located at ptr (however, there is no ~int so that is an error, unless int is a template or dependent type, in which case it is a pseudo-destructor and the call it ignored instead of generating an error).
It is very rare that you want to change construct from the default implementation in an allocator. You might do this if your allocator wants to track creation of objects and attach the information about their arguments, without intrusively modifying the constructor. But this is a corner case.

How to convert between shared_ptr<FILE> to FILE* in C++?

I am trying to use a FILE pointer multiple times through out my application
for this I though I create a function and pass the pointer through that. Basically I have this bit of code
FILE* fp;
_wfopen_s (&fp, L"ftest.txt", L"r");
_setmode (_fileno(fp), _O_U8TEXT);
wifstream file(fp);
which is repeated and now instead I want to have something like this:
wifstream file(SetFilePointer(L"ftest.txt",L"r"));
....
wofstream output(SetFilePointer(L"flist.txt",L"w"));
and for the function :
FILE* SetFilePointer(const wchar_t* filePath, const wchar_t * openMode)
{
shared_ptr<FILE> fp = make_shared<FILE>();
_wfopen_s (fp.get(), L"ftest.txt", L"r");
_setmode (_fileno(fp.get()), _O_U8TEXT);
return fp.get();
}
this doesn't simply work. I tried using &*fp instead of fp.get() but still no luck.
You aren't supposed to create FILE instances with new and destroy them with delete, like make_shared does. Instead, FILEs are created with fopen (or in this case, _wfopen_s) and destroyed with fclose. These functions do the allocating and deallocating internally using some unspecified means.
Note that _wfopen_s does not take a pointer but a pointer to pointer - it changes the pointer you gave it to point to the new FILE object it allocates. You cannot get the address of the pointer contained in shared_ptr to form a pointer-to-pointer to it, and this is a very good thing - it would horribly break the ownership semantics of shared_ptr and lead to memory leaks or worse.
However, you can use shared_ptr to manage arbitrary "handle"-like types, as it can take a custom deleter object or function:
FILE* tmp;
shared_ptr<FILE> fp;
if(_wfopen_s(&tmp, L"ftest.txt", L"r") == 0) {
// Note that we use the shared_ptr constructor, not make_shared
fp = shared_ptr<FILE>(tmp, std::fclose);
} else {
// Remember to handle errors somehow!
}
Please do take a look at the link #KerrekSB gave, it covers this same idea with more detail.

XercesDOMParser* and DOMDocument* going out of scope before a DOMElement*

Short Version: Is it safe for a XercesDOMParser* and DOMDocument* to go out of scope before a DOMElement* that they were used to create does?
Long version:
In the code snippet below I create a local XercesDOMParser* and DOMDocument* in order to get the root element of the document and store it in a member DOMElement* variable. The XercesDOMParser* and DOMDocument* both go out of scope at the end of the constructor, but the DOMElement* lives on as a member variable. Is this ok? So far it seems to work but I am nervous that I may have problems later.
JNIRequest::JNIRequest(JNIEnv *env, jobject obj, jstring input)
{
char *szInputXML = (char*) env->GetStringUTFChars(input, NULL);
XMLPlatformUtils::Initialize();
XercesDOMParser* pParser = new XercesDOMParser();
XMLByte* xmlByteInput = (XMLByte*) szInputXML;
xercesc::MemBufInputSource source(xmlByteInput, strlen(szInputXML), "BufferID");
pParser->parse(source);
DOMDocument* pDocument = pParser->getDocument();
/* This next variable is a DOMElement* */
this->_pRootElement = pDocument->getDocumentElement();
}
Your code snippet looks like it is creating some memory leaks. I'm afraid this is also the reason why the code seems to "work" for the moment.
In general the Xerces parser owns the document tree. Please take a look at AbstractDOMParser::adoptDocument() to transfer the ownership away from the parser. This means for your code example, if you would correctly release the parser, the document is also deleted making your pointer to the DOMElement invalid.
The solution would be to call adoptDocument() and save the pointer to the Document Element afterwards. Please note that you need to release the node tree (on closing the application?) and that the tree could consume a lot of memory depending on the size of the XML ...
Hope this helps

Resources