REDCap programming language for Data Quality checks - redcap

What is the programming language in REDCap for data quality checks and branching logic.
So far it's a bit hard to find the correct operators by trial and error.
It would be nice to know where to look for.
yesterday I programmed a data quality check which reads
([variable1] <> "" && [variable2] = "1")
It says that variable 1 should not be empty and variable 2 is also marked.
The code works, but it is though to find out how if I don't know the underlying language and the valid operators

Information on the syntax and available operators and functions for data quality rule logic, branching logic, report filter logic, calculation expressions within REDCap is documented within REDCap itself on the "Help & FAQ" page. You can access that page either using the link towards the bottom of the left-hand menu column of a project page, or from the top nav bar of a non-project page (like My Projects).
In your check you should use and rather than &&:
[variable1]<>'' and [variable2]='1'

Related

Internationalisation - displaying gendered adjectives

I'm currently working on an internationalisation project for a large web application - initially we're just implementing French but more languages will follow in time. One of the issues we've come across is how to display adjectives.
Let's take "Active" as an example. When we received translations back from the company we're using, they returned "Actif(ve)", as English "Active" translates to masculine "Actif" or feminine "Active". We're unsure of how to display this, and wondered if there are any well established conventions in the web development world.
As far as I see it there are three possible scenarios:
We know at development time which noun a given adjective is referring to. In this case we can determine and use the correct gender.
We're referring to a user, either directly ("you") or in the third person. Short of making every user have a gender, I don't see a better approach than displaying both, i.e. "Actif(ve)"
We are displaying the adjective in isolation, not knowing which noun it's referring to. For example in a table of data, some rows might be dealing with a masculine entity, some feminine.
Scenarios 2 and 3 seem to be the toughest ones. Does anyone have any experience handling these issues? Any tips would be appreciated!
This is complex, because we cannot imagine all the cases, and there is risk to go in "opinion based" answer, so I keep it short and generic.
Usually I prefer to give context in translation (for translator), e.g. providing template: _("active {user_name}" (so also the ordering will be correct if languages want different ordering).
Then you may need to change code and template into _("active {first_name_feminine}") and _("active {first_name_masculine}") (and possibly more for duals, trials, plurals, collectives, honorific, etc.). Note: check that the translator will not mangle the {} and the string inside. Usually you need specific export/import scripts. Or I add a note inside the string, and I quickly translate into English removing the note to the translator). Also this can be automated (be creative on using special Unicode characters which should not be used in normal text, to delimit such text).
But if you cannot know the gender, the Actif(ve) may be the polite version used in such language. You need a native speaker test, and changes back and forth.

How do I balance script-oriented OpenType features with other OpenType features using DirectWrite?

Full disclosure: I'm working on my libui GUI framework's text API. This wraps DirectWrite on Windows, Core Text on OS X, and Pango (which uses HarfBuzz for OpenType shaping) on other Unixes. One of the text formatting attributes I want to specify is a collection of OpenType features to use, which all three provide; DirectWrite's is IDWriteTypography.
Now, when you draw some text with these libraries, by default you'll get a few useful OpenType features enabled, such as the standard ligatures (liga) like the f+i ligature. I thought this was font-specific, but it turns out this is specific to the script of the text being shaped. Microsoft provides guidelines for all the scripts supported by OpenType (under "Script-specific Development"), and I can see rather complex logic for doing it all in HarfBuzz itself to confirm it.
On Core Text and Pango, if I enable other attributes, they'll be added on top of these defaults. But with DirectWrite, in particular IDWriteTextLayout::SetTypography(), doing so removes the defaults:
The program that produces this output is can be found here.
Obviously my first option would be to ask how to get the default features on DirectWrite. Someone did so already on this site, though, and the answer seems to be "no".
I am guessing that DirectWrite is allowing me to be in complete control of the list of features to apply to some text. This is nice, except that I can't do this with the other APIs unless I explicitly disable the default features somehow! Of course, I don't know if this list will ever change, so hardcoding it might not be the best idea.
Even if hardcoding is an option, I could just grab HarfBuzz's list for each script, but a) it's rather complicated b) there are multiple possible shapers for a script, depending on (I think) version compatibility (for instance, Myanmar).
So why not use HarfBuzz's lists to recreate the default list of features for DirectWrite anyway? It seems to want to be accurate to other shapers anyway, so this should work, right? Well I would need to do two things: figure out what script to use, and figure out which attributes to use on which characters for script where the position of a character in the word matters.
DirectWrite provides an interface IDWriteTextAnalyzer that provides facilities to perform shaping. I could use this, but it seems the script data is returned in a DWRITE_SCRIPT_ANALYSIS structure, and the description for the script ID says "The zero-based index representation of writing system script.".
This doesn't help, so I wrote a program to just dump the script numbers for text I type in. Running it on the input string
لللللللللللللاااااااااالا abcd محمد ابن بطوطة‎‎ Отложения датского яруса
yields the output
0 - 26 script 3 shapes 0
26 - 5 script 49 shapes 0
31 - 14 script 3 shapes 0
45 - 2 script 1 shapes 1
47 - 25 script 22 shapes 0
I cannot match these script numbers to anything in any of the Windows headers: if there is a defined number for Arabic, Latin, or Cyrillic in any API, they don't match these. And even if I did get a mapping between script and script number, that still doesn't give me the data to apply intra-word features.
What about Uniscribe? Well, the documentation for the equivalent SCRIPT_ANALYSIS type says that its script ID is an "[opaque] value" whose "value for this member is undefined and applications should not rely on its value being the same from one release to the next". And while I can get a language code to identify the script by, there's still no defined value other than LANG_ENGLISH for "Western" (Latin?) scripts. Are the DirectWrite values the same as the Uniscribe ones? And it seems like I can at least figure the initial and final states of words by looking at the fLinkBefore and fLinkAfter fields, but is this enough to properly apply attributes per-script?
HarfBuzz does have an experimental DirectWrite backend that isn't intended to be used by real programs; I'm not yet sure whether it has the same feature-clobbering I specified above. If I find out, I'll update this part here.
Finally, if I enter the following equivalent test case to the first one above in something like kaxaml:
<Page
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">
<Grid>
<FlowDocumentPageViewer>
<FlowDocument FontFamily="Constantia" FontSize="48">
<Paragraph>
afford afire aflight 1/4<LineBreak/>
<Run Typography.Fraction="1">afford afire aflight 1/4</Run>
</Paragraph>
</FlowDocument>
</FlowDocumentPageViewer>
</Grid>
</Page>
I see the ligatures being applied properly, even in the latter case:
(The fraction at the end is just to prove that that attribute is being applied.) If I assume XAML uses DirectWrite, then that proves my first option (simply overlaying my custom attributes on top of the defaults) should be possible... (I make this assumption based on the idea that XAML provides a strikingly similar API to Direct2D for drawing 2D graphics, and has a lot of holes filled in where I had to manually write a lot of glue code to do the same things with vanilla Direct2D, so I assume whatever is possible in XAML is possible with Direct2D, and by extension DirectWrite since they were technically introduced together...)
At this point I'm completely lost. I want to at least be predictable across platforms, and I'm not sure how programs are even supposed to, let alone going to, use OpenType features directly or not anyway. Am I making bad expectations of text layout APIs? Will I have to drop IDWriteTextLayout and do all the text shaping and layout myself if I want this?
Or do I have to drop vanilla Windows 7 support and upgrade to the Platform Update DirectWrite feature set? Or even Windows 7 entirely?
After some discussions with Peter Sikking and Ebrahim Byagowi, I went and debugged a more general-purpose program I built quickly to test things, and I figured out what's going on internally.
First, however, I will say this applies to Uniscribe and DirectWrite equally.
As it turns out, DirectWrite is always providing a set of default OpenType features, regardless of what feature set I use! The situation is that the list of default features provided differs depending on whether I load my own features or not, and depending on the shaping engine. For the latn script in horizontal writing mode and for English, this is done with the "generic engine".
If I don't provide any features, the generic engine will load script-specific features. For horizontal latn, this list is
locl
ccmp
rlig
rclt
calt
liga
clig
If I do provide features, the generic engine will use the same default list for all scripts:
locl
ccmp
rclt
rlig
mark
mkmk
dist
So I don't know what to do about this. I could probably just provide liga and a few others myself in libui code (marked as a HACK of course), but this is still weird. I'm not sure what the motivation is either. Either way, this explains the behavior I'm seeing.
Supposing your question in general is about programming or at least concerns programming, I will try and give answers to some of your interrogative sentences.
would I have to drop the use of IDWriteTextLayout entirely in my code if I want to be able to add typographical features on top of the defaults?
It depends. If an IDWriteTextLayout interface suits well your project tasks in all ways except ease of variation of DirectWrite default typographic features, learn what you should about typography and create an IDWriteTypography instance suitable for your needs. Developing a custom text layout for the program may require substantial time and effort, especially if the program is supposed to render bidirectional texts, complex scripts, inline objects, etc.
It may happen that the tasks of your project require to develop a text layout engine for reasons other than just controlling typographic features used in rendered text. For example, your manager/customer may ask for implementation of customized linebreaking opportunities or a glyph advance justification algorithm. In this scenario, you will implement an IDWriteTextAnalizer::GetGlyphs method. This method has parameters DWRITE_TYPOGRAPHIC_FEATURES ** features, const UINT32 * featureRangeLengths, UINT32 featureRanges, and this parameters enable you to supersede a set of "default" typography features for a range of the text to be rendered (see my answer to the other question What are the default typography settings used by IDWriteTextLayout?). Only affected features will be altered; the other features has their "default" values. Morever, if you omit this parameters in a GetGlyphs call for the next text range (for example, use values of NULL, NULL, 0), the features altered in the previous GetGlyphs call will not be altered by the call for this next range.
the documentation for the equivalent SCRIPT_ANALYSIS type says that its script ID is an "[opaque] value" whose "value for this member is undefined and applications should not rely on its value being the same from one release to the next". And while I can get a language code to identify the script by, there's still no defined value other than LANG_ENGLISH for "Western" (Latin?) scripts.
Strictly speaking, this is not an interrogative statement, but I guess you are dissatisfied with how these Unicode script IDs are defined and how one can use the API with so vaguely defined structures and constants.
It may be off topic, but I risk to hypothesize on the origin of the "Unicode script ID" values. As of 2010-07-17, the Unicode, Inc. published The Unicode 6.0 version. The standard contained the document
http://www.unicode.org/Public/6.0.0/ucd/PropertyValueAliases.txt, with a section containing a list of scripts. The list went so:
# Script (sc)
sc ; Arab ; Arabic
sc ; Armi ; Imperial_Aramaic
etc.
The Arabic script is #1, the Cyrillic script is #20, the Latin script is #47 in this list. Furthermore, elsewhere I saw this list starting with scripts Common and Inherited. It places the Arabic script to the 3rd, the Cyrillic to the 22nd, and the Latin to the 49th place. These ordinals are familiar to you, aren't they?
Fortunately, we need not rely on the "Unicode script ID" values; we need script properties, not script IDs or abbreviations. The API is self-consistent in that it gives actual script properties for the text range, when we pass to a GetScriptProperties method the number derived from an AnalyzeScript call.

Internalizating content heavy pages into messagefiles seem cumbersome in play2?

Internalization in Play2 can be done with Message.get("home.title") and language files. What about when you internalizate a page full of textual content and not just one specific header or link?
For example doing Messagefile for a long page representing e.g. product info:
_First header_
Some paragraphs of text
...
_Tenth header_
Tenth paragraph and more text*
Messagefile
a)
product.info = "<many paragraphs of text including headers>"
or splitting one page into html elements
b)
product.info.h1 = "<first header>"
product.info.p1 = "<first para>"
product.info.p2 = "<2nd para>"
For me both solutions doesn't sound right. In first having a vast value for a single key seems bad convention and in latter separating a single page into dozens of keys doesn't sound good either.
Big websites often follow the convention www.site.com/en-us/product/1 of having the language in the URL. So the question is, how do i do in this way and is doing in this way a better way at all? I could easily end up not just translating to dozen languages but doing also dozen times layout changes.
I could use global codesnippets using Messagefile for elements that have a little text and doesn't change often e.g. navigation /view/global/header/somenavbar.scala.html but then i end up only having a complex folder structure.
Another way, a best practise, in Play 2 for internalization than messagefile?
Take a look to the Joscha Feth's solution in play_authenticate Java sample.
There are templates for emails in 3 languages for email confirmation, password reseting etc.
Template for each 'type' of email && each language is kept in single file ie:
_password_reset_en.scala.html
_password_reset_de.scala.html
_password_reset_pl.scala.html
_verify_email_en... etc
And for each 'type' there is an 'parent' template, which contains a condition (common Scala's match check the Tags section of template doc) which returns rendered view depending on detected language:
password_reset.scala.html
Finally, yes, at the beginning I also thought that some kind of madness, but believe me, that technique can be useful. There's field for further improvements I think. Maybe it would be better to move the language conditioning to the controller, hm I think that depends on many factors and it will be great if you'll find a time to investigate this topic.

Abstract testing of GUIs

In general how does one test a various parts of a GUI? What are good practices? (Yes I am being overly general here).
Let take for Notepad's Find dialog box:
Notepad's Find dialog box http://img697.imageshack.us/img697/5483/imgp.png
What are some things that can be tested? How does one know its working correctly? What are edge cases to look out for? Stress tests?
Here.
I doubt any good generalization can be made about this - it always depends on the situation.
When someone asks for tests for GUI I always assume that that mean 'this part of application that is accessible via this GUI'. Otherwise it would mean testing the only the GUI without any logic hooked. Dunno why no one never actually asked for testing if the events are fired when button is pressed or is displayed window acquiring focus.
Anyway back to the question. First of all find out about equivalence classes, boundary conditions other testing techniques. Than try to apply it for given problem. Than try to be creative.
All those should be applied when creating following tests:
1) happy path tests - application acts right when given input is good
2) negative tests - application acts right when given input is bad
3) psychotic user behavior (I saw someone use this term, and I find it to be great) - that one user that has nothing better to do than break your application or is to stupid to actually know how bad and horrible things he is doing with your app.
After all this if all tests are passing and you can't figure out other, than you don't know is it working properly, but you can say that it passed all tests and it seems to be working correctly.
As for given GUI example.
1)
Is the application finding string that is in opened file?
Is the application finding character that is in opened file?
How is it reacting to reaching end of file during search?
Is it finding other appearances of given string/character or just one, when there are many of those appearances ?
Is it handling special search characters like * or ? correctly?
Is it searching in desired direction?
Is it 'Mach case ' option working properly?
When opening find setting some criteria, canceling search and launching it again - are search criteria back to default values? Or are they set as you left them when clicking Cancel?
2)
Is it informing user that no mach was found when trying to search for data that is not in opened file?
Is it reacting properly when trying to search down form end of file?
Is it reacting properly when trying to search up form beginning of file?
How search feature is reacting when no file is loaded? (in MS notepad it can be done, but in other editors you can launch editor without opening a file hence this test)
Can I mark both Up and Downs search direction?
3)
Is it working properly on 4GB file?
Can I load 4 GB string in 'Find What:' field and search for it?
Can I provide as input special characters by providing ASCII codes? (it was done like pressing Alt and number of character... or something like that)
Can I search for empty character (there was something like that in character table).
Can I search for characters like end of line or CarretReturn?
Will it search for characters form different languages? (Chinese, or other non-english alphabet characters)
Can I inject something like ') DROP ALL TABLES; (if that would be web based search).
Will I be able to launch proper event twice by really fast double click on search button? (easier on web apps)
With reasonable test suite you know it seems to work correctly.
I think it is better to separate out functional aspects and the usability aspects for the GUI testing.
Let us say in the above example take the use case of user entering some text and hitting the Find button. From the functional aspect I would say your tests should check whether this user action (event) calls the appropriate event handler methods. These can be automated if your code has good separation between the GUI display code and the
functional part.
Testing of usability aspect would involve checking things like whether the display occurs correctly in multiple platforms. I think this needs to be verified manually. But I think there are some tools that automate this kind of GUI testing as well but I've no experience with them.
It's difficult and error-prone to test finished UIs.
But if you are more interested form the programmer's perspective, please have a read of the paper The Humble Dialog. It presents an architecture for creating UIs whose functionality can be tested in code using standard testing frameworks.

IntelliSense rules for "get best match" during member selection

First of all, I finally made this a wiki, but I believe a "simple," straightforward answer is highly desirable, especially since the result would define a unified IDE behavior for everyone. More on the simple later.
In the past, I've blogged about what it means to have a well-behaved member selection drop down. If you haven't read it, do so now. :)
Visual Studio 2010 adds new features to the IntelliSense selection process that makes things ... not so easy. I believe there's great power we can harness from these features, but without a clean set of governing rules it's going to be very difficult.
Before you answer, remember this: The rules should allow someone "in tune" with the system to take advantage of the IntelliSense power with fewer keystrokes and less time than other solutions provide. This is not just about what you're used to - If you use a system as long and frequently as I do, relearning the patterns is trivial next to the time it saves to have a great algorithm behind it.
Here are the controllable axes:
Filtering: A "full" list contains every identifier or keyword allowed at the current location, without regard for the partially typed text the cursor is within.
Sorting: We (at least Visual Studio users) are used to the member selection drop down being sorted in alphabetical order. Other possibilities are partial sorting by some notion "relevance," etc.
Selection: Based on the currently typed text, we have the ability to select one item as the "best match." Selection states are:
No item selected
Passive selection: one item outlined, but pressing ., <space> or similar won't fill it in without using an arrow key to make it:
Active selection: one item selected, and unless Esc or an arrow key is pressed, a . or <space>, etc will auto-complete the item.
My previous set of rules restricted the manipulation to the selection axis. They took into account:
Characters typed as matched against list items with a StartsWith operation (prefix matching), with variants for whether the match was case-sensitive.
Previous completions that started with the same set of characters.
The following are additionally available and potentially useful, but not all have to be used:
CamelCase matches or underscore_separation ("us"): Long, expressive identifiers? Not a problem.
Substring matches: long prefixes hindering your selection speed? Not a problem.
Information available in the summary text, where available: I lean against this but I must admit it's come in handy in the Firefox address bar, so you never know.
The rules you write should address the axes (in bold above) in order. In my previous posts on the subject, the rules were very simple, but the additional considerations will likely make this a bit more complicated.
Filtering
Sorting
Selection
Just one addition or remark ...
IntelliSense should adapt to the context. In the case of Visual Studio, places where only subtypes of a specific type or interface may be used, the dropdown list should filter by these.
IList list = new (Drop down all the types implementing IList) - not all possible types!

Resources