Monday, May 21, 2012

Principles of Parse Trees

One of the more frequent topics that comes up when people start moving beyond beginner's q is functional queries, which are required to parameterize a query by column name, and are often useful for various other reasons as well.

One of trickiest parts of functional queries is the parse trees that appear in their second, third, and fourth arguments (henceforth c, b, and a). Here are a few rules that should summarize most of what's important to know about them:

  • c is a general list, each element of which is a parse tree. Its degenerate form (no conditions) is (), the empty general list.
  • c appears in the results of parse with an extra level of enlist applied to it; this is only necessary when sending the parse tree back to eval (as opposed to using it to write a direct invocation of ?).
  • b and a are both dictionaries with symbol keys and parse tree values. The degenerate form of b (no grouping) is 0b; the degenerate form of a (select all columns as-is) is ().
  • Symbol literals are enlisted.
  • Names (parameters, local or global variables, table columns, functions, etc.) become symbols.
  • Functions become sexps, like in LISP.
  • Adverbs are (technically) monadic functions which take dyadic verbs as arguments and yield dyadic verbs as their return values.
  • Global constants can be referenced by name or by value.
  • Anything which can be computed outside the query framework, may be.

Some elucidation on the last few points:

Adverbs will show up in the output of parse as ((adverb;`verb);`x;`y):

q)unshow parse"select f'[a;b]from t"
(?;`t;();0b;(,`b)!,((';`f);`a;`b))

While this will work fine, simply including the modified verb directly in the tree by value is also legal:

q)?[t;();0b;(enlist`b)!enlist(f';`a;`b)]

Built-in functions from the .q namespace will be included by value in parse output:

q)unshow parse"select dev a from t"
(?;`t;();0b;(,`a)!,(k){sqrt var x};`a))

These should always be replaced by their names in code:

q)?[t;();0b;(enlist`a)!enlist(var;`a)]

Finally, fully parsed code may be mixed freely with expressions which generate results which are otherwise acceptable:

q)unshow parse"select from t1 uj t2 where a in(exec a from t3)"
(?;(k){$[()~x;y;98h=@x;x,(!+x:.Q.ff[x;y])#.Q.ff[y;x];lj[(?(!x),!y)#x]y]};`t1;`t2);,,(in;`a;(?;`t3;();();,`a));0b;())

can be written as

q)?[t1 uj t2;enlist(in;`a;exec a from t3);0b;()]

Next time, I'll discuss how to write utility functions to make advanced constructs like multi-column fby easy to write.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home