tag:blogger.com,1999:blog-33804629774161104412023-11-15T09:46:19.255-08:00q-istAaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.comBlogger28125tag:blogger.com,1999:blog-3380462977416110441.post-19068946579250731092023-05-16T11:46:00.000-07:002023-05-16T11:46:23.177-07:00Parsing Binary Files in q<p>
I gave a presentation on parsing binary files in q at the March 16, 2023 KX
Meetup; here are some relevant links:
</p>
<ul>
<li>
The presentation:
<a href="https://kx.com/videos/parsing-binary-files-in-q-with-aaron-davies/"
>https://kx.com/videos/parsing-binary-files-in-q-with-aaron-davies/</a
>
</li>
<li>
The slide deck:
<a
href="https://www.dropbox.com/s/r18c7j7zo58lfh9/Parsing%20Binary%20Files%20in%20q.pdf?dl=0"
>Parsing Binary Files in q.pdf</a
>
</li>
<li>
The code:
<a href="https://github.com/adavies42/qist/blob/master/lib/png.q"
>https://github.com/adavies42/qist/blob/master/lib/png.q</a
>
</li>
<li>
An example using the code:
<a href="https://github.com/adavies42/qist/blob/master/lib/mandel.k"
>https://github.com/adavies42/qist/blob/master/lib/mandel.k</a
>
</li>
</ul>
Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-32214897770960622682016-12-30T21:29:00.003-08:002016-12-30T21:30:23.141-08:00Riddle 4 AnswerSo apparently this is an annual series now…<br />
<br />
Anyway, <a href="http://www.q-ist.com/2016/05/riddle-4-whats-going-on-here.html#c3516173806306165868">Dan Nugent</a> and <a href="http://www.q-ist.com/2016/05/riddle-4-whats-going-on-here.html#c3872998969562467461">Akash</a> were both on the right track, but neither had it exactly right: when a <code>qsql</code> query inside a function is executed, relative names used in it resolve against globals in the <em>current</em> namespace, <em>not</em> the namespace that was in effect <em>when the function was created</em> (i.e. the one returned by running <code>{(get x). 3 0}</code> on the function).<br />
<br />
(Note that this actually applies to any type of global (e.g. an atom, a vector, etc.) referenced from a query inside a function, but for convenience, I’ll be writing this post assuming a function is what’s being referenced.)<br />
<br />
I will admit to making this a bit of a trick question, as the way I constructed the example was designed around the most common case, which is entirely consistent with Dan’s and Akash’s answers. Here's a snippet showing the full behavior:<br />
<blockquote>
<pre>% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE
q)f:{x+1}
q)\d .foo
q.foo)g:{select f a from x}
q.foo)\d .bar
q.bar).foo.g([]a:1 2 3)
{select f a from x}
'f
q.foo))\
q.bar)f:{x+2}
q.bar).foo.g([]a:1 2 3)
a
-
3
4
5
q.bar)</pre>
</blockquote>
Compare to this snippet, where I reference the function from <em>outside</em> the <code>qsql</code> query:<br />
<blockquote>
<pre>% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE
q)\d .foo
q.foo)f:{x+1}
q.foo)g:{f select a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)</pre>
</blockquote>
This can become problematic if you try to create a group of functions in a namespace, some of which reference each other in queries, and then call those functions from outside that namespace. I’ve found two solutions for this, both unfortunately rather inelegant.<br />
<ol>
<li>You can “copy” the function you need from the global space to a local variable by referencing it from outside any <code>qsql</code> queries:
<blockquote>
<pre>% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE
q)\d .foo
q.foo)f:{x+1}
q.foo)g:{f0:f;select f0 a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)</pre>
</blockquote>
</li>
<li>You can do the name resolution yourself, by leveraging the <code>get</code> results to automatically reference the correct namespace:
<blockquote>
<pre>% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE
q)\d .foo
q.foo)f:{x+1}
q.foo)g:{select((` sv`,((get .z.s). 3 0))`f)a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)</pre>
</blockquote>
This can be encapsulated in a utility function, but note that it must then itself be referenced by an absolute name, or the same problem will apply to it:<br />
<blockquote>
<pre>% q
KDB+ 3.3 2015.09.02 Copyright (C) 1993-2015 Kx Systems
m32/ 16()core 8192MB adavies aaron-daviess-mac-pro.local 192.168.1.151 NONEXPIRE
q)\d .util
q.util)r:{(` sv`,((get x). 3 0))y}
q.util)\d .foo
q.foo)f:{x+1}
q.foo)g:{select((.util.r .z.s)`f)a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)</pre>
</blockquote>
</li>
</ol>
Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com1tag:blogger.com,1999:blog-3380462977416110441.post-50614869396556661102016-05-20T12:46:00.004-07:002016-05-20T12:47:50.005-07:00Riddle 4: What’s Going On Here?What’s happening in this snippet, and why is it interesting?<br />
<blockquote>
<pre>% q
KDB+ 3.3 2016.03.14 Copyright (C) 1993-2016 Kx Systems
m32/ 2()core 2048MB adavies air.local 10.37.129.2 NONEXPIRE
q)f:{x+1}
q)\d .foo
q.foo)g:{select f a from x}
q.foo)\d .
q).foo.g([]a:1 2 3)
a
-
2
3
4
q)</pre>
</blockquote>
Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com3tag:blogger.com,1999:blog-3380462977416110441.post-85160305591784444512016-05-20T12:41:00.001-07:002016-05-20T12:41:15.503-07:00Riddle 3 AnswerAnd it looks like I did it again—<a href="http://www.q-ist.com/2015/06/riddle-3-all-xand-not-any-x.html">Riddle 3</a> has been lacking an official answer for almost a year!<br/>
<a href="http://www.q-ist.com/2015/06/riddle-3-all-xand-not-any-x.html#c8128914030894500956">Ciaran Gorman’s answer</a> was correct—strings with leading and/or trailing space are exactly what I was thinking of, and the serialization technique he showed is how to deal with them.<br/>
The following function should work as a general solution:<br/>
<blockquote>
<pre>{0x01,($[.z.o like"s*";reverse;::]0x0 vs"i"$10+count x),0x000000f5,("x"$x),0x00}</pre>
</blockquote>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-28952995959852780202016-05-20T12:16:00.001-07:002016-12-30T21:34:17.604-08:00Hello from Montauk!I’m at <a href="https://kxcon2016.com/">KxCon 2016</a> in Montauk, and having lots of fun so far. Look me up if you’re here!Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-45016007932364813242016-05-19T16:28:00.000-07:002016-12-30T21:32:04.730-08:00github.com/adavies42/qistq-ist is now on GitHub! I’ve started a new repository, <a href="https://github.com/adavies42/qist">https://github.com/adavies42/qist</a>. My old code from <code>contrib</code> is now available there, but more importantly, I received permission from work to release a collection of utilities and example code from my personal library. This includes a much fuller-featured version of my <code>wtf</code> function, my <code>awq</code> tool for using <code>q</code> as a text filter, dozens of miscellaneous utility functions, and more. I don’t have a <code>README</code> for the whole repository written yet, so to get you started, most of the interesting stuff is in <code><a href="https://github.com/adavies42/qist/tree/master/lib">lib</a></code>, except for <code>awq</code>, which is in <code><a href="https://github.com/adavies42/qist/tree/master/bin">bin</a></code>. Have fun exploring the repository, and feel free to comment or email me with any questions about the code.Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-11852354839500443382015-10-16T20:41:00.000-07:002016-12-30T21:33:33.168-08:00A Combinatoric CombinatorI wrote this while I was playing around with using q on a hobby project, and I thought I’d share it in case anyone else might find it useful.<br />
It takes a number <code>k</code>, a function <code>f</code>, and a list or dictionary <code>y</code> of count <code>n</code>, and runs <code>f</code> once for each of the <a href="http://en.wikipedia.org/wiki/Combination"><i>k</i>-combinations</a> of <code>y</code>. The result is returned as a dictionary with the function outputs as its values and its keys determined by the type of <code>y</code>: if <code>y</code> is a list, its keys are the subsets of <code>y</code> that produced the outputs; if <code>y</code> is a dictionary, its keys are the subsets of the keys of <code>y</code> that index the subsets of <code>y</code> that produced the outputs.<br />
<code></code><br />
<pre><code>eachc:{
c:(where reverse 0b vs)each c@:where((first x)=sum 0b vs)each c:til"j"$2 xexp count y;
(last x)peach$[99h=type y;(key each y)!y:y{((key x)y)#x}/:c;y!y:y@/:c]}
</code></pre>
<br />
Examples:<br />
<blockquote>
<pre>q)eachc[(3;sum)]til 5
0 1 2| 3
0 1 3| 4
0 2 3| 5
1 2 3| 6
0 1 4| 5
0 2 4| 6
1 2 4| 7
0 3 4| 7
1 3 4| 8
2 3 4| 9
q)eachc[(3;sum)]`a`b`c`d`e!til 5
a b c| 3
a b d| 4
a c d| 5
b c d| 6
a b e| 5
a c e| 6
b c e| 7
a d e| 7
b d e| 8
c d e| 9
q)
</pre>
</blockquote>
Notes and caveats:<br />
On little-endian machines (i.e. Sparc), the <code>reverse</code> will probably need to be removed.<br />
The size of the result set gets very big very quickly—an <code>n</code> of thirty is probably infeasible for most machines.<br />
I’ve written it to execute <code>f</code> on the combinations with <code>peach</code>, rather than <code>each</code>; this may or may not be appropriate, depending on the nature of any given <code>f</code> and <code>y</code>.Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-88955137306003904452015-06-08T22:16:00.000-07:002016-05-20T12:47:50.012-07:00Riddle 3: not x~string`$x<p>Here’s an easy one:</p>
<p>When will <code>{(10h=type x)&not x~string`$x}</code> return true (<code>1b</code>)?</p>
<p>Slightly harder:</p>
<p>In cases where this is problematic, what can be done about it?</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com1tag:blogger.com,1999:blog-3380462977416110441.post-35270074580463629602015-04-24T00:09:00.000-07:002016-12-30T21:33:55.347-08:00Name! That! Function! (Pivot Table Edition)What time is it, kids? That’s right, it’s time to play Name! That! Function!<br />
Seriously though, I have a small, useful (IMAO) function I’ve been entering freehand in the console for something like two years now.<br />
Normally, I’d put it in my personal library, but there’s a problem—<a href="https://twitter.com/secretGeek/status/7269997868">I can’t think of a good name for it</a>.<br />
Here’s the function: <code>{((union)over key each x)#/:x}</code>.<br />
And here it is in context, showing what it’s good for:<br />
<blockquote>
<pre>q)t:([]id:1 1 2 2 3;k:`a`b`b`a`b;v:1 2 3 4 5)
q)t
id k v
------
1 a 1
1 b 2
2 b 3
2 a 4
3 b 5
q)exec k!v by id:id from t
id|
--| --------
1 | `a`b!1 2
2 | `b`a!3 4
3 | (,`b)!,5
q){((union)over key each x)#/:x}exec k!v by id:id from t
id| a b
--| ---
1 | 1 2
2 | 4 3
3 | 5
q)</pre>
</blockquote>
So, there it is—an easy way to fix up ad-hoc pivots<a href="https://www.blogger.com/blogger.g?blogID=3380462977416110441#fn1" id="fnr1"><sup>1</sup></a> when your data doesn’t have all keys present (and in the same order) on all ids.<br />
Anyone have any ideas what to call it?<br />
<div class="footnotes">
<ol>
<li id="fn1">Note that for production pivots, particularly if they involve significant amounts of data, you should be using <a href="http://code.kx.com/wiki/Pivot#A_very_general_pivot_function.2C_and_an_example_usage">an optimized pivot function</a>.<a href="https://www.blogger.com/blogger.g?blogID=3380462977416110441#fnr1">↩</a></li>
</ol>
</div>
Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com2tag:blogger.com,1999:blog-3380462977416110441.post-59942862612851252002015-03-09T23:55:00.000-07:002015-03-09T23:56:00.648-07:00Riddle 2 Answer<p>This is somewhat esoteric, and I wouldn’t be surprised if very few people had any idea what I was even asking. I discovered this largely by chance, while fiddling around with function bytecode, though I think it could be deduced from observation without that.</p>
<p>So, the answer:</p>
<p><code>::</code> is usually taught as a <i>sui generis</i> operator, called “global amend”, which has the specific behavior (when used as a verb inside a function) of setting a global variable (instead of the local one that <code>:</code> would set in the same place). No connection is typically drawn between it and any other operator (other than <code>:</code>).</p>
<p>However, I’m pretty sure this is inaccurate. While obviously I don’t know for certain, I strongly suspect that there is no code anywhere in the q binary saying that <code>::</code> is defined as “global amend”. Rather, it is a specific case of the dyadic “<code><i>f</i>:</code>” pattern, where <code>f</code> is some dyadic function—e.g. dyadic <code>+:</code>, <code>-:</code>, <code>*:</code>, etc.</p>
<p>These all have the same behavior—<code>x <i>f</i>:y</code> is defined as <code>x:x <i>f</i> y</code>.</p>
<p>Additionally, when used inside functions on variables that have not been identified by the compiler as locals, they modify (and if necessary, create), global variables.</p>
<blockquote><pre>q){a:1;a+:1;a}[]
2
q)a
'a
q){a+:1}[]
q)a
1
q)</pre></blockquote>
<p>It follows that if <code><i>f</i></code> is <code>:</code>, then the operation involved is assignment, and so <code>x</code> gets <code>y</code> assigned to it, <em>as a global variable if not identified as a local variable</em>.</p>
<p>In fact, this can be seen in the same way:</p>
<blockquote><pre>q){a:1;a::2;a}[]
2
q)a
'a
q){a::2}[]
q)a
2
q)</pre></blockquote>
<p>Thus arises “global amend”.</p>
<p>If anything, the “create view” sense of <code>::</code> must be the special case, as ordinarily, dyadic <code><i>f</i>:</code> verbs behave identically inside and outside functions.</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-88962878066704675242015-01-19T14:30:00.000-08:002016-05-20T12:47:50.015-07:00Riddle 2: “:: is not a special case”<p>(I suppose it’s a good thing I only said I’d be posting riddles “occasionally”, as it’s now almost two years since I last posted one.)</p>
<p>Referring specifically to the “<a href="http://code.kx.com/wiki/Reference/ColonColon#global_amend">global amend</a>” sense, I make the claim that <code>::</code> is not a special case.</p>
<p>What do I mean by this?</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-69686684725521957432013-10-09T20:40:00.000-07:002015-06-08T22:28:42.226-07:00Riddle 1 Answer<p>I appear to have left <a href="http://www.q-ist.com/2013/03/riddle-1-all-x-any-x.html">Riddle 1</a> sitting out there without an official answer for almost seven months now. Sorry about that.</p>
<p><a href="http://www.q-ist.com/2013/03/riddle-1-all-x-any-x.html#c7857547623827540463">The answer given by Peter Byrne</a> was valid, and essentially the one I was thinking of: while his example dealt with the untyped empty list <code>()</code>, I had the typed empty list <code>`boolean$()</code> in mind.</p>
<p>The insight here is that <code>any</code> and <code>all</code> are forms of <code>min</code> and <code>max</code>; and that <code>min x,y</code>, the min of the concatenation of two lists, is equal to <code>min(min x;min y)</code>, the min of their separate <code>min</code>s (and <i>mutatis mutandis</i> for <code>max</code>). For this to work consistently for empty lists, the <code>min</code> of an empty list must be the maximum possible value for that data type (and <i>mutatis mutandis</i> for <code>max</code>).</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-69500721143355239952013-10-09T20:04:00.000-07:002013-10-09T20:37:42.651-07:00Quick Tip: Intra-Statement BreakpointsA statement referencing a non-existent variable is a common way to add a breakpoint to a <code>q</code> function.
<pre>q)f:{x:x+1;break;x+2}</pre>
While this is handy, it only lets you break between statements. A simple extension lets you break <em>within</em> statements, inspecting values at arbitrary points in the code, and continuing with execution once you’re done:
<pre>q)f:{x:x+1;{break;x}x+2}</pre>
Now, the break will occur after the portion of the statement to the right of the break function has executed, and <code>x</code> within the break function will have the value returned by that code. Since the break statement itself has no effects, and <code>x</code> contains the value returned by the code executed so far, typing <code>:</code> to continue will allow the rest of the break function to execute, returning <code>x</code> leftwards and allowing the rest of the statement to execute as if nothing had interrupted it.
<br/><br/>
Note that this is entirely legal inside <code>qsql</code> queries; I’ve often found it of particular utility there, since you can’t create new local variables inside a query. Unfortunately no specific examples come to mind at the moment; I’ll try to post one later to make it clearer how this technique works.Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-22088378472408632812013-05-30T18:58:00.000-07:002013-05-30T19:07:10.951-07:00Parallel xascSorting a table can be divided into two parts: determining the new order for the rows, and applying that ordering to the columns. While the former can’t be parallelized in q, the latter can. I don’t have any hard numbers handy at the moment, but with large tables and under the right conditions, I’ve seen noticeable speedups.<br />
<br />
Note, BTW, that you can’t (and shouldn’t) write to disk from inside a <code>peach</code>, so this is only applicable to an ordinary in-memory table sort, not the <a href="http://code.kx.com/wiki/Reference/xasc#Sorting_data_on_disk">on-disk variety</a> (<code>`c xasc`<span style="color: red; font-weight: bold;">:</span>t</code>).<br />
<br />
<blockquote>
<pre>q)pxasc :{(count keys y)!flip{y x}[ iasc(raze x)#0!y]peach flip 0!y}
q)pxdesc:{(count keys y)!flip{y x}[idesc(raze x)#0!y]peach flip 0!y}
</pre>
</blockquote>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-59653473661234992192013-03-11T14:26:00.001-07:002021-04-19T16:20:49.935-07:00My kdb+ User Meeting PresentationHere are the slides from the presentation I gave today at the kdb+ user meeting at BAML, in <a href="http://www.apple.com/iwork/keynote/">keynote</a>, <a href="http://www.foxitsoftware.com/Secure_PDF_Reader">pdf</a>, and <a href="http://www.google.com/search?q=powerpoint+considered+harmful">powerpoint</a> formats:<br />
<ul>
<li><a href="http://db.tt/NXplXjDM">(dis)functional select and de-queueing bugs.key</a></li>
<li><a href="http://db.tt/iqE3t9QI">(dis)functional select and de-queueing bugs.pdf</a></li>
<li><a href="http://db.tt/FUbm8tYY">(dis)functional select and de-queueing bugs.ppt</a></li>
</ul>
UPDATE: and here's the code as a loadable <code>.q</code> file:<br />
<ul>
<li><a href="https://www.dropbox.com/s/d3a7x5ux3e2diee/pres.q?dl=1">pres.q</a></li>
</ul>
Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com3tag:blogger.com,1999:blog-3380462977416110441.post-31863977268588847322013-03-11T08:25:00.000-07:002016-05-20T12:47:50.002-07:00Riddle 1: (all x)and not any x<p>When will <code>{(all x)and not any x}</code> return true (<code>1b</code>)?</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com3tag:blogger.com,1999:blog-3380462977416110441.post-80992503854293395302012-12-11T20:17:00.001-08:002015-06-08T22:23:18.019-07:00Riddle 0 Answer<p>As several people guessed in the comments, the answer I had in mind was what I think of as “compressed” matrices—general lists where some entries are lists of the same length and others are atoms.</p>
<blockquote><pre>q)(1f;`a`b)
1f
`a`b
q)flip flip(1f;`a`b)
1 1
a b
q)
</pre></blockquote>
<p>This is, as far as I can tell, directly related to atomic extension (<code>x f'y</code> behaves identically for vector/vector, atom/vector, and vector/atom) and similar concepts, such as the ability to use an atom for a constant column in a table literal (<code>([]x:1 2 3;y:4)</code>).</p>
<p>(My original intention was to post these more or less weekly; hopefully I’ll be able to stick a bit closer to that in future.)</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-55153388388745552532012-11-04T13:02:00.001-08:002016-05-20T12:47:50.009-07:00Riddle 0: not x~flip flip x<p>I’m going to start occasionally posting <code>q</code> riddles; this is the first.</p>
<p>When will <code>{x~flip flip x}</code> return false (<code>0b</code>)?</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com5tag:blogger.com,1999:blog-3380462977416110441.post-78443888001911897392012-10-15T20:53:00.000-07:002012-10-15T20:53:35.236-07:00Functional Query FunctionsSo, <a href="http://www.q-ist.com/2012/05/principles-of-parse-trees.html">five months ago</a> I said I’d cover functional query utility functions “next time”. Well, next time is finally here. Sorry for the wait; hope it was worth it.<br />
<br />
Before I begin, though, two quick notes:<br />
<ol>
<li>Functional queries have <strong>no performance advantage</strong>. They are identical to <code>qsql</code> queries in speed and memory usage.</li>
<li>Funcitonal queries are <strong>hard to maintain</strong>. Even using the techniques described here, functional queries are still harder to read and understand than <code>qsql</code> queries.</li>
</ol>
Therefore, functional queries should only be used when <code>qsql</code> is incapable of accomplishing the task at hand. (Typically, this involves dynamically choosing columns.)<br />
<br />
As mentioned previously, getting the parse trees of the <code>c</code>, <code>b</code>, and <code>a</code> arguments to functional queries right is quite tricky. One technique I frequently use to make this easier is to use utility functions to abstract away as much of that complexity as possible. In their most basic form, they simply generate (composable) elements of the relevant structures:<br />
<blockquote>
<pre>q)c:{parse["select from t",$[count x;" where ",x;""]]. 2 0}
q)b:{parse["select",$[count x;" by ",x;""]," from t"]3}
q)a:{parse["select ",x," from t"]4}
</pre>
</blockquote>
For example:<br />
<blockquote>
<pre>q)t:([]x:1 2 3 4;y:1 1 2 3;z:7 8 9 10)
q)select sum x by y from t where x<>1
y| x
-| -
1| 2
2| 3
3| 4
q)?[t;c"x<>1";b"y";a"sum x"]
y| x
-| -
1| 2
2| 3
3| 4
q)
</pre>
</blockquote>
In practice, of course, these functions should be used to express in <code>qsql</code> whatever parts of a query are so expressible, while reserving the actual parse trees for the parts of the query that need them:<br />
<blockquote>
<pre>q)show each t{?[x;c"x<>1";b"y";(enlist y)!enlist(sum y)]}/:`x`z;
y| x
-| -
1| 2
2| 3
3| 4
y| z
-| --
1| 8
2| 9
3| 10
q)
</pre>
</blockquote>
More specialized utilities can also be written, once their need becomes apparent. One particularly difficult thing to express in functional form is a multi-column <a href="http://code.kx.com/wiki/Reference/fby">fby</a>:
<br />
<blockquote>
<pre>q)t2::update k:`a`a`b`c from t
q)select from t2 where i=(last;i)fby([]y;k)
x y z k
--------
2 1 8 a
3 2 9 b
4 3 10 c
q)parse"select from t2 where i=(last;i)fby([]y;k)"
?
`t2
,,(=;`i;(k){@[(#y)#x[0]0#x 1;g;:;x[0]'x[1]g:.=y]};(enlist;last;`i);(+:;(!;,`y`k;(enlist;`y;`k)))))
0b
()
q)
</pre>
</blockquote>
The primary problem is that the parse of this query contains the instructions for assembling a table literal out of its individual pieces. Here’s the functional equivalent, written all the way out:<br />
<blockquote>
<pre>q)?[t2;enlist(=;`i;(fby;(enlist;last;`i);(flip;(!;enlist`y`k;(enlist;`y;`k)))));0b;()]
x y z k
--------
2 1 8 a
3 2 9 b
4 3 10 c
q)
</pre>
</blockquote>
Using a utility function, it can instead be written much more simply:
<br />
<blockquote>
<pre>q)fbyx:{.[parse["select from t where ",x,"fby c"]. 2 0 0;2 2;:;(flip;(!;enlist(),y;(enlist,y)))]}
q)?[t2;enlist fbyx["i=(last;i)";`y`k];0b;()]
x y z k
--------
2 1 8 a
3 2 9 b
4 3 10 c
q)
</pre>
</blockquote>
Judicious application of these techniques (not to mention judicious application of functional querying in the first place) can make code much more readable.Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-85734094430416113582012-07-12T20:29:00.000-07:002012-07-12T20:29:05.737-07:00Dictionaries and Vectors as Functions<p>IMAO this is one of the more interesting bits of theory embedded in <code>q</code>:</p>
<p>Considering a dictionary as a <a href="http://en.wikipedia.org/wiki/Partial_function">(partial) function</a> from its <code>key</code> (domain) to its <code>value</code> (range), then two dictionaries <code>f</code> and <code>g</code> such that <code>f</code>'s <code>value</code> and <code>g</code>'s <code>key</code> are of the same type can be composed:</p>
<blockquote><pre>
q)f:`a`b`c!1 2 3
q)g:1 2 3!("foo";"bar";"quux")
q)g f
a| "foo"
b| "bar"
c| "quux"
q)(g f)`b
"bar"
q)
</pre></blockquote>
<p>Considering a vector <code>v</code> as a dictionary with a <code>key</code> of the vector of integers from <code>0</code> to <code>count[v]-1</code>, then <code>v</code> can be composed with a dictionary <code>h</code> of integer <code>value</code>:</p>
<blockquote><pre>
q)v:42 137 23
q)h:`a`b`c!0 1 2
q)v h
a| 42
b| 137
c| 23
q)(v h)`b
137
q)
</pre></blockquote>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-69622333421120129892012-06-13T20:59:00.000-07:002013-03-13T20:27:14.858-07:00Quick Tip: Statistics on Booleans<p>Due to <code>q</code>’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. <code>avg</code> tends to be the most useful, but all of them should work as expected.</p>
<p>A typical use case: average <code>null</code>ness (handy while developing <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load"><abbr title="Extract, transform, load">ETL</abbr></a> code). We will simulate with a vector of <code>float</code>s mistakenly parsed as <code>int</code>s (due to mostly looking like <code>int</code>s):</p>
<blockquote><pre>
q)v:@[string 1000?1000;-10?1000;,[;".123"]]
q)t:flip(enlist`v)!(enlist"I";" ")0:v
q)t
v
---
468
959
221
694
934
865
344
997
314
580
45
745
898
935
64
177
238
361
850
241
..
q)avg null t
v| 0.01
q)
</pre></blockquote>
<p>So 1<strike>0</strike>% of <code>t.v</code> is null, which (unless this is expected) should cause us to ask ourselves whether <code>"I"</code> was really the right parse.</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com2tag:blogger.com,1999:blog-3380462977416110441.post-27951369520373581142012-06-04T20:49:00.000-07:002012-06-13T21:01:30.949-07:00What the Function?!?<p><code>q</code> is notorious for its limited debugging features. One prominent aspect of this is the way runtime errors in the interactive shell are handled: you enter the debug shell and are presented with the body of the function that failed and the error that occurred. What’s missing? The name of the function!</p>
<p>So, to fill that gap, I’ve written a function <code>wtf[]</code> which will search the workspace for its argument, returning a fully-qualified name if it finds anything. <code>wtf</code>, along with brief documentation and a couple examples, can now be found in <a href="http://code.kx.com/wsvn/code/contrib/adavies/lib/">the <code>lib</code> subdirectory of my contrib</a> (a section I hope to expand in the future).</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0tag:blogger.com,1999:blog-3380462977416110441.post-65375172955228402772012-05-28T16:24:00.003-07:002012-06-13T21:00:41.987-07:00And BackWell, I’m back from Ireland and Kx2012. I met a lot of interesting people, heard some great talks, had a lot of fun, saw a beautiful country, and came back with a bunch of ideas for more posts. All in all, a good time. Hope to see some of you more often!Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com1tag:blogger.com,1999:blog-3380462977416110441.post-27991729607373815522012-05-22T04:25:00.000-07:002012-06-13T21:00:41.980-07:00Off to Ireland!Looking forward to meeting some of you at the conference!Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com1tag:blogger.com,1999:blog-3380462977416110441.post-42554711185476387472012-05-21T22:04:00.000-07:002012-06-13T21:02:18.052-07:00Principles of Parse Trees<p>One of the more frequent topics that comes up when people start moving beyond beginner's <code>q</code> is <a href="http://code.kx.com/wiki/JB:QforMortals2/queries_q_sql#Functional_select">functional queries</a>, which are required to parameterize a query by column name, and are often useful for various other reasons as well.</p>
<p>One of trickiest parts of functional queries is the parse trees that appear in their second, third, and fourth arguments (henceforth <code>c</code>, <code>b</code>, and <code>a</code>). Here are a few rules that should summarize most of what's important to know about them:</p>
<ul>
<li><code>c</code> is a general list, each element of which is a parse tree. Its degenerate form (no conditions) is <code>()</code>, the empty general list.</li>
<li><code>c</code> appears in the results of <code>parse</code> with an <em>extra</em> level of <code>enlist</code> applied to it; this is only necessary when sending the parse tree back to <code>eval</code> (as opposed to using it to write a direct invocation of <code>?</code>).
<li><code>b</code> and <code>a</code> are both dictionaries with symbol keys and parse tree values. The degenerate form of <code>b</code> (no grouping) is <code>0b</code>; the degenerate form of <code>a</code> (select all columns as-is) is <code>()</code>.</li>
<li>Symbol literals are enlisted.</li>
<li>Names (parameters, local or global variables, table columns, functions, etc.) become symbols.</li>
<li>Functions become <a href="http://en.wikipedia.org/wiki/S-expression">sexps</a>, like in LISP.</li>
<li>Adverbs are (technically) monadic functions which take dyadic verbs as arguments and yield dyadic verbs as their return values.
<li>Global constants can be referenced by name or by value.</li>
<li>Anything which can be computed outside the query framework, may be.</li>
</ul>
<p>Some elucidation on the last few points:</p>
<p>Adverbs will show up in the output of <code>parse</code> as <code>((adverb;`verb);`x;`y)</code>:</p>
<blockquote><pre>
q)unshow parse"select <font color=red>f'[a;b]</font>from t"
(?;`t;();0b;(,`b)!,<font color=red>((';`f);`a;`b)</font>)
</pre></blockquote>
<p>While this will work fine, simply including the modified verb directly in the tree by value is also legal:</p>
<blockquote><pre>
q)?[t;();0b;(enlist`b)!enlist<font color=red>(f';`a;`b)</font>]
</pre></blockquote>
<p>Built-in functions from the <code>.q</code> namespace will be included by value in <code>parse</code> output:</p>
<blockquote><pre>
q)unshow parse"select <font color=red>dev a</font> from t"
(?;`t;();0b;(,`a)!,<font color=red>(k){sqrt var x};`a)</font>)
</pre></blockquote>
<p>These should always be replaced by their names in code:</p>
<blockquote><pre>
q)?[t;();0b;(enlist`a)!enlist<font color=red>(var;`a)</font>]
</pre></blockquote>
<p>Finally, fully parsed code may be mixed freely with expressions which generate results which are otherwise acceptable:</p>
<blockquote><pre>
q)unshow parse"select from <font color=red>t1 uj t2</font> where <font color=blue>a in(exec a from t3)</font>"
(?;<font color=red>(k){$[()~x;y;98h=@x;x,(!+x:.Q.ff[x;y])#.Q.ff[y;x];lj[(?(!x),!y)#x]y]};`t1;`t2)</font>;,,<font color=blue>(in;`a;(?;`t3;();();,`a))</font>;0b;())
</pre></blockquote>
<p>can be written as</p>
<blockquote><pre>
q)?[<font color=red>t1 uj t2</font>;enlist<font color=blue>(in;`a;exec a from t3)</font>;0b;()]
</pre></blockquote>
<p>Next time, I'll discuss how to write utility functions to make advanced constructs like multi-column <a href="http://code.kx.com/wiki/Reference/fby"><code>fby</code></a> easy to write.</p>Aaron Davieshttp://www.blogger.com/profile/05334056755840192313noreply@blogger.com0