Wednesday, June 13, 2012

Quick Tip: Statistics on Booleans

Due to q’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. avg tends to be the most useful, but all of them should work as expected.

A typical use case: average nullness (handy while developing ETL code). We will simulate with a vector of floats mistakenly parsed as ints (due to mostly looking like ints):

q)v:@[string 1000?1000;-10?1000;,[;".123"]]
q)t:flip(enlist`v)!(enlist"I";" ")0:v
q)t
v  
---
468
959
221
694
934
865
344
997
314
580
45 
745
898
935
64 
177
238
361
850
241
..
q)avg null t
v| 0.01
q)

So 10% of t.v is null, which (unless this is expected) should cause us to ask ourselves whether "I" was really the right parse.

Labels:

2 Comments:

At March 13, 2013 at 7:55 PM , Blogger David Demner said...

Don't you mean 1% of t.v is null? But yeah, avg null is very useful (just used it to see how many nulls in a table's columns)

 
At March 13, 2013 at 8:22 PM , Blogger Aaron Davies said...

Oops, yeah, thanks, I'll fix that.

Another useful one is "sum null", which is the same thing as "count where null" but shorter, and much shorter, and simpler, when used in a query.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home