Wednesday, June 13, 2012

Quick Tip: Statistics on Booleans

Due to q’s type promotion rules, it’s entirely legal to use statistical functions on boolean vectors. avg tends to be the most useful, but all of them should work as expected.

A typical use case: average nullness (handy while developing ETL code). We will simulate with a vector of floats mistakenly parsed as ints (due to mostly looking like ints):

q)v:@[string 1000?1000;-10?1000;,[;".123"]]
q)t:flip(enlist`v)!(enlist"I";" ")0:v
q)t
v  
---
468
959
221
694
934
865
344
997
314
580
45 
745
898
935
64 
177
238
361
850
241
..
q)avg null t
v| 0.01
q)

So 10% of t.v is null, which (unless this is expected) should cause us to ask ourselves whether "I" was really the right parse.

Labels:

3 Comments:

At March 13, 2013 at 7:55 PM , Blogger David Demner said...

Don't you mean 1% of t.v is null? But yeah, avg null is very useful (just used it to see how many nulls in a table's columns)

 
At March 13, 2013 at 8:22 PM , Blogger Aaron Davies said...

Oops, yeah, thanks, I'll fix that.

Another useful one is "sum null", which is the same thing as "count where null" but shorter, and much shorter, and simpler, when used in a query.

 
At May 29, 2019 at 6:38 PM , Blogger madin said...

The store has been solving problems for me, the service is very patient, this is a perfect shopping experience.replica moncler jackets When I received the clothes, I thought it was a very good copy of the clothes. replica moncler gilet men I like this online store, this dress is simple and generous. Very satisfied with this purchase.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home