Custom column cheatsheet

The Mighty Cheatsheet 

Important: certain formulas / expressions can be a little tricky to write. If you have a question or need help, just message us via our chat in the bottom right!

Jump to the Glossary

Not familiar with your database, including data types? Check this post

For specific methods (e.g. how to count unique values), head to the How do I...? section.

--------------

  • Intro: Data types

Each column has a data type, which defines the kind of values the column can contain:

 

  • String = text (e.g. "Hello to you") 
  • Int = a whole number (e.g. 10, 20, 30)
  • Timestamp = date and time of day (e.g. 25/01/2019 14:25:03)
  • Float = a number with a fractional part (e.g. 4.7)
  • Bool = a value of either true or false.
  • Interval  = a time interval (e.g. 5 days)
  • JSON = a data type used for storing and transporting data. JSON is made up of key-value pairs between curly brackets (e.g. {"Book Title": "Ready Player One}).
  • Array - a data type used for storing multiple values, enclosed in square brackets ([]). 



--------------

Numbers

You can use Trevor as a calculator to perform arithmetic across columns

Examples

(column1 / column2) * 100

column1 - column2

( 5 * ( column1 * 8 ) ) - 10

Convert a negative number to a positive one: number * (0-1)

Text strings

You can add to, manipulate, and compare text columns.

Examples

Remove spaces at the start or end of a string: column.trim()

Add a word to a column: column1 + "hello"

Replace a character: “hello world".replace(" ", "_")

Extract the domain from an email address: 

column1.extract("@(.*)")

column1.substring(column1.position("@")+1)

Extract everything before a space: 

column1.extract("(.*) ")

column1.truncate(column1.position(" "))

Extract everything after a space: 

column1.extract(" (.*)")

column1.substring(column1.position(" ")+1)

Extract the first word

column1.split(“ “).at(0) 

Extract the characters after the 5th character: column1.substring(4)

Extract 4 characters after the 5th character: column1.substring(4, 4) (The second parameter is the number of characters to extract)

Extract only the 5th character: 

column1.substring(4).truncate(1)

column1.substring(4,1)

Extract only the 1st character: 

column1.substring(0).truncate(1)

column1.substring(0, 1)

Extract the last 2 characters: column1.substring(column1.length() -2)

Extract the first 5 characters after a space: column1.substring(column1.position(" ")+1, 5)

Extract everything before a number: column1.extract("^([^0-9]*)")

(Note: Not supported by SQL Server databases)

Extract the last word in a sentence: column1.extract("\S+\$")

Calculate length of text: column1.length()

Convert to lowercase: column1.lowercase()

Convert to uppercase: column1.uppercase()

Return true if the value in one column equals that of another: column1 == column2 

Return true if the value in one column contains a specific value: column1.contains("hello")

Return true if the value in one column contains the value of another column: column1.contains(other_column)

 

Advanced

u.email + " " + u.status

"one two three".split(" ")

(Note: Not supported by SQL Server databases)

r.my_column == r.my_other_column

r.my_column != r.my_other_column

 

Redshift databases

my_column.regexSubstring(".*,") extracts everything before a comma

(you can replace the comma with any other character) 

my_column.regexSubstring(",.*") extracts everything after a comma

 

If then else

Trevor supports if then else statements. Check out an example of using these in a query here.

Examples

If the value in a column is 2, return “two!”. If it isn’t, return ‘not two!”: 

if (my.column == 2) then "two!" else "not two"

-------------------------------------------

If the value in a column is “hello” AND the value of another column is “to you”, return 1, otherwise return 0.

if (my.column == "hello" and other.column == "to you") then 1 else 0

Common next step:

Using the calculate button: Sum the values of the new column. This will give us the total number of rows where mycolumn is “hello” AND other.column is “to you”

-------------------------------------------

If the value in a column is “hello” AND the value of another column is “to you” OR “to me”, return 1, otherwise return 0.

if (my.column == "hello" and (other.column == "to you" or other.column == "to me")) then 1 else 0

 

If the value of the middlename column is empty, return “EMPTY!”, otherwise return the existing column value.

if ( middlename.isEmpty() ) then "EMPTY" else middlename

 

If the value in a column is less than 0, return “negative”. If it’s greater than 0, return “positive”. If it’s 0, return “zero”.

if (my.column < 0) then "negative" else if (my.column > 0) then "positive" else "zero"

 

If the value in a column is true, return “woo”, otherwise return “boo”

if (my.column) then "woo!" else "boo" 

 

If the value in a column is false, return “oh look, it’s false”, otherwise return “nothing to see here...”

if (my.column.isFalse()) then "oh look, it’s false!" else "nothing to see here..." 

 

If the value in a column is true, return 1, otherwise return 0

if (my.column) then 1 else

 

If the value in a column is null, return 0, otherwise just return the existing value.

if (my.column.isNull() ) then 0 else my.column

 

If the value in a column is not null return true else false

my.column.isNull().isFalse()

 

If the value in a column is null or the value is not “something”, AND the value in another column is less than or equal to 5, return 1, otherwise return 0.

if (my.column.isNull() or my.column != "something" and other.column <= 5) then 1 else 2

 

If the value in a column is something, say “hello!”, otherwise set the value as null

if (my.column == “something”) then “hello!” else my.column.null()

Aggregate functions 

This is the equivalent of using the Summarize button, but adding the result to a new column.

Count rows

count_rows()

 

Count rows, grouped by a column

count_rows_by(column)

 

Sum values across rows

This is useful if you want to calculate the percentage and need to sum the values in a particular column. 

sum(my_number_column)

 

Sum values across rows, grouped by a column

sum_by(my_number_column, other_column)

 

Can also do:

max(my_date_column)

max_by(my_date_column, other_column)

min(my_date_column)

min_by(my_date_column, other_column)

Timestamps

Note: Some of the following functions (relating to intervals e.g. 3 days / 2 months) won’t work for SQL Server and Big Query databases. Pleaser scroll to the bottom of this section for Big Query and SQL Server specific functions, or chat with us for a workaround.

Examples

Convert to datetime

my_date_column.toDateTime()

 

Convert to a given timezone

u.created_at.toTimeZone("America/Los_Angeles")

Support timezones: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

 

Take one away from another 

diff_in_seconds(date1,  date2)
Check out a sample query using this here

 

Convert a date to datetime:

date.toDateTime()

 

Convert a datetime to just date (i.e. remove the time from the column):

my_date_column.toDate()

 

Subtract the created_at date from the current time

now() - u.created_at

 

Add 1 hour to a datetime

column + 1.hour()

 

Add 1 day to a datetime

Column + 1.days()

 

*If the date column is not datetime and you want to add an hour / date to it, just convert it to datetime first using .toDateTime()

 

For SQL Server and Big Query databases, please use the following functions:

add_years(1, created_at)

add_months(5, created_at)

add_weeks(2, created_at)

add_days(3, created_at)

add_hours(1, created_at)

add_minutes(5, created_at)

add_seconds(20, created_at)

subtract_year(3, created_at)

---

Check if a date is in the future. If it is, return “the future”, otherwise return “the past”

if (u.created_at > now()) then "the future" else "the past"

 

Formatting timestamps as text strings

r.created_at.day()

r.created_at.week()

r.created_at.month()

r.created_at.year()

 

Rounding down to closest day/week/month/year

r.created_at.startOfDay()

r.created_at.startOfWeek()

r.created_at.startOfMonth()

r.created_at.startOfYear()

Nullable values (any data type)

Examples

r.customer_id.isNull()

r.customer_id.isNull().isFalse()

Assign a value if a column has a null value.

my_column.ifNull(“hello”)

Assign the value of another column if a column has a null value. 

my_column.ifNull(other_column)

If the other column has a NULL value, assign the value “unknown”

my_column.ifNull(other_column).ifNull("unknown")

JSON

Access key values

my_column.key("key-name").key("key-name")

Access array values

my_json_array.at(0)

Convert to other column types

To convert to a string (from any column type):  

my_column + “”

To convert from an Int to a float: 

my_column + 0.0

Common errors

Expected type FLOAT, but found INT

It’s expecting a floating point number e.g. (5.0) but finding an integer (e.g. 5). 

Solution: if your number is 5, change it to 5.0. If your number is 10, change it to 10.0 etc. 

Glossary

STRING

.truncate(INT) returns STRING

.substring(INT) returns STRING

.substring(INT, INT) returns STRING

.position(STRING) returns INT

.length() returns INT

.isEmpty() returns BOOL

.trim() returns STRING

.ascii() returns INT

.uppercase() returns STRING

.lowercase() returns STRING

.contains(STRING) returns BOOL

.startsWith(STRING) returns BOOL

.endsWith(STRING) returns BOOL

.toInt() returns INT

.toFloat() returns FLOAT

.toDate() returns DATE

.extract(STRING) returns STRING [postgres only] using regular expression

.regexSubstring(STRING) returns STRING [redshift only]

.split(STRING) returns ARRAY

.replace(STRING,STRING) returns STRING

.properCaps() returns String

.toJson() returns JSON

.isOneOf(STRING, STRING, STRING, …) BOOL

Example:

"abracadabra".substring(2).truncate(3) 

returns STRING

 

INT

sum(INT) returns INT

sum_by(INT, OTHER) returns INT

count_rows() returns INT

count_rows_by(OTHER) returns INT

max(INT) returns INT

max_by(INT, OTHER) returns INT

min(INT) returns INT

min_by(INT, OTHER) returns INT

.pretty() returns STRING

.toString() returns STRING

.seconds() returns INTERVAL

.minutes() returns INTERVAL

.hours() returns INTERVAL

.days() returns INTERVAL

.weeks() returns INTERVAL

.months() returns INTERVAL

.years() returns INTERVAL

.isOneOf(INT, INT, INT, …) returns BOOL

.abs() returns INT

(The above don’t work for SQL Server databases)

Example:

now() - 5.days()

returns DATETIME

 

Note: if you get the error 'integer out of range' you can do a little hack and convert the column to a float. 

 

FLOAT

sum(FLOAT) returns FLOAT

sum_by(FLOAT, OTHER) returns FLOAT

count_rows() returns INT

count_rows_by(OTHER) returns INT

max(FLOAT) returns FLOAT

max_by(FLOAT, OTHER) returns FLOAT

min(FLOAT) returns FLOAT

min_by(FLOAT, OTHER) returns FLOAT

.pretty() returns STRING

.round() returns INT

.round(INT) returns FLOAT

.floor() returns INT

.ceil() returns INT

.abs() returns FLOAT

Example:

(53.98).round()

returns INT

Convert a float to an Int

Round up using .ceil() or down using .floor()

 

BOOL

.isTrue() returns BOOL

.isFalse() returns BOOL

Example:

t.created_at < now() and t.id < 5

returns BOOL

 

NULLABLE

.isNull() returns BOOL

.ifNull(NULLABLE) returns NULLABLE

Example:

t.email.ifNull( "no-email" )

returns STRING

 

DATE and DATETIME

.format(STRING) returns STRING:

Postgres: see official TO_CHAR documentation for format options

MySQL: see official DATE_FORMAT documentation for format options

Redshift: see official TO_CHAR documentation for format options

Snowflake: see official TO_CHAR documentation for format options

SQLServer: see official FORMAT documentation for format options

BigQuery: see official FORMAT_DATETIME documentation for format options

diff_in_seconds(DATE, DATE) return INT

diff_in_days(DATE, DATE) return INT

diff_in_weeks(DATE, DATE) return INT

diff_in_months(DATE, DATE) return INT

diff_in_years(DATE, DATE) return INT

now() returns DATETIME

date(STRING) returns DATE

datetime(STRING) returns DATETIME

max(DATETIME) returns DATETIME

max_by(DATETIME, OTHER) returns DATETIME

min(TIMETIME) returns DATETIME

min_by(DATETIME, OTHER) returns DATETIME

.secondsSinceEpoch() returns INT

.second() returns INT

.minute() returns INT

.hour() returns INT

.date() returns INT

.month() returns INT

.weekOfYear() returns INT

.day() returns STRING

.dayOfWeek() returns INT

.weekday() returns STRING

.week() returns STRING

.year() returns STRING

.startOfDay() returns DATETIME

.startOfWeek() returns DATETIME

.startOfMonth() returns DATETIME

.startOfYear() returns DATETIME

.toDate() returns DATE

.toDateTime() returns DATETIME

.toString() returns STRING

.toTimeZone(STRING) returns DATETIME

Example:

now() - date("2018-01-25").toDateTime()

returns INTERVAL

Input your own date

(“25/02/2018”).toDate()

 

INTERVAL

.toSeconds() returns INT

.getMonths() returns INT

.getYears() returns INT

Example:

((now() - t.created_at).toSeconds() / 60 / 60 / 24).round()

returns INT

 

JSON

.at(INT) returns JSON

.key(STRING) returns JSON

.toString() returns STRING

.length() returns INT

Example:

t.json_data.key("items").at(0).key("name").toString()

returns STRING

 

UUID

.toString() returns STRING

 

ARRAY

.toString() returns STRING

.length() return INT

.at(INT) returns STRING

Example:

if t.items.length() > 0 then t.items.at(0) else "missing"

Use cases

Find the average of multiple dates 

(E.g. average of 1/06/18 and 30/06/18 = 15/06/18).

This is currently a fun one :)

 - Click on your date column header, select more options, and select seconds since 1970.

 - This will add a new column with the number of seconds between 1970 and each date. 

 - Calculate the average of that column. 

 - Click the new column button and add this big ol’ custom expression:

date("1970-01-01").toDateTime() + your_column.round().seconds()

(where your_column = the average column we created in step 2)

 - Done! You’ve found the average of multiple dates.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.