Cloud9QL is Knowi's proprietary version of SQL, designed to simplify data transformation and manipulation directly within the platform. By leveraging Cloud9QL, users can effortlessly clean, filter, and manipulate their data in multiple stages of the analysis process without needing to depend on native query languages of the underlying datastore.
Paste your own data below and write your own queries to experiment (or use the default data and queries):
Select All Data
select *
Keyword 'select' is optional.
Select Specific Fields
select Sent, Date
Select All Except Specific Fields
Use the ~ symbol before a field name to exclude that field
select *, ~Message_Type, ~Customer_Name
Aliasing Fields
You can rename fields on the fly to make the output more meaningful
select Sent as Sent Messages, Date
Applying Conditions (Filters)
Cloud9QL supports logical operators like: >, >=, <, <=, !=, like, not like, and, or
select * where opened > 100000
select * where campaign_name like 'artists'
select * where Message_Type = 'Transactional' and Sent > 200000
Sorting Data
Order results easily with asc or desc
select * where opened > 100000 order by opened desc
Limiting Results
select * where opened > 100000 order by opened desc limit 1
Unique Records
select distinct *
select distinct customer
Counting Rows
Use count to see how many rows are in your dataset
select count(*)
Or count the number of rows when a condition is met. For example, how many people in the employees table work in HR?
select count(*) where department = 'HR'
Cloud9QL supports the following aggregation functions: count, sum, avg, distinct, max, min, sd, median
Accessing Nested Values
You can access nested values, including elements within arrays, from non-relational databases like MongoDB or JSON files using dot notation and array indexing. Learn more about working with nested arrays and objects here
select
nestedObj.a, nestedArr[0],
nestedObj.secondLevel.x as SecondLevelObject,
nestedObj.secondLevel.y[1] as SecondLevelArray
Unwinding Nested Arrays
Expanding a Single Nested Field:
select customer, nestedObj.secondLevel.y as Nested;
select expand(Nested);
For multiple arrays or fields, use expand_arrays
:
select expand_arrays(nestedArr1, nestedArr2);
For filling in missing values while expanding, use expand_arrays_with_defaults
:
select expand_arrays_with_defaults(nestedArr1, null, nestedArr2, 0);
Learn more about Cloud9QL's expand functions here
Multiple Statements
You can run multiple SQL statements in a row to create a more complex, multi-step transformation. Each statement uses the output of the previous one as its starting point, and every statement must end with a semicolon.
select department
group by department
having count(*) > 5;
select department, AVG(salary) AS average_salary
group by department;
Comments
Single-Line Comments
Use --
to comment out a single line of text. Everything following --
on that line is ignored.
SELECT * FROM employees; -- This retrieves all employees
Multi-Line or Block Comments
Use /* */
to comment out multiple lines or a block of text. Everything between /*
and */
is ignored:
/* This block of code retrieves employee names and departments.
We are ignoring other fields for now. */
select name, department;
No FROM Statement Required: In Cloud9QL, queries are always associated with a specific dataset, so there's no need to explicitly include a FROM clause.
Case Insensitivity: SQL keywords (e.g., SELECT, WHERE) and field names are not case-sensitive. However, string values in conditions remain case-sensitive.
Example:
select * where region = 'North America'
Spaces Allowed in Field Names: You can include spaces in field names without needing to surround them with quotes.
Example:
select Messages Sent
Optional SELECT Statement: The SELECT
keyword is optional and can be omitted for simplicity.
Subqueries Not Supported: Cloud9QL doesn't support subqueries, but you can write multiple statements by separating them with a semicolon. You can also use the append
function to combine different aggregations from the same dataset.
You can use Cloud9QL transformations on datasets both before and after joins, directly within the Queries section.
Select the Editor tab.
You can use Cloud9QL to transform data either alongside or instead of the native query language of the underlying datastore. In this example, the Cloud9QL transformations will be applied after the Mongo Query is executed.
After the data is joined, you can apply a Cloud9QL Post Query to your dataset.
Once data is presented in a widget, further transformations can be made through the Filters button.
Find the filters button at the top right corner of any widget.
At the bottom of the tab you will find Cloud9QL Transformation.
Here you can apply your Cloud9QL transformations. Press the preview button to see the changes applied to the dataset.
Aggregations functions enable grouping/dimensions from the data.
select sum(sent)
select sum(sent), avg(sent), count(*), median(sent), max(sent), min(sent)
Supported: sum, count, avg, median, max, min
Enables aggregations based on one or more groups/dimensions.
select sum(sent) as Total Sent, Customer group by Customer
Combines values on multiple rows of a given field into an array based on group by field(s).
ARRAY(<field>)
ARRAY(<field>, <remove-duplicates-flag>)
ARRAY(<field>, <remove-duplicates-flag>, <filter-out-null-flag>)
select Stock, array(Price) as Trends group by Stock
select Stock, array(Price, true, true) as Trends group by Stock
Arithmetic operations can be used within the query
select (opened/sent)*100 as Open Rate, Customer
Supported operators:
+ |
- |
* |
+ |
/ |
^ |
% |
abs |
acos |
asin |
atan |
cbrt |
ceil |
cos |
cosh |
floor |
sqrt |
tan |
Useful to determine variance of a set of values;
select sd(opened) as Std Deviation, Customer group by customer
Cloud9QL will automatically attempt to parse various date formats.
Use str_to_date(<date>,<format>) for unsupported formats.
Allows for the differentiation between the epoch with milliseconds vs without milliseconds. This is specifically for REST queries
For example: date={$c9today,epochsecs} will format today into epoch seconds format
Converts an Epoch number of seconds to a readable date format.
select epoch_to_date(date) as datetime
Truncates a date to midnight. When used within group by performs aggregation by date.
select date(date), sent
select date(date) as Sent Date, sum(sent) as Total Sent group by date(date)
Day name of the week (Sunday, Monday etc)
select day_of_week(date), sum(sent) as Total Sent group by day_of_week(date)
Day of the month (1, 2, 3 etc)
select day_of_month(date), sum(sent) as Total Sent group by day_of_month(date)
Length of the month in days (28, 29, 30, 31 etc)
select days_in_month(date), sum(sent) as Total Sent group by days_in_month(date)
Truncates to a date to the beginning of the week (Sunday). When used within group by performs aggregation by week.
select week(date) as Sent Week, sum(sent) as Total Sent group by week(date)
By default, WEEK(<date-field>)
returns the Sunday date at midnight of a given week but with WEEK(<date-field>, offset)
, you can offset the days and alter the day of the week returned.
Offset Reference:
1 - Monday
2 - Tuesday
3 - Wednesday
4 - Thursday
5 - Friday
6 - Saturday
Example: WEEK(<date-field>, 1)
: returns the Monday date at midnight of a given week.
Week Number integer for the input date
select week_of_year(date) as Sent Week, sum(sent) as Total Sent group by week_of_year(date)
NOTES:
A week is defined from Monday-Sunday regardless of month or year.
All weeks are 7 days long
Weeks are not dependent on the month but it is possible to have 5 weeks associated with a day in a month depending on how the days are aligned, For example, if the 1st of a month falls on a Saturday, then that day will belong to the week starting on Monday which falls in the previous month.
The first week of a year will always follow the last week in December of the previous year
Truncates to the 1st of the month. Aggregates data on a monthly basis when used within group by.
select month(date) as Sent Month, sum(sent) as Total Sent group by month(date)
Month of the year (1, 2, 7, 12 etc)
select month_of_year(date), sum(sent) as Total Sent group by month_of_year(date)
Truncates to the beginning of the quarter. Aggregates data on a quarterly basis when used within group by.
select quarter(date) as Sent Quarter, sum(sent) as Total Sent group by quarter(date)
Truncates to the 1st of the year. Aggregates data on a yearly basis when used within group by.
select year(date) as Sent Quarter, sum(sent) as Total Sent group by year(date)
Truncates to the hour for dates with timestamps
select HOUR(date) as Sent Hour, sum(sent) as Total Sent group by hour(date)
Truncates/Groups to the minute for dates with timestamps
select MINUTE(timestamp) as Sent Hour, sum(sent) as Total Sent group by MINUTE(timestamp)
Current date/time
select now()
Converts a date into another format
DATE_FORMAT(<date>,<format>)
select date_format(date,dd-MMM) as Display Format
Options:
y | Year |
M | Month |
w | Week of Year |
W | Week in month |
D | Day in Year |
d | Day in Month |
F | Day of Week in Month |
E | Day name in week. Example: Tuesday,Tue |
a | Am/PM marker |
H | Hour in day (0-23) |
h | Hour in am/pm (1-12) |
m | Minute in hour |
s | Second in minute |
S | Millisecond |
z | Time zone |
Z | Time zone |
Add a datetime amount to a date
DATE_ADD(<date>,<amount>)
select date_add(date,+1y) as Date
Date conversion from a String Converts a string into date using a provided format
STR_TO_DATE(<date>,<format>)
select str_to_date(date,dd-MMM-yy HH:mm) as Converted Date
The following reserved tokens enable date queries based on current date/time:
$c9_now | Current Time |
$c9_thishour | 00:00 of the Current hour |
$c9_today | Midnight of the current date |
$c9_yesterday | Midnight, yesterday |
$c9_thisweek | Start of the current week (Sunday midnight) |
$c9_lastweek | Start of last week (Sunday midnight) |
$c9_thismonth | Midnight of the 1st of the current month |
$c9_lastmonth | Midnight of the 1st of the last month |
$c9_thisquarter | Midnight of the 1st of the current quarter (Jan, April, July, Oct) |
$c9_lastquarter | Midnight of the 1st of the last quarter (Jan, April, July, Oct) |
$c9_thisyear | Midnight, Jan 1, of the current year |
$c9_lastyear | Midnight, Jan 1, of the last year |
select * where date > $c9_thisyear
In addition, these can be further manipulated with +/- operands along with time unit identifiers. For example:
select * where date > $c9_thisyear+2m
Gets data from March onwards
select * where date > $c9_yesterday+2h
Data from 2:00 AM yesterday
The following are the list of supported time units:
min | Minutes |
h | Hours |
d | Days |
w | Weeks |
m | Months |
q | Quarters |
y | Years |
Default timezone is US/Pacific for date display within Knowi. On-premise agents inherit the server timezone.
Custom Timezones can be set in the query using:
set time_zone=US/Eastern;
Full list of Timezones here.
Example:
This calculates the amount of time between two date-times objects to a date/time unit. The result will be a positive whole number, even if the end is before the start.
For example, number of minutes between two date-times:
MINUTES_DELTA(<date field>,<date field>)
select minutes_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as minutes_delta;
select minutes_delta(now(), date) as minutes_delta;
Number of hours:
HOURS_DELTA(<date field>,<date field>)
select hours_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as hours_delta;
select hours_delta(now(), date) as hours_delta;
Days:
DAYS_DELTA(<date field>,<date field>)
select days_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as days_delta;
select days_delta(now(), date) as days_delta;
Months:
MONTHS_DELTA(<date field>,<date field>)
select months_delta('02/28/2015 22:25:34', '01/28/2015 16:28:34') as months_delta;
select months_delta(now(), date) as months_delta;
Cloud9QL provides a set of operations which can be utilized to calculate rolling and cumulative operations such as accumulate, growth, delta, simple moving average, cumulative moving average, and time moving average.
The standard usecase for these operations is to allow computation <operation> of a <value field> across a set of <dimension field(s)> and optionally grouping by a set of <grouping field(s)>.
<operation>(<value field>[, <grouping field(s)>]);
For example, compute the DELTA of Sent across Week grouping by Customer. In this example:
<operation>: DELTA
<value field>: Sent
<dimension field(s)>: Week
<grouping field(s)>: Customer
Example:
select Customer, delta(Sent, Customer) as SentDelta
There is one important restriction when using these Cloud9 QL functions: the input data need to be ordered by the <grouping field(s)> and <dimension field(s)> in that order.
Creates cumulative totals for a field between records, given a sorted dataset.
accumulate(<value field>[, <grouping field(s)>]);
select accumulate(sent), date
The above example returns a cumulative sum of sent count for a pre-sorted date order.
Calculates a growth percentage for a field between records, for a sorted dataset.
growth(<value field>[, <grouping field(s)>]);
select growth(sent), date
Calculates a difference for a field between records, for a sorted dataset.
delta(<value field>[, <grouping field(s)>]);
select delta(sent), date
Simple moving average based on a field and a window size for it. Assumes a sorted dataset.
SMA(<value field>, <window size>[, <grouping field(s)>]);
select sma(sent, 10)
Cumulative moving average returns the moving average of all data up to the current data point.
CMA(<value field>[, <grouping field(s)>]);
select cma(sent)
Time moving average based on a field, date field, and a window time unit size for it. See Time Units for all available time units. Assumes a sorted dataset
TMA(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);
select tma(sent, date, 1w)
For more details on moving average definitions, see http://en.wikipedia.org/wiki/Moving_average
Time moving sum based on a field, date field, and window time unit size.
TMS(<value field>, <date field>, <time unit window>[, <grouping field(s)>]);
select tms(sent, date, 1w)
Rank of records, given a sorted dataset.
rank([<grouping field(s)>]);
select rank(), date
The above example returns the rank (increment by 1) of each row for a pre-sorted date order.
Specify the number of decimal points to display
ROUND(<field>, <decimal points>)
select round(sent,1)
Substring between start and end indexes.
SUBSTRING(<field to check against>, < startIndex>,< length>)
select substring(Message Type,0,10)
Substring before the first occurrence of a delimiter for a field value.
SUBSTRING_BEFORE(<field to check against>, < delimiter>)
select substring_before(Message Type,someDelim)
Substring after the first occurrence of a delimiter for a field value.
SUBSTRING_AFTER(<field to check against>, < delimiter>)
select substring_after(Message Type,someDelim)
Concatenates multiple columns together. When a field name does not exist in the current dataset, a fixed string is used.
CONCAT(<field name>, < anotherfield>, < yetanotherfield>,...)
select concat(Customer, for Week of, Week)
For converting hex strings into integers.
CONV (<field name>, <field radix>, <desired radix>)
select conv(hexfield, 16, 10)
Split a string of elements separated by separator into an array. If separator is not specified, comma will be used.
SPLIT(<field name>, <separator>)
select split(Customer, ",")
Join elements of an array value together separated by separator. When a field name does not exist in the current dataset, a fixed string is used.
ARRAY_TO_STRING(<field name>, <separator>)
select array_to_string(Customer, ", ")
Upper cases a string
UPPER(<field name>)
select upper(Customer)
Lower cases a string
LOWER(<field name>)
select lower(Customer)
Removes leading and trailing spaces from a string
TRIM(<field name>)
select trim(address)
Returns the length of a string.
LENGTH(<field name>)
select length(address)
Formats a number to a locale specific currency format. Defaults to US currency format (en_US) if locale is not specified.
CURRENCY_FORMAT(<field name>, <locale>)
CURRENCY_FORMAT(<field name>, <decimal points>)
CURRENCY_FORMAT(<field name>, <locale>, <decimal points>)
select currency_format(revenue)
Example with Locale:
select currency_format(revenue,en-GBP)
This function allows you to control the display of leading and trailing zeros, prefixes and suffixes, grouping (thousands) separators, and the decimal separator.
NUMBER_FORMAT(<number>,<format>)
select number_format(clicks,##,###.00) as Number of clicks
The output for the preceding lines of code is described in the following table. The value is the number, a double , that is to be formatted. The pattern is the String that specifies the formatting properties. The output, which is a String, represents the formatted number:
value | pattern | output | explanation |
---|---|---|---|
123456.789 | ###,###.### | 123,456.789 | The pound sign (#) denotes a digit, the comma is a placeholder for the grouping separator, and the period is a placeholder for the decimal separator. |
123456.789 | ###.## | 123456.79 | The value has three digits to the right of the decimal point, but the pattern has only two. The format method handles this by rounding up. |
123.78 | 000000.000 | 000123.780 | The pattern specifies leading and trailing zeros, because the 0 character is used instead of the pound sign (#). |
12345.67 | $###,###.### | $12,345.67 | The first character in the pattern is the dollar sign ($). Note that it immediately precedes the leftmost digit in the formatted output. |
12345.67 | \u00A5###,###.### | ¥12,345.67 | The pattern specifies the currency sign for Japanese yen (¥) with the Unicode value 00A5. |
Replaces each substring of this string that matches the given regular expression with the given replacement.
In case replacement parameter is not provided an empty string value "" is used as default replacement.
REGEX_REPLACE(<field name>, < regex>) REGEX_REPLACE(<field name>, < regex>, < replacement>)
For example, to replace all occurrences of white spaces in a string
select regex_replace('Morris Park Bake Shop', '\s') as regex_replaced;
==> MorrisParkBakeShop
Extract and return all matches (non-overlapped) for the regular expression from the given input string field.
In case there is no match, NULL will be returned.
REGEX_EXTRACT(<field name>, <regex>, [<extract groups>])
For example, to extract all string occurrences between (and include) '%' characters
select regex_extract("|Morris Park| |Bake Shop|", "\|([^|]*)\|");
==> ["|Morris Park|","|Bake Shop|"]
select regex_extract("|Morris Park| |Bake Shop|", "\|(([^|]*))\|", true);
==> [ ["|Morris Park|","Morris Park"],["|Bake Shop|","Bake Shop"] ]
Returns an alternate value to be used in case the specified field does not exist or the value is NULL.
IFNULL(<field name>, <alternate value>)
select IFNULL(CustomerName, "N/A")
You can also specify an alternate column in place of an alternate value
IFNULL(<field name>, <another field name>)
select IFNULL(CustomerName, CustomerId)
Useful to access data from a previous row with an optional row offset;
select LAG([, offset[, default]])
select LAG(customer, 3)
Users can also group data in the LAG function and compare the grouped results with the previous values. Use the keyword Group by for grouping.
select LAG([, offset[, default]]) Group by (field_name)
select LAG(cuisine, 3) Group by region
CASE WHEN statements provide great flexibility when dealing with buckets of results or when you need to find a way to filter out certain results. Another way to think of it is it's a conditional logic similar to IF-THEN statements in other programming languages.
When using a CASE WHEN statement, it's important to remember you need a condition, what to do when that condition is met, and an END clause. A simple example is below:
CASE
WHEN condition
THEN result
ELSE other_result
END
For example,
SELECT
CASE
WHEN country = 'USA'
THEN 'North America'
WHEN country = 'Mexico'
THEN 'North America'
ELSE country
END AS 'West Region'
Returns the value of the field for the specified percentile rank.
PERCENTILE(<field>, <percentile>)
select percentile(sent,75)
Pivots row values for a field to columns
TRANSPOSE(<field to transpose>, < current column name>)
select transpose(Message Type,Sent)
To collapse based on a key field, use:
TRANSPOSE(<field to transpose>, <current column name>, <key column name>)
select transpose(Message Type, Sent, Customer)
REVERSE TRANSPOSE
Opposite of Transpose, folds columns into rows
Syntax:
REVERSE_TRANSPOSE(<New ID column>, <New Value column>, <Value column 1>, ...., <Value column N>)(, , , ...., )
Example: http://recordit.co/8HRx7aJtmB
As you can see, I have initial data with 5 columns. After executing: select reversetranspose(NEWID, NEWV, V1, V2, V3) I get them value columns folded into the new column, where NEWID's value is the old value column names (V1, V2, V3 one for each new row) and the NEW_V contains the corresponding value.
When you have multiple columns that you want to fold in, specify all the columns you want to pin it by, specify the "pin" columns at the front, followed by new ID and the value column, followed by a *.
Syntax:
REVERSE_TRANSPOSE(Customer, Campaign, State, <New ID column>, <New Value column>, *)
This will fold in all the columns except for the columns in the first section.
Injects last value records in for a date range when the values are not present for that date.
For example, if a sensor emits data point 100 for 01/01/2016 and and the next change of value is at 200 10 days later, you can use the inject function to inject 100 into all dates in between that range.
INJECT(<Date Field>, <Start Date for Injecting>, <End Date for Injecting>, <Injection Frequency> [, <Select Fields>])
[group by <Dimension 1>[, ..., <Dimension N>]]
The optional <Select Field> can either be * (for all fields) or a comma separated list of selected fields from input data.
select inject(date, start_range_date, end_range_date, 1d, Name, Division, Score)
group by Name, Division
Combines the results of two queries into the same dataset
(select sum(sent) as Facebook where customer=Facebook) append (select sum(sent) as LinkedIn Sent where customer=Linkedin)
Use the dot notation to query nested elements and the array notation for selecting items in an array. The example below uses a JSON nested string and uses Cloud9QL to parse it.
To unwind/expand an array, use the expand syntax.
Example:
select customer, nestedObj.secondLevel.y as Nested;
select expand(Nested);
Note that expand must be specified on its own, without any other elements within the select.
Multiple statements can be chained one after the other using a semi-colon delimiter, where the results of the first statement is passed in to the second and so on.
Example:
select sum(sent) as Sent, sum(opened) as opened, customer, month(date) as Month group by customer, month(date);
select (opened/sent)*100 as Open Rate, Opened, Sent, Customer, Month;
select round(Open Rate,1) as Open Rate, Opened, Sent, Customer, Month where Month > $c9_thisyear+2m and Open Rate > 20 order by Open Rate asc
This:
a) gets the Total Sent and Opens on a monthly basis for each customer,
b) then calculates the open rate based on the data from previous step since March,
c) adds additional criteria along with rounding the open rate.
IP to GEO function enables city level geo location from IP addresses.
Note: This uses GeoLite2 data created by MaxMind.
Example:
select ip_to_geo(IP), Some Field
This:
Retrieves lat/longs from addresses using geocod.io.
Note: This requires your own API key from geocod.io.
Example:
select geocode(<fullAddress>,<apiKey>)
This:
Cloud9QL supports the ability to forecast/predict future data points based on previous data history for any dataset.
To predict 1 point in the future:
select PREDICT(<field to predict>, <date field>, <start date for prediction>[, <prediction model>])
To predict multiple data point in the future:
select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
<Prediction Frequency>, <Prediction Data Points>[, <prediction model>])
To predict multiple data point in the future based on secondary dimension(s) (ie: grouping(s)):
select PREDICT(<Field to Predict>, <Date Field>, <Start Date for Prediction>,
<Prediction Frequency>, <Prediction Data Points>[, <prediction model>])
group by <Dimension 1>, ..., <Dimension N>
This:
Example:
Cohorts are useful to determine a grouping of data across a date dimension. For example, how many users remain active n months after initial sign up, etc.
We currently support 2 types of input data.
select COHORT(<Date Column>, <Cohort Date Definition>, <Cohort Period Definition>), <Cohort Operation>
group by <Cohort Date>, <Cohort Period>
Note:
Input data needs to be sorted by Date ascending order.
Cohort Period returns a number (ie: the period) or a date. Example:
a. 1m: Cohort Period as number
b. (1m): Cohort Period as Date
Example 1: If we already have the cohort date populated
Example 2: If we only have transactional events like the following example:
Example 3: Cohorts can be used in combination with transpose to flatten the result based on date
Example 4: A common cohort is retention in percentage format which can be computed as follows:
Unwinds nested field values: This function will expand array, map, or array of maps data structure into rows.
select EXPAND(<Column with Nested Value>)
Example: In this example, Name field's value is an array of map of Last Name and First Name.
Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:
select expand(Name);
select * where First Name like John
Unwinds multiple nested field values: This function can expand multiple arrays, maps, or arrays of maps data structures into rows.
select EXPAND_ARRAYS(<Column with Nested Value 1>, ..., <Column with Nested Value N>)
Example: In this example, there are two nested objects Grade and Address. Grade field's value is an array of map of three fields date, grade, and score. Address field's value is a map of four fields building, coord, street, zipcode.
Note that this function must be used in isolation, i.e., cannot be used in combination with others. Use query chaining to manipulate the results:
select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays(Nested1, Nested2);
Nested fields before unwinding with EXPAND_ARRAYS() function:
To unwind/expand multiple arrays, use the expand_arrays syntax.
Example:
select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays(Nested1, Nested2);
Nested fields after unwinding with EXPAND_ARRAYS() function:
Note that expand_arrays must be specified on its own, without any other elements within the select.
EXPANDARRAYSWITH_DEFAULT
A more powerful version of EXPAND_ARRAYS. This function can expand multiple arrays, maps, or arrays of maps data structures into rows. It also allows you to fill in blank fields of expanded arrays with a default value like nulls or a chosen value if the arrays are different in size.
EXPAND_ARRAYS_WITH_DEFAULTS(<field 1>, 0, <field 2>, now(), <field 3>, LAST, <field 4>, <value from another field on the same row>, ..., <field N>, NULL)
Example: In this example, there are two nested objects Grade and Address. Grade field's value is an array of map of three fields date, grade, and score. Address field's value is a map of four fields building, coord, street, zipcode.
When you unwind grade and coord with EXPAND_ARRAYS(), you find that grade has more rows than coord.
With this function, you can choose a default value to fill in the blank spaces. This value will follow that object in the function. Below, we chose to fill in the blank spaces for the coord field with 0.
Note that a default value is required for all fields. If there is no particular default value you wish to add, simply enter null after the field as seen below for the field grade.
select grade, address.coord as coord;
select expand_arrays_with_defaults(grade, null, coord, 0);
The Result:
To unwind/expand a multiple arrays and fill in values in the function, use the expand_arrays_with_defaults syntax.
Example:
select nestedObj1 as Nested1, nestedObj2.secondLevel.y as Nested2;
select expand_arrays_with_defaults(Nested1,null, Nested2,0);
Note that expand_arrays_with defaults must be specified on its own, without any other elements within the select.
In case your data is a JSON string, PARSE function can be used to convert it to object which can be further manipulated and processed.
select PARSE(<JSON String Column>)
Example: In this example, Name field's value is an array of map of Last Name and First Name.
In case your data needs to be encrypted/obfuscated for storing, you can use the ENCRYPT and DECRYPT functions to achieve this goal.
select ENCRYPT(<Column>)
select DECRYPT(<Column>)