Phd Christine - W11

profileRockstar1
Chapter11b-AdvancedAnalytics-TechnologyandTools-AdvancedSQLandMADlib.pdf

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

1Module 5: Advanced Analytics - Technology and Tools

1Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

During this lesson the following topics are covered:

Advanced SQL and MADlib:

• Window functions

• User defined functions and aggregates

• Ordered Aggregates

• MADlib

Advanced SQL and MADlib

2Module 5: Advanced Analytics - Technology and Tools

This lesson covers advanced SQL and MADlib functions

2Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Window Functions

• About Window Functions

 Returns a value per row, unlike aggregate functions

 Has its results interpreted in terms of the current row and its corresponding window partition or frame

 Is characterized by the use of the OVER clause

 Defines the window partitions, or groups of rows to apply the function

 Defines ordering of data within a window

 Defines the positional or logical framing of a row with respect to its window

3Module 5: Advanced Analytics - Technology and Tools

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

Window functions are a new class of functions introduced in Greenplum.

Window functions allow application developers to more easily compose complex OLAP queries using standard SQL commands. For example:

• Moving averages or sums can be calculated over various intervals.

• Aggregations and ranks can be reset as selected column values change.

• Complex ratios can be expressed in simple terms.

Window functions can only be used in the SELECT list, between the SELECT and FROM keywords of a query.

<Continued>

3Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Defining Window Specifications (OVER Clause)

When defining the window function:

 Include an OVER() clause

 Specify the window of data to which the function applies

• Define:  Window partitions, using the PARTITION BY clause

 Ordering within a window partition, using the ORDER BY clause

 Framing within a window partition, using ROWS and RANGE clauses

 The ORDER BY clause also defines a frame of unbounded preceding to current in the partition

5Module 5: Advanced Analytics - Technology and Tools

All window functions must have an OVER() clause. The window function specifies the window of data to which the function applies

It defines:

• Window partitions using the PARTITION BY clause.

• Ordering within a window partition using the ORDER BY clause.

• Framing within a window partition (ROWS/RANGE clauses).

• The ORDER BY clause also defines a frame of unbounded preceding to current in the partition.

5Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

About the PARTITION BY Clause

The PARTITION BY clause:

• Can be used by all window functions

• Organizes result sets into groupings based on unique values

• Allows the function to be applied to each partition independently

A fly and a flea in a flue Were imprisoned, so what could they do

Said the fly, let us flee. Let us fly said the flee So they flew through a flaw in the flue. A canner exceedingly canny One morning remarked to his granny A canner can can anything that he can But a canner can’t can a can can he?

Note: If the PARTITION BY clause is omitted, the entire result set is treated as a single window partition.

6Module 5: Advanced Analytics - Technology and Tools

About the PARTITION BY Clause

The PARTITION BY clause:

• Can be used by all window functions. However, it is not a required clause. Windows that do not use the PARTITION BY clause present the entire result set as a single window partition.

• Organizes the result set into groupings based on the unique values of the specified expression or column.

• Allows the function to be applied to each partition independently.

6Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Window Partition Example

SELECT * ,

row_number()

OVER()

FROM sale

ORDER BY cn;

row_number | cn | vn | pn | dt | qty | prc

------------+----+----+-----+------------+------+----

1 | 1 | 10 | 200 | 1401-03-01 | 1 | 0

2 | 1 | 30 | 300 | 1401-05-02 | 1 | 0

3 | 1 | 50 | 400 | 1401-06-01 | 1 | 0

4 | 1 | 30 | 500 | 1401-06-01 | 12 | 5

5 | 1 | 20 | 100 | 1401-05-01 | 1 | 0

6 | 2 | 50 | 400 | 1401-06-01 | 1 | 0

7 | 2 | 40 | 100 | 1401-01-01 | 1100 | 2400

8 | 3 | 40 | 200 | 1401-04-01 | 1 | 0

(8 rows)

SELECT * ,

row_number()

OVER(PARTITION

BY cn)

FROM sale

ORDER BY cn;

row_number | cn | vn | pn | dt | qty | prc

------------+----+----+-----+------------+------+----

1 | 1 | 10 | 200 | 1401-03-01 | 1 | 0

2 | 1 | 30 | 300 | 1401-05-02 | 1 | 0

3 | 1 | 50 | 400 | 1401-06-01 | 1 | 0

4 | 1 | 30 | 500 | 1401-06-01 | 12 | 5

5 | 1 | 20 | 100 | 1401-05-01 | 1 | 0

1 | 2 | 50 | 400 | 1401-06-01 | 1 | 0

2 | 2 | 40 | 100 | 1401-01-01 | 1100 | 2400

1 | 3 | 40 | 200 | 1401-04-01 | 1 | 0

(8 rows)

7Module 5: Advanced Analytics - Technology and Tools

Window Partition Example

The example on the slide uses the row_number window function. This function returns a row number for each unique row in the result set.

In the first example, the OVER clause does not have a PARTITION BY. The entire result set is treated as one window partition.

In the second example, the window is partitioned by the customer number. Note that the result of row number is calculated within each window partition.

7Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

RANK and ORDER BY

The ORDER BY clause:

• Can always be used by window functions

• Is required by some window functions such as RANK

• Specifies ordering within a window partition

The RANK built-in function:

• Calculates the rank of a row

• Gives rows with equal values for the specified criteria the same rank

8Module 5: Advanced Analytics - Technology and Tools

The ORDER BY clause is used to order the resulting data set based on an expression or column. It is always allowed in windows functions and is required by some window functions, including RANK. The ORDER BY clause specifies ordering within a window partition.

The RANK function is a built-in function that calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. In this case, ranks may not be consecutive numbers.

8Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Using the OVER (ORDER BY…) Clause

SELECT vn, sum(prc*qty)

FROM sale

GROUP BY vn

ORDER BY 2 DESC;

vn | sum

----+---------

40 | 2640002

30 | 180

50 | 0

20 | 0

10 | 0

(5 rows)

SELECT vn, sum(prc*qty), rank()

OVER (ORDER BY sum(prc*qty)

DESC)

FROM sale

GROUP BY vn

ORDER BY 2 DESC;

vn | sum | rank

----+---------+------

40 | 2640002 | 1

30 | 180 | 2

50 | 0 | 3

20 | 0 | 3

10 | 0 | 3

(5 rows)

9Module 5: Advanced Analytics - Technology and Tools

Using the OVER (ORDER BY…) Clause

The slide shows an example of two queries that rank vendors by sales totals.

The first query shows a window function grouped on the vendor column, vn.

The second query uses the RANK function to output a ranking number for each row. Note that the PARTITION BY clause is not used in this query. The entire result is one window partition. Also, do not confuse ORDER BY of a window specification with the ORDER BY of a query.

9Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Designating a Sliding (Moving) Window

A moving window:

• Defines a set or rows in a window partition

• Allows you to define the first row and last row

• Uses the current row as the reference point • Can be expressed in rows with the ROWS clause

• Can be expressed as a range with the RANGE clause

10Module 5: Advanced Analytics - Technology and Tools

A moving or rolling window defines a set of rows within a window partition. When you define a window frame, the window function is computed with respect to the contents of this moving frame, rather than against the fixed content of the entire window partition. Window frames can be row-based, represented by the ROWS clause, or value based, represented by a RANGE.

When the window frame is row-based, you define the number of rows offset from the current row. If the window frame is range-based, you define the bounds of the window frame in terms of data values offset from the value in the current row.

If you specify only a starting row for the window, the current row is used as the last row in the window.

The window frame can be defined as:

• UNBOUNDED or expression PRECEDING – This clause defines the first row of the window using the current row as a reference point. The starting row is expressed in terms of the number of rows preceding the current row. If you define a ROWS window frame as 5 PRECEDING, the window frame starts at the fifth row preceding the current row. If the definition is for a RANGE window frame, the window starts with the first row whose ordering column value precedes that of the current row by 5. If the term UNBOUNDED is used, the first row of the partition acts as the first row of the window.

10Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Window Framing Example

A rolling window moves through a partition of data, one row at a time.

12Module 5: Advanced Analytics - Technology and Tools

SELECT vn, dt, prc * qty

ma = AVG(prc*qty)

OVER (PARTITION BY vn

ORDER BY dt

ROWS BETWEEN

2 PRECEDING AND

2 FOLLOWING)

FROM sale;

Window Framing Example

While window framing clauses require an ORDER BY clause, not all window functions allow framing.

The ROWS and RANGE clauses specify a positional or logical rolling window that moves through a window partition of data.

In the example shown on the slide, the rolling frame applies to its partition, in this case, vendor, and ordering within that partition, date.

The example shows positional framing using the ROWS BETWEEN clause where the result is interpreted with respect to the CURRENT ROW position in the partition. The focus of the window frame moves from row to row within its partition only.

12Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved. 13Module 5: Advanced Analytics - Technology and Tools

Window Framing Example (Continued)

Window Framing Example (Continued)

The focus of the frame moves from the first selectable row in the window partition using the criteria, 2 preceding and 2 following, for the rolling window.

13Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

General Syntax of Window Function

A moving window:

• Is defined as part of a window with the ORDER BY clause as follows:

WINDOW window_name AS (window_specification)

where window_specification can be: [window_name]

[PARTITION BY expression [, ...]]

[ORDER BY expression [ASC | DESC | USING operator] [, ...]

[{RANGE | ROWS}

{ UNBOUNDED PRECEDING

| expression PRECEDING

| CURRENT ROW

| BETWEEN window_frame_bound AND window_frame_bound }]]

14Module 5: Advanced Analytics - Technology and Tools

Syntax of a moving window is specified here.

14Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

A fly and a flea in a flue Were imprisoned, so what could they do

Said the fly, let us flee. Let us fly said the flee So they flew through a flaw in the flue. A canner exceedingly canny One morning remarked to his granny A canner can can anything that he can But a canner can’t can a can can he?

Note: Any aggregate function used with the OVER clause can also be used as a window function.

Built-In Window Functions Built-In Function Description

dist() Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.

dense_rank() Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.

first_value(expr) Returns the first value in an ordered set of values.

lag(expr [,offset]

[,default])

Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.

15Module 5: Advanced Analytics - Technology and Tools

Built-In Window Functions

The slide shows built-in window functions supported within Greenplum. These built-in functions require an OVER clause.

For more detailed information on the functions, refer to the Greenplum Database Administrator Guide.

15Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Built-In Window Functions (Continued)

Built-In Function Description

last_value(expr) Returns the last value in an ordered set of values.

lead() Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.

ntile(expr) Divides an ordered dataset into a number of buckets (as defined by expr) and assigns a bucket number to each row.

percent_rank() Calculates the rank of a hypothetical row R minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).

row_number() Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).

16Module 5: Advanced Analytics - Technology and Tools

Built-In Window Functions (Continued)

16Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Check Your Knowledge

• Describe how this code will work:  SELECT dt, region, revenue,

count(*) OVER (twdw) AS moving_count, avg(revenue) OVER (twdw) AS moving_average

FROM moving_average_data mad WINDOW twdw AS (PARTITION BY region

ORDER BY dt RANGE BETWEEN '7 days'::interval PRECEDING AND '0 days'::interval FOLLOWING)

ORDER BY region, dt

Your Thoughts?

17Module 5: Advanced Analytics - Technology and Tools

Let us understand the code above. Describe below how this code will work:

17Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

User Defined Functions and Aggregates

Greenplum supports several function types, including:

• Query language functions where the functions are written in SQL

• Procedural language functions where the functions are written in:

 PL/pgSQL

 PL/TcL

 Perl

 Python

 R

• Internal functions

• C-language functions

• Use Case examples:  Second largest element in a column?

 Online auction: Who is the second highest bidder?

18Module 5: Advanced Analytics - Technology and Tools

Greenplum supports a variety of methods for developing functions, including:

• Query language support for functions developed in SQL.

• Procedural language support for functions written in languages such as PL/PGSQL, which is a subset of PL/SQL, PL/TcL, Perl, Python, and R, a programming.

• Language for statistical computing and graphics.

• Internal functions.

• C-language functions.

The Data Scientist may need to create a function that could be used in the downstream analysis. Some use case examples are shown in the slide.

Note: Greenplum supports PL/pgSQL, PL/Perl, and PL/Python out of the box. Other languages can be added with the createlang utility.

18Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Anatomy of a User-Defined Function

• Example:  CREATE FUNCTION times2(INT)

RETURNS INT AS $$

SELECT 2 * $1 $$ LANGUAGE sql

 SELECT times2(1); times2

-------- 2

(1 row)

Start function body

Function body

End function body

19Module 5: Advanced Analytics - Technology and Tools

Here we present a simple function that you can create.

Whenever you pass in a parameter, you can identify it as:

• A base or primitive type, such as integer, char, or varchar

In this example, when you call this function you pass the parameter INT. The value is multiplied by 2 and returned as numeric. $1 indicates the first parameter.

Creating, Modifying, and Dropping Functions

Functions that operate on tables must be created in the same schema. If you modify a table, you must have access to a schema. You:

• Create a function with the CREATE FUNCTION command. You must have CREATE access to the schema to create a function. A function can be created with or without parameters.

• Replace an existing function with the CREATE OR REPLACE FUNCTION command. This command either creates a function if one did not exist before, or replaces an existing function. If you are replacing an existing function, you must specify the same number of parameters and the same data types found in the original function. If not, you are actually creating a new function.

• Change a function with the ALTER FUNCTION command. You must own the function before you can modify it. If the function is to be created in another schema, you must have CREATE privilege on that schema.

• Drop or remove a function with the DROP FUNCTION command. Because you can have multiple functions with the same name but different number of parameters and/or parameter types, you must include the appropriate number of parameters and parameter types as part of the command. You must also be the owner of the function to remove the function from the schema.

19Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

User-Defined Aggregates

• Perform a single table scan

• Example: Second largest number  Keep a state: maximum 2 numbers

 New number can displace the smaller one in the state

 Greenplum extension: Merge two states

• Example :Create a sum of cubes aggregate: CREATE FUNCTION scube_accum(numeric, numeric) RETURNS numeric

AS 'select $1 + $2 * $2 * $2'

LANGUAGE SQL

IMMUTABLE

RETURNS NULL ON NULL INPUT;

CREATE AGGREGATE scube(numeric) (

SFUNC = scube_accum,

STYPE = numeric,

INITCOND = 0 );

20Module 5: Advanced Analytics - Technology and Tools

User defined aggregates performs a single table scan and it keeps state. A state is a maximum of two numbers.

In the example we create a user defined aggregate that returns a maximum of two numbers. We will learn more about it in the Lab.

CREATE AGGREGATE defines a new aggregate function. Some basic and commonly-used aggregate functions such as count, min, max, sum, avg and so on are already provided in the Greenplum Database.

If one defines new types or needs an aggregate function not already provided, then CREATE AGGREGATE can be used to provide the desired features.

An aggregate function is made from one, two or three ordinary functions (all of which must be IMMUTABLE functions): a state transition function sfunc, an optional preliminary segment-level calculation function prefunc, and an optional final calculation function ffunc. These are used as follows:

sfunc( internal-state, next-data-values ) ---> next-internal-state

prefunc( internal-state, internal-state ) ---> next-internal-state

ffunc( internal-state ) ---> aggregate-value

In the example shown above we only have the sfunc.

To test this aggregate you can try the following code:

CREATE TABLE x(a INT);

INSERT INTO x VALUES (1),(2),(3);

SELECT scube(a) FROM x;

Correct answer for reference:

SELECT sum(a*a*a) FROM x;

20Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Ordered Aggregates

• Output of aggregates may depend on order

 Example: SELECT array_agg(letter) FROM alphabet

 SQL does not guarantee a particular order

 Output could be {a,b,c} or {b,c,d} or ... depending on query optimizer, distribution of data, ...

• Sample Use Case:  Maximum value of discrete derivative? For example:

Largest single-day stock increase during last year?

• Greenplum 4.1 introduces ordered aggregates:

 SELECT array_agg(column ORDER BY expression [ASC|DESC]) FROM table

• Median can be implemented using an ordered call of array_agg()

 This will be covered in the Lab

21Module 5: Advanced Analytics - Technology and Tools

Support has been added for ordered aggregate functions in Greenplum, providing a method for controlling the order in which values are fed to an aggregate function.

In a Greenplum Database, only aggregate functions defined as ORDERED can be called with an ORDER BY clause. This can be followed by other arguments to specify a system-defined ordering.

The three built-in ordered aggregates and optional ORDER BY clauses that have been implemented in 4.1, are shown in the following table :

Aggregate Function Description

array_agg(any element) Concatenates any element into an array. Example: SELECT array_agg(anyelement ORDER BY anyelement) FROM table;

string_agg(text) Concatenates text into a string. Example: SELECT string_agg(text ORDER BY text) FROM table;

string_agg(text, delimiter) Concatenates text into a string delimited by delimiter. Example: SELECT string_agg(text, ',' ORDER BY text) FROM table;

The columns in an ORDER BY clause are not necessarily the same as the aggregated column, as shown in the following statement that references a table named product with columns store_id, product_name, and quantity.

SELECT store_id, array_agg(product_name ORDER BY quantity desc) FROM product GROUP BY store_id;

Note: There can only be one aggregated column. Multiple columns can be specified in the ORDER BY clause.

21Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

MADlib: Definition

• MAD stands for:

• lib stands for library of: • advanced (mathematical, statistical, machine learning)

• parallel & scalable

• in-database functions

• Mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice,

academic research, and open-source development.

22Module 5: Advanced Analytics - Technology and Tools

MADlib was first reported at VLDB 2009 in which MAD Skills: New Analysis Practices for Big Data was presented

• http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf

MADlib: Getting started

1) Visit http://MADlib.net

2) Download the latest release:

3) Click the MADlib Wiki link and…

4) Follow the installation guide for PostgreSQL or Greenplum

We have installed MADlib for you in the lab environment and you used MADlib functions in Module 4 labs.

Many functions are implemented as User defined functions and User defined aggregates and some examples are as follows:

• testdb=# \da madlib.mreg* • List of aggregate functions • Schema | Name | Argument data types | • -----------+---------------------+--------------------------------------------------+----- • madlib | mregr_coef | double precision, double precision[] | • madlib | mregr_pvalues | double precision, double precision[] | • madlib | mregr_r2 | double precision, double precision[] | • madlib | mregr_tstats | double precision, double precision[] |

22Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

MADlib: Getting Help…

• Check out the user guide with examples at: http://doc.madlib.net

• Need more help?

Try: http://groups.google.com/group/madlib-user-forum

23Module 5: Advanced Analytics - Technology and Tools

The slide provides additional details on obtaining help with MADlib functions.

23Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Greenplum In-database Analytical Functions

Descriptive Statistics Modeling

Quantile Association Rule Mining

Profile K-Means Clustering

CountMin (Cormode-Muthukrishnan) Sketch- based Estimator

Naïve Bayes Classification

FM (Flajolet-Martin) Sketch-based Estimator Linear Regression

MFV (Most Frequent Values) Sketch-based Estimator

Logistic Regression

Frequency Support Vector Machines

Histogram SVD Matrix Factorization

Bar Chart Decision Trees/CART

Box Plot Chart Neural Networks

Correlation Matrix Parallel Latent Dirichlet Allocation

Module 5: Advanced Analytics - Technology and Tools 24

Listed are the in-database analytic functions available natively in Greenplum and as Madlib functions (MADlib functions in bold). This list keeps expanding with every update and as the user community contributes to the MADlib.

24Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools

During this lesson the following advanced functions were covered:

• Window functions

• User defined functions and aggregates

• Ordered Aggregates

• MADlib

Summary

25Module 5: Advanced Analytics - Technology and Tools

Advanced SQL and MADlib functions were covered in this lesson.

25Module 5: Advanced Analytics - Technology and Tools

Copyright © 2014 EMC Corporation. All rights reserved.

Copyright © 2014 EMC Corporation. All Rights Reserved.

Advanced Analytics - Technology and Tools Summary

The Key Points covered in this module were:

• MapReduce , Hadoop and Hadoop ecosystems

• In-database analytics with advanced SQL functions and MADlib

26Module 5: Advanced Analytics - Technology and Tools

These are the key topics covered in this module.

26Module 5: Advanced Analytics - Technology and Tools