Phd Christine - W11
Rockstar1
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics - Technology and Tools
1Module 5: Advanced Analytics - Technology and Tools
1Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics - Technology and Tools
During this lesson the following topics are covered:
Advanced SQL and MADlib:
• Window functions
• User defined functions and aggregates
• Ordered Aggregates
• MADlib
Advanced SQL and MADlib
2Module 5: Advanced Analytics - Technology and Tools
This lesson covers advanced SQL and MADlib functions
2Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Window Functions
• About Window Functions
Returns a value per row, unlike aggregate functions
Has its results interpreted in terms of the current row and its corresponding window partition or frame
Is characterized by the use of the OVER clause
Defines the window partitions, or groups of rows to apply the function
Defines ordering of data within a window
Defines the positional or logical framing of a row with respect to its window
3Module 5: Advanced Analytics - Technology and Tools
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
Window functions are a new class of functions introduced in Greenplum.
Window functions allow application developers to more easily compose complex OLAP queries using standard SQL commands. For example:
• Moving averages or sums can be calculated over various intervals.
• Aggregations and ranks can be reset as selected column values change.
• Complex ratios can be expressed in simple terms.
Window functions can only be used in the SELECT list, between the SELECT and FROM keywords of a query.
<Continued>
3Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Defining Window Specifications (OVER Clause)
When defining the window function:
Include an OVER() clause
Specify the window of data to which the function applies
• Define: Window partitions, using the PARTITION BY clause
Ordering within a window partition, using the ORDER BY clause
Framing within a window partition, using ROWS and RANGE clauses
The ORDER BY clause also defines a frame of unbounded preceding to current in the partition
5Module 5: Advanced Analytics - Technology and Tools
All window functions must have an OVER() clause. The window function specifies the window of data to which the function applies
It defines:
• Window partitions using the PARTITION BY clause.
• Ordering within a window partition using the ORDER BY clause.
• Framing within a window partition (ROWS/RANGE clauses).
• The ORDER BY clause also defines a frame of unbounded preceding to current in the partition.
5Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
About the PARTITION BY Clause
The PARTITION BY clause:
• Can be used by all window functions
• Organizes result sets into groupings based on unique values
• Allows the function to be applied to each partition independently
A fly and a flea in a flue Were imprisoned, so what could they do
Said the fly, let us flee. Let us fly said the flee So they flew through a flaw in the flue. A canner exceedingly canny One morning remarked to his granny A canner can can anything that he can But a canner can’t can a can can he?
Note: If the PARTITION BY clause is omitted, the entire result set is treated as a single window partition.
6Module 5: Advanced Analytics - Technology and Tools
About the PARTITION BY Clause
The PARTITION BY clause:
• Can be used by all window functions. However, it is not a required clause. Windows that do not use the PARTITION BY clause present the entire result set as a single window partition.
• Organizes the result set into groupings based on the unique values of the specified expression or column.
• Allows the function to be applied to each partition independently.
6Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Window Partition Example
SELECT * ,
row_number()
OVER()
FROM sale
ORDER BY cn;
row_number | cn | vn | pn | dt | qty | prc
------------+----+----+-----+------------+------+----
1 | 1 | 10 | 200 | 1401-03-01 | 1 | 0
2 | 1 | 30 | 300 | 1401-05-02 | 1 | 0
3 | 1 | 50 | 400 | 1401-06-01 | 1 | 0
4 | 1 | 30 | 500 | 1401-06-01 | 12 | 5
5 | 1 | 20 | 100 | 1401-05-01 | 1 | 0
6 | 2 | 50 | 400 | 1401-06-01 | 1 | 0
7 | 2 | 40 | 100 | 1401-01-01 | 1100 | 2400
8 | 3 | 40 | 200 | 1401-04-01 | 1 | 0
(8 rows)
SELECT * ,
row_number()
OVER(PARTITION
BY cn)
FROM sale
ORDER BY cn;
row_number | cn | vn | pn | dt | qty | prc
------------+----+----+-----+------------+------+----
1 | 1 | 10 | 200 | 1401-03-01 | 1 | 0
2 | 1 | 30 | 300 | 1401-05-02 | 1 | 0
3 | 1 | 50 | 400 | 1401-06-01 | 1 | 0
4 | 1 | 30 | 500 | 1401-06-01 | 12 | 5
5 | 1 | 20 | 100 | 1401-05-01 | 1 | 0
1 | 2 | 50 | 400 | 1401-06-01 | 1 | 0
2 | 2 | 40 | 100 | 1401-01-01 | 1100 | 2400
1 | 3 | 40 | 200 | 1401-04-01 | 1 | 0
(8 rows)
7Module 5: Advanced Analytics - Technology and Tools
Window Partition Example
The example on the slide uses the row_number window function. This function returns a row number for each unique row in the result set.
In the first example, the OVER clause does not have a PARTITION BY. The entire result set is treated as one window partition.
In the second example, the window is partitioned by the customer number. Note that the result of row number is calculated within each window partition.
7Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
RANK and ORDER BY
The ORDER BY clause:
• Can always be used by window functions
• Is required by some window functions such as RANK
• Specifies ordering within a window partition
The RANK built-in function:
• Calculates the rank of a row
• Gives rows with equal values for the specified criteria the same rank
8Module 5: Advanced Analytics - Technology and Tools
The ORDER BY clause is used to order the resulting data set based on an expression or column. It is always allowed in windows functions and is required by some window functions, including RANK. The ORDER BY clause specifies ordering within a window partition.
The RANK function is a built-in function that calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. In this case, ranks may not be consecutive numbers.
8Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Using the OVER (ORDER BY…) Clause
SELECT vn, sum(prc*qty)
FROM sale
GROUP BY vn
ORDER BY 2 DESC;
vn | sum
----+---------
40 | 2640002
30 | 180
50 | 0
20 | 0
10 | 0
(5 rows)
SELECT vn, sum(prc*qty), rank()
OVER (ORDER BY sum(prc*qty)
DESC)
FROM sale
GROUP BY vn
ORDER BY 2 DESC;
vn | sum | rank
----+---------+------
40 | 2640002 | 1
30 | 180 | 2
50 | 0 | 3
20 | 0 | 3
10 | 0 | 3
(5 rows)
9Module 5: Advanced Analytics - Technology and Tools
Using the OVER (ORDER BY…) Clause
The slide shows an example of two queries that rank vendors by sales totals.
The first query shows a window function grouped on the vendor column, vn.
The second query uses the RANK function to output a ranking number for each row. Note that the PARTITION BY clause is not used in this query. The entire result is one window partition. Also, do not confuse ORDER BY of a window specification with the ORDER BY of a query.
9Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Designating a Sliding (Moving) Window
A moving window:
• Defines a set or rows in a window partition
• Allows you to define the first row and last row
• Uses the current row as the reference point • Can be expressed in rows with the ROWS clause
• Can be expressed as a range with the RANGE clause
10Module 5: Advanced Analytics - Technology and Tools
A moving or rolling window defines a set of rows within a window partition. When you define a window frame, the window function is computed with respect to the contents of this moving frame, rather than against the fixed content of the entire window partition. Window frames can be row-based, represented by the ROWS clause, or value based, represented by a RANGE.
When the window frame is row-based, you define the number of rows offset from the current row. If the window frame is range-based, you define the bounds of the window frame in terms of data values offset from the value in the current row.
If you specify only a starting row for the window, the current row is used as the last row in the window.
The window frame can be defined as:
• UNBOUNDED or expression PRECEDING – This clause defines the first row of the window using the current row as a reference point. The starting row is expressed in terms of the number of rows preceding the current row. If you define a ROWS window frame as 5 PRECEDING, the window frame starts at the fifth row preceding the current row. If the definition is for a RANGE window frame, the window starts with the first row whose ordering column value precedes that of the current row by 5. If the term UNBOUNDED is used, the first row of the partition acts as the first row of the window.
10Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Window Framing Example
A rolling window moves through a partition of data, one row at a time.
12Module 5: Advanced Analytics - Technology and Tools
SELECT vn, dt, prc * qty
ma = AVG(prc*qty)
OVER (PARTITION BY vn
ORDER BY dt
ROWS BETWEEN
2 PRECEDING AND
2 FOLLOWING)
FROM sale;
Window Framing Example
While window framing clauses require an ORDER BY clause, not all window functions allow framing.
The ROWS and RANGE clauses specify a positional or logical rolling window that moves through a window partition of data.
In the example shown on the slide, the rolling frame applies to its partition, in this case, vendor, and ordering within that partition, date.
The example shows positional framing using the ROWS BETWEEN clause where the result is interpreted with respect to the CURRENT ROW position in the partition. The focus of the window frame moves from row to row within its partition only.
12Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved. 13Module 5: Advanced Analytics - Technology and Tools
Window Framing Example (Continued)
Window Framing Example (Continued)
The focus of the frame moves from the first selectable row in the window partition using the criteria, 2 preceding and 2 following, for the rolling window.
13Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
General Syntax of Window Function
A moving window:
• Is defined as part of a window with the ORDER BY clause as follows:
WINDOW window_name AS (window_specification)
where window_specification can be: [window_name]
[PARTITION BY expression [, ...]]
[ORDER BY expression [ASC | DESC | USING operator] [, ...]
[{RANGE | ROWS}
{ UNBOUNDED PRECEDING
| expression PRECEDING
| CURRENT ROW
| BETWEEN window_frame_bound AND window_frame_bound }]]
14Module 5: Advanced Analytics - Technology and Tools
Syntax of a moving window is specified here.
14Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
A fly and a flea in a flue Were imprisoned, so what could they do
Said the fly, let us flee. Let us fly said the flee So they flew through a flaw in the flue. A canner exceedingly canny One morning remarked to his granny A canner can can anything that he can But a canner can’t can a can can he?
Note: Any aggregate function used with the OVER clause can also be used as a window function.
Built-In Window Functions Built-In Function Description
dist() Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.
dense_rank() Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.
first_value(expr) Returns the first value in an ordered set of values.
lag(expr [,offset]
[,default])
Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
15Module 5: Advanced Analytics - Technology and Tools
Built-In Window Functions
The slide shows built-in window functions supported within Greenplum. These built-in functions require an OVER clause.
For more detailed information on the functions, refer to the Greenplum Database Administrator Guide.
15Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Built-In Window Functions (Continued)
Built-In Function Description
last_value(expr) Returns the last value in an ordered set of values.
lead() Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
ntile(expr) Divides an ordered dataset into a number of buckets (as defined by expr) and assigns a bucket number to each row.
percent_rank() Calculates the rank of a hypothetical row R minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).
row_number() Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).
16Module 5: Advanced Analytics - Technology and Tools
Built-In Window Functions (Continued)
16Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Check Your Knowledge
• Describe how this code will work: SELECT dt, region, revenue,
count(*) OVER (twdw) AS moving_count, avg(revenue) OVER (twdw) AS moving_average
FROM moving_average_data mad WINDOW twdw AS (PARTITION BY region
ORDER BY dt RANGE BETWEEN '7 days'::interval PRECEDING AND '0 days'::interval FOLLOWING)
ORDER BY region, dt
Your Thoughts?
17Module 5: Advanced Analytics - Technology and Tools
Let us understand the code above. Describe below how this code will work:
17Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
User Defined Functions and Aggregates
Greenplum supports several function types, including:
• Query language functions where the functions are written in SQL
• Procedural language functions where the functions are written in:
PL/pgSQL
PL/TcL
Perl
Python
R
• Internal functions
• C-language functions
• Use Case examples: Second largest element in a column?
Online auction: Who is the second highest bidder?
18Module 5: Advanced Analytics - Technology and Tools
Greenplum supports a variety of methods for developing functions, including:
• Query language support for functions developed in SQL.
• Procedural language support for functions written in languages such as PL/PGSQL, which is a subset of PL/SQL, PL/TcL, Perl, Python, and R, a programming.
• Language for statistical computing and graphics.
• Internal functions.
• C-language functions.
The Data Scientist may need to create a function that could be used in the downstream analysis. Some use case examples are shown in the slide.
Note: Greenplum supports PL/pgSQL, PL/Perl, and PL/Python out of the box. Other languages can be added with the createlang utility.
18Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Anatomy of a User-Defined Function
• Example: CREATE FUNCTION times2(INT)
RETURNS INT AS $$
SELECT 2 * $1 $$ LANGUAGE sql
SELECT times2(1); times2
-------- 2
(1 row)
Start function body
Function body
End function body
19Module 5: Advanced Analytics - Technology and Tools
Here we present a simple function that you can create.
Whenever you pass in a parameter, you can identify it as:
• A base or primitive type, such as integer, char, or varchar
In this example, when you call this function you pass the parameter INT. The value is multiplied by 2 and returned as numeric. $1 indicates the first parameter.
Creating, Modifying, and Dropping Functions
Functions that operate on tables must be created in the same schema. If you modify a table, you must have access to a schema. You:
• Create a function with the CREATE FUNCTION command. You must have CREATE access to the schema to create a function. A function can be created with or without parameters.
• Replace an existing function with the CREATE OR REPLACE FUNCTION command. This command either creates a function if one did not exist before, or replaces an existing function. If you are replacing an existing function, you must specify the same number of parameters and the same data types found in the original function. If not, you are actually creating a new function.
• Change a function with the ALTER FUNCTION command. You must own the function before you can modify it. If the function is to be created in another schema, you must have CREATE privilege on that schema.
• Drop or remove a function with the DROP FUNCTION command. Because you can have multiple functions with the same name but different number of parameters and/or parameter types, you must include the appropriate number of parameters and parameter types as part of the command. You must also be the owner of the function to remove the function from the schema.
19Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
User-Defined Aggregates
• Perform a single table scan
• Example: Second largest number Keep a state: maximum 2 numbers
New number can displace the smaller one in the state
Greenplum extension: Merge two states
• Example :Create a sum of cubes aggregate: CREATE FUNCTION scube_accum(numeric, numeric) RETURNS numeric
AS 'select $1 + $2 * $2 * $2'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
CREATE AGGREGATE scube(numeric) (
SFUNC = scube_accum,
STYPE = numeric,
INITCOND = 0 );
20Module 5: Advanced Analytics - Technology and Tools
User defined aggregates performs a single table scan and it keeps state. A state is a maximum of two numbers.
In the example we create a user defined aggregate that returns a maximum of two numbers. We will learn more about it in the Lab.
CREATE AGGREGATE defines a new aggregate function. Some basic and commonly-used aggregate functions such as count, min, max, sum, avg and so on are already provided in the Greenplum Database.
If one defines new types or needs an aggregate function not already provided, then CREATE AGGREGATE can be used to provide the desired features.
An aggregate function is made from one, two or three ordinary functions (all of which must be IMMUTABLE functions): a state transition function sfunc, an optional preliminary segment-level calculation function prefunc, and an optional final calculation function ffunc. These are used as follows:
sfunc( internal-state, next-data-values ) ---> next-internal-state
prefunc( internal-state, internal-state ) ---> next-internal-state
ffunc( internal-state ) ---> aggregate-value
In the example shown above we only have the sfunc.
To test this aggregate you can try the following code:
CREATE TABLE x(a INT);
INSERT INTO x VALUES (1),(2),(3);
SELECT scube(a) FROM x;
Correct answer for reference:
SELECT sum(a*a*a) FROM x;
20Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Ordered Aggregates
• Output of aggregates may depend on order
Example: SELECT array_agg(letter) FROM alphabet
SQL does not guarantee a particular order
Output could be {a,b,c} or {b,c,d} or ... depending on query optimizer, distribution of data, ...
• Sample Use Case: Maximum value of discrete derivative? For example:
Largest single-day stock increase during last year?
• Greenplum 4.1 introduces ordered aggregates:
SELECT array_agg(column ORDER BY expression [ASC|DESC]) FROM table
• Median can be implemented using an ordered call of array_agg()
This will be covered in the Lab
21Module 5: Advanced Analytics - Technology and Tools
Support has been added for ordered aggregate functions in Greenplum, providing a method for controlling the order in which values are fed to an aggregate function.
In a Greenplum Database, only aggregate functions defined as ORDERED can be called with an ORDER BY clause. This can be followed by other arguments to specify a system-defined ordering.
The three built-in ordered aggregates and optional ORDER BY clauses that have been implemented in 4.1, are shown in the following table :
Aggregate Function Description
array_agg(any element) Concatenates any element into an array. Example: SELECT array_agg(anyelement ORDER BY anyelement) FROM table;
string_agg(text) Concatenates text into a string. Example: SELECT string_agg(text ORDER BY text) FROM table;
string_agg(text, delimiter) Concatenates text into a string delimited by delimiter. Example: SELECT string_agg(text, ',' ORDER BY text) FROM table;
The columns in an ORDER BY clause are not necessarily the same as the aggregated column, as shown in the following statement that references a table named product with columns store_id, product_name, and quantity.
SELECT store_id, array_agg(product_name ORDER BY quantity desc) FROM product GROUP BY store_id;
Note: There can only be one aggregated column. Multiple columns can be specified in the ORDER BY clause.
21Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
MADlib: Definition
• MAD stands for:
• lib stands for library of: • advanced (mathematical, statistical, machine learning)
• parallel & scalable
• in-database functions
• Mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice,
academic research, and open-source development.
22Module 5: Advanced Analytics - Technology and Tools
MADlib was first reported at VLDB 2009 in which MAD Skills: New Analysis Practices for Big Data was presented
• http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf
MADlib: Getting started
1) Visit http://MADlib.net
2) Download the latest release:
3) Click the MADlib Wiki link and…
4) Follow the installation guide for PostgreSQL or Greenplum
We have installed MADlib for you in the lab environment and you used MADlib functions in Module 4 labs.
Many functions are implemented as User defined functions and User defined aggregates and some examples are as follows:
• testdb=# \da madlib.mreg* • List of aggregate functions • Schema | Name | Argument data types | • -----------+---------------------+--------------------------------------------------+----- • madlib | mregr_coef | double precision, double precision[] | • madlib | mregr_pvalues | double precision, double precision[] | • madlib | mregr_r2 | double precision, double precision[] | • madlib | mregr_tstats | double precision, double precision[] |
22Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
MADlib: Getting Help…
• Check out the user guide with examples at: http://doc.madlib.net
• Need more help?
Try: http://groups.google.com/group/madlib-user-forum
23Module 5: Advanced Analytics - Technology and Tools
The slide provides additional details on obtaining help with MADlib functions.
23Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Greenplum In-database Analytical Functions
Descriptive Statistics Modeling
Quantile Association Rule Mining
Profile K-Means Clustering
CountMin (Cormode-Muthukrishnan) Sketch- based Estimator
Naïve Bayes Classification
FM (Flajolet-Martin) Sketch-based Estimator Linear Regression
MFV (Most Frequent Values) Sketch-based Estimator
Logistic Regression
Frequency Support Vector Machines
Histogram SVD Matrix Factorization
Bar Chart Decision Trees/CART
Box Plot Chart Neural Networks
Correlation Matrix Parallel Latent Dirichlet Allocation
Module 5: Advanced Analytics - Technology and Tools 24
Listed are the in-database analytic functions available natively in Greenplum and as Madlib functions (MADlib functions in bold). This list keeps expanding with every update and as the user community contributes to the MADlib.
24Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics - Technology and Tools
During this lesson the following advanced functions were covered:
• Window functions
• User defined functions and aggregates
• Ordered Aggregates
• MADlib
Summary
25Module 5: Advanced Analytics - Technology and Tools
Advanced SQL and MADlib functions were covered in this lesson.
25Module 5: Advanced Analytics - Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics - Technology and Tools Summary
The Key Points covered in this module were:
• MapReduce , Hadoop and Hadoop ecosystems
• In-database analytics with advanced SQL functions and MADlib
26Module 5: Advanced Analytics - Technology and Tools
These are the key topics covered in this module.
26Module 5: Advanced Analytics - Technology and Tools