HW #2: Data Mining by Evolutionary Computation and Genetic Learning Team is allowed to complete this homework Due date: Nov. 2 Total...



HW #2: Data Mining by Evolutionary Computation and Genetic Learning

Team is allowed to complete this homework Due date: Nov. 2

Total Score: 100 ---------------------------------------------------------------------------------------------------------------------------------------------------

Problem: Problem solving by evolutionary algorithm --------------------------------------------------------------------------------------------------------------------------------------------------- Inductive learning is one of the most commonly used learning approaches that simulate human learning process, e.g., learning by examples or mistakes. The process of inductive learning in general requires two steps training and testing (or verification). During the training period, examples are provided to the learning system and let the system to build a model (or patterns) for the given examples. During the testing period, the model (or patterns) built from the training period are tested or verified for accuracy. This training and testing steps can be repeated until it reaches a satisfactory accuracy level before real application. During those steps, parameters and models can be adjusted or modified as necessary.

In order to build an accurate learning system with predictive power, researchers have proposed many different approaches in the past. In this assignment, we will use evolutionary approach as a learning system that can learn a valid mathematical expression for a given data set, which is also known as symbolic regression problem briefly discussed in class.

To successfully complete this assignment, perform the following activities:
Research existing systems that use evolutionary computation approaches such as genetic programming, genetic

algorithm, or others and has learning or symbolic regression capability, select one, and learn how to use the

system or write your own system if you wish.

  1. (b)  For a given data set that consists of value-pairs, (xi, yi) in a text file called “train.txt”, perform a symbolic

    regression utilizing the system’s learning capability to produce or learn a model in the form of mathematical function, f(x) that represents the data set. The function set may consist of Fset = {+, , , /, sin(x)}. All constants are in the range of [0, 1] and the range of x is in [0, 100]. Use the following error function for evaluating a function f, with respect to a particular training case pi:

    Error(pi) = Σ|pi oi|, where pi is the output from a learned program p on the ith case and oi is the output of the ith case in the test data set.

  2. (c)  Once you have your system running and get a result for a model for the training data set. Test the model with test data set, “test.txt” for accuracy using the error function specified in (b). Both data sets, “train.txt” and “test.txt” will be posted later when you are ready.

  3. (d)  If you didn’t find a perfect model during the training process for the given test data set, “test.txt”, improve its performance by modifying various system parameters such as cross over rates, mutation rates, population sizes, improving/modifying how new individuals are created in the initial population, or making other necessary modifications that you believe it to be useful.

  4. (e)  Write a brief report that summarizes your activities and results including at least (1) name and source of the evolutionary learning system used, and a brief description about the system, (2) parameter settings for the system, (3) your strategies to reduce errors, (4) the best function learned in standard form of math expression with error information, e.g., (+ x 1 (* y 3)) is NOT considered as a standard math expression, (5) a brief justification on why you think this is the best function, optionally (6) retrospective comments about the system used, evolutionary approaches in general, etc.

*The grade will depend on the quality of work represented in the report clearly demonstrating your research results. ////////////////////////////////////////////How to Submit this Homework////////////////////////////////////////////////////////////////////////////////////

Upload your report in word format to Titanium. In the report, include your name and email address. Unless the source code is your own implementation, DO NOT include the source code. Instead, clearly specify the source of the program. If it is the team work, give the percentage contribution to this homework for each member. If your team cannot reach a consensus on the individual contribution, include the individual’s claimed contributed, briefly stating the tasks performed.


    • Posted: 5 years ago
    • Budget: $999999.99