This project is read-only.

Problem with NSGA

Dec 1, 2011 at 2:44 PM

Hi,
I try to use your product to achieve optimization of NSGA. My application would be choose 10 items from 1,000, which will be the best. But, I have a problem with setting the parameters to optimize the NSGAI mean if I get two genomes. One of them as a vector of indices, and second as the values. Can I realize this in such a way or in some other way example as two-dimensional array?
Please help me
Przemek

Coordinator
Dec 3, 2011 at 8:21 PM
Edited Dec 3, 2011 at 8:43 PM

Hi Przemek,

In short, the answer is "Yes". The ECJ framework (including my fork converted to C#) is totally customizable. One of the things you can do is create a single genome defined with a two-dimensional array. But I'm not exactly sure I understand the problem enough to provide guidance. Without more detail, all I can do is suggest that you look at the examples, to understand how abstract types are extended to concrete types that are specialized for a particular evolutionary strategy. Almost all of the algorithms can be overridden to do just about anything that you want. But some of them do nothing at all UNLESS you override them.

The best way to learn how to do something in ECJ (including this converted fork) is to visit the home site that is maintained at George Mason University by Sean Luke and his associates. 

http://cs.gmu.edu/~eclab/projects/ecj/

I would recommend reading the documentation you find there.

Regards,

ben

Dec 5, 2011 at 10:09 AM
Edited Dec 5, 2011 at 12:04 PM

Hi, 

I can't find information about set parameters of genomes in NSGA optimalization, which create two two-dimensional array of Integers. I forgot to add that I'm using the version done in. NET 4.0

Regards,

Przemek 

Coordinator
Dec 5, 2011 at 1:57 PM
Przemek,
If you can describe the specific problem, or provide me with the source code you are trying to use, then maybe I can help you out with this.
I have limited time because in my primary occupation I am CTO of a hedge fund. And I have a lot of other development projects that I am juggling at the moment.
Please be as specific as you can, without revealing any "trade secrets" that you might not want to share.
;-)
Thanks,
Ben
Dec 5, 2011 at 3:08 PM

I implementing a project related to the master's thesis at the university. I want to use to create the NSGA to optimize the portfolio of shares. Of course I want to create a portfolio that contains less than the available number of companiesso the genome should provide two vectorsOne is the indexes of selected companies (I prefer integer vector), and the other with their participation in the portfolio. I can determine two functions of equipment to maximize return and minimize risk.

Regards,

Przemek

Coordinator
Dec 5, 2011 at 3:50 PM

Przemek,

I'm afraid I still don't understand what it is you are trying to do. It sounds to me as though you are trying to optimize a singe result that is a function of TWO variables. The evaluation function takes TWO variables and returns a single RESULT. The result is achieved using, say, a standard SHARPE RATIO, which is defined as the maximization of portfolio return versus the risk that is taken to achieve that reward (i.e., the variance in equity). Using a standard evolutionary strategy (a genetic algorithm that can adjust ANY NUMBER of variables), what you are trying to do is optimize a SINGE evaluation result.

Are you confusing multi-objective optimization with multi-variable optimization?

Almost all forms of evolutionary computation involves the adjustment of multiple variables. I routinely optimize TENS of THOUSANDS of variables (weights in neural network architectures) to achieve a single evaluation result, such as portfolio performance.

;-)

Regards.

Ben

Coordinator
Dec 5, 2011 at 5:06 PM

Przemek,

Another way to look at the problem is to consider only a single vector of values that includes elements for all candidate stocks. Each element can be evalutated as the proportion of each stock within the portfolio.  Some number of these elements (presumably most) will tend to drop to zero weighting if they do not contribute to an optimal risk-reward scenario.

Ben

Coordinator
Dec 5, 2011 at 5:30 PM

Przemek,

In case my point isn't entirely clear yet, you might be much better off spending your time thinking about how to EVALUATE a portfolio, rather than on how to optimize the variables that define it. To create a portfolio that generalizes well on FUTURE data is where the problem lies. Backtesting it on PAST data is, unfortunately, not the answer that poses any difficulty. Asking which stocks will "be the best", in other words, is not nearly as easy we would like. You will probably only find out which WERE the best if you don't devote considerable thought to the deeper questions concerning the feasability of "prediction".

Ben

Dec 5, 2011 at 6:07 PM

Ben,

Currently, my optimization works as follows:
Starting MultiObjetive of NSGA, which returns me to the genome, for example, 10 values ​​corresponding to the shares in the portfolio of 10 companies. I use this to set rate of return of the portfolio. Then, according to Markowitz's model, I set the risk of the portfolio on the basis of covariance. And so I turn to the two objectives he sets me Parreto collection, for which I use Sharpe ratio to set best portfolio.
However, I would do this for a portfolio of 10 companies selected from a larger set, and therefore needs a vectorof indexes selected companies.

I hope that I will bring you better my problem.

Coordinator
Dec 5, 2011 at 7:10 PM

Przemek,

The best advice I can give you at this point is to look at the "MooSuite" example. That stands for "Multi-Objective Optimization". That has a set of benchmark tests that will hopefully show you how to achieve what you want. Notice how the parameters are set up in the various tests. For example, in the main "MooSuite.params" file you will find the following:

# Different problems has different default settings, so we suggest 
# using the appropriate params file for each benchmark.
 
pop.subpop.0.species = ec.vector.FloatVectorSpecies
pop.subpop.0.species.ind = ec.vector.DoubleVectorIndividual
pop.subpop.0.species.fitness.num-objectives = 2
pop.subpop.0.species.fitness.maximize = false
seed.0 = time
If you want to change the types to ones that you have customized for your particular experiment, then you need to override the defaults in there. 
For example, you can create a "DualVectorSpecies", and a "DualVectorIndividual" (with one array for the integer "gene" and another for a double or float "gene").
You will also need to create a custom breeder type, that inherits from "NSGABreeder2" (which in turn inherits from "SimpleBreeder").
You need to override the methods in the "Setup" and "BreedPopulation" in there. Look at the standard "Vector" breeding pipeline types.
I think you will see that it is very easy to handle multiple vectors, just overriding a method here and there.
Everything higher "up" in the framework, is just going to call the overridden methods at the approapriate times.
Does that help?
Ben
Dec 5, 2011 at 8:49 PM

Ben,

Thank you for your help. I think this is what I was looking for.

Regards,

Przemek

Coordinator
Dec 5, 2011 at 9:01 PM

Przemek,

Good! Let me know how it works out for you.

 If I had more free time I could probably be more helpful. But since you are doing this as part of a thesis, it's probably best that you hack it out for yourself. Also, after you walk through it once, you'll be more comfortable doing it for other problems.

It's hard to appreciate the full potential of ECJ until you learn how to adapt it for your own specific projects.

;-)

Ben 

Jan 19, 2012 at 12:34 PM

 

Hi,
I finish my project and I have one problem. Which of the defined parameters is responsible for the uniqueness of the genome, because I would like to returnthe value of the genome were unique?

Regards,

Przemek

Coordinator
Jan 29, 2012 at 8:11 PM

rogala,

I'm sorry it has taken me so long to get back to on this. But I'm not sure I understand the question. You could create your own "Statistics" class, which could return any information you want. I'm not sure how you would make that a parameter, however, without knowing more about the intent. One way to compare genomes would be to compare their string representation.

Here is a quote from MSDN that explains how string comparisons are efficiently handled in .NET:

"The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.

For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.

The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned."

Ben

Coordinator
Jan 29, 2012 at 8:24 PM

rogala,

Also note that it is NOT GOOD ENOUGH to compare string hash codes. Different strings can return the same hash code. Therefore, you will probably run into memory difficulties for a very large number of genomes using the string comparison technique. It is possible to use certain compression techniques to alleviate this problem. You might want to examine the notion of "Suffix Sorting", or more specifically something like the "Burrows-Wheeler Transform" compression strategy to compactly represent uniqueness.

I have to warn you though, this gets into the realm of Information or "Knowledge" Theory. There may be other tree comparison techniques with which I am not (off the top of my head) familiar.

Ben

Coordinator
Jan 29, 2012 at 9:44 PM

rogala,

ALSO note that memory usage will probably NOT be a problem unless the scale of your problem is quite large. There could also be situations where one might be doing, say, "Island Exhange" for speciation. If you needed to ensure GLOBAL uniqueness, then that could put some pressure on you to use very compact representations. You would probably NOT want to serialize the actual trees across the network as .NET object graphs, but would instead rely on each Island to report on uniqueness for genomes that you pass over in string form. The built-in ability to represent trees as compact LISP-like "S" expressions, is very useful for such purposes.

In the extreme case, if you wanted GLOBAL uniqueness across all generations, and islands, and even runs... then you would have to think carefully about efficiency. This is also heavily dependent on what types of evolution you wish to support. If you are bringing the top 10 percent of your genomes into a new population, and you want them all to be cloned uniquely (with respect to all of the other genomes that have ever been evaluated anywhere), then you have some analysis and work to do. 

In some cases, such as when I am trying to analyze the evolutionary process itself, I use a relational database to store all kinds of statistics, along with the representation of all genomes that were produced by the system in many cases. This is a lot of overhead for a non-trivial Problem. Using compression on a single S-Expression is probably not going to be very helpful. But compressing thousands of them in a single stream provides significant benefit.

Ben

Feb 6, 2012 at 2:59 PM

Hi,

Thank you for your reply, but my problem is different. I would like the genome, which is represented in an array of integer, contains diffrent value.

Przemek

Coordinator
Feb 12, 2012 at 6:37 AM

Sorry Rogala,

I still do not understand the specific nature of your problem.

It is standard practice to let the problem in standard GA architectures consider "diversity". But this can be very specific to particular "domains".  I cannot really help you without additional information.

There are downsides to enforcement related to "convergence" under some circumstances. You might want to look at the C++ implementation of "pure" GA algorithms in something like Mathew Wall's GALib (http://lancet.mit.edu/ga/). This is a very "traditional' treatment that is used in many commercial and research projects that may help you decide which way to proceed. (It includes, by the way, tree-based representation, as well as standard array-based representation, so you are not limited to a particular representation when you analyze the algorithms).

ECJ, and my C# variant of that, is capable of doing anything you want (provided you fully understand what you are trying to accomplish!).

I hope this general pointer helps.

As I originally stated: For a Masters Thesis, I don't feel as though I should be explicitly wrinting any custom code for you. This is something that you should strive to solve on your own, given the flexibility of the framework provided. \;-)

In all likelihood, you are smarter than I am! I am just a hack with some "Idiot Savant" talents that suit my proprietary needs quite nicely!

Best Regards,

Ben