2011-06-01

Astrostatistics and Data Mining, day 3

In the morning I wrote a Gaussian processes code to model radial velocity data, just for fun. I am definitely reinventing the wheel, but I am learning a lot. I am using Python classes to cache all the expensive matrix operations; this should make things as fast as they can be without serious engineering.

In the afternoon, Lupton (Princeton) talked about the SDSS and other large surveys. He said that the decisions they made to make the catalog would not all be agreed upon by all users, but they were science-driven, and driven by particular goals. Then, when asked how we could re-make those decisions and re-analyze the data, he essentially said you can't. But he followed that by saying that he wants LSST to be different, with reanalysis possible through smart APIs or equivalent. This meshes nicely with things Anthony Brown said on day 1.

There were a bunch of talks on classifying variables, all using the Random Forest method. I have to learn more about that. A discussion following these talks got a little bit into the issues of generative modeling vs black-box classifying. I far, far prefer the former, of course, because it advances the science (and does a better job, I hope) while performing the classification.

No comments:

Post a Comment