Schiminovich, Wu, Ben Johnson, and I spent parts of the weekend and today getting ready our Astronomical Observation Requests (AORs) for our large statistical *Spitzer* (no, not *this* Spitzer) mid-infrared spectroscopy program of line-emitting galaxies in the SDSS. The idea is to get a statistical sample of visual-selected galaxies, and we use a clever trick of combining many nearby galaxies into tiles

that share a peak-up (astrometric) star to get more spectra per hour than we would get with a more straightforward program. What we don't know is *which* tiles will be observed, because our program is statistical

, which means we will get what we get, when we get it, as it is convenient for the *Spitzer* observation planners.

## 2008-03-31

### Spitzer Observing

## 2008-03-28

### statistical counterpart association works

I got the code running, and I now have a justifiable probability

, for each proposed match of a GALEX source to an SDSS source, that that match is true. The joint distribution of GALEX and SDSS properties for the true matches is highly informative, and the joint distribution of false matches is just the cross of two independent distributions, one for GALEX and one for SDSS. The nice thing is that I can plot these distributions exactly, even though each source only gets a probability of being in one category or the other. Now to write up the method!

## 2008-03-27

### degeneracies

I got my statistical counterpart association system working for GALEX and SDSS, but found that the results are somewhat wacky because there exists in my current formulation of the problem a perfect degeneracy. I have a few ways to break it, but I want to do so in a responsible

way.

## 2008-03-26

### image analysis; discretization; Spitzer AORs

I spent the morning discussing general issues in image analysis with Fergus and Barron; we came up with some very specific ideas for Fergus and my joint NSF proposal on building generative models of universal image data sets.

In the afternoon, I switched back and forth between setting up the discrete version of the statistical counterpart association formalism we worked out for associating GALEX and SDSS sources, and building (with Wu) the AORs (observation descriptions) for Schiminovich and company's large Spitzer program.

## 2008-03-25

### binning, trust

I spent the morning working on a relatively careful binning of the data for Schiminovich and my statistical counterpart association work with GALEX and SDSS.

I spent the afternoon working with Lang. Among other things, we discussed the trust model for the Virtual Observatory: There is none. This seems like a significant problem, but one we could address, in principle.

## 2008-03-24

### source association

After I described my expectation-maximization approach to statistical counterpart association, Schiminovich insisted I read the paper on probabilistic cross-identification of astronomical sources by Budavari and Szalay. They have come very close to solving one of astronomy's fundamental problems.

All of astronomy and astrophysics is built on the observation and reobservation of sources on the sky. In each new observation, especially when that new observation is taken at a wavelength not previously observed, the sources detected in the observation must be matched or associated

with the sources detected in the previous observations. This source association across observations is an ill-posed statistics problem, because you don't know, *a priori*, how the sources might move or vary or appear different from observation to observation, and all the observations are noisy to boot.

It seems trivial—most astronomers have never thought much about the step of source association—but in fact there is, to my knowledge, no *well-posed* form for this association problem; every well-posed problem that people have solved (such as take the closest, or the closest within some error radius, or something like that) is some kind of approximation (usually a very uncontrolled approximation!). But Budavari and Szalay do a nice job of building a well-posed problem that is a controlled approximation to the ill-posed problem with clear assumptions and a somewhat scalable form. They don't solve the fully general problem, in which sources can move and vary radically, and in which the observer doesn't know all sources of noise, but they present a nice Bayesian formulation for cross-identification among surveys of a static sky, with known gaussian error properties. I suspect it is not hard to generalize at least somewhat further.

## 2008-03-23

### statistical counterpart association

Partly with the help of Roweis, I devised a scheme by which Schiminovich and I can determine the joint distribution of GALEX and SDSS properties of stars and galaxies, without being sure which positional coincidences are true associations across the two surveys, and which are by chance. This falls into the realm of statistical counterpart association, and it involves jointly modeling the true-match and chance-match distributions and a probability (of being a true match) for each coincidence. I started to code it up on the plane home from San Francisco.

## 2008-03-20

### Berkeley

Spent the day at UC Berkeley, where I gave my automated calibration talk and met with SDSS-3 people, DEEP people, and GRB people. In the latter category, Josh Bloom showed me some new technology he is testing for making super-cheap robotic telescopes, and also data from his group obtained by very fast follow up of the magnitude 5 (naked-eye) gamma-ray burst.

## 2008-03-19

### archetypes

Roweis and I started working on the problem of finding *N* galaxy spectra that cover

all SDSS galaxy spectra. We have to find all of the pairs *ij* where the spectrum of galaxy *i* is a good model for the spectrum of galaxy *j* and then Roweis has an algorithm that finds the smallest subset of the covering spectra *i* such that every spectrum *j* in the sample is covered. This subset is a discrete sampling of the entire space of galaxy spectra, or you can think of it as a non-parametric description of that space; we will also identify outliers.

## 2008-03-18

### blind date objective function

One of my polemics (not yet written) is that every measurement made with data must involve the optimization of some scalar objective. I wrote words about this, and made other changes, in Barron's paper on the blind assignment of date-of-observation to arbitrary astronomical imaging.

## 2008-03-17

### faint-source parallaxes

Lang and I began the process of clean rooming

my IDL (yuck) code for measuring the proper motions of very faint sources in multi-epoch imaging to Python. We used a technique pioneered by Mierle and me: Connect by skype (tm) and talk while both manipulating the same `emacs` window via old-school unix `screen` (which can be attached by multiple parties simultaneously). This is very effective for communicating about the code and allowing each parties to type while the other watches even when the parties are geographically separated. We didn't finish the clean-room, but we made more progress than is common in a typical 7 person-hours, with two of us working for 3.5 hours. As my loyal reader knows, pair coding

is a corner stone of the extreme programming model (which is badly named but very effective).

The reason for the clean-room is mainly to be fully open-source, but also to take advantage of the scientific equipment available in Python libraries; in particular, my dumb IDL implementation is not going to scale as we *(a)* add parallax estimation, and *(b)* run on millions of sources.

## 2008-03-15

### work for others

I worked entirely for others today. I made jpeg images from individual SDSS fields for Lang, who is asking whether he can understand the bandpasses and photometric zeropoints of arbitrary astronomical images. This involved writing some code, because my generic SDSS jpeg code mosaics together multiple fields onto an exact tangent-plane coordinate system, but Lang wanted distorted real fields.

I re-worked one of Zolotov's plots of the accretion histories of stars in different kinematic components as a function of height above the disk to this:

In a later post I will say what the plot means; right now I am just trying to make it look good.

## 2008-03-14

### minor planets, hacking

I hacked a script-run query to the minor planet center so I can automatically check whether or not my GALEX-only (no SDSS) sources are known minor planets. This required some of Lang's web foo (much appreciated). Some of them are clearly bright, known minor planets, and some of them aren't, but I don't yet have any evidence that I have discovered any new Solar System objects. As a commenter on an earlier post noted, the point is to measure the ultraviolet albedos of the known minor planets we have caught with GALEX; this is much easier than discovery, and more likely to produce something I can publish!

## 2008-03-13

### calibration

I spent a great morning at the CfA vising the plate scanning project DASCH. The plate scanner is beautiful, and I saw it in action. I also saw the 100 tons of plates in the plate stacks, and was suitably impressed with the care with which they are maintaining and carrying forward all the meta-data they have. Then they apply some good automatic calibration and are building an archive. I learned from Josh Grindlay that there are enough plates in the Harvard archive alone to have *fully imaged the sky 500 times over*.

Earlier in the morning I met with Chris Stubbs and discussed many issues related to performing precise calibration of astronomical data sets and providing enough information back to users that the data set will play well with others. We put in some good hours on truly fundamental things such as: What does a telescope really measure?

(integrals of the photon phase-space density, in my view) and All precise observations are necessarily relative to astronomical sources with (fundamentally) unknown spectral properties.

Stubbs is a deep thinker, and obviously I would say that because he thinks about these things much as I do! Now here's to him taking over the world and bending it to his will.

## 2008-03-12

### innovative computing

I spent the day at Harvard between the CfA and the Institute for Innovative Computing, where I gave my automated astrometry and open-source sky survey talk. I also spent some time chatting with Willman about our projects with Zolotov, and with the IIC staff about computation in science.

## 2008-03-11

### SDSS history

I spent a good fraction of the day speaking with Ann Finkbeiner (no relation to Doug), a science writer who is writing up a history of the Sloan Digital Sky Survey. She reminded me of a whole lot of great stuff. What a fun project it has been, and productive to boot.

## 2008-03-10

### figures

Zolotov and I talked figures for her current paper, which consists of observations

of the stellar components of some simulated galaxies (from within). Once again I was reminded of the issue that it really is non-trivial to usefully and informatively plot distributions of millions of points. I have many techniques, but many of them fail in the situation that you want to distinguish overlapping or intermixed but distinct populations of hundreds of thousands of points in the same two-dimensional plot window.

## 2008-03-09

### testing CDM

On Wednesday I gave the Astronomy Colloquium at Columbia; I spoke about the opportunities (and responsibility) to test CDM at non-linear scales. This all followed an extremely enjoyable lunch with the graduate students and some prospectives. Then for the last few days I have been on vacation

(building furniture on a NYC balcony).

## 2008-03-04

### minor planets, black-hole orbits

In our work on sources that show up in GALEX imaging but not SDSS imaging, Schiminovich and I have found large numbers of minor planets. This surprised me, but GALEX has huge coverage! Interestingly, GALEX has time-tagged photons, so you can get (minimal) proper motion information for fast-moving sources straight out of the GALEX time stream direcctly. My next job is to figure out which of these minor planets are already known. It *should* be all of them, but there is no reason not to check!

At pizza lunch, Gabe Perez-Giz (Columbia) gave a very nice talk about test-particle orbits around black holes. He (with Janna Levin) has found that periodic orbits are very much easier to analyze in many ways than non-periodic orbits; this would be sophistry except that he showed that for any non-periodic orbit there is an *arbitrarily similar* periodic orbit. This is completely obvious once someone (Gabe in this case) does an enormous amount of completely non-obvious work.

## 2008-03-03

### candidacy

Wu passed her candidacy exam today. Her thesis topic is on star formation in galaxies, indicators and truncation (cessation) mechanisms.

## 2008-03-02

### proper motions and parallaxes

This weekend I had a terrible (though obvious) realization: The Earth moves around the Sun at about 30 km/s. The disk velocity dispersion is also on the order of 30 km/s. Therefore, if you have a survey (like the SDSS Southern Stripe) that has a time span measured in a small number of years, *the parallax will be detected at comparable significance to the proper motion*. That is, you cannot measure proper motions without also simultaneously fitting parallaxes. Argh—and *duh!*