{"title": "On the Separation of Signals from Neighboring Cells in Tetrode Recordings", "book": "Advances in Neural Information Processing Systems", "page_first": 222, "page_last": 228, "abstract": null, "full_text": "On the Separation of Signals from \n\nNeighboring Cells in Tetrode Recordings \n\nManeesh Sahani, John S. Pezaris and Richard A. Andersen \n\nmaneesh@caltech.edu, pz@caltech.edu, andersen@vis.caltech.edu \n\nComputation and Neural Systems \nCalifornia Institute of Technology \n\n216-76 Caltech, Pasadena, CA 91125 USA \n\nAbstract \n\nWe discuss a solution to the problem of separating waveforms pro(cid:173)\nduced by multiple cells in an extracellular neural recording. We \ntake an explicitly probabilistic approach, using latent-variable mod(cid:173)\nels of varying sophistication to describe the distribution of wave(cid:173)\nforms produced by a single cell. The models range from a single \nGaussian distribution of waveforms for each cell to a mixture of \nhidden Markov models. We stress the overall statistical structure \nof the approach, allowing the details of the generative model chosen \nto depend on the specific neural preparation. \n\n1 \n\nINTRODUCTION \n\nMuch of our empirical understanding of the systems-level functioning of the brain \nhas come from a procedure called extracellular recording. The electrophysiologist \ninserts an insulated electrode with exposed tip into the extracellular space near \none or more neuron cell bodies. Transient currents due to action potentials across \nnearby cell membranes are then recorded as deflections in potential, spikes, at the \nelectrode tip. At an arbitrary location in gray matter, an extracellular probe is likely \nto see pertubations due to firing in many nearby cells, each cell exhibiting a distinct \nwaveform due to the differences in current path between the cells and the electrode \ntip. Commonly, the electrode is maneuvered until all the recorded deflections have \nalmost the same shape; the spikes are then all presumed to have arisen from a \nsingle isolated cell. This process of cell isolation is time-consuming, and it permits \nrecording from only one cell at a time. \nIT differences in spike waveform can be \nexploited to sort recorded events by cell, the experimental cost of extracellular \nrecording can be reduced, and data on interactions between simultaneously recorded \ncells can be obtained. \n\n\fSeparation of Signals from Neighboring Cells \n\n223 \n\nMany ad hoc solutions to spike sorting have been proposed and implemented, but \nthus far an explicit statistical foundation, with its accompanying benefits, has \nmostly been lacking. Lewicki (1994) is the exception to this rule and provides a well(cid:173)\nfounded probabilistic approach, but uses assumptions (such as isotropic Gaussian \nvariability) that are not well supported in many data sets (see Fee et al (1996)). \nA first step in the construction of a solution to the spike-sorting problem is the \nspecification of a model by which the data are taken to be generated. The model \nhas to be powerful enough to account for most of the variability observed in the \ndata, while being simple enough to allow tractable and robust inference. In this \npaper we will discuss a number of models, of varying sophistication, that fall into \na general framework. We will focus on the assumptions and inferential components \nthat are common to these models and consider the specific models only briefly. In \nparticular, we will state the inference algorithms for each model without derivation \nor proof; the derivations, as well as measures of performance, will appear elsewhere. \n\n2 DATA COLLECTION \n\nThe algorithms that appear in this paper are likely to be of general applicabil(cid:173)\nity. They have been developed, however, with reference to data collected from the \nparietal cortex of adult rhesus macaques using tetrodes (Pezaris et a11997). \n\nThe tetrode is a bundle of four individually insulated 13/lm-diameter wires twisted \ntogether and cut so that the exposed ends lie close together. The potential on each \nwire is amplified (custom electronics), low-pass filtered (9-pole Bessel filter, Ie = \n6.4 kHz) to prevent aliasing, and digitized (fs between 12.8 and 20 kHz) (filters and \nAjD converter from Thcker Davis Technologies). This data stream is recorded to \ndigital media; subsequent operations are currently performed off-line. \nIn preparation for inference, candidate events (where at least one cell fired) are \nidentified in the data stream. The signal is digitally high-pass filtered (fe = 0.05Is) \nand the root-mean-square (RMS) amplitude on each channel is calculated. This \nvalue is an upper bound on the noise power, and approaches the actual value when \nthe firing rates of resolvable cells are low. Epochs where the signal rises above three \ntimes the RMS amplitude for two consecutive signals are taken to be spike events. \nThe signal is upsampled in the region of each such threshold crossing, and the time \nof the maximal subsequent peak across all channels is determined to within one(cid:173)\ntenth of a sample. A short section is then extracted at the original Is such that this \npeak time falls at a fixed position in the extracted segment. One such waveform is \nextracted for each threshold crossing. \n\n3 GENERATIVE FRAMEWORK \n\nOur basic model is as follows. The recorded potential trace V(t) is the sum of \ninfluences that are due to resolvable foreground cells (which have a relatively large \ne~ect) and a background noise process. We write \n\n(1) \n\nHere, c~ is an indicator variable that takes the value 1 if the mth cell fires at time \nT and 0 otherwise. If cell m fires at T it adds a deflection of shape S::n (t - T) to the \nrecorded potential. The effect of all background neural sources, and any electrical \nnoise, is gathered into a single term 7](t). For a multichannel probe, such as a \ntetrode, all of V(t), 7](t) and S::n(t) are vector-valued. Note that we have indexed \n\n\f224 \n\nM. Sahani, 1. S. pezaris and R. A Andersen \n\n8) \n! \n\n@) \n! \n\nFigure 1: Schematic graph of the general framework. \n\nthe spike shapes from the mth cell by time; this allows us to model changes in the \nspike waveform due to intrinsic biophysical processes (such as sodium inactivation \nduring a burst of spikes) as separate to the additive background process. We will \ndiscuss models where the choice of S::n is purely stochastic, as well as models in \nwhich both the probability of firing and the shape of the action potential depend \non the recent history of the cell. \n\nIt will be useful to rewrite (1) in terms of the event waveforms described in section 2. \nAt times r when no foreground cell fires all the c~ are zero. We index the remaining \ntimes (when at least one cell fired) by i and write c!n for c~ at ri (similarly for S:n) \nto obtain \n\n(2) \n\nThis basic model is sketched, for the case of two cells, in figure 1. Circles repre(cid:173)\nsent stochastic variables and squares deterministic functions, while arrows indicate \nconditional or functional dependence. We have not drawn nodes for 0Tl and O. \nThe representation chosen is similar to, and motivated by, a directed acyclic graph \n(DAG) model of the generative distribution. For clarity, we have not drawn edges \nthat represent dependencies across time steps; the measurement V(t) depends on \nmany nearby values of S::n and c~, and f/(t) may be autocorrelated. We will con(cid:173)\ntinue to omit these edges, even when we later show connections in time between c~ \nand S::n. \n\n4 INFERENCE \n\nWe have two statistical objectives. The first is model selection, which includes the \nchoice of the number of cells in the foreground . The second is inference: finding \ngood estimates for the c~ given the measured V(t) . We will have little to say on \nthe subject of model selection in this paper, besides making the observation that \nstandard techniques such as cross-validation, penalized likelihood or approximation \nof the marginal likelihood (or \"evidence\") are all plausible approaches. We will \ninstead focus on the inference of the spike times. \n\nRather than calculating the marginalized posterior for the c~ we will find the dis(cid:173)\ntribution conditioned on the most probable values of the other variables. This is a \ncommon approximation to the true posterior (compare Lewicki (1994)). \n\nA simple property of the data allows us to estimate the most probable values of \n\n\fSeparation of Signals from Neighboring Cells \n\n225 \n\nthe parameters in stages; times at which at least one foreground cell fires can be \nidentified by a threshold, as described in section 2. We can then estimate the noise \nparameters 8Tf by looking at segments of the signal with no foreground spikes, the \nwaveform distribution and firing time parameters 8 from the collection of spike \nevents, and finally the spike times c:-n and the waveforms S:n by a filtering process \napplied to the complete data V(t) given these model parameters. \n\n4.1 NOISE \n\nWe study the noise distribution as follows. We extract lms segments from a band(cid:173)\npassed recording sampled at 16 kHz from a four-channel electrode, avoiding the \nforeground spikes identified as in section 2. Each segment is thus a 64-dimensional \nobject. We find the principal components of the ensemble of such vectors, and \nconstruct histograms of the projections of the vectors in these directions. A few of \nthese histograms are shown on a log-scale in figure 2 (points), as well as a zero-mean \nGaussian fit to the distribution projected along the same axes (lines). It is clear \nthat the Gaussian is a reasonable description, although a slight excess in kurtosis \nis visible in the higher principal components. \n\npc 1 \n\npc 12 \n\npc 24 \n\npc 36 \n\npc 48 \n\n2l102~. \n\n. , \n\nc: \n0 \n::l 10 \n8 \n\n10-2 \n\n-100 \n\n0 \nIlV \n\n100 -50 \n\no \n\n50 -20 \n\no \n\n20 -20 \n\no \n\n20 -20 \n\no \n\n20 \n\nFigure 2: Distribution of background noise. \n\nThe noise parameters are now seen to be the covariance of the noise, ETf (we repre(cid:173)\nsent it as a covariance matrix taken over the length of a spike). In general, we can \nfit an autoregressive process description to the background and apply a filter that \nwill whiten the noise. This will prove to be quite useful during the filtering stages. \n\n4.2 WAVEFORM PARAMETERS \n\nWe can make some general remarks about the process of inferring the parameters \nof the models for S~ and c:-n. Specific models and their inference algorithms will \nappear in section 5. \n\nThe models will, in general, be fit to the collection of segments extracted and aligned \nas described in section 2. At other times they have no influence on the waveform \nrecorded. We will represent these segments by Vi, implying a connection to the \nfiring events ri used in (2). It should be borne in mind that the threshold-based \ntrigger scheme will not exactly identify all of the true ri correctly. \n\nWe will assume that each segment represents a single Sm, that is, that no two \ncells fire at times close enough for their spike waveforms to overlap. This is an \nunreasonable assumption; we can shore it up partially by eliminating from our col(cid:173)\nlection of Vi segments that appear heuristically to contain overlaps (for example, \ndouble-peaked waveforms). Ultimately, however, we will need to make our infer(cid:173)\nence procedure robust enough that the parameters describing the model are well \nestimated despite the errors in the data. \n\n\f226 \n\nM. Sahani, 1. S. Pezaris and R. A Andersen \n\nFigure 3: The mixture model for Vi. \n\nThe advantage to making this assumption is that the overall model for the distribu(cid:173)\ntion of the Vi becomes a mixture: a single control variable ci sets exactly one of the \nc~ to 1. Vi is then drawn from the distribution of waveforms for the selected cell, \nconvolved with the noise. This is a formal statement of the \"clustering\" approach to \nspike-sorting. Mixture models such as these are easy to fit using the Expectation(cid:173)\nMaximization (EM) algorithm (Dempster et al1977). We will also consider models \nwith additional latent state variables, which are used to describe the distributions \nof the Sm and Cm, where again EM will be of considerable utility. \nThe measured ensemble Vi will be incorrect on a number of counts. The threshold \nmay make either false positive or false negative errors in selecting ri, and some of \nthe identified Vi will represent overlaps. We can use heuristics to minimize such \nerrors, but need to account for any remaining outliers in our models. We do so by \nintroducing additional mixture components. Segments of noise that are incorrectly \nidentified as foreground events are handled by an explicit zero mixture component \nwhose variability is entirely due to the background noise. Overlaps are handled by \nproviding very broad low-probability components spanning large areas in waveform \nspace; clusters of overlap waveforms are likely to be diffuse and sparse. \nThe mixture model is sketched in figure 3. In the basic model the variables are \nchosen independently for each cross-threshold event. The dynamic models discussed \nbelow will introduce dependencies in time. \n\n4.3 SPIKE TIMES \n\nIn our final stage of inference, we make estimates of the c~ given the V(t) and \nthe most-probable parameters fit in the previous two stages. This is exactly the \nsignal detection problem of identifying pulses (perhaps with random or else adapting \nparameters) in Gaussian noise of known covariance. Solutions to this are well known \n(McDonough and Whalen 1995) and easily adapted to the problem at hand (Sahani \net al1998). \n\n\fSeparation of Signals from Neighboring Cells \n\n227 \n\n5 SPECIFIC MODELS \n\nFinally, we describe examples of models that may be used within this framework. \nAs stated before, in this brief catalog we summarize the motivation for each, and \nstate without derivation or proof the algorithms for inference. The details of these \nalgorithms, as well as tests of performance, will appear elsewhere. \n\n5.1 CONSTANT WAVEFORM \n\nThe simplest model is one in which we take the waveform of the mth cell to remain \nunchanged and the firing probability of each cell to be constant. In this case we drop \nthe index r or i on the waveform shape and just write Sm (t - r i ) . We write Pm for \nthe probability that a given event is due to the mth cell firing. The mixture model \nis then a mixture of multivariate Gaussian distributions, each with covariance E r\" \nmean Sm and mixture fraction Pm. The EM algorithm for such a mixture is well \nknown (Nowlan 1990). \nGiven parameters 8(n) = {S~), p~)} from the nth iteration, we find the expected \nvalues of the e~ (called the responsibilities), \n\nri = \u00a3[ei I {Vi} 8(n)] = pm \n\n(n) N(Vi. S(n) E \n,m , 1) \n\n) \n\n'\"'p~n) N(Vi. S~n) E )' \nL.Jm \nin \n\n'm '1 ) \n\nm \n\nm \n\n, \n\n(3) \n\nand then reestimate the parameters from the data weighted by the responsibilities. \n\nL:r~ \nP(n+l) = _i __ . \nN ' \nm \n\n(4) \n\n5.2 REFRACTORY FIRING \n\nA simple modification to this scheme can be used to account for the refractory period \nbetween spikes from the same cell (Sahani et al1998). The model is similar to the \nGaussian mixture above, except that the choice of mixture component is no longer \nindependent for each waveform. If the waveforms arrive within a refractory period \nthey cannot have come from the same cell. This leads to the altered responsibilities: \n\ni \nrm \ni \nsm = Zi \n\nIT \n\nj ;(i,j) refractory \n\n(5) \n\nwhere Z is a normalizing constant. \nThe M step here is identical to (4), with the responsibilities s~ replacing the r~. \n\n5.3 STATIC MIXTURE \n\nAs we have suggested above, the waveform of the mth cell is not, in fact, unchanged \neach time the cell fires. Variability in excess of the additive background noise is \nintroduced by changes in the biophysical properties of the cell (due to recent firing \npatterns, or external modulators) as well as by background activity that may be \ncorrelated with foreground events. We can attempt to model this variability as \ngiving rise to a discrete set of distinct waveforms, which are then convolved with the \npreviously measured noise covariance to obtain the distribution of measurements. In \neffect, we are tiling an irregularly shaped distribution with a mixture of Gaussians \n\n\f228 \n\nM. Sahani, J S. pezaris and R. A Andersen \n\nof fixed shape, E1J\" We obtain a hierarchical mixture distribution in which each \ncomponent corresponding to a cell is itself is a mixture of Gaussians. Given a \nparticular hierarchical arrangement the parameters can be fit exactly as above. \nWhile this approach seems attractive, it suffers from the flaw that model selection \nis not well defined. In particular, the hierarchical mixture is equivalent in terms of \nlikelihood and parameters to a single-layer, flat, mixture. To avoid this problem we \nmay introduce a prior requiring that the Gaussian components from a single cell \noverlap, or otherwise lie close together. It is, however, difficult to avoid excessive \nsensitivity to such a prior. \n\n5.4 DYNAMICAL MIXTURE \n\nAn alternative approach is to replace the independent transitions between the com(cid:173)\nponents of the mixture distribution of a single cell with a dynamical process that \nreflects the manner in which both firing probability and waveform shape depend \non the recent history of the cell. In this view we may construct a mixture of hid(cid:173)\nden Markov models (HMMs), one for each cell. Our earlier mixture assumption \nnow means that the models must be coupled so that on anyone time step at most \none makes a transition to a state corresponding to firing. This structure may be \nthought of as a special case of the factorial HMM discussed by Gharamani and \nJordan (1997). \n\nThe general model is known to be intractable .. In this special case, however, the \nstandard forward-backward procedure for a single HMM can be modified to operate \non reponsibiIity-weighted data, where the reponsibilities are themselves calculated \nduring the forward phase. This is empirically found to provide an effective Estep. \nThe M step is then straightforward. \n\nAcknowledgements \n\nThis work has benefited considerably from important discussions with both Bill \nBialek and Sam Roweis. John Hopfield has provided invaluable advice and men(cid:173)\ntoring to MS. We thank Jennifer Linden and Philip Sabes for useful comments on \nan earlier version of the manuscript. Funding for various components of the work \nhas been provided by the Keck Foundation, the Sloan Center for Theoretical Neu(cid:173)\nroscience at Caltech, the Center for Neuromorphic Systems Engineering at Caltech, \nand the National Institutes of Health. \n\nReferences \n\nDempster, A. P., N. M. Laird, and D. B. Rubin (1977). J. Royal Stat. Soc. B 39, \n\n1-38. \n\nFee, M. S., P. P. Mitra, and D. Kleinfeld (1996). J. Neurophys. 76(3),3823-3833. \nGharamani, Z. and M. I. Jordan (1997). Machine Learning 29, 245-275. \nLewicki, M. S. (1994). Neural Compo 6(5), 1005-1030. \nMcDonough, R. N. and A. D. Whalen (1995). Detection of Signals in Noise (2nd \n\ned.). San Diego: Academic Press. \n\nNowlan, S. J. (1990). In D. S. Touretzky (Ed.), Advances in Neural Information \n\nProcessing Systems 2, San Mateo, CA. Morgan Kaufmann. \n\nPezaris, J. S., M. Sahani, and R. A. Andersen (1997). In J. M. Bower (Ed.), \n\nComputational Neuroscience: Trends in Research, 1997. \n\nSahani, M., J. S. Pezaris, and R. A. Andersen (1998). In J. M. Bower (Ed.), \n\nComputational Neuroscience: Trends in Research, 1998. \n\n\f", "award": [], "sourceid": 1456, "authors": [{"given_name": "Maneesh", "family_name": "Sahani", "institution": null}, {"given_name": "John", "family_name": "Pezaris", "institution": null}, {"given_name": "Richard", "family_name": "Andersen", "institution": null}]}