Add Data Explanation #15

syncrostone · 2016-07-06T18:45:44Z

Add in clear test data preferences--00 includes inactive neurons and 01-04 prefer active neurons

Add in definitions of recall/precision/inclusion/exclusion as well as predictions for the above differences in data: For algorithms that prefer active neurons, best results expected 01-04, low recall but high precision on 00. For algorithms that prefer inactive neurons, best results expected on 00, high recall but low precision on 01-04.

This will hopefully encourage labs to submit their algorithms even if they are not the most successful because no algorithm is ideal across all of the data sets provided. Additionally, enable labs to post an explanation of their results so they can make themselves look good (and make sense of their results).

marius10p · 2016-08-03T11:54:47Z

Are you sure 01-04 actually prefer active neurons, or is this just relative to 00? In my experience, most people would manually choose their cells on the mean image, and make sure they have some activity. However, due to neuropil contamination, everything selected on the mean image has activity anyway.

Selmaan · 2016-08-08T01:16:00Z

The methods for 01-04 vary, as far as I understand it, but all include time series information in one form or another. The harvey lab datasets for example use no information from the mean-image in detecting cells, so it is impossible for it to select for completely inactive cells. We use an optimized spectral clustering approach on very small spatial windows, and manual annotators adjust clustering parameters in real-time accompanied by viewing of resulting traces and neuropil subtraction with an individually-fit linear model. Thus the manual annotator never 'draws' ROIs on any image, and we do not falsely attribute a neuropil signal present at inactive cells for cellular activity. However the manual annotator knows where cells are likely to be and what they should look like, so this will guide them in cell selection compared to an unsupervised approach. We think we are essentially looking at the same information as the factorization-based approaches, but doing an inordinate amount of manual adjustment and fine-tuning in real-time for every cell, which would be fantastic to automate away!

I think some datasets are more what you're talking about though, involving manual 'circling' of cells on some kind of image that may include pixel-pixel correlations. I think a good explanation for the 'truth' for each dataset would be very useful as we see how algorithms match up to these various truth definitions. Glad to provide more specific info if it'd be useful for the project.

marius10p · 2016-08-08T08:23:57Z

Great, it sounds like all datasets should be annotated with the Harvey/Selmaan method!

Take a look at the 03 series (image attached). I would be very surprised if they used any time series information. The temptation to go after all donuts is high.

freeman-lab mentioned this issue Aug 12, 2016

"ground truth" discussion #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Data Explanation #15

Add Data Explanation #15

syncrostone commented Jul 6, 2016

marius10p commented Aug 3, 2016

Selmaan commented Aug 8, 2016

marius10p commented Aug 8, 2016

Add Data Explanation #15

Add Data Explanation #15

Comments

syncrostone commented Jul 6, 2016

marius10p commented Aug 3, 2016

Selmaan commented Aug 8, 2016

marius10p commented Aug 8, 2016