Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data Explanation #15

Open
syncrostone opened this issue Jul 6, 2016 · 3 comments
Open

Add Data Explanation #15

syncrostone opened this issue Jul 6, 2016 · 3 comments

Comments

@syncrostone
Copy link
Collaborator

Add in clear test data preferences--00 includes inactive neurons and 01-04 prefer active neurons

Add in definitions of recall/precision/inclusion/exclusion as well as predictions for the above differences in data: For algorithms that prefer active neurons, best results expected 01-04, low recall but high precision on 00. For algorithms that prefer inactive neurons, best results expected on 00, high recall but low precision on 01-04.

This will hopefully encourage labs to submit their algorithms even if they are not the most successful because no algorithm is ideal across all of the data sets provided. Additionally, enable labs to post an explanation of their results so they can make themselves look good (and make sense of their results).

@marius10p
Copy link

Are you sure 01-04 actually prefer active neurons, or is this just relative to 00? In my experience, most people would manually choose their cells on the mean image, and make sure they have some activity. However, due to neuropil contamination, everything selected on the mean image has activity anyway.

@Selmaan
Copy link

Selmaan commented Aug 8, 2016

The methods for 01-04 vary, as far as I understand it, but all include time series information in one form or another. The harvey lab datasets for example use no information from the mean-image in detecting cells, so it is impossible for it to select for completely inactive cells. We use an optimized spectral clustering approach on very small spatial windows, and manual annotators adjust clustering parameters in real-time accompanied by viewing of resulting traces and neuropil subtraction with an individually-fit linear model. Thus the manual annotator never 'draws' ROIs on any image, and we do not falsely attribute a neuropil signal present at inactive cells for cellular activity. However the manual annotator knows where cells are likely to be and what they should look like, so this will guide them in cell selection compared to an unsupervised approach. We think we are essentially looking at the same information as the factorization-based approaches, but doing an inordinate amount of manual adjustment and fine-tuning in real-time for every cell, which would be fantastic to automate away!

I think some datasets are more what you're talking about though, involving manual 'circling' of cells on some kind of image that may include pixel-pixel correlations. I think a good explanation for the 'truth' for each dataset would be very useful as we see how algorithms match up to these various truth definitions. Glad to provide more specific info if it'd be useful for the project.

@marius10p
Copy link

Great, it sounds like all datasets should be annotated with the Harvey/Selmaan method!

Take a look at the 03 series (image attached). I would be very surprised if they used any time series information. The temptation to go after all donuts is high.

losonczy30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants