Index problem in `run_enact()` #13

Nick-Eagles · 2024-11-27T15:17:31Z

Hello,

Thanks for developing this useful software. I'm attempting to run the complete ENACT pipeline from within Python. When invoking the run_enact() method of an ENACT object, many steps seem to complete, but the analysis ultimately halts with a complaint that appears to reference a mismatch in cell IDs. I'll attach the full configuration and traceback below.

Best,
-Nick

Configuration as printed when running run_enact():

analysis_name: H1-W369TJK_D1_9090
 run_synthetic: False
 cache_dir: /dcs04/lieber/lcolladotor/Habenula_R01_LIBD4270/Habenula_Visium/processed-data/09_HD_cell_level/enact/H1-W369TJK_D1_9090
 wsi_path: /dcs04/lieber/lcolladotor/Habenula_R01_LIBD4270/Habenula_Visium/raw-data/images/vis-hd/H1-W369TJK_D1_9090.tif
 visiumhd_h5_path: /dcs04/lieber/lcolladotor/Habenula_R01_LIBD4270/Habenula_Visium/processed-data/01_spaceranger/H1-W369TJK_D1_9090/outs/binned_outputs/square_002um/filtered_feature_bc_matrix.h5
 tissue_positions_path: /dcs04/lieber/lcolladotor/Habenula_R01_LIBD4270/Habenula_Visium/processed-data/01_spaceranger/H1-W369TJK_D1_9090/outs/binned_outputs/square_002um/spatial/tissue_positions.parquet
 segmentation: True
 bin_to_geodataframes: True
 bin_to_cell_assignment: True
 cell_type_annotation: True
 seg_method: stardist
 patch_size: 4000
 bin_representation: polygon
 bin_to_cell_method: weighted_by_area
 cell_annotation_method: celltypist
 cell_typist_model: Developing_Human_Brain.pkl
 use_hvg: True
 n_hvg: 1000
 n_clusters: 4
 chunks_to_run: []
 cell_markers: {}

Traceback:

Traceback (most recent call last):
  File "/dcs04/lieber/lcolladotor/Habenula_R01_LIBD4270/Habenula_Visium/code/09_HD_cell_level/enact/01_run_enact.py", line 28, in <module>
    so_hd.run_enact()
  File "/jhpce/shared/libd/core/visium_hd/1.0/hd_env/lib/python3.9/site-packages/enact/pipeline.py", line 1040, in run_enact
    self.package_results()
  File "/jhpce/shared/libd/core/visium_hd/1.0/hd_env/lib/python3.9/site-packages/enact/pipeline.py", line 981, in package_results
    adata = pack_obj.df_to_adata(results_df, cell_by_gene_df)
  File "/jhpce/shared/libd/core/visium_hd/1.0/hd_env/lib/python3.9/site-packages/enact/package_results.py", line 111, in df_to_adata
    adata.obsm["spatial"] = results_df[spatial_cols].astype(int)
  File "/jhpce/shared/libd/core/visium_hd/1.0/hd_env/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 199, in __setitem__
    value = self._validate_value(value, key)
  File "/jhpce/shared/libd/core/visium_hd/1.0/hd_env/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 264, in _validate_value
    raise ValueError(msg) from None
ValueError: value.index does not match parent’s obs names:
Index are different

Index values are different (100.0 %)
[left]:  Index(['ID_10822', 'ID_10867', 'ID_10934', 'ID_11145', 'ID_11195', 'ID_11507',
       'ID_11785', 'ID_12211', 'ID_4692', 'ID_4693',
       ...
       'ID_9269', 'ID_9270', 'ID_9272', 'ID_9273', 'ID_9275', 'ID_9276',
       'ID_9277', 'ID_9278', 'ID_9279', 'ID_9282'],
      dtype='object', name='id', length=59071)
[right]: Index(['ID_1', 'ID_10', 'ID_100', 'ID_1000', 'ID_1002', 'ID_1003', 'ID_1004',
       'ID_1006', 'ID_1007', 'ID_1008',
       ...
       'ID_60482', 'ID_60485', 'ID_60493', 'ID_60497', 'ID_60498', 'ID_60499',
       'ID_60501', 'ID_60502', 'ID_60507', 'ID_60508'],
      dtype='object', name='id', length=59071)

The text was updated successfully, but these errors were encountered:

XinchaoWu99 · 2024-12-03T18:27:37Z

Same problem and possible solution
Hi, I encounter the same problem here:

ValueError Traceback (most recent call last)

Cell In[2], line 20
2 sample = "Visium_HD_060424_5X" # Visium_HD_060424_5X Visium_HD_060424-WT
4 so_hd = ENACT(
5 cache_dir=f"{data_path}/test_cache",
6 wsi_path=f"{data_path}/2024_05_22_04_5xfad.tif", # "2024_05_22_04_5xfad.tif" "2024_05_22_02_ctrl.tif"
(...)
17 cell_type_annotation=True,
18 )
---> 20 so_hd.run_enact()

File ~/.local/lib/python3.9/site-packages/enact/pipeline.py:1040, in ENACT.run_enact(self)
1038 if self.cell_type_annotation:
1039 self.run_cell_type_annotation()
-> 1040 self.package_results()
1042 else:
1043 # Generating synthetic data
1044 if self.analysis_name in ["xenium", "xenium_nuclei"]:

File ~/.local/lib/python3.9/site-packages/enact/pipeline.py:981, in ENACT.package_results(self)
977 cell_by_gene_df = pack_obj.merge_cellassign_output_files()
978 results_df = pd.read_csv(
979 os.path.join(self.cellannotation_results_dir, "merged_results.csv")
980 )
--> 981 adata = pack_obj.df_to_adata(results_df, cell_by_gene_df)
982 pack_obj.save_adata(adata)
983 pack_obj.create_tmap_file()

File ~/.local/lib/python3.9/site-packages/enact/package_results.py:111, in PackageResults.df_to_adata(self, results_df, cell_by_gene_df)
109 adata = anndata.AnnData(cell_by_gene_df.set_index("id"))
110 # adata = anndata.AnnData(results_df[stat_columns].astype(int))
--> 111 adata.obsm["spatial"] = results_df[spatial_cols].astype(int)
112 adata.obsm["stats"] = results_df[stat_columns].astype(int)
113 # This column is the output of cell type inference pipeline

File ~/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py:199, in AlignedActualMixin.setitem(self, key, value)
198 def setitem(self, key: str, value: V):
--> 199 value = self._validate_value(value, key)
200 self._data[key] = value

File ~/.local/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py:264, in AxisArraysBase._validate_value(self, val, key)
262 except AssertionError as e:
263 msg = f"value.index does not match parent’s {self.dim} names:\n{e}"
--> 264 raise ValueError(msg) from None
265 else:
266 msg = "Index.equals and pd.testing.assert_index_equal disagree"

ValueError: value.index does not match parent’s obs names:
Index are different

Index values are different (100.0 %)
[left]: Index(['ID_1', 'ID_10', 'ID_100', 'ID_1000', 'ID_1001', 'ID_1002', 'ID_1003',
'ID_1004', 'ID_1005', 'ID_1006',
...
'ID_50670', 'ID_50671', 'ID_50672', 'ID_50673', 'ID_50674', 'ID_50675',
'ID_50676', 'ID_50677', 'ID_50678', 'ID_50679'],
dtype='object', name='id', length=21796)
[right]: Index(['ID_10003', 'ID_10005', 'ID_10007', 'ID_10010', 'ID_10012', 'ID_10021',
'ID_10023', 'ID_10026', 'ID_10035', 'ID_10037',
...
'ID_49667', 'ID_49668', 'ID_49670', 'ID_49671', 'ID_49672', 'ID_49675',
'ID_49676', 'ID_49678', 'ID_49684', 'ID_49685'],
dtype='object', name='id', length=21796)
"""'

And I think there is a bug in the function PackageResults.df_to_adata() that it tries to set adata.obsm directly form results_df which may cause index problem, thus I revised the code in this function to get it done:

def df_to_adata(self, results_df, cell_by_gene_df):
    """Converts pd.DataFrame object with pipeline results to AnnData

    Args:
        results_df (_type_): _description_

    Returns:
        anndata.AnnData: Anndata with pipeline outputs
    """
    file_columns = results_df.columns
    spatial_cols = ["cell_x", "cell_y"]
    stat_columns = ["num_shared_bins", "num_unique_bins", "num_transcripts"]
    results_df.loc[:, "id"] = results_df["id"].astype(str)
    results_df = results_df.set_index("id")
    results_df["num_transcripts"] = results_df["num_transcripts"].fillna(0)
    results_df["cell_type"] = results_df["cell_type"].str.lower()
    # adata = anndata.AnnData(cell_by_gene_df.set_index("id").astype(int))
    adata = anndata.AnnData(cell_by_gene_df.set_index("id"))
    adata.obs = adata.obs.merge(results_df, on="id").drop_duplicates(keep='first')
    # adata = anndata.AnnData(results_df[stat_columns].astype(int))
    # adata.obsm["spatial"] = results_df[spatial_cols].astype(int)
    adata.obsm["spatial"] = adata.obs[spatial_cols].astype(int)
    # adata.obsm["stats"] = results_df[stat_columns].astype(int)
    adata.obsm["stats"] = adata.obs[stat_columns].astype(int)
    # This column is the output of cell type inference pipeline
    # adata.obs["cell_type"] = results_df[["cell_type"]].astype("category")
    adata.obs["cell_type"] = adata.obs["cell_type"].astype("category")
    # adata.obs["patch_id"] = results_df[["chunk_name"]]
    adata.obs["patch_id"] = adata.obs["chunk_name"]
    adata.obs = adata.obs[["cell_type", "patch_id"]]
    return adata

"""

Which is works for me.

stc120121 · 2024-12-05T14:33:05Z

Thanks for catching this and suggesting a fix! We’ve implemented a very similar solution, and it’ll be rolled out soon with some other updates

inofechm mentioned this issue Dec 9, 2024

ValueError: could not broadcast input array from shape #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index problem in `run_enact()` #13

Index problem in `run_enact()` #13

Nick-Eagles commented Nov 27, 2024

XinchaoWu99 commented Dec 3, 2024 •

edited

Loading

stc120121 commented Dec 5, 2024

Index problem in run_enact() #13

Index problem in run_enact() #13

Comments

Nick-Eagles commented Nov 27, 2024

XinchaoWu99 commented Dec 3, 2024 • edited Loading

stc120121 commented Dec 5, 2024

Index problem in `run_enact()` #13

Index problem in `run_enact()` #13

XinchaoWu99 commented Dec 3, 2024 •

edited

Loading