Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom node to do pseudobulk #116

Open
jessegmeyerlab opened this issue Apr 22, 2024 · 0 comments
Open

add custom node to do pseudobulk #116

jessegmeyerlab opened this issue Apr 22, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@jessegmeyerlab
Copy link
Member

The following works in my python environment, return of pseudobulk data not required but we need some mechanism to output tables that are created through such code

`
import scanpy as sc
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.multitest import multipletests

def pseudobulk_differential_expression(adata, leiden_group, group1, group2):
# Subset the data to include only the cells from the specified Leiden cluster
adata_subset = adata[adata.obs['leiden'].isin(leiden_group)].copy()

# Create a new column combining 'group' and 'rep' for aggregation  
adata_subset.obs['group_rep'] = adata_subset.obs['group'].astype(str) + '_' + adata_subset.obs['rep'].astype(str)  
  
# Aggregate to create pseudobulk profiles  
pseudobulk = adata_subset.to_df().groupby(adata_subset.obs['group_rep']).mean()  
  
# Get unique combinations of group and rep  
group_reps = adata_subset.obs['group_rep'].unique()  
  
# Split the pseudobulk profiles by the biological groups  
pseudobulk_group1 = pseudobulk.loc[[gr for gr in group_reps if group1 in gr]]  
pseudobulk_group2 = pseudobulk.loc[[gr for gr in group_reps if group2 in gr]]  

# Perform t-tests for each gene  
p_values = []  
fold_changes = []  
for gene in adata.var_names:  
    # Calculate fold change  
    fold_change = pseudobulk_group1[gene].mean() - pseudobulk_group2[gene].mean()  
    fold_changes.append(fold_change)  
      
    # Perform t-test  
    t_stat, p_val = ttest_ind(pseudobulk_group1[gene], pseudobulk_group2[gene], equal_var=False)  
    p_values.append(p_val)  
  
# Adjust p-values for multiple testing using BH correction  
bh_corrected_pvals = multipletests(p_values, method='fdr_bh')[1]  # what is a less conservative correction?

# Create a DataFrame with the results  
results_df = pd.DataFrame({  
    'gene': adata.var_names,  
    'fold_change': fold_changes,  
    'p_value': p_values,  
    'bh_corrected_p_value': bh_corrected_pvals  
})  
  
# Return the results DataFrame  
return results_df.sort_values('bh_corrected_p_value'), pd.concat([pseudobulk_group1, pseudobulk_group2] )

`

@AlexandreHutton AlexandreHutton added enhancement New feature or request good first issue Good for newcomers labels Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants