You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following works in my python environment, return of pseudobulk data not required but we need some mechanism to output tables that are created through such code
`
import scanpy as sc
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.multitest import multipletests
def pseudobulk_differential_expression(adata, leiden_group, group1, group2):
# Subset the data to include only the cells from the specified Leiden cluster
adata_subset = adata[adata.obs['leiden'].isin(leiden_group)].copy()
# Create a new column combining 'group' and 'rep' for aggregation
adata_subset.obs['group_rep'] = adata_subset.obs['group'].astype(str) + '_' + adata_subset.obs['rep'].astype(str)
# Aggregate to create pseudobulk profiles
pseudobulk = adata_subset.to_df().groupby(adata_subset.obs['group_rep']).mean()
# Get unique combinations of group and rep
group_reps = adata_subset.obs['group_rep'].unique()
# Split the pseudobulk profiles by the biological groups
pseudobulk_group1 = pseudobulk.loc[[gr for gr in group_reps if group1 in gr]]
pseudobulk_group2 = pseudobulk.loc[[gr for gr in group_reps if group2 in gr]]
# Perform t-tests for each gene
p_values = []
fold_changes = []
for gene in adata.var_names:
# Calculate fold change
fold_change = pseudobulk_group1[gene].mean() - pseudobulk_group2[gene].mean()
fold_changes.append(fold_change)
# Perform t-test
t_stat, p_val = ttest_ind(pseudobulk_group1[gene], pseudobulk_group2[gene], equal_var=False)
p_values.append(p_val)
# Adjust p-values for multiple testing using BH correction
bh_corrected_pvals = multipletests(p_values, method='fdr_bh')[1] # what is a less conservative correction?
# Create a DataFrame with the results
results_df = pd.DataFrame({
'gene': adata.var_names,
'fold_change': fold_changes,
'p_value': p_values,
'bh_corrected_p_value': bh_corrected_pvals
})
# Return the results DataFrame
return results_df.sort_values('bh_corrected_p_value'), pd.concat([pseudobulk_group1, pseudobulk_group2] )
`
The text was updated successfully, but these errors were encountered:
The following works in my python environment, return of pseudobulk data not required but we need some mechanism to output tables that are created through such code
`
import scanpy as sc
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.multitest import multipletests
def pseudobulk_differential_expression(adata, leiden_group, group1, group2):
# Subset the data to include only the cells from the specified Leiden cluster
adata_subset = adata[adata.obs['leiden'].isin(leiden_group)].copy()
`
The text was updated successfully, but these errors were encountered: