GImap logo    Documentation

Analysis of the primary screen

Example command line:

In [1]: run crop_reads.py ../sequencing/Sample.fastq ../cropped/Sample.fa 22

Example command line:

In [2]: run align_primary.py ../cropped/Sample.fa Sample LibraryName

The script generates several output files, most importantly a table of the frequencies with which each individual shRNA was detected in the sequencing data. In the above example, this file will be named SampleLibraryName.counts, located in the subdirectory counts. Once this counts file was successfully generated, files in the directories cropped and maps can be removed.


Example command line:

In [3]: run analyze_primary_screen.py ../results/Example ../counts/Example_Primary_t0.counts ../counts/Example_Primary_untreated.counts ../counts/Example_Primary_treated.counts ../growth/Example_Primary_GK.txt

The script generates output on the terminal screen to summarize statistics of the input data. Example screen output:

Total T0 counts: 15709968
Number of different shRNAs in the T0 sample: 54530
Total untreated counts: 19386703
Number of different shRNAs in the untreated sample: 54630
Total Treated counts: 19907742
Number of different shRNAs in the treated sample: 54252
221 shRNA species found in the untreated sample but not the t0 sample.
121 shRNA species found in the t0 sample but not the untreated sample.
151 shRNA species found in the treated sample but not the t0 sample.
429 shRNA species found in the t0 sample but not the treated sample.
34 shRNA species found in the treated but not the untreated sample.
104 shRNA species found in the untreated but not the treated sample.

The script generates the following output files:
•    Entrez Gene ID
•    Official Gene Symbol
•    Gene information
•    Number of shRNAs for which gammas could be calculated
•    Gene scored as (P)rotective (i.e. shRNAs were enriched) or (S)ensitizing (i.e. shRNAs were depleted) based on Gamma phenotype
•    P value for gene based on gamma phenotype, calculated using Mann-Whitney U test
•    P value for gene based on gamma phenotype, calculated using Kolmogorov-Smirnov test
•    Number of shRNAs for which taus could be calculated
•    Gene scored as (P)rotective (i.e. shRNAs were enriched) or (S)ensitizing (i.e. shRNAs were depleted) based on tau phenotype
•    P value for gene based on tau phenotype, calculated using Mann-Whitney U test
•    P value for gene based on tau phenotype, calculated using Kolmogorov-Smirnov test
•    Number of shRNAs for which rhos could be calculated
•    Gene scored as (P)rotective (i.e. shRNAs were enriched) or (S)ensitizing (i.e. shRNAs were depleted) based on rho phenotype
•    P value for gene based on rho phenotype, calculated using Mann-Whitney U test
•    P value for gene based on rho phenotype, calculated using Kolmogorov-Smirnov test

This file is best opened in a program like Microsoft Excel to allow sorting of genes by P value.

We have provided mathematical definitions of the quantitative phenotypes (gamma, tau and rho) and of the approach for calculating P values in the following publication: Kampmann M, Bassik MC, Weissman JS (2013) Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. PNAS 110(25):E2317-26. Click here for free article.


The script also generates the following graphical output in individual pop-up windows:

These graphs are saved as graphics file with the prefix specified on the command line. They can also be explored interactively, e.g. by zooming into areas of the P value scatter plots.

After running the script, the following interactive functions can be called on the command line to explore the data interactively: gene_gamma(), gene_tau(), gene_rho(). The functions accept the following arguments:
For example, the following command:

In [4]: gene_rho('VPS54')

generates the following graph:

analyze_primary_screen.py Screenshot
Each dot represents an shRNA. X-axis: log2-transformed deep-sequencing counts of cells expressing this shRNA in the untreated population at the endpoint of the experiment. Y-axis: log2-transformed deep-sequencing counts of cells expressing this shRNA in the treated population at the endpoint of the experiment. Grey semi-transparent dots represent negative control shRNAs. Solidly colored dots represent shRNAs targeting the gene VPS54; they are colored according to a heatmap representing the resistance phenotype (rho).