GImap logo    Documentation

Analysis of genetic interactions

Example command line:

In [1]: run crop_reads.py ../sequencing/DoubleshRNASample.fastq ../cropped/ DoubleshRNASample.fa 26

Example command line:

In [2]: run align_double_shRNAs.py ../cropped/DoubleshRNASample.fa DoubleshRNASample

The script generates several output files, most importantly a table of the frequencies with which each double-shRNA was detected in the sequencing data. In the above example, this file will be named DoubleshRNASample.counts, located in the subdirectory counts. Once this counts file was successfully generated, files in the directory cropped can be removed.


Example:

In [3]: run double_shRNA_phenotypes.py ../results/Rep1
File with list of plasmids in the first position (XbaI/KpnI backbone): ../LUT/Double_shRNA_plasmids.txt
File with list of plasmids in the second position (AvrII/KpnI insert): ../LUT/Double_shRNA_plasmids.txt
Use t0 counts? (y/n): y
Name of the t0 counts file: ../counts/Rep1_t0.counts
Name of the untreated counts file: ../counts/Rep1_untreated.counts
Use treated counts? (y/n): y
Name of the treated counts file: ../counts/Rep1_treated.counts
Name of GK file: ../growth/Rep1_GK.txt
Minimal fraction of combinations required to include shRNA (default: 0.5): 0.5
Count ratio 0.869394877872
WT Log2E -1.1717825642
Expected in position 1 but not sufficiently frequent in rhos:
5861_RAB1A_16
Expected in position 2 but not sufficiently frequent in rhos:

Count ratio 1.21154456258
WT Log2E 0.428613912177
Expected in position 1 but not sufficiently frequent in gammas:
5861_RAB1A_16
Expected in position 2 but not sufficiently frequent in gammas:

Count ratio 1.05331063702
WT Log2E -0.742951508028
Expected in position 1 but not sufficiently frequent in taus:
5861_RAB1A_16
Expected in position 2 but not sufficiently frequent in taus:

The script generates several graphs. These graphs analyze the phenotypes of pairs of shRNAs, A and B in the AB and the BA orientations. AB and BA phenotypes should be very similar; large differences between AB and BA are therefore mostly reflective of experimental noise. The distribution of AB–BA differences in the entire dataset is visualized by a histogram:
double_shRNA_phenotypes.py Screenshot

Two-dimensional histograms (based on hexagonally binned heatmaps) explore the dependence of noise on phenotypic strength and abundance of double shRNAs at time t0:
double_shRNA_phenotypes.py Screenshot
double_shRNA_phenotypes.py Screenshot
Less abundant double-shRNAs typically give noisier phenotypes, indicating that data quality can be further improved by scaling up the cell population for the pooled screen.

A further internal quality control is the comparison of phenotypes of double-shRNAs containing one shRNA targeting a hit gene and one negative control shRNA in the two possible orientations:
double_shRNA_phenotypes.py Screenshot
A small systematic deviation is observed since shRNAs in the second position of the double-shRNAs are slightly more effective.

The script also generates several output files, which are used by scripts in subsequent steps.


Example:

In [4]: run calculate_GIs.py ../results/Rep1
Use rhos? (y/n): y
Use gammas? (y/n): n
Use taus? (y/n): n

The script generates several graphs in pop-up windows:
calculate_GIs.py Screenshot
Comparison of different definitions for the expected double-mutant phenotype. Orange line: Sum definition, green line: Product definition, blue: empirical linear fit for each shRNA (dark blue dots indicate the average phenotypes and slopes for shRNAs targeting a given gene, light blue error bars indicate standard deviation)
calculate_GIs.py Screenshot

Comparison of slopes for empirical linear fits for shRNAs obtained either along “rows” or “columns” of the double-shRNA phenotype table – both should be identical. This graph can reveal noise or systematic biases.

After running the script, several interactive functions can be called on the command line to explore the data interactively:
For example, the following command:

In [5]: rho_sd('SEC23B_4', LabelAboveZ=2.5,Link=True)

generates the following graph:
calculate_GIs.py Screenshot
Each dot represents an shRNA. X axis: average phenotype (rho) of double-shRNAs containing this shRNA and one of 12 negative control shRNAs. Y axis: phenotype (rho) of the double-shRNA containing the same shRNA in combination with shRNA SEC23B_4. Blue dots: SEC23B_4 is in the first position of the double shRNA, red dots: SEC23B_4 is in the second position of the double shRNA. Dotted blue and red lines indicate phenotypes of SEC23B_4 in combination with negative control shRNAs. If the shRNA combined with SEC23B_4 had no effect, dots should lie on the dotted lines. Solid blue and red lines represent linear fits of the data points. These lines were forced to cross the dotted lines at X = 0. Permutated pairs of shRNAs are linked by lines (for double shRNAs deviating from the expected double-shRNA phenotype as defined by the solid red and blue lines).

For example, the following command:

In [6]: gene_rho_sd('SEC23B', LabelAboveZ = 1.5, Link = True, PlotSD = True)

generates the following graph:
calculate_GIs.py Screenshot
Same as the previous graph, except that values for shRNAs targeting the same gene are averaged. Error bars represent the standard deviation.


For example, the following command:
   
In [7]: ab_rho_sd_yellow_blue('SEC23B_4',ShowLabels=True)

generates the following graph:
calculate_GIs.py Screenshot
Each dot represents an shRNA. X axis: average phenotype (rho) of double-shRNAs containing this shRNA and one of 12 negative control shRNAs. Y axis: phenotype (rho) of the double-shRNA containing the same shRNA in combination with shRNA SEC23B_4. Values for shRNAs with SEC23B_4 in the first and second position of the shRNA were averaged. Additionally, expected double-shRNA phenotypes according to three different definitions are indicated by solid lines: red, linear fit orange, sum definition; pink, product definition.

The script also generates several output files, which are used by scripts in subsequent steps. These include tables of genetic interactions calculated according to the different definitions we described previously in Kampmann M, Bassik MC, Weissman JS (2013) Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. PNAS 110(25):E2317-26. Click here for free article.

If none of these definitions are a good fit for a given experiment, users can implement additional custom definitions by editing the script calculate_GIs.py.

Repeat calculate_GIs.py for the second replicate of the screen.

The phenotypes (rhos, gammas, taus) for which genetic interactions should be compared are specified interactively. Example:

In [8]: run compare_GIs.py ../results/Rep1 ../results/Rep2
Use rhos? (y/n): y
Use gammas? (y/n): n
Use taus? (y/n): n

The script generates several output files that tabulate genetic interactions averaged between the two replicates. Different methods for calculating genetic interactions are indicated by the following code, which appears in the corresponding file names: PhenotypeExpectedFlip. Phenotype can be Rho, Gamma or Tau. Expected can be p (for product definition of expected double-shRNA phenotypes), s (for sum definition of expected double-shRNA phenotypes), or f (for empirical fit definition of expected double-shRNA phenotypes). The code for Flip can be absent for raw genetic interactions, flip1 for Definition 1 of buffering/synergistic genetic interactions, or flip2 for Definition 2 of buffering/synergistic genetic interactions. We have previously defined these different methods to calculate genetic interactions quantitatively in Kampmann M, Bassik MC, Weissman JS (2013) Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells. PNAS 110(25):E2317-26. Click here for free article.

For example, the above command line will create, among others, the following output file: ../results/Rep1Rep2_Rhofflip2GI.txt.

The script also generates several graphs as pop-up windows:

Two-dimensional histograms (based on hexagonally binned heatmaps) visualize the correlation of genetic interactions between shRNA pairs across the two replicates:
compare_GIs.py Screenshot

Correlation coefficients for genetic interactions across the two replicates are summarized in a bar graph for the different methods for calculating genetic interactions:
compare_GIs.py Screenshot
Mean genetic interaction values are summarized in a bar graph for genetic interactions calculated using the different methods and averaged between replicates:
compare_GIs.py Screenshot

Comparison of the means of Pearson correlation between the genetic interaction patterns of shRNAs targeting the same gene (“Intra-gene correlation”) or different genes (“Inter-Gene correlation”), for genetic interactions calculated using the different methods and averaged between replicates:
compare_GIs.py Screenshot

For an appropriate definition of genetic interactions, the mean intra-gene correlation should be much larger than the mean inter-gene correlation.



Example command line:

In [9]: run filter_GIs.py ../results/Rep1Rep2_RhofGI 0.8

The script lists the shRNAs that were removed from the dataset as screen output, for example:

Rejected shRNAs: ['C17orf75_3', 'CCT7_2', 'CCT7_3', 'COPB1_1', 'COPB1_2', 'HMGCR_4', 'HMGCR_7', 'HMGCR_8', 'IGF2R_1', 'IGF2R_2', 'PDIA3_1', 'PDIA3_3', 'SRRM1_2', 'SRRM1_3']

Two filtered genetic interaction files are generated, one with the shRNA-based genetic interactions (e.g. ../results/Rep1Rep2_KfGI_minZ0.8.txt), and one with gene-averaged genetic interactions, for which genetic interactions of shRNAs targeting the same gene are averaged (e.g. ../results/Rep1Rep2_KfGI_minZ0.8_GeneGI.txt).

A heatmap in a popup window visualizes the frequency of different outcomes for the shRNAs targeting a given gene:
filter_GIs.py Screenshot
Only genes targeted by at least two correlating shRNAs are included in the gene-based genetic interaction map.