- Welcome!
- Data Setup
- Counts Overview
- Extract Results
- Summary Plots
- Gene Finder
- Functional Analysis
- Signatures Explorer
- Report Editor
- About
Welcome to ideal
!
ideal
is a Bioconductor package containing a Shiny application for analyzing RNA-Seq data in the context of differential expression. This enables an interactive and at the same time analysis, keeping the functionality accessible, and yet providing a comprehensive selection of graphs and tables to mine the dataset at hand.
ideal
is an R package which fully leverages the infrastructure of the Bioconductor project in order to deliver an interactive yet reproducible analysis for the detection of differentially expressed genes in RNA-Seq datasets. Graphs, tables, and interactive HTML reports can be readily exported and shared across collaborators. The dynamic user interface displays a broad level of content and information, subdivided by thematic tasks. All in all, it aims to enforce a proper analysis, by reaching out both life scientists and experienced bioinformaticians, and also fosters the communication between the two sides, offering robust statistical methods and high standard of accessible documentation.
It is structured in a similar way to the pcaExplorer
, also designed as an interactive companion tool for RNA-seq analysis focused rather on the exploratory data analysis e.g. using principal components analysis as a main tool.
The interactive/reactive design of the app, with a dynamically generated user interface makes it easy and immediate to apply the gold standard methods (in the current implementation, based on DESeq2
) in a way that is information-rich and accessible also to the bench biologist, while also providing additional insight also for the experienced data analyst. Reproducibility is supported via state saving and automated report generation.
ideal
101: quick start for effective usage
If you see a grey box like this one open below...
… you can click on the Help
text to collapse it or open it.
Its content will be a text introduction for each section/tab.
You can close this help box now and take a quick tour of the app - see the button below!
... you can click on that to start a tour based on introJS
Setup your data for the analysis
Data Setup
Upload a dataset
Use the widgets below in Step 1 to upload a text tab/comma/semicolon/space delimited data files. You will need two pieces of information, namely
- the count matrix (as output e.g. from featureCounts or HTSeq-count)
- the experimental design matrix/data frame
In both cases, the first row is assumed to contain headers and the first column contains the feature names (i.e. gene IDs/names).
You can check how the data look like in the collapsible elements.
Select the DE design
Select the experimental factors that you want to account for when setting up your comparison of interest.
In the easiest case, this field will be one of the columns you provided in the experimental design matrix
Optional steps
You can additionally
- create an annotation object - recommended, as it will enhance your experience throughout the app runs
- remove samples - if you think some have to be deemed as outliers (please consider using
pcaExplorer
for this purpose)
Run the DESeq2 pipeline
Once everything is set, you can just click on the run button. This operation might take a while depending on the dataset size, but will compute most of the components you will require for the further analyses.
You can also inspect a diagnostic mean-dispersion plot for the current dataset in the collapsible element.
Once you are done with this help box, you can close it by clicking on the Help
text.
Step 1
Upload your count matrix and the info on the experimental design
... or you can also
Count matrix preview
Experimental design preview
Get an overview on your data
Counts Overview
Here you can view and explore your expression data. You can view your data as
- raw counts
- normalized counts
- log2-transformed normalized counts
Powered by the DT package, the data tables are all interactive and searchable.
To get a quick look at how many genes are (robustly) detected, you can change the threshold levels in the criteria below, and if wished, subset the initial dataset (and proceed back to the Data Setup panel, if you want to rerun the DESeq analysis).
You can also display a pairwise scatter plot for all samples and all genes. If you are interested in deeper exploration of your data, please consider pcaExplorer
.
Once you are done with this help box, you can close it by clicking on the Help
text.
Basic summary for the counts
Number of uniquely aligned reads assigned to each sample
According to the selected filtering criteria, this is an overview on the provided count data
Sample to sample scatter plots
Compute sample to sample correlations on the normalized counts - warning, it can take a while to plot all points (depending mostly on the number of samples you provided).
You did not create the dds object yet. Please go the main tab and generate it
Extract and inspect the DE results
Extract Results
Here you can generate and quickly inspect the results with respect to the experimental factor of interest.
You can perform
- classical contrast comparisons between two groups
- ANOVA-like comparisons, if you have three or more levels in your experimental factor
Results are sorted by the ascending values in the adjusted p-value column. You can also sort by logFC if you are interested. Still, please rely on the adjusted p-value to call a gene differentially expressed - the (shrunken) log fold change is rather an indicator of the effect size.
Click on the gene names and/or IDs to reach their page in the NCBI/ENSEMBL databases. Histograms of p-values and log fold changes are displayed for diagnostic purposes.
Once you are done with this help box, you can close it by clicking on the Help
text.
Diagnostic plots
You did not create the dds object yet. Please go the main tab and generate it
Interactive graphical exploration of the results
Summary Plots
Here you can have a bird’s view on all your genes with
- MA plots (logFC versus mean expression)
- volcano plots (-log10(pvalue) versus logFC)
You can click on the MA plot, and when the zoomed section appears, click on any gene to obtain a boxplot/dot plot of this feature. Groups will be built according to the factor(s) selected in the Group/color by
input - this is automatically populated by using the contrast selected when building the results. An infobox for that genes in also shown, with information retrieved from the NCBI website.
For the genes you selected by brushing, you can have a static and a dynamic heatmap (plus the underlying data) available below.
Once you are done with this help box, you can close it by clicking on the Help
text.
MA plot - Interactive!
Gene infobox
Volcano plot
Brushed table
You did not create the result object yet. Please go the dedicated tab and generate it
Find your gene(s) of interest
Gene Finder
Do you have a gene or a handful of them you would like to inspect in detail? The Gene Finder does it for you.
Select the genes of interest in the widget in the sidebar on the left. Up to four will be displayed in boxplots/dot plots.
All of them will be annotated on the MA plot below, and their subset of data regarding normalized counts and DE testing results are merged and shown in a DataTable.
If you have a list of genes, and prefer uploading it instead of typing, you can also do that - scroll down in the page.
Once you are done with this help box, you can close it by clicking on the Help
text.
You did not create the dds object yet. Please go the main tab and generate it
Find functions enriched in gene sets
Functional Analysis
Do you need something more than just a list of genes? Looking for insight on the affected biological pathways? Then you can use the modules in the Functional Analysis tab. You can do Gene Ontology overrepresentation analysis based on
limma::goana
topGO
(recommended for the somewhat clearer presentation of results, thanks to the algorithms in the topGO package)goseq
You can analyse
- all regulated genes, up- and down- regulated
- up-regulated genes alone
- down-regulated genes only
- two custom lists, that can be uploaded as text files
Once you obtain the interactive table of functions enriched in the gene set of interest, you can click on any row of the DataTable, and this will display a heatmap with the expression values for the genes annotated to that GO Term - as a kind of signature for that function in your data.
You can explore the overlap of the lists as Venn diagrams as well as Upset plots.
Once you are done with this help box, you can close it by clicking on the Help
text.
Intersection of gene sets
You did not create the result object yet. Please go the dedicated tab and generate it
Signatures Explorer
This tab allows you to check the behavior of a number of provided gene signatures in your data at hand, displaying this as a heatmap.
This panel is composed by different well panels:
-
in the Setup Options, you can select and upload a gene signature file, in
gmt
format (e.g. like the ones provided in the MSigDB database, or from WikiPathways), and quickly compute the variance stabilized transformed version of your data, which is more amenable for visualization than raw or normalized counts -
in the Conversion options tab, you can create an annotation vector, used to bring the ids from your data and the ids the
gmt
used for encoding the signature elements. This works based on theorg.XX.eg.db
packages. -
the lower well panels control the appearance of the heatmap, also with an option to display all genes annotated in that pathway, or only the ones detected as differentially expressed (for this you need to provide or compute the result object)
Setup options
Conversion options
You did not create the dds object yet. Please go the main tab and generate it
Create, view and export a report of your analysis
Report Editor
Here is where you can generate, preview, and export your interactive HTML report on the dataset you submitted.
If you are not familiar with R, you can leave the editor unchanged - a template report can do already a good job for you!
But if you prefer to edit any code chunk or add new ones, you are free to do it. The shinyAce-powered text editor supports autocompletion to make this easier.
General options for markdown are found in the collapsible box on the left - these control also the appearance of the code chunks and their output in the report.
Once you are done with this help box, you can close it by clicking on the Help
text.
Markdown options
Editor options
About ideal
ideal
is a Bioconductor package containing a Shiny application for interactively analyzing RNA-seq expression data, by interactive exploration of the results of a Differential Expression analysis.
Thanks to its interactive/reactive design, it is designed to become a practical companion to any RNA-seq dataset analysis, making downstream and exploratory data analysis accessible also to the bench biologist, while providing additional quick insight also for the experienced data analyst.
ideal
was developed by Federico Marini in the Bioinformatics Division led by Harald Binder at the IMBEI (Institut für Medizinische Biometrie, Epidemiologie und Informatik) in the University Medical Center of the Johannes Gutenberg University Mainz.
Developers
Code
All code for ideal
is available on
GitHub.
Citation info
If you use ideal
for your analysis, please cite it as here below:
citation("ideal")
##
## To cite package 'ideal' in publications use:
##
## Federico Marini (2017). ideal: Interactive Differential
## Expression AnaLysis. R package version 0.6.2.
## https://github.com/federicomarini/ideal
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {ideal: Interactive Differential Expression AnaLysis},
## author = {Federico Marini},
## year = {2017},
## note = {R package version 0.6.2},
## url = {https://github.com/federicomarini/ideal},
## }