Skip to content

Modified items

All recently modified items, latest first.
Graduate Courses
Academic activities
Courses taught by members of GRBIO:
Bioinformatics and “omics” data analysis
Alex Sánchez
One of the main goals of GRBIO is to enhance research of their members by assisting researchers with their statistical analysis. We can provide statistical expertise to the scientific community through consulting, teaching and contract services. Beyond the usual statistical consultancy service that can identify and apply the most appropriate statistical tools to solve a research problem we can extend the analysis service to deal with problems that lie beyond the boundaries of current statistical knowledge maybe due to the complexity of the data or the need for new statistical models. We can also provide training for any staff at any level from beginner to expert and covering any statistical topic from the applied end right through to cutting-edge methodological research.
Multivariate Analysis
Conxita Arenas
Exome Variant Analysis (EVA) Pipeline for NGS Data. Developed in R language, calling external programs for specific tasks. It can be run in parallel using many processors from the same machine, and can be easily extended to run in computer clusters (HPC). Still in its early stages, it's a work in progress that is being used already for production in some research centers.
Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package.The three models admit blocking factors to control for nuissance variables.To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.
Exploratory data analysis to assess the quality of a set of LC-MS/MS experiments, and visualize de influence of the involved factors.
R / Bioconductor Packages
Easy HeatMaps
A web tool to produce simple heatmaps for microarray data.
Web tools
The package implements methods to compare lists of genes based on comparing the corresponding 'functional profiles'.
ICGE is a user-friendly R package which provides many functions related to: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use.
Tests for Right and Interval-Censored Survival Data Based on the Fleming-Harrington Class.
The library bwsurvival is designed for situations in which there are two consecutive events of interest, E1 and E2, when the scientific goal is to infer on the time T2 until E2 given the time T1 till E1. The methodology is based on non-parametric estimation of the conditional survival function T2|T1 on a partition of different intervals of time of scientific interest (1 week, one quarter, one year, two years ,...) entered by the user. The proposed estimator takes into account the selection bias and the heterogeneity due to the dependent censorship, by using a weighted method on the observations of T1. The library allows the use of other weights defined by the user as well as the stratification of the survival function by a categorical variable. (Gómez, G. and Serrat, C. (2014) Correcting the bias due to dependent censoring of the survival estimator by conditioning. Statistics, 48 (2), 295 - 314).
Description: dcens is a library for the estimation of the survival function for doubly censored data. A double censorship scheme appears when in addition to the usual right censoring also left censoring exists. If T denotes the time of interest, its exact value is only observed when it is in the time window [L, R], where L < R are positive random variables. So, the observable sample is integrated by the pairs (U, d) where U = min{R, max{T, L}} and d an indicator variable (d = 0 for exact values, d = 1 for right censored values and d = -1 for left censored values). The library allows the non parametric and simultaneous estimation of the marginal survival functions ST, SL and SR, for T, L and R, respectively by using an inverse probability of censoring weighted procedure. (Julià, O. and Gómez, G. (2011) Simultaneous marginal survival estimators when doubly censored data is present. Lifetime Data Analysis, 17, 347-372).
A Statistical Web-Enabled Tool for the Identification of Significant Protein Coding Regions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global p-values called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initiomethods (where they are in the range of 70–75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms. (BioMed Research International Volume 2014, Article ID 282343, 10 pages).
A Package for Computation of Confidence Intervals for Variance Components of Mixed Models in R. varcompci computes these confidence intervals (Burdick and Graybill, 1992) for any balanced mixed effects saturated (main effect and interaction terms of all orders) analysis of variance (ANOVA) model (type III), involving five or fewer factors. The methods in this paper can also be applied to data with unbalanced design, but we have not evaluated the performance of these methods for unbalanced data.