Saturday, January 21, 2012

MVP's Vast Amount of Data

The amount of data to be collected through the Veterans Administration’s “Million Veteran Program” (MVP) is enormous. The MVP program launched earlier this year with 15,000 veterans enrolled is on target to build the world’s largest database of health and genetic information according to the VA publication “VA Research Currents”.

A team at the Boston VA Healthcare System has designed an ambitious project called the “Genomic Information System for Integrated Science” (GenISIS) to maintain the data. Backed by huge clusters of servers housed in two locations, the system links de-identified patient DNA samples and health information with a multitude of VA and non-VA databases, and links to a call and mail center that manages MVP enrollment, appointments, and gathers information.

Genetically speaking, each person’s cells carry within them some 3.2 billion bits of data since many pairs of nucleotides or chemical bases are in the human genome. This figure represents tens of thousands of protein-coding genes, plus large amounts of other DNA. The scientists are trying to determine the precise role for just one stretch of DNA versus another stretch of DNA. There are countless possible variants that could affect health, and scientists have yet to learn about most of them.

According to Leonard D’Avolio, PhD, Associate Director for Biomedical Informatics at VA’s Massachusetts Veterans Epidemiology Research and Information Center and Manager of the MVP Project, “Each patient has hundreds if not thousands of relevant pieces of information such as current and past medical conditions, laboratory data, information on prescriptions, family histories, lifestyle issues, plus environmental exposures.”

Some veterans who take part in MVP can have a VA EHR going back two decades. Multiply the billons of data points for each person by the million veterans that are expected to take part in MVP and the figure is in the quadrillions.

However, the larger the figure, the easier it is for meaningful patterns to emerge. With a study on 500 or 1,000 people, the association between a gene variant and a certain trait would have to be quite striking to catch the attention of the researcher.

“To detect meaningful patterns, researchers will need to analyze samples numbering in the tens of thousands. These connections are going to be discovered only by looking across many data points to make that possible”, said D’Avolio.

In the future, researchers will be able to access GenISIS remotely, ask questions, and then move the data with appropriate permissions into a secure environment that will be needed to deal with the huge amounts of storage required. The system through its nexus with various VA and non VA databases, could also gather specific data relevant to a researcher’s question even of the data is not retrieved and brought over to the MVP on a routine basis.

D’Avolio reports the first scientific contribution of MVP will likely deal with mental health illnesses which today affects some 170,000 veterans using VA care. The study is recruiting thousands of veterans who have schizophrenia or bipolar disorder. “It’s no small thing to get up to 10,000 patients with schizophrenia or bipolar disorder but then you have to match that with another 10,000 who don’t have either disease”, says D’Avolio.

For more details, go to www.research.va.gov/currents/dec11-jan12/dec-jan12-01.cfm.