Friday, January 28, 2011

Edit check - a critical step to ensure the data quality during clincial trials

In clinical trial, one critical task is to ensure that the data collected or data entered into the system / database is valid, correct, and logically sound. This task requires a data quality plan starting from designing a good study protocol -> developing efficient case report forms -> providing clear instructions for completing case report forms -> implementing electronic edit checks -> monitoring the study data / source data verification -> data clarification process -> data review process. One of the steps is to implement the electronic edit checks.
Edit check is a program instruction or subroutine that tests the validity of input in a data entry program. According to the CDISC clinical research glossary from Applied Clinical Trials, the edit check is defined as:

An auditable process, usually automated, of assessing the content of a data field against its expected logical, format, range, or other properties that is intended to reduce error. NOTE: Time-of-entry edit checks are a type of edit check that is run (executed) at the time data are first captured or transcribed to an electronic device at the time entry is completed of each field or group of fields on a form. Back-end edit checks are a type that is run against data that has been entered or captured electronically and has also been received by a centralized data store.

Electronic edit checks allow us to use the power of the computer to check for illogical, incomplete or inconsistent data. In clinical trial, one of the most important tasks facing clinical data management personnel is to produce the electronic Edit Checks specifications for a study. Developing the electronic edit check specification -- and processing the queries that result from them -- is arguably the most vital and time-consuming data cleaning activity data management personnel undertakes. The study statistician should always participate in the process of developing the electronic edit checks to ensure that the critical edit checks are included. Effectively implementing the edit check can prevent the illogical, incomplete, or inconsistent data from entering into the data capture system or data set, which will make the downstream data analyses much easier.

There are two types of edit checks:

Univariate edit checks (include range checks): these are the edit checks only applicable to a single field or single variable. For example, for subject weight, we can set up an edit check to ensure that the extreme or unlikely value not to be entered. Let’s say we set up a range check if a data entry is smaller than 90 lb or greater than 300 lb. For lung function test, we may set up an edit check for predicted FEV1 to be no less than 20% because it is unlikely to have someone with predicted FEV1 <20%. The univariate edit checks are usually run instantly during the time of data entry.

Multivariate edit checks (also called aggregate edit checks): these are the edit checks with more than one fields or variables involved. These edit checks cross check the entries across multiple fields / variables to ensure the data is logical and consistency. For example, if the entry on Gender field is ‘Male’, there should not be data for pregnancy test result field. If the reason for subject dropping out the study is entered as ‘adverse events’, there should be a corresponding entry in AE data set. Statistician can provide great inputs in identifying the multivariable edit checks. Some multivariate edit checks could involve the complicated algorithm and take considerable time to run. In this situation, the multivariable edit checks can be run at back-end at a specified interval (for example, 2 am at night).

One misunderstanding is to think that all data issues can be resolved by implementing the edit checks. Edit check is only one of the steps in the data cleaning process. Also, there should be balance in terms of the number of edit checks. Too many edit checks for non-critical fields could be very annoying for people who enter the data. This is especially true for clinical trials using electronic data capture (EDC) where the data entry responsibility is delegated to the investigator and study coordinators who may lose patient if there are too many pop-up messages during the data entry. For example, if the telephone number needs to be entered, an edit check to enforce the data entry to follow xxx-xxx-xxxx would be unnecessary (xxxxxxxxxx and 1xxxxxxxxxx should also be accepted) – this is an example I see in some of the web forms – very annoying).

Sunday, January 23, 2011

Regulatory Guidance on Source Data in EDC Trials

When we move toward the clinical studies using electronic data capture, the ‘source data’ or ‘source document’ has been an issue. Unlike the paper-CRF (case report form) based study, the source data in EDC study can be confusing and sometimes vague. If the data was directly entered into EDC system, the EDC system is the direct source and there is no another source to be verified against. This could be worrisome to some people. In a 2008 article, I talked about this issue.
Recently, both FDA and EMEA published the guidance on this issue. FDA’s guidance "Electronic Source Documentation in Clinical Investigations" was issued in December, 2010. EMEA issued its guidance last June and the guidance titled “Reflection paper on expectations for electronic source data and data transcribed to electronic data collection tools in clinical trials”.

The guidance titles seem to suggest that they are written for the data management functions, however, the discussions in these two guidelines are more relevant to the clinical sites and study monitors. Switching the clinical study from paper CRF to EDC is not just about the shift of the data entry from data management group to the clinical sites, it actually has impact on how the entire study is operated.

Tuesday, January 11, 2011

FDA's New Website for Industry

Have you noticed the changes in the design of FDA website (http://www.fda.gov/) recently? Last August, I mentioned the FDA's initiatives on transparency. As part of FDA's continued push to increase transparency in an agency once notorious for making decisions behind closed doors, the FDA has launched a new Web-based resource that industry can use to keep abreast of the regulatory status for drugs, devices, food, and cosmetics. The new website is under http://www.fda.gov/ForIndustry/ and is supposed to provide a repository for industries to understand FDA's detail processes in submission, reviewing, approval, and surveillance of the regulated products, and even the processes for complaints (dispute resolution). The website includes the sections that are very pertinent to us working in the pharmaceutical industry:
  • Developing products for rare disease and conditions
  • Dispute resolution
  • Guidance documents
  • FDA eSubmitter
  • Data standards
  • FDA basics for industry
FDA basics for industry includes the kind of basic information about the regulatory process that is often requested by drug, device, and biologic companies and is aimed at improving communication between FDA and industry by making basic information about the regulatory process more accessible to industry in a user-friendly format.

The new website reflects the great improvement towards the transparency and is a great resource for professionals working in the drug development industry.

Also see:

Sunday, January 02, 2011

Agreement Statistics and Kappa

In clinical trial and medical research, we often have a situation where two different measures/assessments are performed on the same sample, same patient, same image,… the agreement needs to be calculated as a summary statistics. Depending on whether or not the measurement is continuous or categorical, the agreement statistics could be different. Lin L had a very nice overview for agreement statistics.

Specifically for categorical assessment, there are many examples where the agreement statistics is needed. In a clinical trial with imaging assessment, the same image (for example, CT Scan, arteriogram,…) can be read by different readers. For disease diagnosis, a new diagnostic tool (with advantage of less invasive or easier to implement) could be compared to an established diagnostic tool… Typically, the outcome measure is dichotomous (e.g., disease vs no disease, positive vs. negative…).

The choice of the methods of comparison is influenced by the existence and/or practical applicability of a reference standard (golden standard). If a reference standard (golden standard) is available, we can estimate sensitivity and specificity – ROC (receiver operation characteristics) analysis. If a reference standard is not available or there is no golden standard for comparison, we can not perform ROC analysis. Instead, we can assess the agreement and calculate the Kappa. This has been discussed in detail in FDA’s Guidance for Industry and FDA Staff “Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests”. For example, for comparing the assessment from two different readers, we would calculate Kappa, overall percent agreement, positive percent agreement, and negative percent agreement. We would not use ROC statistics and would not calculate the sensitivity and specificity.
If we would like to assess the agreement between the urine pregnancy test and the serum pregnancy test, we could use the ROC and calculate the sensitivity, specificity, positive predictive value, and negative predictive value since the serum pregnancy test could be considered as a reference standard or golden standard for pregnancy test.

Kappa Statistic(K) is a measure of agreement between two sources, which is measured on a binary scale (i.e. condition present/absent). K statistic can take values between 0 and 1.
  • Poor agreement : K < 0.20
  • Fair agreement : K = 0.20 to 0.39
  • Moderate agreement : K = 0.40 to 0.59
  • Good agreement : K = 0.60 to 0.79
  • Very good agreement : K =0.80 to 1.00
A good review article about Kappa Statistics is the one written by Karemer et al “Kappa Statistics in Medical Research”.

SAS procedures can calculate Kappa Statistics easily. Here is a list of papers: