In clinical trial, one critical task is to ensure that the data collected or data entered into the system / database is valid, correct, and logically sound. This task requires a data quality plan starting from designing a good study protocol -> developing efficient case report forms -> providing clear instructions for completing case report forms -> implementing electronic edit checks -> monitoring the study data / source data verification -> data clarification process -> data review process. One of the steps is to implement the electronic edit checks.
Edit check is a program instruction or subroutine that tests the validity of input in a data entry program. According to the CDISC clinical research glossary from Applied Clinical Trials, the edit check is defined as:
An auditable process, usually automated, of assessing the content of a data field against its expected logical, format, range, or other properties that is intended to reduce error. NOTE: Time-of-entry edit checks are a type of edit check that is run (executed) at the time data are first captured or transcribed to an electronic device at the time entry is completed of each field or group of fields on a form. Back-end edit checks are a type that is run against data that has been entered or captured electronically and has also been received by a centralized data store.
Electronic edit checks allow us to use the power of the computer to check for illogical, incomplete or inconsistent data. In clinical trial, one of the most important tasks facing clinical data management personnel is to produce the electronic Edit Checks specifications for a study. Developing the electronic edit check specification -- and processing the queries that result from them -- is arguably the most vital and time-consuming data cleaning activity data management personnel undertakes. The study statistician should always participate in the process of developing the electronic edit checks to ensure that the critical edit checks are included. Effectively implementing the edit check can prevent the illogical, incomplete, or inconsistent data from entering into the data capture system or data set, which will make the downstream data analyses much easier.
There are two types of edit checks:
Univariate edit checks (include range checks): these are the edit checks only applicable to a single field or single variable. For example, for subject weight, we can set up an edit check to ensure that the extreme or unlikely value not to be entered. Let’s say we set up a range check if a data entry is smaller than 90 lb or greater than 300 lb. For lung function test, we may set up an edit check for predicted FEV1 to be no less than 20% because it is unlikely to have someone with predicted FEV1 <20%. The univariate edit checks are usually run instantly during the time of data entry.
Multivariate edit checks (also called aggregate edit checks): these are the edit checks with more than one fields or variables involved. These edit checks cross check the entries across multiple fields / variables to ensure the data is logical and consistency. For example, if the entry on Gender field is ‘Male’, there should not be data for pregnancy test result field. If the reason for subject dropping out the study is entered as ‘adverse events’, there should be a corresponding entry in AE data set. Statistician can provide great inputs in identifying the multivariable edit checks. Some multivariate edit checks could involve the complicated algorithm and take considerable time to run. In this situation, the multivariable edit checks can be run at back-end at a specified interval (for example, 2 am at night).
One misunderstanding is to think that all data issues can be resolved by implementing the edit checks. Edit check is only one of the steps in the data cleaning process. Also, there should be balance in terms of the number of edit checks. Too many edit checks for non-critical fields could be very annoying for people who enter the data. This is especially true for clinical trials using electronic data capture (EDC) where the data entry responsibility is delegated to the investigator and study coordinators who may lose patient if there are too many pop-up messages during the data entry. For example, if the telephone number needs to be entered, an edit check to enforce the data entry to follow xxx-xxx-xxxx would be unnecessary (xxxxxxxxxx and 1xxxxxxxxxx should also be accepted) – this is an example I see in some of the web forms – very annoying).