Exploratory data analysis.
Anomalies of the field: VAERSDATA.NUMDAYS
Most of the fields in the VAERS database have pretty clean data. For instance, many fields will have either ‘Y’, or NULL in them.
You won’t find odd data such as ‘N’, numbers or other letters. The
data looks quite reliable.
However, analysis of the field NUMDAYS reveals some oddities.
In the VAERS documentation, NUMDAYS is defined as:
Number of days (ONSET_DATE – VAX_DATE)
Does this hold true, and can it be relied upon?
In how many rows is NUMDAYS populated?
use vaers_2019; go Select count (numdays ) cnt_numdays , count(*) Count , 100.0 * count (numdays ) / count(*) as Pct_NUMDAYS_Not_Null from dbo.[vaersdata]; go
Cnt_numdays | Count | Pct_NUMDAYS_Not_Null |
---|---|---|
593,718 | 759,483 | 78.17 |
Approximately 78% of the rows in VAERSDATA have the column NUMDAYS, populated.
Interestingly, after loading the VAERSDATA MDB file into SQL SERVER, the NUMDAYS column is loaded as type Nvarchar.
Are there any rows where NUMDAYS differs from the difference in days between VAX_DATE and ONSET_DATE?
select DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) DateDiff , CONVERT(INT, numdays) NUMDAYS from dbo.[vaersdata] where numdays is not null and DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) <> CONVERT(INT, numdays)
Zero rows. Reliable.
However, looking at the fields, VAX_DATE and ONSET_DATE, there are a number of rows where the ONSET_DATE precedes the VAX_DATE. This does not make sense. VAX_DATE needs to precedes the ONSET_DATE.
Select count(*) Count from dbo.[vaersdata] where VAX_DATE > ONSET_DATE
Count:
2,530
Does this affect NUMDAYS?
Select NUMDAYS , count(*) Count from dbo.[vaersdata] where ONSET_DATE < VAX_DATE group by NUMDAYS;
NUMDAYS | Count |
---|---|
NULL | 2530 |
If the ONSET_DATE precedes the VAX_DATE, then NUMDAYS remains NULL.
...
When ONSET_DATE < VAX_DATE, what values are found?
select DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) DateDiff , CONVERT(INT, numdays) NUMDAYS , count(*) Count from dbo.[vaersdata] where numdays is null and VAX_DATE is not null and ONSET_DATE is not null Group by DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) , CONVERT(INT, numdays) order by 1 desc
Row Number | DateDiff | NUMDAYS | Count |
---|---|---|---|
1 | -1 | NULL | 135 |
2 | -2 | NULL | 123 |
3 | -3 | NULL | 105 |
4 | -4 | NULL | 98 |
5 | -5 | NULL | 98 |
6 | -6 | NULL | 86 |
7 | -7 | NULL | 97 |
8 | -8 | NULL | 94 |
9 | -9 | NULL | 101 |
10 | -10 | NULL | 99 |
11 | -11 | NULL | 80 |
12 | -12 | NULL | 109 |
13 | -13 | NULL | 77 |
14 | -14 | NULL | 90 |
15 | -15 | NULL | 92 |
… | |||
95 | -815 | NULL | 1 |
96 | -1,271 | NULL | 1 |
97 | -1,511 | NULL | 1 |
98 | -1,854 | NULL | 1 |
99 | -2,554 | NULL | 1 |
100 | -2,701 | NULL | 1 |
101 | -3,287 | NULL | 1 |
102 | -3,652 | NULL | 3 |
103 | -23,010 | NULL | 1 |
104 | -35,667 | NULL | 1 |
104 rows.
The majority of the odd results are within 30 days.
At the same time, there are some very odd results. The maximum, 35667 days would be over 97 years.
—
Some Excerpts Where ONSET_DATE < VAX_DATE
select VAERS_ID , VAX_DATE , ONSET_DATE , CONCAT (symptom_text, symptom_text2) symptom_text from dbo.[vaersdata] where numdays is null and DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) = -2 order by 1;
VAERS_ID | VAX_DATE | ONSET_DATE |
---|---|---|
791383 | 2018-12-03 | 2018-12-01 |
Symptom_text:
This case was reported by a consumer via call center representative and described the occurrence of dizziness in a 50-year-old female patient who received Shingles vaccine for prophylaxis. On 3rd December 2018, the patient received Shingles vaccine. In December 2018, less than a week after receiving Shingles vaccine, the patient experienced dizziness. On an unknown date, the outcome of the dizziness was unknown. It was unknown if the reporter considered the dizziness to be related to Shingles vaccine. Add
Comment:
Looks like ONSET_DATE should be entered as: 2018-12-10, one week after the VAX_DATE.
…
VAERS_ID | VAX_DATE | ONSET_DATE |
---|---|---|
833737 | 2019-09-03 | 2019-09-01 |
Symptom_text:
The edema increased until the elbow and on 05 Sep2019 [the arm of the patient) was swollen until the elbow with spots, very warm and painful at touch; The edema increased until the elbow and on 05 Sep2019 [the arm of the patient) was swollen until the elbow with spots, very warm and painful at touch; The edema increased until the elbow and on 05 Sep2019 [the arm of the patient) was swollen until the elbow with spots, very warm and painful at touch; The edema increased until the elbow and on 05 Sep2019 [th
Comment:
Onset date should be: 05 Sep2019, as it says in the field SYMPTOM_TEXT. Not 2019-09-01
Onset_Date should be two days higher, not two days less.
…
VAERS_ID | VAX_DATE | ONSET_DATE |
---|---|---|
835136 | 2019-08-03 | 2019-08-01 |
Symptom_Text:
Diarrhoea; Pyrexia; This case was reported by a consumer via regulatory authority and described the occurrence of diarrhoea in a 2-year-old female patient who received Rabies NVS (Rabipur) (batch number ARBA341A, expiry date unknown) for prophylaxis. Co-suspect products included TYPHOID VACCINE (TYPHIM VI) (batch number P1E191V, expiry date unknown) for prophylaxis. On 3rd August 2019, the patient received Rabipur (parenteral) and TYPHIM VI (parenteral). On 5th August 2019, 2 days after receiving Rabi
Comment:
Vaccination date: 2019-08-03
Onset date should be: 5th August 2019, as specified in the text.
Not 2019-08-01
Onset_Date should be two days higher, not two days less.
….
These all look like data entry errors.
—-
Date anomalies in thousands of days:
select VAERS_ID , VAX_DATE , ONSET_DATE , Age_yrs , DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) as theDateDiff , CONCAT (symptom_text, symptom_text2) Symptom_text from dbo.[vaersdata] where numdays is null and DATEDIFF(DAY, VAX_DATE, ONSET_DATE ) < -10000 order by 1
VAERS_ID | VAX_DATE | ONSET_DATE | Age_yrs | theDateDiff |
---|---|---|---|---|
816765 | 2019-01-08 | 1921-05-15 | 31 | -35,667 |
Symptom_text:
Arthralgia
Comment:
Entered as 1921-05-15.
Probably should have been entered as: 2019-05-15
....
VAERS_ID | VAX_DATE | ONSET_DATE | Age_yrs | theDateDiff |
---|---|---|---|---|
835910 | 2019-09-22 | 1956-09-22 | 63 | -23,010 |
Symptom_text:
Gave HD dose to 63 year old patient inadvertedly
Comment:
1956-09-22 to 2019-09-22 is exactly 63 years, the patient's age.
Some kind of data entry error.
---
Does It Affect The Results?
Select count(*) NUMDAYS_Count from dbo.[vaersdata] where numdays is not null
NUMDAYS_Count:
593,718
...
Select count(*) Vax_Onset_Date_Count from dbo.[vaersdata] where VAX_DATE is not null and ONSET_DATE is not null
Vax_Onset_Date_Count:
596,248
...
Select count(*) Count from dbo.[vaersdata]
Count:
759,483
...
NUMDAYS percent populated:
100 * 593718/759483 = 78.17%
VAX_DATE and ONSET_DATE populated:
100 * 596248/759483 = 78.5%
Not even a 1% difference