People v H.K.
2020 NY Slip Op 20232 [69 Misc 3d 774]
May 15, 2020
Busching, J.
Criminal Court of the City of New York, Bronx County
Published by New York State Law Reporting Bureau pursuant to Judiciary Law § 431.
As corrected through Wednesday, December 9, 2020


[*1]
The People of the State of New York, Plaintiff,
v
H.K., Defendant.

Criminal Court of the City of New York, Bronx County, May 15, 2020

APPEARANCES OF COUNSEL

The Bronx Defenders (Dany Greene and Hannah Rosenthal of counsel) for defendant.

Darcel D. Clark, District Attorney (Tiffany Ould and Felicity Lung of counsel), for plaintiff.

{**69 Misc 3d at 775} OPINION OF THE COURT
Laurence E. Busching, J.

In People v Wakefield (175 AD3d 158 [3d Dept 2019], lv denied 34 NY3d 1083 [2019]), the Court considered whether the prosecution's introduction of testimony concerning analysis of deoxyribonucleic acid (DNA) evidence conducted using the TrueAllele Casework System (hereinafter, TrueAllele), a software program, violated the defendant's right to confront his accusers. The Court found that "TrueAllele, by running at the source code's direction, compared DNA found at the crime scene to that of defendant's DNA and generated the report containing the likelihood ratios, which, in effect, implicates defendant in the murder" (id. at 168). Applying the primary purpose test outlined in People v Pealer (20 NY3d 447 [2013], cert denied 571 US 846 [2013]), the Court determined that the TrueAllele report was testimonial (Wakefield at 168-169). Yet, despite the fact that the report was generated through "a synergy and distributed cognition continuum between human and machine," the Court did not find the distribution allotted to the program to be sufficient to "transform the source code into a declarant" (id. at [*2]169 [citations omitted]). In Wakefield, Mark Perlin, "the founder, chief scientist and chief executive officer of Cybergenetics," the company which developed and marketed TrueAllele, testified at trial (id. at 161). Perlin specified which functions were performed by a human analyst and which were done by the program. Since Perlin, "the declarant in the epistemological, existential and legal sense," testified at trial, the Court held that the defendant's right to confrontation was not violated (id. at 169-170).

In this case, the defendant stands charged with two counts of forcible touching (Penal Law § 130.52 [1]), one count of endangering the welfare of a child (Penal Law § 260.10 [1]), two counts of sexual abuse in the third degree (Penal Law § 130.55) and two counts of harassment in the second degree (Penal Law § 240.26 [1]). It is alleged that on October 28, 2018, the defendant sexually assaulted two teenagers by, inter alia, entering their bedroom and touching them on their breasts and vaginas. The complainants reported the incident immediately to their mother and other family members who confronted the defendant and contacted the police. The defendant, a tenant of the complainants' grandmother, made statements to the family{**69 Misc 3d at 776} and the police acknowledging interacting with the girls but denying that any sexual contact took place. The complainants were transported to a hospital where they were examined by a sexual assault nurse examiner and a sexual assault evidence collection kit was prepared that included swabs taken of relevant areas of both complainants. The defendant gave a consensual DNA sample and was later ordered to provide an additional sample for comparison. DNA samples were also taken from both complainants.

Relevant findings from the analysis conducted by the Office of the Chief Medical Examiner (OCME) include:[FN1]

1. Dried secretion swab from "left breast" of complainant 1 contained a DNA mixture that is "approximately 12.2 quadrillion (1.22 x 1016) times more probable if the sample originated from H.K., [complainant 1], and one unknown person than if it originated from [complainant 1] and two unknown persons. Therefore, this supports that H.K. is included as a contributor to this sample."

2. Dried secretion swab from "right breast" of (complainant 1) contained a DNA mixture that is "approximately 4.81 quadrillion (4.81 x 1015) times more probable if the sample originated from H.K., [complainant 1], and one unknown person than if it originated from [complainant 1] and two unknown persons. Therefore, this supports that H.K. is included as a contributor to this sample."

3. "A likelihood ratio was calculated for the comparison of H.K. to the DNA mixture found on the dried secretion swab from 'left breast' [of complainant 2] . . . H.K. is excluded as a contributor to this sample."

Additionally, male DNA was recovered from other swabbed areas of complainant 2, but in insufficient concentrations to permit DNA typing.

The People have sought to admit the first two findings at trial through the testimony of OCME Criminalist Level II Alison Eychner. The defense objected, arguing that in order to calculate the probability ratios described above, the criminalist used a software program called [*3]STRmix that is similar, but not identical, to TrueAllele. Citing Wakefield, they sought preclusion{**69 Misc 3d at 777} of the criminalist's testimony on this topic as violative of the Confrontation Clause. In order to resolve the question of whether such testimony would run afoul of the principles set forth in Wakefield, the court ordered a hearing.

The issue at the hearing, as agreed to by both parties, was: whether testimony of the criminalist provides a sufficient opportunity for cross-examination to satisfy the Confrontation Clause as defined by Crawford v Washington (541 US 36 [2004]) and its progeny?

Findings of Fact

The People called Tiffany Vasquez, a Criminalist IV and Assistant Technical Leader in the Department of Forensic Biology at the New York City OCME. The court finds her to be a credible witness. The defense did not present any witnesses.

Ms. Vasquez has worked at the OCME for 15 years. Criminalist IV is the highest-level criminalist designation. Her responsibilities include supervising Criminalists I, II and III, reviewing case files and technical data, and triaging evidence. As an Assistant Technical Leader, Ms. Vasquez reviews validation data for new technologies, assists in developing standard operating procedures, assists analysts with technical issues with their casework and assists in training analysts.[FN2] Ms. Vasquez holds a bachelor's degree in molecular and cell biology from the University of California and a master's degree in forensic science from the University of Illinois at Chicago.

Ms. Vasquez assists with training on STRmix by giving lectures and assisting in exercises. She also conducts oral examinations for the analysts that are completing their training. She assists analysts who report to her and other supervisees while they are being trained and with their first cases. She has conducted or reviewed thousands of statistical analyses.

There are four stages of DNA testing. These are: (1) extraction, in which the sample is heated in order to release the DNA and isolate it; (2) quantitation, where the amount of DNA is determined; (3) amplification, where the DNA is copied; and (4) detection, where a genetic analyzer is used to determine whether there is any usable DNA in the sample.

Once the detection phase is completed, the analyst enters the raw data into a software program called Genemarker. The{**69 Misc 3d at 778} analyst uses Genemarker to perform an evaluation of the data to account for any artifacts of the process or other "background noise" that would interfere with the results (I at 11, line 14).[FN3] Then, "they will look at the sample and the case as a whole to determine what samples can be interpreted or compared" (I at 13, lines 4-6).

When this process is complete, a criminalist is assigned to the case and goes through each sample presented to see if it can be compared to another sample or interpreted. Each sample can contain DNA from a single source or a mixture from two or three people.[FN4]

The criminalist will look at any reference samples provided by a victim to see if that can give them information about the sample they are comparing. If it is a single-source sample, they may be able to assign a genotype or DNA profile then.

[*4]

If the sample contains a mixture of two or three people, the analyst will employ STRmix in determining probabilities. STRmix is "a probabilistic genotyping software program, which means that it's an assistive tool that helps DNA analysts with their interpretation of data" (I at 15, lines 11-14).[FN5] Ms. Vasquez further explained:

"For a mixture, sometimes if you have a lot of DNA from one person in the mixture and very little DNA from another person in the mixture, it can also be straightforward to determine the genotype of that major contributor who contributed more DNA or the minor contributor that contributed less DNA. But many mixtures are less straightforward, and they have similar amounts from both people in the mixture or three people. So, an analyst can determine about how much DNA came from each person and possible genotypes for each of the individual people, but STRmix can assist them in assigning some probability to each of those genotypes that are possible in that mixture. That is the probabilistic{**69 Misc 3d at 779} portion of the software . . . it is assigning probabilities to each of the genotypes" (I at 16, lines 3-11).

After the analyst designates whether it is a two-person or three-person mixture, "STRmix gives an estimate about how much DNA came from each individual, and . . . possible genotypes for each contributor and assigns a probability for each of those genotypes" (I at 17, lines 17-20). The analyst then compares those results with what they would expect to see from the sample:

"So, are those genotype[ ] possibilities that it came up with reasonable based on the DNA results that they are interpreting themselves? Do the mixture proportions meet their expectation of what they can calculate as a mixture proportion for that particular sample? And they're trained in all of that interpretation before they are trained in STRmix. They are trained in how to determine mixture proportion; they are trained how to assign genotypes to individual contributors within a mixture. They are just making sure STRmix conforms to their expectations" (I at 18, lines 3-12).

If the results produced by STRmix differ from the interpretations of the analyst, the analyst will reexamine the inputs to determine if there were mistakes, such as an incorrect number of contributors or a missed artifact from processing. The analyst will then rerun the data based on the corrected parameters.

When comparing DNA results to a known reference sample, STRmix generates a likelihood ratio using biological and mathematical modeling. Thus, when it is looking at different loci, "it assigns a probability for each genotype combination" (I at 42, lines 15-16). "It compares millions of possibilities, looks at how well they fit the data based on its own biological modeling and then generates an individual statistic for every locus" (I at 45, lines 12-16). OCME analysts then "are looking at the STRmix outputs and comparing it to their interpretation of the data to see that it aligns with their interpretation" (II at 34, lines 5-7).

[*5]

An analyst does not examine each of the millions of proposals or guesses the program makes, but "they are looking at [the] summary table that shows what happened over time" (II at 34, lines 13-15). "If given the proposal, they could calculate why it's accepted or rejected on a particular guess" (II at 34, lines 23-24).

{**69 Misc 3d at 780}When asked if the analyst would be able to perform these tasks without using the STRmix software, Ms. Vasquez explained that they would. The calculations would not be as exact, but they would be "in the ballpark" (I at 16, lines 18-19). On cross-examination, she further explained, "I don't know that they will be able to do it in exactly the way that STRmix does it, but they can still recognize the output of STRmix and see that it matches their interpretation or expectation" (I at 46, lines 2-5). Due to the increasing sensitivity of DNA testing and the number of locations that analysts consider, "[t]o just go through all of that manual interpretation in a reasonable amount of time will be extremely challenging for an analyst to do. We wouldn't get any cases done" (I at 16, lines 21-24). If an analyst were given an unlimited amount of time, they could produce the same output as STRmix. STRmix is not an "expert system" that functions "without any human intervention by the analyst" (II at 52, lines 4-16).

Once they are confident in the results, the analyst will then report their interpretations of the sample in accordance with OCME standard operating procedures. "So STRmix is assisting them with those interpretations, but in the end, they are putting their interpretation into their reports based on a combination of what they are looking at in that DNA data along with the information they obtained from STRmix" (I at 20, lines 4-8).

On cross-examination, Ms. Vasquez testified that there are various issues that the STRmix program models for, including stutter, drop-in, drop-out, random walk standard deviation, effective sample size thinning, degradation, saturation burn-in accepts, highest posterior density iterations and duplicate runs. She testified that changes to most of these factors could affect the eventual likelihood ratio produced. On redirect, she testified that in order to control for these effects, OCME provides assumptions and parameters to the software. While the analyst sets the number of contributors, "[m]ost of the other settings that are inputted into the software were determined through internal validation [by OCME]" (II at 50, lines 21-22).

In terms of training on STRmix, criminalists are first trained on interpretation of DNA data, followed by three days of training on the theories behind the STRmix software, followed by exercises that demonstrate how STRmix works. The training includes "how STRmix uses the assumptions and parameters" (II at 51, lines 11-13). They also do hands-on exercises with{**69 Misc 3d at 781} known samples. Finally, they complete oral and written examinations.

Ms. Vasquez has some limited familiarity with TrueAllele based on having seen presentations on the product, watched webinars and read papers about it. When asked about similarities and differences between the programs, she replied:

"So from what I know of TrueAllele it is similar to STRmix in that it uses some of the same computing processes to look at DNA data, and it generates probabilities and it generates what is called a likelihood ratio for a comparison as a statistic. One of the ways it is different is that analysis step . . . where an analyst intervenes, applies a threshold to separate signal from noise and unlabels artifacts. With TrueAllele, that is not necessary . . . the raw data from the genetic analyzer can be put directly into TrueAllele, and it will make some number of contributor determinations for you, as well as differentiate between artifacts and true peaks" (I at 21, line 20 through 22, line 7).
[*6]

In order to assist in explaining the role STRmix plays in performing an analysis, Ms. Vasquez drew an analogy to Google maps. A driver could use a map to get to their destination, calculate the mileage and make an approximation about travel time. With Google maps, the driver could see all that information mapped out in front of them—but they would still have to check that they are going to the correct place and verify that they had gotten there.

Finally, Ms. Vasquez testified that the analyst does not need or use the source code for the program to come to their conclusions.

Conclusions of Law

Both the Sixth Amendment of the Federal Constitution and article I, § 6 of the New York Constitution provide that every defendant has the right to confront witnesses against them. The factfinder at trial may not consider "testimonial statements of a witness who did not appear at trial unless he was unavailable to testify, and the defendant had . . . a prior opportunity for cross-examination" (Crawford v Washington, 541 US 36, 53-54 [2004]). There is no "forensic evidence" exception—the prosecution may not introduce a testimonial report prepared by an analyst unless they present a witness capable of testifying to its truth (Melendez-Diaz v Massachusetts, 557 US 305 [2009]).{**69 Misc 3d at 782}

The US Supreme Court held that an affidavit from a state laboratory analyst that stated that a particular substance was tested and found to contain cocaine was "functionally identical to live, in-court testimony, doing 'precisely what a witness does on direct examination' " (id. at 310-311, citing Davis v Washington, 547 US 813, 830 [2006]). Similarly, an affidavit prepared by an analyst who reported the result of a gas chromatograph machine was considered testimonial since the affidavit also certified that the machine was in proper working order, that the sample entered was the correct one and that nothing that occurred during testing affected the integrity of the test (Bullcoming v New Mexico, 564 US 647 [2011]).

The New York Court of Appeals has identified "two factors that are 'especially important' in resolving whether to designate a statement as testimonial—'first, whether the statement was prepared in a manner resembling ex parte examination and second, whether the statement accuses defendant of criminal wrongdoing' " (People v Pealer, 20 NY3d 447, 453 [2013], quoting People v Rawlins, 10 NY3d 136, 156 [2008]).

Raw data describing a DNA profile without linking it to the accused was found not to be testimonial as it "shed no light on the guilt of the accused in the absence of an expert's opinion that the results genetically match a known sample" (Rawlins at 159). An autopsy report that was redacted to eliminate any opinions by the medical examiner was held to be non-testimonial because it was "very largely a contemporaneous, objective account of observable facts" (People v Freycinet, 11 NY3d 38, 42 [2008]). Breathalyzer calibration and related maintenance records are not testimonial as the purpose of these tasks was "to ensure the reliability of [the] machines—not to secure evidence for use in any particular criminal proceeding" (Pealer at 455 [emphasis added]).

The testimony of a single criminalist may not, under certain circumstances, be sufficient to satisfy the Confrontation Clause where the testimony concerns conclusions drawn by other criminalists involved in interpreting DNA evidence. In People v John (27 NY3d 294 [2016]), multiple OCME analysts participated in performing DNA testing on a sample from a gun the defendant was accused of possessing. Their final report found that "[t]he combination of the DNA alleles found in the sample would be expected to be found in approximately '1 in greater [*7]than 1 trillion people' " (id. at 298). To make such a calculation, "[e]xperienced analysts convert . . . numeric identifiers into a {**69 Misc 3d at 783}DNA profile using machine-generated raw data analyzed by a software program and the analyst's independent manual examination which involves an editing process" (id., citing John M. Butler, Fundamentals of Forensic DNA Typing 213 [2010]). The results were compared to a sample of the defendant's DNA in a "table [of numbers] resembling a box score" and "[t]he series of numbers were identical" (id. at 299). In that case, at trial, a Criminalist II testified as to the processes and findings of the OCME and laid the foundation for admission of the various analysts' reports as business records. The witness had not performed any of the tests on either sample herself, nor did she observe or supervise any of the tests. The reports were held to be testimonial. Yet, the prosecution "did not produce the analyst who generated the DNA profile from either the gun or the exemplar in this case" (id. at 309). The Court found that "these critical analysts . . . were effectively insulated from cross-examination" and that the testifying witness provided "nothing more than surrogate testimony to prove a required fact" (id.).

Recently, the Court of Appeals reiterated that "when confronted with testimonial DNA evidence at trial, a defendant is entitled to cross-examine 'an analyst who witnessed, performed or supervised the generation of defendant's DNA profile, or who used his or her independent analysis on the raw data' " (People v Tsintzelis, 35 NY3d 925, 926 [Mar. 24, 2020], quoting John at 315; see also People v Austin, 30 NY3d 98 [2017]).

In Wakefield, the Court considered whether the testifying analyst improperly served as a conduit for the analysis and conclusions of a software program, rather than an actual analyst. The concern about the implications for the right to confrontation was heightened in Wakefield since the source code for the program, TrueAllele, is proprietary and was not disclosed to the defense.

Perlin, the developer, "explained that TrueAllele is what is known as an 'expert system,' describing how, beyond the calculations made . . . , the program is designed to have a certain degree of artificial intelligence in order to make additional inferences as more information becomes available" (Wakefield at 167). The Court recognized that due process issues can arise when decisions are made by a software program, rather than by, or at the direction of, the analyst. "Given the exponential growth of technologies such as artificial intelligence,{**69 Misc 3d at 784} to embrace the future we must assess, and perhaps reassess, the constitutional requirements of due process that arise where law and modern science collide" (id. at 165-166, citing Christian Chessman, A "Source" of Error: Computer Code, Criminal Defendants, and the Constitution, 105 Cal L Rev 179 [2017], Katherine Kwong, The Algorithm Says You Did It: The Use of Black Box Algorithms to Analyze Complex DNA Evidence, 31 Harv JL & Tech 275 [2017], Andrea Roth, Machine Testimony, 126 Yale LJ 1972 [2017], and Edward J. Imwinkelried, Computer Source Code: A Source of the Growing Controversy over the Reliability of Automated Forensic Techniques, 66 DePaul L Rev 97 [2016]).

These concerns were alleviated in Wakefield by the fact that Perlin himself testified, both at the Frye hearing and at trial. The Court reviewed the many functions that the analyst performs in directing, setting parameters and reviewing the results of the program. It also considered Perlin's testimony "as to genetic science, the TrueAllele program and the formulation of the TrueAllele report through the computer processors and algorithms, including the MCMC [*8]algorithm"[FN6] (id. at 169 [citations omitted]). Thus, any Confrontation Clause concerns were met because the defendant had the opportunity to confront his "true accuser" (id. at 170).

While the Court found that the report produced by TrueAllele was testimonial, it did not find the source code to be a declarant, explaining:

"This is not to say that an artificial intelligence-type system could never be a declarant, nor is there little doubt that the report and likelihood ratios at issue were derived through distributed cognition between technology and humans (see Itiel E. Dror & Jennifer L. Mnookin, The Use of Technology in Human Expert Domains: Challenges and Risks Arising from the Use of Automated Fingerprint Identification Systems in Forensic Science, 9 L, Probability & Risk 47, 48-49 [2010]). Indeed, similar to many expert reports, the testimonial aspects{**69 Misc 3d at 785} of the TrueAllele report are formulated through a synergy and distributed cognition continuum between human and machine (see Itiel E. Dror & Jennifer L. Mnookin, The Use of Technology in Human Expert Domains: Challenges and Risks Arising from the Use of Automated Fingerprint Identification Systems in Forensic Science, 9 L, Probability & Risk at 48), but this fact alone does not tip the scale so far as to transform the source code into a declarant" (id. at 169).

In this case, the result reached using the STRmix software forcefully advances the assertion that the defendant had illegal contact with the minor complainant. On the other hand, the calculations prepared using the STRmix software are not equivalent to the types of affidavits and other testimonial substitutes that have resulted in findings of Confrontation Clause violations.

Here, STRmix was used as a tool to assist the analyst in her interpretation of the data. It was not working independently. The analyst determined whether the program considered the sample to be a two-person or three-person mixture. STRmix then performed an analysis, using known mathematical and biological models, to give an estimate about the quantity of DNA from each individual and possible genotypes. It then assigned probabilities to each of the possible genotypes. The analyst compared those results with her expectations in order to see if the data made sense based on her training and experience. If the data did not conform with her expectations, the analyst would reexamine the inputs to make sure there are no errors. If there were any, she would correct them and rerun the data.

The program is a tool that performs these analyses much faster than an analyst could. If, however, the analyst was given an unlimited amount of time, she could produce the same output as STRmix. In essence, the software is acting as a highly sophisticated calculator. In contrast to TrueAllele, STRmix is not an "expert system" that relies on artificial intelligence. In this way, STRmix is more akin to a program like Genemarker, which was used in this case to remove [*9]artifacts and "background noise" and was not the subject of an objection.

Under these circumstances, the analyst who utilized STRmix can be meaningfully cross-examined. She has been trained on the underlying principles of biology and biological and mathematical{**69 Misc 3d at 786} modeling as well as the operation of the STRmix software and its underlying principles. She personally inputted the raw data used by the program and designated whether it was a two- or three-person mix. She has knowledge about the assumptions and parameters that have been set by OCME to direct the program's efforts. She can examine particular results produced by the program and assess their accuracy. The results are not the product of "artificial intelligence" for which the analyst does not have responsibility. The analyst is the declarant.

Additionally, unlike in Wakefield, the defense here has not argued that the source code is not available for their inspection (see Access to STRmixTM Software by Defence Legal Teams, available at https://www.strmix.com/assets/Uploads/Defence-Access-to-STRmix-April-2016.pdf [accessed Apr. 16, 2020], cached at http://www.nycourts.gov/reporter/webdocs/Defence-Access-to-STRmix-April-2016[1].pdf).

The court finds that the defendant's right to confrontation will be properly preserved. The analyst will not be serving as a mere conduit for the conclusions of witnesses who will not be called to testify. Rather, the analyst will be testifying as one who performed the analyses at issue, using STRmix as an assistive tool.

Accordingly, the motion to preclude the testimony of Criminalist Alison Eychner is denied.



Footnotes


Footnote 1:Names of the complainants are omitted from this decision pursuant to section 50-b of the Civil Rights Law. The defendant's name has been abbreviated since the case has been resolved with a noncriminal disposition and is otherwise sealed, pursuant to CPL 160.55 (1).

Footnote 2:Throughout the hearing, the witness and the parties used the terms "analyst" and "criminalist" interchangeably.

Footnote 3:References to the testimony refer to the day (I or II), page, and line(s) in the transcript.

Footnote 4:OCME does not interpret samples determined to be from four or more individuals.

Footnote 5:DNA test results interpreted using STRmix have been found admissible since the software was unanimously recommended for use by the DNA Subcommittee of the New York State Commission on Forensic Science and found to be generally accepted within the relevant scientific community. (People v Bullard-Daniel, 54 Misc 3d 177 [Niagara County Ct 2016].)

Footnote 6:The MCMC algorithm "is used to solve high dimension calculus problems that would be impossible or impractical without a computer so as to identify all possibilities, not just the maximum possibility" (Wakefield at 167, citing Ben Shaver, A Zero-Math Introduction to Markov Chain Monte Carlo Methods, Towards Data Science, available at https://towardsdatascience.com/a-zero-math-introduction-to-markov-chain-monte-carlo-methods-dcba889e0c50).