A new peer-reviewed study published in Science Advances confirms the success of a new COVID-19 pooling test that identifies all positive subjects, including asymptomatic carriers, in a single round of testing.

P-BEST, an algorithmic method for pooling-based efficient SARS-CoV-2 testing, was developed by a group of researchers from Ben-Gurion University of the Negev (BGU), the National Institute for Biotechnology in the Negev (NIBN), The Open University of Israel (OUI), and Soroka University Medical Center.

“Approximately 10 to 30% of COVID-19-infected patients are asymptomatic and significant viral spread can occur days before symptom onset,” says Prof. Angel Porgador, BGU deputy vice president of research and development and member of the NIBN.

“Until there is a vaccine, there will be an urgent need to increase diagnostic testing capabilities to allow for screening of asymptomatic and pre-symptomatic populations.

This new single-stage diagnostic test will help prevent the spread of the disease by identifying these patients sooner and at a lower cost using significantly fewer tests.”

In the current study, 384 samples were divided into only 48 pools providing an eightfold increase in testing efficiency and similar reduction in testing costs for reagents. Each pool comprises a unique set of 48 samples, where each sample appears in exactly six pools using a specific combinatorial design.

These 48 pools were then tested at the Soroka virology laboratory using a COVID-19 PCR-based diagnostic protocol that included an RNA extraction stage.

After testing each of the 48 pools individually, the researchers successfully identified up to five positive carriers within the 384 samples, without having to test the subjects in that pool.

“P-BEST can be configured on the basis of the carrier rate,” says Dr. Noam Shental, head of the OUI Computer Science Division. “The lower the carrier rate, the higher efficiency.

Our pooling method has been tested using an advanced liquid-handling robotic system that can perform the task in an hour and can be performed in a typical clinical diagnostic laboratory anywhere in the world.”

The researchers also tested the performance of P-BEST in a clinical study aimed at screening asymptomatic and mildly symptomatic healthcare workers. In the study, they screened 1,115 asymptomatic health care personnel at Soroka using P-BEST.

Subjects were recruited across all Soroka staff and included physicians, nurses, nurse assistants, as well as clinical and administrative staff.

A total of 296 (26.5%) subjects worked in direct contact with patients with COVID-19.

Within the cohort, 926 (93.1%) subjects reported themselves as totally asymptomatic, 71 (6.3%) reported a mild cough, and 70 (6.3%) reported rhinorrhea.

The 1,115 participants were tested using only 144 tests. All of the pools tested were negative. Because of the decreasing carrier rates in Israel during April 2020, the third batch was blindly spiked with a sample from a patient with COVID-19, which was positively identified.

“P-BEST is ideal for conducting carrier screening when infection rates are very low, less than one percent,” says Prof. Tomer Hertz from BGU’s Shraga Segal Department of Microbiology, Immunology and Genetics. “This will provide significant savings in reagents and other diagnostic testing resources while significantly increasing testing capacity.”

BGU and OUI have established a company, Poold Diagnostics, to pursue large scale COVID-19 testing, following the clinical study. In mid-August, the Israel Ministry of Health approved the use of P-BEST in clinical laboratories in the State of Israel.

“The BGU research team has validated a pooling method that will allow sizeable populations in Israel and ultimately other countries to be tested accurately and at low cost for COVID-19,” says Doug Seserman, chief executive officer of the New York City-based AABGU.

“The BGU COVID-19 Response Effort is moving forward on commercializing a number of technologies and innovations to mitigate effects of this pandemic.”

One key to containing and mitigating the COVID-19 pandemic is suggested to be rapid testing on a massive scale (Huang et al., 2019, Siegenfeld and Bar-Yam, YYYY). It would be beneficial to develop the ability to routinely, and in particular rapidly, test groups such as frontline healthcare workers, police officers, and international travellers.

Testing for SARS-CoV-2 is currently performed via the polymerase chain reaction (PCR) on nasopharyngeal swabs (Kai-Wang To et al., 2019).

Typically, the population size significantly exceeds the capacity for testing, with the number of available PCR machines and reagents an important bottleneck in this process.

There are two basic approaches to PCR testing in populations: 1. individual tests, where every single sample is examined, and 2. pooled tests where larger sets of samples are mixed and tested en bloc. Pooled testing was pioneered by Dorfman in 1943 (Dorfman, December 1943) in the context of blood tests and led to a host of research activity, both on the lab side as well as the theoretical side (Aldridge et al., 2019, Du and Hwang, December 1999, Du et al., 2006).

If the disease is rare in the population, pooled testing may be advisable. In this case it can assist in optimizing precious testing capacity since most individual results would be negative.

Pooling relies on the fact that the PCR is reasonably reliable under the combination of samples: the preprint (Yelin et al., 2020) suggests that a detection of SARS-CoV-2 in pools of size 32 and possibly 64 is feasible.

While a classic pooling strategy has the advantage that less overall PCR tests are required, there are disadvantages in terms of lab organisation and – more crucially – time: pooling only indicates whether a pool contains at least one infected individual.

If samples are tested in pools of size n and the incidence ρ is small (more precisely, if ρ⋅n is small) then a number of samples will be in pools that are tested positive and hence undergo a second round of testing.

In other words, pooled testing with individual verification of positive pools is an adaptive testing strategy, the lab organisation for which is a labour, management, and resource intensive process. It has several drawbacks, since it requires keeping multiple lab samples and re-running of the time-intensive PCR process.

The lab feedback loop makes the entire workflow more susceptible to delays (see Fig. 2 ). This may result in delays in individual results – a particular problem when the objective is to rapidly identify infected individuals, who may infect others while waiting for the test outcome.

Furthermore, since the number of samples undergoing a second round of testing is an unknown quantity, some reserve capacity is required to prevent further delays. This makes it more challenging for the lab to operate near its maximal capacity.(see Fig. 3 )

In the theoretical research on testing strategies the distinction is made between adaptive testing, for example when all samples in a positive pool undergo a second round of testing, and non-adaptive strategies, where all tests can be run simultaneously (Du and Hwang, December 1999).

Testing every sample individually can be considered as a trivial non-adaptive strategy, but there exist non-adaptive strategies which combine the benefit of pooling with the advantages of non-adaptive testing.

In this note, we propose a non-adaptive pooling strategy for rapid and large-scale screening for SARS-CoV-2 or other scenarios where detection time is critical.

This allows for significant streamlining of the testing process and reductions in detection time. Firstly because only one round of PCR is required, and secondly because it eliminates actions in the lab workflow that require input from results determined in the lab, i.e. the testing infrastructure can be organized completely linearly, cf. Fig. 2 for an illustration.

The strategy will systematically overestimate the number of positives, but we can provide error bounds on the number of false positives which scale favourably with large numbers and will be small in realistic scenarios.

**Definition of the non-adaptive testing strategy: multipools**

Our testing strategy is as follows: every individual’s sample is broken up into k samples and distributed over k different pools of size n such that no two individuals share more than one pool.

An individual is considered as tested positive if all the pools in which its sample has been given are tested positive or – in our case equivalently – an item is considered as tested negative if it appears in at least one negative pool.

This decoding algorithm is also known as COMP (Combinatorial Orthogonal Matching Pursuit), an algorithm easily implementable in practice with low run-time and storage (Johnson et al., February 2019).

Let us make our definition more formal:

## Definition 1

#### Multipools — Let a *population* (X1,…,XN) of size *N*, a *pool size n*, and a *multiplicity k* be given, and assume that Nk is a multiple of *n*. We call a collection of subsets/pools of {X1,…,XN} an (N,n,k) *-multipool*, or briefly *multipool*, if all of the following three conditions hold:

- (M1)Every pool consists of exactly
*n*elements. - (M2)Every sample Xi is contained in exactly
*k*pools. - (M3)For any two different samples Xi,Xj there exists at most one pool which contains both Xi and Xj.

In the context of non-adaptive testing, designs as in ^{Definition 1} are called (k−1) *-disjoint matrices* and it is known that such matrices correctly identify up to *k* infected samples (^{Mazumdar, 2012}). However, we will be interested in scenarios where the number of infected samples can exceed the multiplicity *k*. If N=n2 and k=2 the construction of an (N,n,2)-multipool is quite straightforward, see Fig. 1 : arrange the *N* samples in a rectangular grid and then pool along every row and column, cf. (^{Sint et al., August 2016, Fargion, YYYY, Zuzarte et al., April 2014}). However, as we shall see below, k=2 is in many realistic scenarios insufficient for the desired precision.

Some recent contributions (Fargion, YYYY, Mutesa et al., 2020) propose to arrange samples in a (3 or higher dimensional) hypercube and to pool along all hyperplanes.

This makes every individual sample appear in three or more pools, but it is not a multipool in the sense of Definition 1 above, since in dimension three and higher, any two hyperplanes will intersect in more than one point, in violation of Property (M3).

This creates unnecessary correlations between different pools and impairs performance.

If k=3, systems as in Definition 1 are also called Steiner triples and have been recently used in non-adaptive group testing for SARS-CoV-2 (Ghosh et al., 2020).

A flexible way to construct multipools of various multiplicities k is given by the Shifted Transversal Design (Thierry-Mieg, 2006, Erlich et al., 0353) which we explain in Section 4.

**Controlling the number of false positives**

We always assume that the incidence ρ of the disease is small compared to the inverse pool size 1/n. This is a reasonable requirement, also in classical pooling strategies (a ρn portion of samples will have to undergo second testing, thus a large ρn would attenuate the benefit of pooling).

Assuming perfect performance of the PCR, also under pooling (see Section 6 on how to deal with uncertainty here), multipooling will identify all infected individuals, since all their pools will be positive.

However, a sample might falsely be declared positive if all pools in which it is contained happen to contain an infected sample.

The expected portion of false positives in a multipool strategy is

Here, the third identity crucially uses the property (M3) which guarantees independence between the poolmates in the different pools of a sample. By Bayes’ rule, the probability to actually be negative when tested positive by the multipool (i.e. the portion of subjects falsely declared positive among all subjects declared positive) is

Let us calculate for which *k* the probability of a positive test result being a false positive does not exceed *∊*fp>0:

This provides a lower bound on the necessary multiplicity *k* in terms of the sample size *n*, the knowledge on the incidence ρ, and the acceptable portion *∊*fp of false positive results among all positives. Assuming *∊*fp<1 and ρ⩽1/2 (which are both reasonable assumptions, recall that ρn is small), the lower bound in (10) is monotone increasing in ρ. Hence, if the exact incidence is unknown but we have an upper bound on it, we can work with the largest/worst case ρ. Let us summarize these findings in the following Theorem 1

Let the incidence be at most ρ⩽12, and let 0<*∊*fp<1. If

then in any multipooling strategy with pool size *n* and multiplicity *k*, the probability of a positive test being a false positive does not exceed *∊*fp.

The number of tests required in a multipool strategy is Nk/n, an improvement compared to individual testing by a factor n/k. A key observation is that the lower bound on *k* in Inequality (11) scales favourably with large multiplicities *n*.

Indeed, recall that in an adaptive pooling strategy one wants on the one hand large pool sizes *n*, but on the other hand nρ should be small. It is therefore reasonable to have *n* proportional to the inverse of ρ, i.e. nρ≈C.

Using that 1−ρ≈1 and 1−(1−ρ)n−1≈(n−1)ρ≈nρ, the lower bound in (11) behaves approximately as

that is k grows only logarithmically with the pool size n. An analogous analysis shows that k also grows logarithmically with the inverse of ∊fp when the error probability ∊fp is sent to zero.

## Generating multipools

The question for which combiniations (N,n,k) a multipool exists seems to be in general a non-trivial combinatorial problem. We focus here on the case when N=n2 and on constructions based on the Shifted Transversal Design (^{Thierry-Mieg, 2006}).

It is useful to imagine all *N* samples arranged in an n×n-square and label samples by their *x* and *y*-coordinate, i.e. denoting the sample at position (i,j)∈N20 by Xij, where we define the the sample in the lower left (south-west) corner to be X00. For multiplicity k=2, a (N,n,k)-multipool can be constructed by pooling along rows and columns, as in Fig. 1.

Unfortunately, for reasonable parameter choices, a multiplicity of k=2 turns out to lead to large false positive rates: For instance, arranging N=64 samples from a population with incidence ρ=0.01 in a rectangular grid and pooling along all rows and columns (in our notation this is an (64,8,2)-multipool), Identity (6) will imply that on average 31.4% of positive results will actually be false positives. To improve on this and pass to multiplicity k=3, one can sample along diagonals, where the diagonals are continued periodically, see Fig. 4 . This works for any pool size n⩾2 and leads toTheorem 2

Let N=n2 and n⩾2. Then there exists an (N,n,3)-multipool, obtained by sampling along rows, columns, and all periodically continued south-west-to-north-east diagonals.

In the situation of N=64 and n=8, this allows for the construction of a (64,8,3)-multipool in which, by (6), the probability of a positive result being erroneous is reduced to 3.01%. In such a scenario, one would test 64 individuals with 24 tests, a compression by a factor 0.375. A higher compression rate would require larger pool sizes *n*. Since the lower bound (11) on *k* in ^{Theorem 1} is monotonous in *n*, this will in turn also require to higher multiplicities *k* in order to achieve comparable false positive error probabilities. To pass to k=4, one might now be tempted to pool along the other (north-west-to-south-east) diagonals, but this is not going to yield a multipool in general, see for instance Fig. 5 where, in the case n=8, two diagonals intersect in more than one point, in violation of Property (M3) in ^{Definition 1}.

This is due to the fact that n=8 has non-trivial divisors, i.e. it is not a prime number. South-west-to-north-east diagonals are of the form

were, (modn) means that we use arithmetic modulo *n*, that is as soon as we exceed n−1, we start counting from 0 again. These diagonals are lines of slope +1 and −1, respectively, and the difference of these slopes is 2, which divides 8. Since intersections of two such lines are given by solutions to the equation

there can be more than one *j* solving (16): Indeed, if some j0∈{0,…,7} solves (16), then j’≔j+4(mod8) is a solution as well, since 2j’=2j(mod8).

More generally, it is well-known that for m∈{1,…,n−1} and j∈{0,…,n−1}, the equation

has a unique solution *j* if and only if the greatest common divisor of *m* and *n* is 1. Since this must hold for all m∈{1,…,n−1},n must be a *prime number*. In this case, the integers modulo *n* form an algebraic structure called a *field*, in which every non-zero element has a well-defined multiplicative inverse.

For prime *n*, the unique solution of (17) is therefore given by j=m−1l, where m−1 denotes the multiplicative inverse of *m* in arithmetic modulo *n*.

This suggests to use a prime pool size *n* and sample along lines of different slopes, that is to use pools of the form

We can add one more type of pool by sampling along all vertical lines (their slope can be considered as ”infinity”) which we denote by

Such ensembles of pools are sketched in Fig. 6 for the case n=5.

This construction is also referred to as the *Shifted Transversal Design* in (^{Thierry-Mieg, 2006}). We summarise our findings in the followingTheorem 3

Let *n* be a prime number and let N=n2. Then, there exists a (N,n,k)-multipool for k=(n+1), and consequently also for every smaller *k*. This multipool is given by pooling along all sloped lines, that is:

Fig. 6 contains an illustration of elements of such a multipool in the case n=5 with multiplicity k=6. ^{Theorem 3} allows for multiplicities up to k=n+1, but in practice, one will want to work with much lower multiplicities *k* since a high multiplicity would require many tests and defeat the purpose of pooling.

From a practical perspective it seems reasonable to generate large pools by a sequence of unions of two equally diluted pools.

This leads to pool sizes which are a power of 2, certainly not a prime number (except for 2 itself).

One approach to accomodate for that would be population sizes N=n2 where *n* is a prime just below a power of 2, e.g. n=31, which is just below 32 or n=61 which is just below 64.

Then pools of size *n* can be mixed by adding a small number of negative dummy samples and proceeding as if *n* was a power of 2.

reference link : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7428746/

**More information:** Noam Shental et al. Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers, *Science Advances* (2020). DOI: 10.1126/sciadv.abc5961