TECHNIQUE OF THE TESTING OF PSEUDORANDOM SEQUENCES

The article is dedicated to systematization of scientific positions about the static testing of sequences, widely used in cryptographic systems of information protection for the production of key and additional information (random numbers, vectors of initialization, etc.). Existing approaches to testing pseudorandom sequences, their advantages and disadvantages are considered. It is revealed that for sequences of length up to 100 bits there are not enough existing statistical packets. Perspective direction of research вЂ“ static testing of sequences using n- dimensional statistics is considered. The joint distributions of 2-chains and 3-chains of a fixed type of random (0, 1) -sequences allow for statistical analysis of local sections of this sequence. Examples, tables, diagrams that can be used to test for randomness of the location of zeros and ones in the bit section are 16 lengths. The paper proposes a methodology for testing pseudorandom sequences, an explicit form of the joint distribution of 2- and 3-chains numbers of various options of random bit sequence of a given small length is obtained. As a result of the implementation of this technique, an information system will be created that will allow analyzing the pseudorandom sequence of a small length and choosing a quality pseudorandom sequence for use in a particular subject area.


INTRODUCTION
Random sequences have found the widest application from the gaming computer industry to mathematical modeling and cryptology.
We list some areas of their usage: 1. Modeling. In computer simulation of physical phenomena. In addition, mathematical modeling uses random numbers as one of the tools of numerical analysis.
2. Cryptography and information security. Random numbers can be used to test the correctness or effectiveness of algorithms and programs. Many algorithms use the generation of pseudo-random numbers to solve applied problems (for example, cryptographic encryption algorithms, the generation of unique identifiers, etc.).
3. Decision making in automated expert systems. The use of random numbers is part of decisionmaking strategies. For example, for the impartiality of the choice of examination paper by a student in an exam. Randomness is also used in the theory of matrix games. 4. Optimization of functional dependencies. Some mathematical optimization methods use stochastic methods to search for extremums of functions. 5. Fun and games. Accident in games has a significant role. In computer or board games, chance helps to diversify the gameplay.
There are various approaches to the formal definition of the term "randomness" based on the concepts of computability and algorithmic complexity [1].
By implementing some algorithm, software generators produce numbers (although not obvious) depending on the set of previous values, so the received numerical sequences are not truly random and are called pseudo-random sequences (PRS). At the moment, more than a thousand software PRS generators are known, which differ in algorithms and values of parameters. Statistical properties are significantly different from the number sequences that are generated by them.

REVIEW OF EXISTING SETS OF PRS TESTS AND THEIR APPLICATION
A selection of 14 tests "Diehard" J. Marsaly was the first in the complex testing of generators PRS. The selection is considered as one of the most rigorous test suites; implemented software and available on the Internet. However, the selection of tests "Diehard" has several disadvantages.
-There is no detailed description of the tests and methods for interpreting the results.  xN-3xN-2xN-1xN; and also checking the frequency of various types of the fives (pokertest). The NIST STS 800-22 standard of the National Institute of Standardization and Technologies NIST [14] includes 15 tests and is focused on testing bit sequences used in the tasks of cryptographic protection of information.
A typical application of tests (in particular, Diehard) is given, for example, in the report.
With an increase in the length of the tested memory bandwidth (more than 100 thousand), many statistical tests begin to detect statistically significant patterns that were not found on samples of smaller size. For example, the sign rank criterion (signed rank test, Wilcoxon), which is quite powerful, rejects such well-known and high-quality generators, as Bluma-Blum-Shuba (BBS), Shamir (RSA), "Marsaglia Multicarry" and "Xorshift" George Marsala Mersenne vortex (MT19937), as well as "truly random sequence" having 1.5-2 thousands of elements of a numerical sequence.
Dimensionality reduction without losing essential information is the goal of any approach designed to cope with high-dimensional time sequences. In this relation, [28][29] should be mentioned first of all. It enables evaluation of distance between any twosample series from a sequence of observations.
The results obtained in the paper [30] are applied to estimate the probability that a nonhomogeneous system of Boolean random linear equations is consistent.
An overview of popular methods for testing bit sequences for randomness showed that, despite the large number of statistical tests, they all give a more correct result with a sufficiently large sample size. However, we will not be able to get a correct answer about the randomness of the sequence if the sequence length is less than 100 elements. In this situation, we propose to test the sequence for randomness using two and / or three-dimensional statistics.

PROBLEM STATEMENT
Before responsible using in mathematical modeling and cryptology, PRS should be tested. Unfortunately, for many PRS tests, there are some limitations: check out only one of the probable ones properties that characterize PRS; do not fix family alternatives; do not have theoretical one's ratings power.
do not give a correct estimate of chance sequences providing a little sample. Problems of small and large samples refer to the main problems that arise in practical application methods of data analysis. Let us use the next classification samples by number [31], based on requirements presented in the program criteria: very small samplingfrom 5 to 12, small samplingfrom 13 to 40, average sampling numberfrom 41 to 100, large samplingfrom 101 and more. The minimum size of the sample limits not so much the algorithm of calculating the criterion, but the distribution of its statistics. So, for row algorithms with too much small ones numbers sample normal approximation distribution of statistics criterion will be under question.
During the research, the localization of the local sections of the bit sequence was conducted to detect the dependencies in the location of its elements by using the exact distributions of the corresponding statistics. In the work an explicit form of the joint distribution of the numbers of 2-chains and numbers of 3-chains of various variants in a random sequence was obtained. This joint distribution allows more accurate comparison of the use of one-dimensional statistics, to analyze the bit sequence small length by chance [32][33][34].

SCHEME FOR VERIFICATION OF STATISTICAL TESTS OF RANDOMNESS SEQUENCES
If users of mathematical and statistical algorithms and their software products are interested in quality research, the following steps should be performed before conducting any research ( 2. Develop a clear understanding about scales measurement. It is through the scale measurement of the original data, methods that can be used for their processing are determined, in order to determine which method to use to help names modules software provision and their descriptions. Before applying of each method one should get acquainted with it prerequisites and constraints and plan necessary amount sampling based on power criteria.
3. Start collecting data. Already selected processing method asks in which form should be presented experimental results. Data can be adequately used by the predicted method.
4. Mathematical and statistical processing is penultimate, technical, stage, whose content should be completely understandable after implementation of the 2nd stage, while there was still no significant cost for the experimental study. This stage does not have any relation to the subject matter of the area.
5. The last one stage is objective scientifically justified conclusion based on the results of the study, taking into account subject matter industry, recommendations and forecast. Using the methods of chart notation, we construct a context diagram (IDEF 0) for the random sequence testing system (Fig. 2).  Mathematical and statistical analysis of sequences, as a rule, takes place in two stages.
Schematically the process of sequences analysis is depicted in Fig. 3.

Figure 3 -Scheme of statistical analysis of sequences
Description of the main steps: 1. The first stage is named as preparatory, it is the most labor-intensive step, and here basic mass calculations are executed. Trust probability is necessary to calculate a number of sample statistical indicators as well differences from a number of others parameters that are not calculated by sampling, but are asked by user program size. It is selected from the following standard rulers: -Zero threshold of 0,90 applies to work with lowered responsibility at the first familiarity with the phenomenon; -The first threshold of 0,95 is applied in most studies (e.g., biological research); -Second threshold of 0,99 is used to work with higher liability (e.g., medical research); -Third threshold of 0,999 is used to work with highest liability (e.g., research efficiency medicine

EXPERIMENT
As a result of the application of this technique for testing pseudo-random sequences, tables were constructed, with the help of which one can obtain the probability of the distribution of zeros and ones in a given sequence. As practice shows, the use of ready-made tables for analyzing the sequence of randomness allows you to get the answer as quickly as possible, in contrast to the classical testing method.
Consider an example of tables for a bit-sequence of small length. For example, let the length of the bit sequence n, n=16. Table 1 and Fig. 4 show the use of the relation (2) for a small sample , = 16, and some values 1 and 2 . In Table 1 Table 2 and Fig. 5 show the use of the relation (3) for a small sample of n, n = 16, and some values of 1 and 2 .  Table 2 is formed of columns whose contents are similar to the contents of the Table 1 columns. Table 3 and Fig. 6 show the use of the relation (4) for a small sample , = 16, and some values 1 and 2 .  Table 3 is formed of columns whose contents are similar to the contents of columns from Table 1. Table 4 shows the use of the relation (5) for a small sample , = 16, and some values 1 , 2 and 3 .

ILLUSTRATION OF THE USE OF EQUALITY (5)
In Table 4 in the first, second and third columns are all possible values 1 , 2 and 3 , for which probability Ρ{( 1 1 * ) = 1 , ( ) = 2 , ( * ) = 3 } ≥ 0,01 . The contents of the fourth and fifth columns are similar to the contents of the third and fourth columns of the Table 1.

RESULTS AND DISCUSSION
As a result of applying this technique for testing pseudo-random sequences for two-dimensional statistics (relations (2) -(4)), you can build a bubble diagram with which you can get the probability of the distribution of zeros and ones in a given sequence.
Consider examples of bubble diagrams for a bit sequence of small length n, n = 16. Fig. 4 gives a bubble chart in which the first parameter (horizontal axis) is the value 1 , the second parameter (vertical axis) is the value 2 , and the third parameter (the bubble size) is the probability of the event occurring { ( 1 1 * ) = 1 , ( 1 ) + ( 0 ) = 2 }, presented in percent. After analyzing Fig. 4 it can be concluded that for the analysis of the sequence of chains of small and medium length (from 13 to 100 elements), onedimensional statistics does not always give the correct result. For example, if we consider the sequence where the parameter k1 = 4, then we can draw a conclusion with a high degree of probability of randomness of the sequence with these characteristics, however, if we pay attention when k1 = 4 and k2 = 0 it can be argued that this sequence is non-random, therefore as shown in Fig. 4 we have Ρ{ ( 1 1 * ) = 1 , ( 1 ) + ( 0 ) = 2 } = 1,30% that also shows the lack of use of one-dimensional statistics for the analysis of short and medium bit sequences.

GRAPHIC ILLUSTRATION OF THE USE OF EQUALITY (2)
An approach to testing using n-dimensional statistics allows us to rely on a deeper justification of the randomness of generated sequences.    Value k 2

GRAPHIC ILLUSTRATION OF THE USE OF EQUALITY (3)
Value k 1 Fig. 5 gives a bubble chart in which the first parameter (horizontal axis) is the value 1 , the second parameter (vertical axis) is the value 2 , and the third parameter (bubble size) is the probability of the event occurring { ( 1 1 * ) = 1 , ( ) = 2 }, which is represented as a percentage. Fig. 6 shows the use of relation (4) for a small sample , = 16, and some values 1 and 2 . Fig. 6 gives a bubble chart in which the first parameter (horizontal axis) is the value 1 , the second parameter (vertical axis) is the value 2 , and the third parameter (bubble size) is the probability of the event occurring { ( 1 1 * ) = 1 , ( * ) = 2 }, which is represented as a percentage.

GRAPHIC ILLUSTRATION OF THE USE OF EQUALITY (4)
In this paper, the exact compatible distributions of some statistics (0, 1) -sequences of length 1 < < ∞ are given. For a bit sequence of small length n, n = 16, the tables containing the numerical values of the corresponding distribution are given. These tables, as well as the proposed graphic representations, can be used to test the hypothesis of the randomness of the arrangement of zeros and units.

THE RESULTS OF THE COMPARISON THE NIST STATISTICAL TEST SUITE AND TEST OF PRS OF SMALL LENGTH USING MULTIDIMENSIONAL STATISTICS
Consider the well-known examples that are given in [14]. Let us analyze the submitted sequences for the corresponding tests, where: • P is the probability of sequence randomness according to the selected criterion from the first column, • P1 is the probability obtained using relation (2), • P2 is the probability obtained using relation (3), • P3 is the probability obtained using relation (4). As can be seen from the table, the use of twodimensional statics gives a more accurate result for short sequences. And also, according to [14], the recommended minimum sequence length n is greater than 100 bits.

CONCLUSION
An approach to testing the use of multidimensional statistics allows you to rely on a deeper rationale for randomized bit sequences that are being analyzed. This area is promising for scientific research. Thus, a new technique of PRS testing is proposed in the paper, and several criteria for testing bit sequence of small length are considered, which, in comparison with onedimensional statistics, gives a more accurate result.
To implement the proposed approach, the author develops a software package for testing PRS, which will include multidimensional statistical tests.
Thus, the paper proposed a methodology for testing a memory bandwidth, and obtained a correct view of the joint distribution of the numbers of 2 chains and the numbers of 3 chains of various variants in a random bit sequence of a given small length.
To implement the proposed approach, a PRS software test package is being developed, which will include tests using multidimensional statistics, which are well recommended for testing a short length PRS. The complex is based on software products developed in C ++, Python, for analyzing PRS, as well as, the user part on a Microsoft Excel spreadsheet processor. Choosing a Microsoft Excel spreadsheet processor due to a wide segment of users, a large number of built-in mathematical and statistical functions, the possibility of programming in VBA, as well as the visibility of implementation, testing programs, there is no need to install additional programs and user training. Currently, more than 20 PRS tests have been implemented, and the test database is being updated.
An analysis of the effectiveness of pseudorandom sequence generators is an urgent issue of cybersecurity in the use of more advanced methods of encryption and information security. The available techniques show low flexibility and versatility in the means of finding hidden patterns in the data. To solve this problem, it is suggested to use algorithms based on multidimensional statistics. These algorithms combine all the advantages of statistical methods and are the only alternative for the analysis of sequences of small and medium length.
As a result of the implementation of this technique, an information system will be created that allows analyzing the PRS of a small length and choosing a quality PRS for use in a particular subject area.