Geostatistics has gained popularity as a quantitative tool to generate multiple geological models, or realizations, that honor a given statistical structure and various types of measured and interpreted data. Two key approaches have traditionally been followed: a pixel based approach such as sequential indicator simulation based on correlation variograms, and an object based (Boolean) approach in which large objects representing geobodies are inserted into the reservoir grid. Both of these approaches, although widely used in the practice have limitations for reproducing geologically realistic volumes, especially when trying to incorporate a user-defined, conceptual geological model.
Multiple-point statistics (MPS) simulation is one of the most recent spatial modeling techniques because of its ability to reproduce complex geological patterns (e.g. sinuous channels) that cannot be modeled by two-point statistics moments (i.e. variograms). Instead of using mere 2-point statistics from a variogram model , as seen in algorithms such as sequential indicator simulation, the MPS approach borrows multiple-point statistics from a conceptual geological model called training image.

Initial algorithms generate geologically realistic realizations by using these training images to obtain conditional probabilities needed in a stochastic simulation framework. More recent pattern-based geostatistical algorithms attempt to improve the accuracy of the training image pattern reproduction. In these approaches, the training image is used to construct a pattern database. Consequently, sequential simulation will be carried out by selecting a pattern from the database and pasting it onto the simulation grid. One of the shortcomings of the present algorithms is the lack of a unifying framework for classifying and modeling the patterns from the training image. In this research an entirely different approach will be taken towards geostatistical modeling. A novel, principled and unified technique for pattern analysis and generation that ensures computational efficiency and enables a straightforward incorporation of domain knowledge is obtained. The outlook of the workflow is shown below:

In the developed methodology, patterns scanned from the training image are represented as points in a Cartesian space using multi-dimensional scaling. The idea behind this mapping is to use distance functions as a tool for analyzing variability between all the patterns in a training image. These distance functions can be tailored to the application at hand. Next, by significantly reducing the dimensionality of the problem and using kernel space mapping, an improved pattern classification algorithm is obtained. An improved pattern continuity and data-conditioning capability is observed in the generated realizations for both continuous and categorical variables. I show how the proposed methodology is much less sensitive to the user-provided parameters, and at the same time has the potential to reduce computational time significantly.

Stochastic Simulation of Patterns Using Distance-based Pattern Modeling

The details of the algorithm and many specific algorithmic issues will not be discussed here. However, two different results are shown to illustrate the effectiveness of the proposed methodology in multiple-point statistics. They all show better pattern reproductivity within the results. The patterns and the features that was inherently provided within a training image are almost perfectly reproduced. The connectivity of the channels or sinusoidal features within the generated realization are well above expectation. This rise to the new era of possibilities in geomodeling and reservoir characterization within petroleum industry.

MPS algorithms needs to be programmed so that companies and interested engineers could be able to easily use them. Here, a C++ software implementation of the MPS algorithm, innovated during the PhD, is done. The name of the algorithm is called DisPAT. This algorithm is a pattern-based multiple-point geostatistical routine, where not only is superior to its predecessors in terms of pattern reproductivity and more geologically realistic results, but also, is orders of magnitude faster than them. The goals of DisPAT algorithm is summarized below:

The Stanford Geostatistical Modeling Software (SGeMS) is an open-source computer package for solving problems involving spatially related variables. It provides geostatistics practitioners with a user-friendly interface, an interactive 3-D visualization, and a wide selection of algorithms. Users can perform complex tasks using the embedded Python scripting language, and new algorithms can be developed using the SGeMS plug-in mechanism. SGeMS is the first software to provide algorithms for multiple-point statistics. The SGeMS package provides a versatile toolkit for Earth Sciences graduates and researchers, as well as practitioners of environmental, mining and petroleum engineering.
DisPAT algorithm has been coded in C++ for SGeMS software. The interface shown in front shows the inputs for the algorithm. This method, has the minimal amount of input parameters, but on the other hand, has provided the capability of manual change in the inputs. This will ensure that, first of all, the experienced user would feel comfortable by changing the inputs, and the non-so-technical engineer would easily understand and apply the geostatistical routine. This program also has data conditioning of well data (hard) and seismic data (soft) within it. This ensures that the generated models would honor all the available data at hand. Therefore, a more realistic models will be genarated and better and more accurate uncertainty assessment can be made.

Better and more realistic models should not require an increase in user-set parameters (a disadvantage of filtersim) and a much simpler method should work just as well. I believe the same to be true for pattern analysis. Therefore, in this DisPAT algorithm, a new mathematical framework for modeling patterns of natural images, representing subsurface heterogeneities, was obtained. Distance-based methods for mapping the space of patterns, through the mathematical theory of multi-dimensional scaling, was introduced. In distance-based modeling, many of the tasks usually performed in multiple-point geostatistical algorithms can be carried out in a surprisingly simple yet powerful way.

A speed comparison with two different training images is provided below. The first one is simple 2D training image and the second one is a more complex 3D training image. All different MPS algorithms have been tested on the same machine with 2.66 GHz CPU and 2 GB of RAM. These clearly show the superiority of DisPAT over other MPS routines.

Better and more realistic models should not require an increase in user-set parameters (a disadvantage of filtersim) and a much simpler method should work just as well. I believe the same to be true for pattern analysis. Therefore, in this DisPAT algorithm, a new mathematical framework for modeling patterns of natural images, representing subsurface heterogeneities, was obtained. Distance-based methods for mapping the space of patterns, through the mathematical theory of multi-dimensional scaling, was introduced. In distance-based modeling, many of the tasks usually performed in multiple-point geostatistical algorithms can be carried out in a surprisingly simple yet powerful way.

DisPAT: Distance-based Pattern Simulator

Feature Identification and Template Size Selection in Training Images

In order to generate realizations in any geostatistical simulation, some form of statistical prior model is required. For example, in sequential Gaussian simulation, a mean and a variogram alone are sufficient to generate realizations. Similarly, in multiple-point geostatistics, one acquires the statistics from a conceptual training image. However in two-point statistics, the variogram range is implicitly taken into account by the formulation. But in multiple-point modeling, the relevant statistics are implicitly defined through the choice of the size of the template. In probabilistic approaches, such as snesim, this corresponds to the search neighborhood for the calculation of probabilities, and in pattern-based approaches, such as simpat and filtersim, it corresponds to the size of the patterns that are used to generate realizations. So far, selection of an optimal template size, associated with the training image, has been made with cumbersome trial-and-errors by analyzing the reproduction of patterns and large-scale structures in the final simulated realization. In the next section, an algorithm that automatically selects the optimal template size is provided.
Some criterion of optimality needs to be defined to select a template. The template size should depend on the small-scale structures as well as on the training image resolution and the actual features themselves. If the template size is chosen very small with respect to the actual features (i.e. the actual features are insufficient represented within the template), the statistics will differ strongly for each template window. Therefore, the template size should be chosen as small as possible to improve small-scale structures but it should be chosen as large as necessary to represent the actual features occurring in a training image. The algorithm is shown here and a simple explanation is provided afterwards.

A general concept of entropy is used to find an optimal template size. Entropy is a statistical measure of randomness that can be used to characterize the texture of the training image.
High entropy relates to more randomness.Generally speaking and despite some exception, as the template size grows, the entropy of the patterns of a training image should increase. The reason behind that lies in the Shannon’s source coding theorem. In information theory, Shannon’s entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). This concept, when applied to patterns, can determine the minimum information required in order to reliably represent the pattern as encoded binary digits. Therefore, by increasing the template size in a stationary training image, two different behaviors will be observed. In the first stage, the entropy will sharply increase since the average number of bits of information needed to encode or compress the patterns is increasing. At a later stage, where the template size has increased above the optimal window that represents the stationary features of the training image, Shannon’s entropy would increase at a much slower pace; since the information needed for encoding a large pattern that has some stationarity features is approximately the stationary feature of the training image itself.
Two different examples is shown in here on how the methodolgy works. There are proofs on the exact correctness of the results, which is omitted from here, but can be provided upon request.

Modeling Upscaling Errors of Ensembles Using Distance-Based Methods

In order to now learn the model, we also incorporate the permeability fields of both fine-scale and coarse-scale models. However, we will only look at the principal components. For all the models, we will apply PCA on the permeability fields. We assume that this capture significant variability between them. By differencing the important PCAs of fine and coatse models, one is able to define the error in upscaling with respect to the feature seen within the differences, i.e. the high-permeability streaks existing within one part of the model could be causing the errors.
Therefore, we first try to optimize the weights associated to each principal component in such a way that the upscaled and fine scale MDS space representations would look as close to each other as possible, deformably. Next, we map both the upscale responses + PCA component (with their associated weights), and the fine scale responses of the selected models to MDS space, and then, to remove non-linearities, map them to the higher-dimensional space using Kernel transformation. This will ensure a linear relationship between the models in both spaces. Here, we would then be able to learn the mapping using linear algorithms such as linear regression. Applying back the regression on all other models that fine-scale simulation was not run on, we would obtain their location within high-dimensional kernel space of fine-scale runs. Finally applying a pre-image problem, we would obtain the original fine-scale results and flow responses. This ensures that we the minimal fine-scale runs we can obtain and model the upscaling errors, and use that to fine the fine-scale runs of the ensemble of the models. Figure below shows the original upscaled MDS space and the fine-scale runs on the same plot, but with different regression methods. The results are satisfactory and shows improvment over previous methodologies.

Upscaling techniques are widely used to render flow simulation possible at an affordable cost. There exist a variety of upscaling techniques. Upscaling techniques in general can be classified according to the size of the region on which the flow simulation is performed in order to determine the coarse-scale properties. Purely local methods solve local fine-scale flow problems only on the target coarse-grid block, and local boundary conditions need to be assumed. Extended local methods include some neighboring fine-scale cells to reduce the impact of local boundary conditions. Global upscaling considers global fine-scale flow simulation on the entire global domain and computes the upscaled quantities from the global fine-scale solutions. It eliminates the need of local boundary conditions, but requires flow simulation on the highresolution grid. In between, local-global upscaling procedures incorporates global coarse-scale flow into local upscaling calculations, providing more accurate coarsescale models than local upscaling methods while reducing the computational cost in global upscaling.
Typically, flow simulations on coarse-scale models contain some error, and may be very different from flow simulations on the high resolution models from which the coarse models are derived. In this work, I propose to evaluate the potential error in the upscaling procedure. The ultimate goal is to approximate the fine-scale responses by running coarsescale simulations only, and a few fine-scale models, and accounting for the error due to upscaling.
Our approach is based upon the intuitive assumption that models which have similar flow responses at the coarse-scale have similar errors in the upscaling. In other words, we assume that the upscaling error is not random, but has structure which is correlated with the coarse-grid flow response. As a consequence, if we group models with similar coarse-grid responses, we can estimate the error from a prototype model in that group (e.g. the cluster center), and associate the same error to other models in the group.

Figure above shows the MDS space of all fine-scale and coarse-scale models. The red ones show the fine-scale and the yellow coarse-scale. As can be seen there is a systematic error in upscaling. The purpose here is to leanr this error model. However, unlike other algorithms, we also learn exactly where upscaling error is emenating from. In other words, depending on the permeability of the field, or different features within the reservoir model, upscaling technique causes systematic errors within. In order to model this, we use distance-based methods over coarse-scale runs, and then select the most representative set of models that can capture the differences within the responses as widely as possible. This will ensure that the models selected are as diverse as possible. Next, we run fine-scale flow simulation on those mosels and obtain their results. Here is a representation of the selected models responses with respect to the coarse-scale results.


Permeability Anisotropy from Flow Simulation (Well & Interpretation tests)

This project was to obtain the permeability field of the Cerereide field in Montpollier, France. It was an aquifer and the whole process was two-fold: to obtain the permeability field for furthur studies, and to find the systematic methodology on how to accomplish this.
we fist defined the geometry of the aquifer and after mapping of the outcrops, the continuity and thickness of the layer has been estimated. Generally, the transmissivities need to be estimated. Data has come from pumping tests on 12 wells and interference results on 14 wells. Next, the pumping data has been interpreted by Jacob-Theis method, and also, by using PIE software. They showed a permeability of about 950 Darcy. Consequently, in order to understand the results of this method a research has been conducted and it showed that late-time Jacob method slope shows geometric mean weighted transmissivity of the areas where the cone of depression is situated. However, for saving time in setting up a model, the head values in its mesh form should be estimated from local measurements. Hence, counter maps where drawn manually, by hand.
gOcad was used with the purpose of geomodeling the aquifer. Choice of parameters here concerned the size of the mesh. They are related to the required precision in the solution of a discretized partial differential equation. Consequently Eclipse software is used for flow simulations and the choice of time steps in a transient state followed roughly the same rules.
The most important phase in the construction of the model is calibration. Most of the time, the procedure used is called “trial and error”. One tries to calculate the head for a period with available observed heads. Finally one compares the two. If the comparison is not favorable, the parameters are modified in the appropriate direction in order to improve the fitting and decrease the differences. The only parameter that we changed was transmissivity (permeability). We tried to simulate the steady state condition because it helped to drop out one of the unknown parameters, storativity. Next we figured out that skin is affecting the head values. Therefore, keeping that in mind we continued by automatic fitting of the model, which is called “the inverse problem. For inverse modeling we used “Differential System Method” and wrote the FORTRAN code for calibrating all 12 well head data using “Best Couple” approach. After preparing the inputs by “Discrete Smooth Interpolation” method on counter maps, we did two inverse modeling steps to find the permeability in our field.
The permeabilities showed an increasing trend towards west side of the field, and large scale anisotropy is related to the depositional azimuth. In fact, sedimentation of the reservoir shows an orientation quite similar to present day, from N-NE to S-SW. This matched the orientation seen in our final predicted model. Also, the existence of small barriers in the east side of the field was noticeable, which explained the irregularities in head value data observed during interference tests.


Computer Programming of a Water-Flooding Simulator with Visual Basic

In this project a water-flooding simulator was coded using visual basic. This Bachelor thesis was oriented towards calculating the differential equations governing thw fuild flow and coding them in visual basic. the results were compared with commercial simulators and an excellent match were obtained, showing accuracy and speed of the coded algorithm.

Journal Paper:
Honarkhah, M and Caers, J, 2010, Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling, Mathematical Geosciences, 42: 487 - 517 (best paper award IAMG 09)

MPS Algorithm:
DispAT algorithm will be available online shortly. The algorithm comes as a plugin for SGeMS software. Improved functionality, parameter-free learning, extremely fast multiple-point geostatistics simulations are some of the advantages of the DisPAT plugin.