Reference

to be completed

For more information, see the glossary.

Analytical attacks

chi2 attack

sealwatch.chi2.attack(spatial)

Measures the “distance” between the observed histogram and a typical histogram after LSB replacement.

LSB replacement (embedding rate = 1) averages the neighboring histogram bins.

Parameters:

spatial (np.ndarray) – image pixels, of arbitrary shape

Returns:

distance and p-value, distance is the chi2 test statistic between the observed histogram and the stego model. A small distance means that the image matches the model (e.g., because it was embedded with LSB replacement). The p-value turns the score into a probability. A p-value of 0 means that the image contains no steganography. p-value of 1 indicates that the image contains LSBR steganography.

Return type:

Tuple[float]

SPA

sealwatch.spa.attack(x0)

Run sample-pair analysis.

Parameters:
Returns:

embedding rate estimate

Return type:

float

Example:

>>> spatial = np.array(Image.open('suspicious.png'))
>>> alpha_hat = sw.spa.attack(spatial)
>>> assert alpha_hat == 0

WS

sealwatch.ws.attack(x1, pixel_predictor='KB', correct_bias=False, weighted=True)

Runs weighted stego-image (WS) steganalysis on a given image.

The goal of WS steganalysis is to estimate the embedding rate of uniform LSB replacement embedding.

Parameters:
Returns:

change rate estimate

Return type:

float

sealwatch.ws.unet_estimator(*args, **kw)

Histogram attack

sealwatch.F5.attack(y1, qt, **kw)

Runs a histogram attack with cartesian callibration, targetted against F5.

Pools the estimates for the DCT AC modes 01, 10, and 11.

Parameters:
Returns:

change rate estimate

Return type:

float

Example:

>>> beta_hat = sw.F5.attack(jpeg1.Y, jpeg1.qt[0])

RJCA

sealwatch.rjca.attack(y1, qt)

Performs RJCA and returns variance.

Rounding error should be around 0.04-0.07. For stego, it grows towards 0.08333 (1/12).

Parameters:
  • y1 (np.ndarray) – quantized cover DCT coefficients of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]

  • qt (np.ndarray) – quantization table of shape [8, 8]

Returns:

variance of the rounding error

Return type:

float

Example:

>>> jpeg = jpeglib.read_dct('suspicious.jpeg')
>>> var = cl.rjca.attack(
...         y1=jpeg.Y,
...         qt=jpeg.qt[0],
    ... )
>>> assert np.abs(var - 1/12.) > .005

Handcrafted features

HCF-COM

sealwatch.hcfcom.extract(x1, *, order=1)
Parameters:
Returns:

Return type:

OrderedDict

Example:

>>> # TODO
sealwatch.hcfcom.extract_from_file(path, **kw)
Parameters:

path (Union[str, Path]) –

Return type:

Dict[str, ndarray]

SPAM

sealwatch.spam.extract(x1, *, T=3, rounded=False)

Extract 2nd-order spatial adjacency model (SPAM) features. The implementation merges over image directions.

The final feature set has 686 dimensions.

Parameters:
Returns:

ordered dict containing 686 feature dimensions in total.

Return type:

collections.OrderedDict

Examples:

>>> features = sw.spam.extract(x1)

By default, this function uses Rust-accelerated backend. To use the (substantially slower) Python implementation, type

>>> with sw.BACKEND_PYTHON:
>>>     features = sw.spam.extract(x1)
sealwatch.spam.extract_from_file(path, *, rounded=True, **kw)

Extract SPAM features from luminance channel of given JPEG image

Parameters:
Returns:

ordered dict with the feature values

Return type:

collections.OrderedDict

Example:

>>> # TODO

This function can only work with Python backend.

SRM

sealwatch.srm.extract(x, *, qs=[[1, 2], [1, 1.5, 2], [1, 1.5, 2], [1, 1.5, 2], [1, 1.5, 2]], directional=True)

Extracts spatial rich model for steganalysis.

Parameters:
  • x (np.ndarray) – 2D input image

  • qs (List[List[int]]) –

  • directional (bool) –

Returns:

structured SRM features

Return type:

OrderedDict

sealwatch.srm.extract_from_file(path, **kw)
Parameters:

path (Union[str, Path]) –

Return type:

Dict[str, ndarray]

sealwatch.srmq1.extract(x, **kw)

Extracts spatial rich model for steganalysis.

Parameters:

x (np.ndarray) – 2D input image

Returns:

structured SRMQ1 features

Return type:

collections.OrderedDict

sealwatch.srmq1.extract_from_file(path, **kw)
Parameters:

path (Union[str, Path]) –

Return type:

Dict[str, ndarray]

CRM

sealwatch.crm.extract(x, *, q=1, Tc=2, implementation=Implementation.CRM_FIX_MIN24)

Extracts color rich model for steganalysis.

Parameters:
  • x (np.ndarray) – 2D input image

  • q (int) –

  • Tc (int) –

  • implementation (Implementation) –

Returns:

structured CRM features

Return type:

collections.OrderedDict

sealwatch.crm.extract_from_file(path, **kw)
Parameters:

path (Union[str, Path]) –

Return type:

Dict[str, ndarray]

JRM

sealwatch.jrm.extract(y1, *, calibrated=False, qt=None)

Extracts JPEG rich models (JRM) for the given DCT coefficients.

Parameters:
  • y1 (np.ndarray) – DCT coefficients, of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]

  • calibrated (bool) – Choose JRM or cc-JRM.

  • qt (np.ndarray) – quantization table

Returns:

JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 11255

Return type:

collections.OrderedDict

sealwatch.jrm.extract_from_file(path, calibrated=False)

Compute the JPEG rich models (JRM) feature descriptor from the given image’s luminance channel.

The mode-specific submodels give the rich model a fine “granularity” at the price of utilizing only a small portion of the DCT plane. To cover a larger range of DCT coefficients, the mode-specific submodels are complemented by co-occurrence matrices integrated over all DCT modes.

J. Kodovsky, J. Fridrich, Steganalysis of JPEG Images Using Rich Models, SPIE, Electronic Imaging, Media Watermarking, Security, and Forensics, 2012. http://dde.binghamton.edu/kodovsky/pdf/SPIE2012_Kodovsky_Steganalysis_of_JPEG_Images_Using_Rich_Models_paper.pdf

Parameters:
  • path – path to JPEG image

  • calibrated (bool) – Choose JRM or cc-JRM.

Returns:

JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 11255

Return type:

collections.OrderedDict

sealwatch.ccjrm.extract(y1, *, qt=None)

Extracts calibrated JPEG rich models (JRM) for the given DCT coefficients.

Parameters:
  • y1 (np.ndarray) – DCT coefficients, of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]

  • qt (np.ndarray) – quantization table

Returns:

cc-JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 22510

Return type:

collections.OrderedDict

sealwatch.ccjrm.extract_from_file(path)

Compute the clibrated JPEG rich models (cc-JRM) feature descriptor from the given image’s luminance channel.

The mode-specific submodels give the rich model a fine “granularity” at the price of utilizing only a small portion of the DCT plane. To cover a larger range of DCT coefficients, the mode-specific submodels are complemented by co-occurrence matrices integrated over all DCT modes.

J. Kodovsky, J. Fridrich, Steganalysis of JPEG Images Using Rich Models, Proc. SPIE, Electronic Imaging, Media Watermarking, Security, and Forensics XIV, San Francisco, CA, January 23–25, 2012. http://dde.binghamton.edu/kodovsky/pdf/SPIE2012_Kodovsky_Steganalysis_of_JPEG_Images_Using_Rich_Models_paper.pdf

Parameters:

path – path to JPEG image

Returns:

cc-JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 22510

Return type:

collections.OrderedDict

DCTR

sealwatch.dctr.extract(x1, q, *, T=4)

Extracts DCTR features from the provided image.

Note that there can be minor differences during quantization, which is why the Matlab and Python results do not match perfectly.

Parameters:
  • x1 – grayscale image with intensities in range [-128, 127]

  • q (float) – quantization step

  • T – truncation threshold. The number of histogram bins is T + 1.

Returns:

DCTR features of shape [64x25, 5]

Return type:

Example:

>>> # TODO
sealwatch.dctr.extract_from_file(path, qf)

Extract DCTR features from the luminance channel of JPEG image given by its filepath

Parameters:
  • path (str) – path to JPEG image

  • qf – JPEG quality factor used to determine the quantization step

Returns:

DCTR features of shape [64x25, 5]

Return type:

PHARM

sealwatch.pharm.extract(x1, *, implementation=Implementation.PHARM_REVISITED, q=5, T=2, num_projections=100, maximum_projection_size=8, first_order_residuals=True, second_order_residuals=True, third_order_residuals=True, symmetrize=True, normalize=False, seed=1)

Extracts the PHARM features from a given decompressed image.

The PHARM features were introduced in V. Holub and J. Fridrich, Phase-Aware Projection Model for Steganalysis of JPEG Images. SPIE Electronic Imaging, Media Watermarking, Security, and Forensics XVII, vol. 9409, 2015. http://dde.binghamton.edu/vholub/pdf/SPIE15_Phase-Aware_Projection_Model_for_Steganalysis_of_JPEG_Images.pdf

Parameters:
  • x1 (np.ndarray) – decompressed JPEG image of shape [height, width]

  • implementation (Implementation) – implementation of PHARM to use

  • q (int) – quantization step

  • T (int) – truncation threshold

  • num_projections (int) – number of random projection matrices. The original implementation defaults to 900, but we use 100 for speed reasons.

  • maximum_projection_size (int) – maximum spatial size of each projection matrix

  • first_order_residuals (bool) – If True, include first order residuals. If False, skip first order residuals.

  • second_order_residuals (bool) – If True, include second order residuals. If False, skip second order residuals.

  • third_order_residuals (bool) – If True, include third order residuals. If False, skip third order residuals.

  • symmetrize (bool) – If True, merge histograms with horizontally and vertically flipped versions of the image. If False, skip symmetrization.

  • normalize (bool) – If True, normalize the histogram counts.

  • seed (int) – seed for random number generator for the projection matrices

Returns:

features as ordered dictionary, where the keys are the submodel names and the values are the features of shape [num_projections, T]. Note that the features are not normalized.

Return type:

OrderedDict

Example:

>>> # TODO
sealwatch.pharm.extract_from_file(path, *, implementation=Implementation.PHARM_REVISITED, q=5, T=2, num_projections=100, maximum_projection_size=8, first_order_residuals=True, second_order_residuals=True, third_order_residuals=True, symmetrize=True, normalize=False, seed=1)

Extracts the PHARM features from a given JPEG image.

The PHARM features were introduced in V. Holub and J. Fridrich, Phase-Aware Projection Model for Steganalysis of JPEG Images. SPIE Electronic Imaging, Media Watermarking, Security, and Forensics XVII, vol. 9409, 2015. http://dde.binghamton.edu/vholub/pdf/SPIE15_Phase-Aware_Projection_Model_for_Steganalysis_of_JPEG_Images.pdf

Parameters:
  • path (str or Path) – path to JPEG image

  • implementation (Implementation) – implementation of PHARM to use

  • q (int) – quantization step

  • T (int) – truncation threshold

  • num_projections (int) – number of random projection matrices. The original implementation defaults to 900, but we use 100 for speed reasons.

  • maximum_projection_size (int) – maximum spatial size of each projection matrix

  • first_order_residuals (bool) – whether to include first order residuals

  • second_order_residuals (bool) – whether to include second order residuals

  • third_order_residuals (bool) – whether to include third order residuals

  • symmetrize (bool) – whether to merge histograms with horizontally and vertically flipped image. If False, skip symmetrization.

  • normalize (bool) – whether to normalize the histogram counts, by default False

  • seed (int) – seed for random number generator for the projection matrices, by default 1

Returns:

features as ordered dictionary, where the keys are the submodel names and the values are the features of shape [num_projections, T]. Note that the features are not normalized.

Return type:

OrderedDict

Example:

>>> # TODO
class sealwatch.pharm.Implementation(value)

PHARM implementation to choose from.

PHARM_ORIGINAL = 1

Original PHARM implementation by DDE.

PHARM_REVISITED = 2

PHARM implementation with fixes.

GFR

sealwatch.gfr.extract(img, *, num_rotations=32, quantization_steps=75, T=4, implementation=Implementation.GFR_ORIGINAL)

Extract the Gabor filter residual features from a given image.

Parameters:
  • img – grayscale image with values in range [0, 255]

  • num_rotations (int) – number of rotations for Gabor kernel

  • quantization_steps (int) – quantization step for each of the four scales

  • T (int) – the highest histogram bin value after quantization. The histogram contains T + 1 bins corresponding to the values [0, …, T]. Quantized values exceeding T will be clamped to T.

  • implementation (Implementation) –

Returns:

extracted Gabor features as 5D ndarray. The five dimensions denote: # Dimension 0: Phase shifts # Dimension 1: Scales # Dimension 2: Rotations/Orientations # Dimension 3: Number of histograms # Dimension 4: Co-occurrences

Flatten the 5D array to obtain a 1D feature descriptor.

Will be changed in the future to OrderedDict to match the common interface.

Return type:

Example:

>>> # TODO
sealwatch.gfr.extract_from_file(path, num_rotations=32, qf=None, quantization_steps=None, T=4, implementation=Implementation.GFR_ORIGINAL)

Extract the Gabor filter residual features from a given JPEG image file.

Parameters:
  • path (Union[Path, str]) – filepath to JPEG image

  • num_rotations (int) – number of rotations for Gabor kernel

  • qf (Optional[int]) – JPEG quality factor; used to select the quantization steps

  • quantization_steps (Optional[int]) – list of four quantization steps, one for each scale

  • T (int) – truncation threshold

  • implementation (Implementation) –

Returns:

extracted Gabor features as 5D ndarray. The five dimensions denote: # Dimension 0: Phase shifts # Dimension 1: Scales # Dimension 2: Rotations/Orientations # Dimension 3: Number of histograms # Dimension 4: Co-occurrences

Flatten the 5D array to obtain a 1D feature descriptor.

Return type:

class sealwatch.gfr.Implementation(value)

GFR implementation to choose from.

GFR_FIX = 2

GFR implementation with fixes.

GFR_ORIGINAL = 1

Original GFR implementation by DDE.

Detectors

class sealwatch.ensemble_classifier.EnsembleClassifier(base_learners, d_sub=None)
predict(X)

Calculate predictions based on (unweighted) majority voting. Ties are resolved randomly. :param X: samples of shape [num_samples, num_features] :return: predictions of shape [num_samples], where -1 stands for the negative and +1 for the positive class

predict_confidence(X)

Calculate confidence score based on majority voting. :param X: samples of shape [num_samples, num_features] :return: confidence score of predictions of shape [num_samples], in the range of -1 for the negative and +1 for the positive class.

score(X, y_true)

Calculate accuracy :param X: samples of shape [num_samples, num_features] :param y_true: labels of shape [num_samples], where -1 indicates a cover image and +1 indicates a stego image :return: accuracy

class sealwatch.ensemble_classifier.FldEnsembleTrainer(Xc, Xs, seed=None, seed_subspaces=None, seed_bootstrap=None, L='automatic', d_sub='automatic', verbose=1, max_num_base_learners=500)
train()

Train an ensemble of Fisher linear discriminant classifiers :return: (ensemble_classifier, training_records) as 2-tuple ensemble_classifier is an instance of an EnsembleClassifier training_records is a list of dicts

class sealwatch.xunet.XuNet

XuNet: A convolutional neural network for steganalysis.

Architecture: - Preprocessing with high pass filter - 5 convolutional groups with batch normalization, activation, and pooling - Fully connected layers with softmax output

Intended for binary classification of stego and cover images.

forward(x)

Forward pass

Parameters:

x (Tensor) –

Return type:

Tensor

sealwatch.xunet.pretrained(model_path=None, model_name='XuNet-LSBM_0.4_lsb-250714133836.pt', *, device=device(type='cpu'), strict=True)

Loads pretrained model. Downloads if missing.

Parameters:
  • model_path (str) – local path to the model

  • model_name (str) – filename of the model

  • device (torch.nn.Module) – torch device

  • strict (bool) –

Returns:

loaded XuNet Model

Return type:

torch.nn.Module

sealwatch.xunet.infere_single(x, model=None, *, device=device(type='cpu'))

Runs inference for a single image.

Parameters:
  • x – image

  • model

  • device

Returns:

Return type:

Helper functions

sealwatch.ensemble_classifier.helpers.load_hdf5(features_filename, max_num_samples=None)

Retrieve features and filenames from a HDF5 file.

When the origin attribute is “matlab”, the feature array is transposed.

Parameters:
  • features_filename – path to HDF5 file

  • max_num_samples – Only load the first n features and filenames

Returns:

(features, filenames) as 2-tuple

sealwatch.ensemble_classifier.helpers.load_features(cover_features_filename, stego_features_filename, max_num_samples=None)

Load cover and stego features.

On the way, drop images where the feature extraction failed. Also drop images where we have no matching cover-stego pairs.

Parameters:
  • cover_features_filename – path to HDF5 file containing the cover features

  • stego_features_filename – path to HDF5 file containing the stego features

  • max_num_samples – take only the first n samples from each dataset. Useful for quick prototyping.

Returns:

(cover_features, stego_features, cover_filenames, stego_filenames) cover_features and stego_features are ndarrays of shape [num_samples, num_features] cover_filenames and stego_filenames are lists with strings

sealwatch.ensemble_classifier.helpers.remove_file_extension(f)
sealwatch.ensemble_classifier.helpers.load_and_split_features(cover_features_filename, stego_features_filename, train_csv, test_csv, max_num_samples=None)

Load cover and stego features and split them into training and test sets.

On the way, drop images where the feature extraction failed. Also drop images where we have no matching cover-stego pairs.

Parameters:
  • cover_features_filename – path to HDF5 file containing the cover features

  • stego_features_filename – path to HDF5 file containing the stego features

  • train_csv – csv file containing the filenames to use for training

  • test_csv – csv file containing the filenames to use for testing

  • max_num_samples – take only the first n samples from each dataset. Useful for quick prototyping.

Returns:

6-tuple 0: cover_features_train, 1: stego_features_train, 2: cover_features_test, 3: stego_features_test 4: train_filenames (same lengths as covers; the covers and stegos have the same filenames) 5: test_filenames (same length as covers; the covers and stegos have the same filename)

sealwatch.ensemble_classifier.helpers.load_features_subset(cover_features_filename, stego_features_filename, test_csv)