Reference¶
to be completed
For more information, see the glossary.
Analytical attacks¶
chi2 attack¶
- sealwatch.chi2.attack(spatial)¶
Measures the “distance” between the observed histogram and a typical histogram after LSB replacement.
LSB replacement (embedding rate = 1) averages the neighboring histogram bins.
- Parameters:
spatial (np.ndarray) – image pixels, of arbitrary shape
- Returns:
distance and p-value, distance is the chi2 test statistic between the observed histogram and the stego model. A small distance means that the image matches the model (e.g., because it was embedded with LSB replacement). The p-value turns the score into a probability. A p-value of 0 means that the image contains no steganography. p-value of 1 indicates that the image contains LSBR steganography.
- Return type:
SPA¶
- sealwatch.spa.attack(x0)¶
Run sample-pair analysis.
- Parameters:
cover_spatial (np.ndarray) –
x0 (ndarray) –
- Returns:
embedding rate estimate
- Return type:
float
- Example:
>>> spatial = np.array(Image.open('suspicious.png')) >>> alpha_hat = sw.spa.attack(spatial) >>> assert alpha_hat == 0
WS¶
- sealwatch.ws.attack(x1, pixel_predictor='KB', correct_bias=False, weighted=True)¶
Runs weighted stego-image (WS) steganalysis on a given image.
The goal of WS steganalysis is to estimate the embedding rate of uniform LSB replacement embedding.
- Parameters:
x1 (np.ndarray) –
pixel_predictor –
correct_bias (bool) –
weighted (bool) –
- Returns:
change rate estimate
- Return type:
- sealwatch.ws.unet_estimator(*args, **kw)¶
Histogram attack¶
- sealwatch.F5.attack(y1, qt, **kw)¶
Runs a histogram attack with cartesian callibration, targetted against F5.
Pools the estimates for the DCT AC modes 01, 10, and 11.
- Parameters:
y1 (np.ndarray) – Stego DCT coefficients.
qt (np.ndarray) – quantization table
- Returns:
change rate estimate
- Return type:
- Example:
>>> beta_hat = sw.F5.attack(jpeg1.Y, jpeg1.qt[0])
RJCA¶
- sealwatch.rjca.attack(y1, qt)¶
Performs RJCA and returns variance.
Rounding error should be around 0.04-0.07. For stego, it grows towards 0.08333 (1/12).
- Parameters:
y1 (np.ndarray) – quantized cover DCT coefficients of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]
qt (np.ndarray) – quantization table of shape [8, 8]
- Returns:
variance of the rounding error
- Return type:
- Example:
>>> jpeg = jpeglib.read_dct('suspicious.jpeg') >>> var = cl.rjca.attack( ... y1=jpeg.Y, ... qt=jpeg.qt[0], ... ) >>> assert np.abs(var - 1/12.) > .005
Handcrafted features¶
HCF-COM¶
- sealwatch.hcfcom.extract(x1, *, order=1)¶
- Parameters:
x (np.ndarray) –
order –
x1 (ndarray) –
- Returns:
- Return type:
OrderedDict
- Example:
>>> # TODO
SPAM¶
- sealwatch.spam.extract(x1, *, T=3, rounded=False)¶
Extract 2nd-order spatial adjacency model (SPAM) features. The implementation merges over image directions.
The final feature set has 686 dimensions.
- Parameters:
x1 (np.ndarray) – 2D ndarray
T (int) – truncation threshold
rounded (bool) –
- Returns:
ordered dict containing 686 feature dimensions in total.
- Return type:
- Examples:
>>> features = sw.spam.extract(x1)
By default, this function uses Rust-accelerated backend. To use the (substantially slower) Python implementation, type
>>> with sw.BACKEND_PYTHON: >>> features = sw.spam.extract(x1)
- sealwatch.spam.extract_from_file(path, *, rounded=True, **kw)¶
Extract SPAM features from luminance channel of given JPEG image
- Parameters:
path (str or pathlib.Path) – JPEG image to be analzed
rounded (bool) –
- Returns:
ordered dict with the feature values
- Return type:
- Example:
>>> # TODO
This function can only work with Python backend.
SRM¶
- sealwatch.srm.extract(x, *, qs=[[1, 2], [1, 1.5, 2], [1, 1.5, 2], [1, 1.5, 2], [1, 1.5, 2]], directional=True)¶
Extracts spatial rich model for steganalysis.
- sealwatch.srm.extract_from_file(path, **kw)¶
- sealwatch.srmq1.extract(x, **kw)¶
Extracts spatial rich model for steganalysis.
- Parameters:
x (np.ndarray) – 2D input image
- Returns:
structured SRMQ1 features
- Return type:
CRM¶
- sealwatch.crm.extract(x, *, q=1, Tc=2, implementation=Implementation.CRM_FIX_MIN24)¶
Extracts color rich model for steganalysis.
- Parameters:
- Returns:
structured CRM features
- Return type:
JRM¶
- sealwatch.jrm.extract(y1, *, calibrated=False, qt=None)¶
Extracts JPEG rich models (JRM) for the given DCT coefficients.
- Parameters:
y1 (np.ndarray) – DCT coefficients, of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]
calibrated (bool) – Choose JRM or cc-JRM.
qt (np.ndarray) – quantization table
- Returns:
JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 11255
- Return type:
- sealwatch.jrm.extract_from_file(path, calibrated=False)¶
Compute the JPEG rich models (JRM) feature descriptor from the given image’s luminance channel.
The mode-specific submodels give the rich model a fine “granularity” at the price of utilizing only a small portion of the DCT plane. To cover a larger range of DCT coefficients, the mode-specific submodels are complemented by co-occurrence matrices integrated over all DCT modes.
J. Kodovsky, J. Fridrich, Steganalysis of JPEG Images Using Rich Models, SPIE, Electronic Imaging, Media Watermarking, Security, and Forensics, 2012. http://dde.binghamton.edu/kodovsky/pdf/SPIE2012_Kodovsky_Steganalysis_of_JPEG_Images_Using_Rich_Models_paper.pdf
- Parameters:
path – path to JPEG image
calibrated (bool) – Choose JRM or cc-JRM.
- Returns:
JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 11255
- Return type:
- sealwatch.ccjrm.extract(y1, *, qt=None)¶
Extracts calibrated JPEG rich models (JRM) for the given DCT coefficients.
- Parameters:
y1 (np.ndarray) – DCT coefficients, of shape [num_vertical_blocks, num_horizontal_blocks, 8, 8]
qt (np.ndarray) – quantization table
- Returns:
cc-JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 22510
- Return type:
- sealwatch.ccjrm.extract_from_file(path)¶
Compute the clibrated JPEG rich models (cc-JRM) feature descriptor from the given image’s luminance channel.
The mode-specific submodels give the rich model a fine “granularity” at the price of utilizing only a small portion of the DCT plane. To cover a larger range of DCT coefficients, the mode-specific submodels are complemented by co-occurrence matrices integrated over all DCT modes.
J. Kodovsky, J. Fridrich, Steganalysis of JPEG Images Using Rich Models, Proc. SPIE, Electronic Imaging, Media Watermarking, Security, and Forensics XIV, San Francisco, CA, January 23–25, 2012. http://dde.binghamton.edu/kodovsky/pdf/SPIE2012_Kodovsky_Steganalysis_of_JPEG_Images_Using_Rich_Models_paper.pdf
- Parameters:
path – path to JPEG image
- Returns:
cc-JRM features as ordered dict, where the keys are the names of the submodels. All submodels together have dimensionality 22510
- Return type:
DCTR¶
- sealwatch.dctr.extract(x1, q, *, T=4)¶
Extracts DCTR features from the provided image.
Note that there can be minor differences during quantization, which is why the Matlab and Python results do not match perfectly.
- Parameters:
x1 – grayscale image with intensities in range [-128, 127]
q (float) – quantization step
T – truncation threshold. The number of histogram bins is T + 1.
- Returns:
DCTR features of shape [64x25, 5]
- Return type:
- Example:
>>> # TODO
PHARM¶
- sealwatch.pharm.extract(x1, *, implementation=Implementation.PHARM_REVISITED, q=5, T=2, num_projections=100, maximum_projection_size=8, first_order_residuals=True, second_order_residuals=True, third_order_residuals=True, symmetrize=True, normalize=False, seed=1)¶
Extracts the PHARM features from a given decompressed image.
The PHARM features were introduced in V. Holub and J. Fridrich, Phase-Aware Projection Model for Steganalysis of JPEG Images. SPIE Electronic Imaging, Media Watermarking, Security, and Forensics XVII, vol. 9409, 2015. http://dde.binghamton.edu/vholub/pdf/SPIE15_Phase-Aware_Projection_Model_for_Steganalysis_of_JPEG_Images.pdf
- Parameters:
x1 (np.ndarray) – decompressed JPEG image of shape [height, width]
implementation (Implementation) – implementation of PHARM to use
q (int) – quantization step
T (int) – truncation threshold
num_projections (int) – number of random projection matrices. The original implementation defaults to 900, but we use 100 for speed reasons.
maximum_projection_size (int) – maximum spatial size of each projection matrix
first_order_residuals (bool) – If True, include first order residuals. If False, skip first order residuals.
second_order_residuals (bool) – If True, include second order residuals. If False, skip second order residuals.
third_order_residuals (bool) – If True, include third order residuals. If False, skip third order residuals.
symmetrize (bool) – If True, merge histograms with horizontally and vertically flipped versions of the image. If False, skip symmetrization.
normalize (bool) – If True, normalize the histogram counts.
seed (int) – seed for random number generator for the projection matrices
- Returns:
features as ordered dictionary, where the keys are the submodel names and the values are the features of shape [num_projections, T]. Note that the features are not normalized.
- Return type:
OrderedDict
- Example:
>>> # TODO
- sealwatch.pharm.extract_from_file(path, *, implementation=Implementation.PHARM_REVISITED, q=5, T=2, num_projections=100, maximum_projection_size=8, first_order_residuals=True, second_order_residuals=True, third_order_residuals=True, symmetrize=True, normalize=False, seed=1)¶
Extracts the PHARM features from a given JPEG image.
The PHARM features were introduced in V. Holub and J. Fridrich, Phase-Aware Projection Model for Steganalysis of JPEG Images. SPIE Electronic Imaging, Media Watermarking, Security, and Forensics XVII, vol. 9409, 2015. http://dde.binghamton.edu/vholub/pdf/SPIE15_Phase-Aware_Projection_Model_for_Steganalysis_of_JPEG_Images.pdf
- Parameters:
path (str or Path) – path to JPEG image
implementation (Implementation) – implementation of PHARM to use
q (int) – quantization step
T (int) – truncation threshold
num_projections (int) – number of random projection matrices. The original implementation defaults to 900, but we use 100 for speed reasons.
maximum_projection_size (int) – maximum spatial size of each projection matrix
first_order_residuals (bool) – whether to include first order residuals
second_order_residuals (bool) – whether to include second order residuals
third_order_residuals (bool) – whether to include third order residuals
symmetrize (bool) – whether to merge histograms with horizontally and vertically flipped image. If False, skip symmetrization.
normalize (bool) – whether to normalize the histogram counts, by default False
seed (int) – seed for random number generator for the projection matrices, by default 1
- Returns:
features as ordered dictionary, where the keys are the submodel names and the values are the features of shape [num_projections, T]. Note that the features are not normalized.
- Return type:
OrderedDict
- Example:
>>> # TODO
GFR¶
- sealwatch.gfr.extract(img, *, num_rotations=32, quantization_steps=75, T=4, implementation=Implementation.GFR_ORIGINAL)¶
Extract the Gabor filter residual features from a given image.
- Parameters:
img – grayscale image with values in range [0, 255]
num_rotations (int) – number of rotations for Gabor kernel
quantization_steps (int) – quantization step for each of the four scales
T (int) – the highest histogram bin value after quantization. The histogram contains T + 1 bins corresponding to the values [0, …, T]. Quantized values exceeding T will be clamped to T.
implementation (Implementation) –
- Returns:
extracted Gabor features as 5D ndarray. The five dimensions denote: # Dimension 0: Phase shifts # Dimension 1: Scales # Dimension 2: Rotations/Orientations # Dimension 3: Number of histograms # Dimension 4: Co-occurrences
Flatten the 5D array to obtain a 1D feature descriptor.
Will be changed in the future to OrderedDict to match the common interface.
- Return type:
- Example:
>>> # TODO
- sealwatch.gfr.extract_from_file(path, num_rotations=32, qf=None, quantization_steps=None, T=4, implementation=Implementation.GFR_ORIGINAL)¶
Extract the Gabor filter residual features from a given JPEG image file.
- Parameters:
- Returns:
extracted Gabor features as 5D ndarray. The five dimensions denote: # Dimension 0: Phase shifts # Dimension 1: Scales # Dimension 2: Rotations/Orientations # Dimension 3: Number of histograms # Dimension 4: Co-occurrences
Flatten the 5D array to obtain a 1D feature descriptor.
- Return type:
Detectors¶
- class sealwatch.ensemble_classifier.EnsembleClassifier(base_learners, d_sub=None)¶
- predict(X)¶
Calculate predictions based on (unweighted) majority voting. Ties are resolved randomly. :param X: samples of shape [num_samples, num_features] :return: predictions of shape [num_samples], where -1 stands for the negative and +1 for the positive class
- predict_confidence(X)¶
Calculate confidence score based on majority voting. :param X: samples of shape [num_samples, num_features] :return: confidence score of predictions of shape [num_samples], in the range of -1 for the negative and +1 for the positive class.
- score(X, y_true)¶
Calculate accuracy :param X: samples of shape [num_samples, num_features] :param y_true: labels of shape [num_samples], where -1 indicates a cover image and +1 indicates a stego image :return: accuracy
- class sealwatch.ensemble_classifier.FldEnsembleTrainer(Xc, Xs, seed=None, seed_subspaces=None, seed_bootstrap=None, L='automatic', d_sub='automatic', verbose=1, max_num_base_learners=500)¶
- train()¶
Train an ensemble of Fisher linear discriminant classifiers :return: (ensemble_classifier, training_records) as 2-tuple ensemble_classifier is an instance of an EnsembleClassifier training_records is a list of dicts
- class sealwatch.xunet.XuNet¶
XuNet: A convolutional neural network for steganalysis.
Architecture: - Preprocessing with high pass filter - 5 convolutional groups with batch normalization, activation, and pooling - Fully connected layers with softmax output
Intended for binary classification of stego and cover images.
- forward(x)¶
Forward pass
- Parameters:
x (Tensor) –
- Return type:
Tensor
- sealwatch.xunet.pretrained(model_path=None, model_name='XuNet-LSBM_0.4_lsb-250714133836.pt', *, device=device(type='cpu'), strict=True)¶
Loads pretrained model. Downloads if missing.
- sealwatch.xunet.infere_single(x, model=None, *, device=device(type='cpu'))¶
Runs inference for a single image.
- Parameters:
x – image
model –
device –
- Returns:
- Return type:
Helper functions¶
- sealwatch.ensemble_classifier.helpers.load_hdf5(features_filename, max_num_samples=None)¶
Retrieve features and filenames from a HDF5 file.
When the origin attribute is “matlab”, the feature array is transposed.
- Parameters:
features_filename – path to HDF5 file
max_num_samples – Only load the first n features and filenames
- Returns:
(features, filenames) as 2-tuple
- sealwatch.ensemble_classifier.helpers.load_features(cover_features_filename, stego_features_filename, max_num_samples=None)¶
Load cover and stego features.
On the way, drop images where the feature extraction failed. Also drop images where we have no matching cover-stego pairs.
- Parameters:
cover_features_filename – path to HDF5 file containing the cover features
stego_features_filename – path to HDF5 file containing the stego features
max_num_samples – take only the first n samples from each dataset. Useful for quick prototyping.
- Returns:
(cover_features, stego_features, cover_filenames, stego_filenames) cover_features and stego_features are ndarrays of shape [num_samples, num_features] cover_filenames and stego_filenames are lists with strings
- sealwatch.ensemble_classifier.helpers.remove_file_extension(f)¶
- sealwatch.ensemble_classifier.helpers.load_and_split_features(cover_features_filename, stego_features_filename, train_csv, test_csv, max_num_samples=None)¶
Load cover and stego features and split them into training and test sets.
On the way, drop images where the feature extraction failed. Also drop images where we have no matching cover-stego pairs.
- Parameters:
cover_features_filename – path to HDF5 file containing the cover features
stego_features_filename – path to HDF5 file containing the stego features
train_csv – csv file containing the filenames to use for training
test_csv – csv file containing the filenames to use for testing
max_num_samples – take only the first n samples from each dataset. Useful for quick prototyping.
- Returns:
6-tuple 0: cover_features_train, 1: stego_features_train, 2: cover_features_test, 3: stego_features_test 4: train_filenames (same lengths as covers; the covers and stegos have the same filenames) 5: test_filenames (same length as covers; the covers and stegos have the same filename)
- sealwatch.ensemble_classifier.helpers.load_features_subset(cover_features_filename, stego_features_filename, test_csv)¶