realutils.tagging.idolsankaku

Overview:

This module provides utilities for image tagging using IdolSankaku taggers. It includes functions for loading models, processing images, and extracting tags.

The module is inspired by the SmilingWolf/wd-tagger project on Hugging Face.

Overview of IdolSankaku (NSFW Warning!!!)

../../_images/idolsankaku_demo.plot.py.svg

This is an overall benchmark of all the idolsankaku models:

../../_images/idolsankaku_benchmark.plot.py.svg

get_idolsankaku_tags

realutils.tagging.idolsankaku.get_idolsankaku_tags(image: str | PathLike | bytes | bytearray | BinaryIO | Image, model_name: str = 'SwinV2', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt: Any = ('rating', 'general', 'character'))[source]

Get tags for an image using IdolSankaku taggers.

This function is similar to the SmilingWolf/wd-tagger project on Hugging Face.

Parameters:

image (ImageTyping) – The input image.
model_name (str) – The name of the model to use.
general_threshold (float) – The threshold for general tags.
general_mcut_enabled (bool) – If True, applies MCut thresholding to general tags.
character_threshold (float) – The threshold for character tags.
character_mcut_enabled (bool) – If True, applies MCut thresholding to character tags.
no_underline (bool) – If True, replaces underscores in tag names with spaces.
drop_overlap (bool) – If True, drops overlapping tags.
fmt (Any) – Return format, default is ('rating', 'general', 'character'). embedding is also supported for feature extraction.

Returns:

Prediction result based on the provided fmt.

Note

The fmt argument can include the following keys:

rating: a dict containing ratings and their confidences
general: a dict containing general tags and their confidences
character: a dict containing character tags and their confidences
tag: a dict containing all tags (including general and character, not including rating) and their confidences
embedding: a 1-dim embedding of image, recommended for index building after L2 normalization
logit: a 1-dim logit of image, before softmax.
prediction: a 1-dim prediction result of image

You can extract embedding of the given image with the follwing code

>>> from realutils.tagging import get_idolsankaku_tags
>>>
>>> embedding = get_idolsankaku_tags('idolsankaku/1.jpg', fmt='embedding')
>>> embedding.shape
(1024, )

This embedding is valuable for constructing indices that enable rapid querying of images based on visual features within large-scale datasets.

Example:

Here are some images for example

../../_images/idolsankaku_tiny_demo.plot.py.svg

>>> from realutils.tagging import get_idolsankaku_tags
>>>
>>> rating, general, character = get_idolsankaku_tags('idolsankaku/1.jpg')
>>> rating
{'safe': 0.748395562171936, 'questionable': 0.22442740201950073, 'explicit': 0.022273868322372437}
>>> general
{'1girl': 0.7476911544799805, 'asian': 0.3681548237800598, 'skirt': 0.8094233274459839, 'solo': 0.44033104181289673, 'blouse': 0.7909733057022095, 'pantyhose': 0.8893758654594421, 'long_hair': 0.7415428161621094, 'brown_hair': 0.4968719780445099, 'sitting': 0.49351146817207336, 'high_heels': 0.41397374868392944, 'outdoors': 0.5279690623283386, 'non_nude': 0.4075928330421448}
>>> character
{}
>>>
>>> rating, general, character = get_idolsankaku_tags('idolsankaku/7.jpg')
>>> rating
{'safe': 0.9750080704689026, 'questionable': 0.0257779061794281, 'explicit': 0.0018109679222106934}
>>> general
{'1girl': 0.5759814381599426, 'asian': 0.46296364068984985, 'skirt': 0.9698911905288696, 'solo': 0.6263223886489868, 'female': 0.5258357524871826, 'blouse': 0.8670071959495544, 'twintails': 0.9444552659988403, 'pleated_skirt': 0.8233045935630798, 'miniskirt': 0.8354354500770569, 'long_hair': 0.8752110004425049, 'looking_at_viewer': 0.4927205741405487, 'detached_sleeves': 0.9382797479629517, 'shirt': 0.8463951945304871, 'tie': 0.8901710510253906, 'aqua_hair': 0.9376567006111145, 'armpit': 0.5968506336212158, 'arms_up': 0.9492673873901367, 'sleeveless_blouse': 0.9789504408836365, 'black_thighhighs': 0.41496211290359497, 'sleeveless': 0.9865490198135376, 'default_costume': 0.36392033100128174, 'sleeveless_shirt': 0.9865082502365112, 'very_long_hair': 0.3988983631134033}
>>> character
{'hatsune_miku': 0.9460012912750244}

convert_idolsankaku_emb_to_prediction

realutils.tagging.idolsankaku.convert_idolsankaku_emb_to_prediction(emb: ndarray, model_name: str = 'SwinV2', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt: Any = ('rating', 'general', 'character'))[source]

Convert idolsankaku embedding to understandable prediction result. This function can process both single embeddings (1-dimensional array) and batches of embeddings (2-dimensional array).

Parameters:

emb (numpy.ndarray) – The extracted embedding(s). Can be either a 1-dim array for single image or 2-dim array for batch processing
model_name (str) – Name of the idolsankaku model to use for prediction
general_threshold (float) – Confidence threshold for general tags (0.0 to 1.0)
general_mcut_enabled (bool) – Enable MCut thresholding for general tags to improve prediction quality
character_threshold (float) – Confidence threshold for character tags (0.0 to 1.0)
character_mcut_enabled (bool) – Enable MCut thresholding for character tags to improve prediction quality
no_underline (bool) – Replace underscores with spaces in tag names for better readability
drop_overlap (bool) – Remove overlapping tags to reduce redundancy
fmt (Any) – Specify return format structure for predictions, default is ('rating', 'general', 'character').

Returns:

For single embeddings: prediction result based on fmt. For batches: list of prediction results.

For batch processing (2-dim input), returns a list where each element corresponds to one embedding’s predictions in the same format as single embedding output.

Example:

>>> import os
>>> import numpy as np
>>> from realutils.tagging import get_idolsankaku_tags, convert_idolsankaku_emb_to_prediction
>>>
>>> # extract the feature embedding, shape: (W, )
>>> embedding = get_idolsankaku_tags('skadi.jpg', fmt='embedding')
>>>
>>> # convert to understandable result
>>> rating, general, character = convert_idolsankaku_emb_to_prediction(embedding)
>>> # these 3 dicts will be the same as that returned by `get_idolsankaku_tags('skadi.jpg')`
>>>
>>> # Batch processing, shape: (B, W)
>>> embeddings = np.stack([
...     get_idolsankaku_tags('img1.jpg', fmt='embedding'),
...     get_idolsankaku_tags('img2.jpg', fmt='embedding'),
... ])
>>> # results will be a list of (rating, general, character) tuples
>>> results = convert_idolsankaku_emb_to_prediction(embeddings)