This guide covers the available nodes and requirements for creating real-time video pipelines using ComfyUI with Livepeer.

Video Input/Output Nodes - REQUIRED FOR ALL PIPELINES

ComfyStream

  • Github Link
  • Input:
    • Video stream URL or device ID
    • Optional configuration parameters
  • Output:
    • RGB frame tensor (3, H, W)
    • Frame metadata (timestamp, index)
  • Performance Requirements:
    • Frame processing time: < 5ms
    • VRAM usage: < 500MB
    • Buffer size: ≤ 2 frames
    • Supported formats: RTMP, WebRTC, V4L2
  • Best Practices:
    • Set fixed frame rate

Analysis Nodes

Depth Anything TensorRT

  • Github Link
  • Input: RGB frame (3, H, W)
  • Output: Depth map (1, H, W)
  • Performance Requirements:
    • Inference time: < 20ms
    • VRAM usage: 2GB
    • Batch size: 1
  • Best Practices:
    • Place early in workflow
    • Cache results for static scenes
    • Use lowest viable resolution

Segment Anything 2

  • (Github Link)[https://github.com/kijai/ComfyUI-segment-anything-2]
  • Input: RGB frame (3, H, W)
  • Output: Segmentation mask (1, H, W)
  • Performance Requirements:
    • Inference time: < 30ms
    • VRAM usage: 3GB
    • Batch size: 1
  • Best Practices:
    • Cache static masks
    • Use mask erosion for stability
    • Implement confidence thresholding

Florence2

  • Github Link
  • Input: RGB frame (3, H, W)
  • Output: Feature vector (1, 512)
  • Performance Requirements:
    • Inference time: < 15ms
    • VRAM usage: 1GB
    • Batch size: 1
  • Best Practices:
    • Cache embeddings for known references
    • Use cosine similarity for matching
    • Implement feature vector normalization

Generation and Control Nodes

LivePortraitKJ

  • Github Link
  • Input:
    • Source image (3, H, W)
    • Driving frame (3, H, W)
  • Output: Animated frame (3, H, W)
  • Performance Requirements:
    • Inference time: < 50ms
    • VRAM usage: 4GB
    • Batch size: 1
  • Best Practices:
    • Pre-process source images
    • Implement motion smoothing
    • Cache facial landmarks

ComfyUI Diffusers

  • (Github Link)[https://github.com/Limitex/ComfyUI-Diffusers]
  • Input:
    • Conditioning tensor
    • Latent tensor
  • Output: Generated frame (3, H, W)
  • Performance Requirements:
    • Inference time: < 50ms
    • VRAM usage: 4GB
    • Maximum steps: 20
  • Best Practices:
    • Use TensorRT optimization
    • Implement denoising strength control
    • Cache conditioning tensors

Supporting Nodes

K Sampler

  • Input:
    • Latent tensor
    • Conditioning
  • Output: Sampled latent
  • Performance Requirements:
    • Maximum steps: 20
    • VRAM usage: 2GB
    • Scheduler: euler_ancestral
  • Best Practices:
    • Use adaptive step sizing
    • Cache conditioning tensors

Prompt Control

  • Input: Text prompts
  • Output: Conditioning tensors
  • Performance Requirements:
    • Processing time: < 5ms
    • VRAM usage: minimal
  • Best Practices:
    • Cache common prompts
    • Use consistent style tokens
    • Implement prompt weighting

VAE

  • Input: Latent tensor
  • Output: RGB frame
  • Performance Requirements:
    • Inference time: < 10ms
    • VRAM usage: 1GB
    • Tile size: 512
  • Best Practices:
    • Use tiling for large frames
    • Implement half-precision
    • Cache common latents

IPAdapter

  • Input:
    • Reference image
    • Target tensor
  • Output: Conditioned tensor
  • Performance Requirements:
    • Inference time: < 20ms
    • VRAM usage: 2GB
    • Reference resolution: ≤ 512x512
  • Best Practices:
    • Cache reference embeddings
    • Use consistent weights
    • Implement cross-attention

Cache Nodes

  • Input: Any tensor
  • Output: Cached tensor
  • Performance Requirements:
    • Access time: < 1ms
    • Maximum size: 2GB
    • Cache type: GPU
  • Best Practices:
    • Implement LRU eviction
    • Monitor cache pressure
    • Clear on scene changes

ControlNet

  • Input:
    • Control signal
    • Target tensor
  • Output: Controlled tensor
  • Performance Requirements:
    • Inference time: < 30ms
    • VRAM usage: 2GB
    • Resolution: ≤ 512
  • Best Practices:
    • Use adaptive conditioning
    • Implement strength scheduling
    • Cache control signals

Default Nodes

All default nodes that ship with ComfyUI are available. The list below is subject to change.

  • AlignYourStepsScheduler
  • BasicGuider
  • BasicScheduler
  • BetaSamplingScheduler
  • Canny
  • CFGGuider
  • CheckpointLoader
  • CheckpointLoaderSimple
  • CheckpointSave
  • CLIPAdd
  • CLIPAttentionMultiply
  • CLIPLoader
  • CLIPMergeSimple
  • CLIPSave
  • CLIPSetLastLayer
  • CLIPSubtract
  • CLIPTextEncode
  • CLIPTextEncodeControlnet
  • CLIPTextEncodeFlux
  • CLIPTextEncodeHunyuanDiT
  • CLIPTextEncodeSD3
  • CLIPTextEncodeSDXL
  • CLIPTextEncodeSDXLRefiner
  • CLIPVisionEncode
  • CLIPVisionLoader
  • ConditioningAverage
  • ConditioningCombine
  • ConditioningConcat
  • ConditioningSetArea
  • ConditioningSetAreaPercentage
  • ConditioningSetAreaStrength
  • ConditioningSetMask
  • ConditioningSetTimestepRange
  • ConditioningZeroOut
  • ControlNetApply
  • ControlNetApplyAdvanced
  • ControlNetApplySD3
  • ControlNetInpaintingAliMamaApply
  • ControlNetLoader
  • CropMask
  • DifferentialDiffusion
  • DiffControlNetLoader
  • DiffusersLoader
  • DisableNoise
  • DualCFGGuider
  • DualCLIPLoader
  • EmptyImage
  • EmptyLatentAudio
  • EmptyLatentImage
  • EmptyMochiLatentVideo
  • EmptySD3LatentImage
  • ExponentialScheduler
  • FeatherMask
  • FlipSigmas
  • FluxGuidance
  • FreeU
  • FreeU_V2
  • GITSScheduler
  • GLIGENLoader
  • GLIGENTextBoxApply
  • GrowMask
  • HypernetworkLoader
  • HyperTile
  • ImageBatch
  • ImageBlend
  • ImageBlur
  • ImageCompositeMasked
  • ImageColorToMask
  • ImageCrop
  • ImageFromBatch
  • ImageInvert
  • ImageOnlyCheckpointLoader
  • ImageOnlyCheckpointSave
  • ImagePadForOutpaint
  • ImageQuantize
  • ImageScale
  • ImageScaleBy
  • ImageScaleToTotalPixels
  • ImageSharpen
  • ImageToMask
  • ImageUpscaleWithModel
  • InpaintModelConditioning
  • InstructPixToPixConditioning
  • InvertMask
  • JoinImageWithAlpha
  • KarrasScheduler
  • KSampler
  • KSamplerAdvanced
  • KSamplerSelect
  • LaplaceScheduler
  • LatentAdd
  • LatentApplyOperation
  • LatentApplyOperationCFG
  • LatentBatch
  • LatentBatchSeedBehavior
  • LatentBlend
  • LatentComposite
  • LatentCompositeMasked
  • LatentCrop
  • LatentFlip
  • LatentFromBatch
  • LatentInterpolate
  • LatentMultiply
  • LatentOperationSharpen
  • LatentOperationTonemapReinhard
  • LatentRotate
  • LatentSubtract
  • LatentUpscale
  • LatentUpscaleBy
  • LoadAudio
  • LoadImage
  • LoadImageMask
  • LoadLatent
  • LoraLoader
  • LoraLoaderModelOnly
  • LoraSave
  • MaskComposite
  • MaskToImage
  • ModelAdd
  • ModelMergeBlocks
  • ModelMergeFlux1
  • ModelMergeSD1
  • ModelMergeSD2
  • ModelMergeSD35_Large
  • ModelMergeSD3_2B
  • ModelMergeSDXL
  • ModelMergeSimple
  • ModelSamplingAuraFlow
  • ModelSamplingContinuousEDM
  • ModelSamplingContinuousV
  • ModelSamplingDiscrete
  • ModelSamplingFlux
  • ModelSamplingSD3
  • ModelSamplingStableCascade
  • ModelSave
  • ModelSubtract
  • Morphology
  • PatchModelAddDownscale
  • PerpNeg
  • PerpNegGuider
  • PerturbedAttentionGuidance
  • PhotoMakerEncode
  • PhotoMakerLoader
  • PolyexponentialScheduler
  • PorterDuffImageComposite
  • PreviewAudio
  • PreviewImage
  • RandomNoise
  • RebatchImages
  • RebatchLatents
  • RepeatImageBatch
  • RepeatLatentBatch
  • RescaleCFG
  • SamplerCustom
  • SamplerCustomAdvanced
  • SamplerDPMAdaptative
  • SamplerDPMPP_2M_SDE
  • SamplerDPMPP_2S_Ancestral
  • SamplerDPMPP_3M_SDE
  • SamplerDPMPP_SDE
  • SamplerEulerAncestral
  • SamplerEulerAncestralCFGPP
  • SamplerEulerCFGpp
  • SamplerLCMUpscale
  • SamplerLMS
  • SaveAnimatedPNG
  • SaveAnimatedWEBP
  • SaveAudio
  • SaveImage
  • SaveLatent
  • SD_4XUpscale_Conditioning
  • SDTurboScheduler
  • SelfAttentionGuidance
  • SetLatentNoiseMask
  • SetUnionControlNetType
  • SkipLayerGuidanceSD3
  • SolidMask
  • SplitImageWithAlpha
  • SplitSigmas
  • SplitSigmasDenoise
  • StableCascade_EmptyLatentImage
  • StableCascade_StageB_Conditioning
  • StableCascade_StageC_VAEEncode
  • StableCascade_SuperResolutionControlnet
  • StableZero123_Conditioning
  • StableZero123_Conditioning_Batched
  • StyleModelApply
  • StyleModelLoader
  • SV3D_Conditioning
  • SVD_img2vid_Conditioning
  • ThresholdMask
  • TomePatchModel
  • TorchCompileModel
  • TripleCLIPLoader
  • unCLIPCheckpointLoader
  • unCLIPConditioning
  • UNETLoader
  • UNetCrossAttentionMultiply
  • UNetSelfAttentionMultiply
  • UNetTemporalAttentionMultiply
  • UpscaleModelLoader
  • VAEDecode
  • VAEDecodeAudio
  • VAEDecodeTiled
  • VAEEncode
  • VAEEncodeAudio
  • VAEEncodeForInpaint
  • VAEEncodeTiled
  • VAESave
  • VideoLinearCFGGuidance
  • VideoTriangleCFGGuidance
  • VPScheduler
  • WebcamCapture