Skip to main content

AI Pose Detection — Find 17 Body Keypoints in a Photo

Detect 17 body-pose keypoints in any photo with the MoveNet Lightning model. Exports the original image with a skeleton overlay plus a JSON file with every keypoint and its confidence score.

Tap to select a file

Supports JPG, PNG, WebP, up to 25MB

Runs entirely in your browser

What to do next

Related tools

About AI Pose Detection

Detecting body-pose keypoints in an image used to mean either renting GPU compute on a cloud service that costs per-image, or running a heavyweight Python script with a 100 MB model file. The AI Pose Detection tool gives you the same 17-keypoint output from a sub-4 MB MoveNet Lightning ONNX model running on ONNX Runtime Web inside your browser. Drop a photo in, get a skeleton-overlay PNG plus a JSON file with every keypoint's pixel coordinates and confidence score.

MoveNet Lightning is Google's distilled fast-pose-estimation network — accurate enough for sports and fitness analytics, light enough to run client-side at sub-second latency on a phone. The 17 keypoints it returns (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles) are the standard COCO-style keypoint set used by every downstream consumer (sports tagging tools, animation rigs, fitness form-checkers). Each keypoint comes with a confidence score 0..1, so downstream code can filter low-confidence detections before consuming the data. The skeleton overlay drawn on the output image follows the standard MoveNet edge connections, with line thickness scaled to the source image size so the result looks proportionate at any resolution.

The pre-processing step is a letterbox resize: the source image is centred onto a 192×192 canvas with black padding so the model input is square (MoveNet's required shape) without changing the photo's aspect ratio. The keypoints come back in normalised model-input coordinates and the tool remaps them back to the source image's pixel coordinates using the recorded scale and offset. This means a person in the bottom-right corner of a wide landscape photo gets the right pixel position in the output; a naive resize would have squashed the keypoints into the wrong location.

The single-pose variant ships first because it covers the most common use case: a sports / fitness / dance photo with one main subject. The Lightning variant prioritises speed at a small accuracy cost compared with the heavier Thunder variant — clearly visible keypoints land cleanly, but partially occluded limbs (hands behind the back, ankles cropped out of frame) sometimes return low-confidence positions. The confidence threshold is configurable; raising it from the default 0.3 to 0.5 hides the occluded keypoints from the overlay if you want a cleaner result. The model is fetched once on first use and cached in your browser, so subsequent images run in milliseconds without another network hit.

How it works

  1. 1Drop a JPG, PNG or WebP photo onto the upload area. Files up to 25 MB are accepted.
  2. 2On first use, the MoveNet Lightning ONNX model (~4 MB) downloads from the Favtoo CDN and is cached in your browser. ONNX Runtime Web WASM bytes are shared with the AI Image Upscaler.
  3. 3The source image is letterbox-resized onto a 192×192 model-input canvas — square aspect with black padding so the photo is not squashed.
  4. 4MoveNet runs inference and returns 17 keypoints with confidence scores in normalised coordinates.
  5. 5The tool remaps the keypoints back to source pixel coordinates and draws a skeleton overlay using standard MoveNet edge connections.
  6. 6Download a ZIP containing the skeleton-overlay PNG and a JSON sidecar with the raw keypoint data so downstream tools can consume them.

Common use cases

  • Analyse a yoga pose photo to check the alignment of shoulders and hips relative to the centre line
  • Generate body-keypoint overlays on dance photography for an instructional blog post
  • Extract keypoints from action photos to build a sports-form-correction tool
  • Drive a 2D character rig in a custom animation pipeline by feeding keypoint JSONs through a stylised renderer
  • Validate a fitness app’s pose-recognition behaviour with offline ground-truth from a known-good model
  • Highlight body landmarks in a physiotherapy report photo for post-op rehab tracking

FAQ

Which keypoints are detected?

The MoveNet 17-keypoint set: nose, left/right eye, left/right ear, left/right shoulder, left/right elbow, left/right wrist, left/right hip, left/right knee, left/right ankle. Each keypoint comes with an x/y position in image pixels plus a confidence score 0..1.

How accurate is it?

Strong on clear single-person photos at any angle. The Lightning variant is optimised for speed at a small accuracy cost vs the heavier Thunder variant — it lands every clearly visible keypoint, but partially occluded limbs (hands behind back, ankles cropped out of frame) are sometimes missed and the confidence score reflects that.

Does it find multiple people in one frame?

The single-pose variant ships first because it covers the most common use case (a sports / fitness / dance photo with one main subject). A multi-pose variant covering up to six people will follow once the larger model is uploaded to the asset CDN.

What outputs do I get?

Two files: the original image with a skeleton drawn over the detected keypoints (PNG), and a separate JSON file with the raw keypoint coordinates and scores so downstream tools (sports analytics, fitness apps, animation rigs) can consume them.

Will my photo upload?

No. MoveNet runs entirely on ONNX Runtime Web inside your browser. The model file (~4 MB) downloads once and is cached locally; everything else stays in your tab.

Does it find every body part?

It finds the 17 keypoints in the COCO standard set: nose, left/right eye, left/right ear, left/right shoulder, left/right elbow, left/right wrist, left/right hip, left/right knee, left/right ankle. Hands (individual fingers), feet (individual toes), and facial landmarks beyond eyes/ears/nose are out of scope for this model — those need separate hand-tracking and face-landmark networks.

How accurate is it?

Strong on clear single-person photos at any angle. The Lightning variant is optimised for speed at a small accuracy cost compared with the heavier Thunder variant — every clearly visible keypoint lands cleanly, but partially occluded limbs (hands behind back, ankles cropped out of frame) sometimes return low-confidence positions. Raise the confidence threshold to 0.5 to hide low-confidence detections from the overlay.

Does it handle multiple people?

The single-pose variant ships first — it returns one set of keypoints for the most prominent subject. A multi-pose variant covering up to six people will follow once the larger model is uploaded to the asset CDN. For now, crop the source to one person at a time with Crop Image and run each cropped subject separately.

Will my photo be uploaded?

No. MoveNet runs entirely on ONNX Runtime Web inside your browser. The model file (~4 MB) downloads once from our CDN and is cached locally; everything else stays in your tab.

What if no person is in the photo?

MoveNet returns 17 keypoints with low confidence scores. The default threshold of 0.3 filters most spurious detections from the skeleton overlay. The JSON output retains every keypoint with its score so downstream code can decide what to do with low-confidence data.

How fast is it on a phone?

Sub-second on a recent iPhone or flagship Android after the initial model load. The Lightning variant was specifically designed for mobile real-time inference, so a single still photo finishes in well under the time it takes to decode and display the result.

Can I use the keypoint JSON in my own tool?

Yes — the JSON is a standard structure: an object with source dimensions, model input size, and a keypoints array where each entry has name, x, y, and score. Drop it into any downstream pipeline that consumes COCO-style keypoint data.

AI Image Upscaler 2×

Double the resolution of any photo while sharpening detail. Real-ESRGAN runs entirely in your browser to enlarge low-res images without the soft, blurry look of standard scaling.

Document Scanner

Turn any phone photo of a document into a flat, perspective-corrected scan. OpenCV.js detects the page edges, warps it to a rectangle, and exports a clean PNG or PDF.

AI Profile Picture Maker

Crop any photo into a perfectly centred profile picture. face-api.js detects the face in your browser and frames it for square, circle or 4:5 PFP exports.

Add Noise to Image

Add monochrome film grain, colour noise, or salt-and-pepper specks to any photo. Choose noise type and amount; the result is rendered into a real PNG file in your browser.

Censor / Blur Region

Permanently censor a rectangular region of any photo with pixelation, blur, or a solid black bar. Specify exact x/y/width/height coordinates and the censor is baked into a real PNG — no recoverable original.

Skew Image

Apply real horizontal and vertical shear to any photo, turning a rectangle into a parallelogram. Choose X-skew and Y-skew angles from −60° to +60°; the tool re-renders to a real PNG with transparent corners.

Pixel Sorter

Apply real pixel-sorting glitch art to any photo: sort each row or column by brightness, hue, or saturation, with a threshold to control which pixels get included. Real PNG output.

Recompress JPEG

Upload a JPEG and re-compress it at a lower quality to reduce file size. Automatically strips EXIF metadata.

View all Image Tools