畢業專題Capstone

機械手臂視覺抓取系統Robotic-Arm Visual Grasping System

結合 YOLOv7 與 OpenCV 的六軸機械手臂視覺抓取，同時取得類別、位置與角度。A six-axis robotic-arm grasping system fusing YOLOv7 and OpenCV to obtain class, position, and angle at once.

年份Year2023

類型Type大學畢業專題Undergrad capstone

指導Advisor柯春旭博士Dr. Chun-Hsu Ko

獲獎Award系專題展優等Showcase — Distinction

概述Overview

標準 YOLO 能辨識類別與位置卻算不出角度；OpenCV 能由輪廓求角度卻無法分類。本專題將兩者結合，搭配深度影像與六軸手臂控制，實現即時動態追蹤與自動抓取。透過 YOLOv7 偵測物體，再利用 OpenCV 對深度影像進行裁切與二值化處理，取得物體的精確中心點與旋轉角（θ），最後經由逆向運動學（Inverse Kinematics）計算關節角度，導引手臂執行抓取任務。Standard YOLO classifies and localizes but cannot infer angle; OpenCV recovers angle from contours but cannot classify. This project fuses the two with depth imaging and six-axis arm control for real-time tracking and automated grasping. YOLOv7 detects objects, then OpenCV processes the cropped depth map via binarization to obtain precise centers and rotation angles (θ). Finally, Inverse Kinematics solves for joint angles to guide the arm.

研究背景Background

隨著自動化需求成長，工業機器人安裝量屢創新高——據國際機器人聯合會（IFR）統計，2022 年全球新安裝量達 531,060 台，2012–2022 年累計約 350 萬台；加上人口老化導致勞動力短缺，以智慧工廠導入機械手臂成為趨勢，應用涵蓋分揀、生產與醫療復健等領域。要讓手臂真正分擔繁瑣與精細的工作，影像辨識是關鍵；但傳統 RGB 視覺易受光線強弱與折射干擾，因此本專題改採深度影像搭配深度學習，即使在昏暗環境也能穩定辨識，並輔以直覺的人機介面降低操作門檻。Demand for automation keeps pushing industrial-robot installations to record highs — the IFR reports 531,060 new units installed worldwide in 2022 and ~3.5 million cumulatively over 2012–2022 — while an aging workforce and labor shortages accelerate the move to smart factories, with arms now used in sorting, production, and medical rehabilitation. Vision is key to having arms take on tedious and delicate tasks, yet conventional RGB vision is sensitive to lighting and refraction; this project therefore pairs depth imaging with deep learning for stable recognition even in dim conditions, behind an intuitive interface that lowers the operating barrier.

549

訓練影像Training images

物件類別Object classes

~0.9

mAP@.5

6 軸-axis

AR3 手臂AR3 arm

系統流程Pipeline

影像擷取：Intel RealSense D435 以 pyrealsense2 對齊 RGB 與深度影像，解析度 640×480、90 FPS（畫面尺寸維持為 YOLO stride 32 的倍數）。Capture: Intel RealSense D435 aligns RGB and depth via pyrealsense2 at 640×480 / 90 FPS (frame size kept a multiple of YOLO's stride of 32).
物件偵測：YOLOv7 以 549 張影像訓練（358 張訓練、191 張驗證，6 類日常物件），約 5 小時、1000 epochs，mAP@.5 達約 0.9；輸出類別與 Bounding Box。Detection: YOLOv7 trained on 549 images (358 train / 191 val, 6 everyday-object classes) for ~5 hours and 1000 epochs; mAP@.5 ≈ 0.9. Outputs class and bounding boxes.
姿態估算：依框裁切深度影像，進行二值化處理。使用 OpenCV findContours 取最小外接矩形，獲得旋轉角度 θ 以調整夾爪方向。Pose Estimation: Crop depth map by bounding box and binarize. OpenCV findContours + min-area rectangle yields rotation angle θ for gripper alignment.
座標轉換：利用相機內參與深度（d）動態計算水平（HFOV）與垂直（VFOV）視野，將像素座標 (nx, ny) 轉換為實際空間座標 (rx, ry)。Coordinate Transform: Uses camera intrinsics and depth (d) to dynamically compute HFOV/VFOV, mapping pixel coordinates (nx, ny) to real-world (rx, ry).
運動控制：對 x / y / z / θ 採比例控制（P Control）；當垂直距離 < 15 cm 時觸發抓取動作。Control: Proportional control on x / y / z / θ; grasping triggers when vertical distance < 15 cm.
逆向運動學：採數值解法（Numerical Solution）計算 AR3 六軸關節角，確保手臂末端能精確抵達物體位置。Inverse Kinematics: Employs numerical solutions for AR3 6-axis joint angles, ensuring the end-effector reaches the target accurately.

硬體與工具Hardware & tools

手臂Arm: AR3 六軸機械手臂（Stepper-driven, G-code 控制）AR3 six-axis arm (stepper-driven, G-code control)
相機Camera: Intel RealSense D435（RGB + Depth）
運算平台Platform: Python · PyTorch (YOLOv7) · OpenCV · pyrealsense2 · pyserial · Tkinter 多執行緒 GUI（Anaconda／VSCode 環境）Python · PyTorch (YOLOv7) · OpenCV · pyrealsense2 · pyserial · Tkinter multi-threaded GUI (Anaconda/VSCode)
夾爪控制Gripper: 自製控制箱（驅動器、編碼器、電源）+ Swaytail 伺服馬達夾爪Custom control box (drivers, encoders, PSU) + Swaytail servo gripper

AR3 六軸機械手臂本體 — AR3 六軸手臂本體與夾爪AR3 six-axis arm & gripper

自製控制箱內部（步進驅動器、繼電器、電源）Custom control box (stepper drivers, relays, PSU)

成果Results

在不同種類、大小與形狀的物件下反覆測試，系統能高效、準確且穩定地即時追蹤並抓取，並可正確處理雙物體的連續抓取（先抓取一個、歸位後再追蹤並抓取下一個）。下表為單張影像的平均辨識時間：Tested repeatedly on objects of varying type, size, and shape, the system tracks and grasps in real time with high efficiency and stability, including sequential two-object grasping (pick one, return, then track and grasp the next). Average per-frame recognition time:

物件數Objects	YOLO + OpenCV	OpenCV
1	0.093 s	0.022 s
2	0.087 s	0.023 s
3	0.089 s	0.026 s

純 OpenCV 以 C 為底層、單純求輪廓，速度較快，但相鄰或重疊的物件會被併成同一個；結合 YOLO 後雖略增運算，卻能正確分離並辨識個別物體的類別，兼顧速度與可靠度。此系統可應用於藥局藥品取放（降低取藥失誤）與工廠瑕疵檢測等情境。Pure OpenCV is faster (C-based contouring) but merges adjacent or overlapping items into one; adding YOLO costs a little time yet correctly separates and classifies individual objects — balancing speed and reliability. The system suits scenarios such as pharmacy drug pick-and-place (reducing dispensing errors) and factory defect inspection.

技術細節Technical Details

動態座標轉換Dynamic Coordinate Mapping

本系統不採用固定的像素轉實際距離比例，而是根據相機內參及即時深度值 d 動態計算：Instead of a fixed pixel-to-world ratio, the system computes mapping dynamically based on intrinsics and depth d:

HFOV = 2 · tan-1(w / 2f), VFOV = 2 · tan-1(h / 2f)
rx = d · tan(HFOV / 2) / (nw / 2)
xo = Δu · rx, yo = Δv · ry

其中 Δu, Δv 為物體中心相對於影像中心點的像素偏移量。Where Δu, Δv are pixel offsets from the image center.

實機 DemoLive Demo

未來展望Future Work

擴充物件資料庫以涵蓋更多種類，提升 YOLO 對複雜形狀與細小物件的辨識與鑑別力；並導入模糊控制、神經網路控制等更進階的控制策略，進一步增進抓取的穩定性與移動的平穩度。Expand the object dataset to cover more categories and improve recognition of complex or tiny objects, and adopt more advanced control strategies (e.g., fuzzy and neural-network control) to further improve grasp stability and motion smoothness.

技術Tech

YOLOv7
OpenCV
RealSense D435
AR3 6-Axis
Python
逆向運動學Inverse Kinematics
即時追蹤Real-Time Tracking

查看完整專題論文 (PDF)View Full Capstone Paper (PDF)

查看專題展「優等」獎狀（PDF）Project showcase Distinction award (PDF) ↗