Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    ItemRestricted
    Efficient Processing of Convolutional Neural Networks on the Edge: A Hybrid Approach Using Hardware Acceleration and Dual-Teacher Compression
    (University of Central Florida, 2024-07-05) Alhussain, Azzam; Lin, Mingjie
    This dissertation addresses the challenge of accelerating Convolutional Neural Networks (CNNs) for edge computing in computer vision applications by developing specialized hardware solutions that maintain high accuracy and perform real-time inference. Driven by open-source hardware design frameworks such as FINN and HLS4ML, this research focuses on hardware acceleration, model compression, and efficient implementation of CNN algorithms on AMD SoC-FPGAs using High-Level Synthesis (HLS) to optimize resource utilization and improve the throughput/watt of FPGA-based AI accelerators compared to traditional fixed-logic chips, such as CPUs, GPUs, and other edge accelerators. The dissertation introduces a novel CNN compression technique, "Two-Teachers Net," which utilizes PyTorch FX-graph mode to train an 8-bit quantized student model using knowledge distillation from two teacher models, improving the accuracy of the compressed model by 1%-2% compared to existing solutions for edge platforms. This method can be applied to any CNN model and dataset for image classification and seamlessly integrated into existing AI hardware and software optimization toolchains, including Vitis-AI, OpenVINO, TensorRT, and ONNX, without architectural adjustments. This provides a scalable solution for deploying high-accuracy CNNs on low-power edge devices across various applications, such as autonomous vehicles, surveillance systems, robotics, healthcare, and smart cities.
    25 0
  • Thumbnail Image
    ItemRestricted
    THERMAL ANALYSIS OF HIGH-PERFORMANCE FPGA-BASED MULTI-CHANNEL TIME-TO-DIGITAL CONVERTERS BASED ON TAPPED DELAY LINES ARCHITECTURE
    (University of Dayton, 2024-03-27) Alshehry, Awwad; Chodavarapu, Vamsy
    We describe a study on the effect of temperature variations on multi-channel Time to Digital Converters (TDC). The objective is to study the impact of ambient thermal variations on the performance of Field Programmable Gate Array (FPGA)-based Tapped Delay Line (TDL) TDC systems, while simultaneously meeting the requirements of high-precision time measurement, low-cost implementation, small size, and low power consumption. For our study we choose two devices, Xilinx Artix-7 and Microsemi ProASIC3L. The radiation-tolerant ProASIC3L device offers better stability in terms of thermal sensitivity and power consumption compared to the Artix-7. To assess the performance of the TDCs under varying thermal conditions, a laboratory thermal chamber was utilized to maintain ambient temperatures ranging from -75 to 80 °C. This analysis ensured a comprehensive evaluation of the TDCs performance across a wide operational range. By utilizing the Artix-7 and ProASIC3L devices, we achieved Root Mean Square (RMS) resolution of 24.7 and 554.59 picoseconds, respectively. We worked to determine the temperature sensitivity for both FPGA devices by observing a significantly low temperature coefficient using Artix-7, while temperature insensitive and stable performance are achieved using the ProASIC3L device. Total on-chip 3 power of 0.968 W was achieved using Artix-7 while less than 1.988 mW of power consumption was achieved using ProASIC3L device. The results and analysis presented in this study convince that the proposed design using the new generations of the FPGAs would help in the design and optimization of FPGA-based TDCs for many applications.
    21 0
  • Thumbnail Image
    ItemRestricted
    Design and Implementation of a RISC Microprocessor
    (Saudi Digital Library, 2023-11-21) Aljishi, Hadi Fadel A; Khursheed, Saqib
    The demand for compact, high-speed, and energy-efficient computing systems has made the innovation and advancement of microprocessor designs increasingly vital. This project concerns the evelopment of a fully-featured Reduced Instruction Set Computer microprocessor on an FPGA. A practical instruction set was chosen and used as the basis for a datapath design. Implementation was done on the Cyclone II featured on the Altera DE2 board. Two basic implementations were created based on internal and external memory. The maximum achievable clock frequency was determined to be 63.32 MHz for the internal memory implantation and 44.32 MHz for the external memory implementation. A third implementation featuring a multiplier and a floating-point unit was then developed which achieves a maximum clock frequency of 26.16 MHz and a total power consumption of 41.06 mW. Several programs were written using the new instruction set to test the three implementations, and all produced the expected outputs. However, some areas of the design and testing methodology could be improved.
    36 0

Copyright owned by the Saudi Digital Library (SDL) © 2024