Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    ItemRestricted
    Automatic Sketch-to-Image Synthesis and Recognition
    (University of Dayton, 2024-05) Baraheem, Samah; Nguyen, Tam
    Image is used everywhere since it conveys a story, a fact, or an imagination without any words. Thus, it can substitute the sentences because the human brain can extract the knowledge from images faster than words. However, creating an image from scratch is not only time-consuming, but also a tedious task that requires skills. Creating an image is not a trivial task since it contains rich features and fine-grained details, such as colors, brightness, saturation, luminance, texture, shadow, and so on. Thus, in order to generate an image in less time and without any artistic skills, sketch-to-image synthesis can be used. The reason is that hand sketches are much easier to produce, where only the key structural information is contained. Moreover, it can be drawn without skills and in less time. In fact, since sketches are often simple and rough black and white and sometimes imperfect, converting a sketch into an image is not a trivial problem. Hence, it has attracted the researchers' attention to solve this challenging problem; therefore, much research has been conducted in this field to generate photorealistic images. However, the generated images still suffer from issues, such as the un-naturality, the ambiguity, the distortion, and most importantly, the difficulty in generating images from complex input with multiple objects. Most of these problems are due to converting a sketch into an image directly in one-shot. To this end, in this dissertation, we propose a new framework that divides the problem into sub-problems, leading to generating high-quality photorealistic images even with complicated sketches. Instead of directly mapping the input sketch into an image, we map the sketch into an intermediate result, namely, mask map, through an instance segmentation and semantic segmentation in two levels: background segmentation and foreground segmentation. Background segmentation is formed based on the 4 context of the existing foreground objects. Various natural scenes are implemented for both indoor and outdoor scenes. Then, a foreground segmentation process is commenced, where each detected object is sequentially and semantically added into the constructed segmented background. Next, the mask map is converted into an image through image-to-image translation model. Following this step, a post-processing stage is implemented to enhance the synthetic image further via background improvement and human face refinement. This leads to not only generating better results but also being able to generate images from complicated sketches with multiple objects. We further improve our framework by implementing scene and size sensing. As for size awareness feature, in the instance segmentation stage, the objects' sizes might be modified based on the surrounding environment and their respective size prior to reflect reality and produce more realistic and naturalistic images. Moreover, to implement scene awareness feature in the background improvement step, after the scene is initially defined based on the context and then classified based on a scene classifier, a scene image is first selected. Then, the generated objects are placed on the chosen scene image and based on a pre-defined snapping point to place the objects in their proper location and maintain realism. Furthermore, since the generated images have been improved over time regardless of the input modality, it sometimes becomes hard to distinguish between the synthetic images and genuine ones. Of course, this improves the content and the media, but it is considered a serious threat regarding legitimacy, authenticity, and security. Thus, an automatic detection system of AI-generated images is a legitim need. This system also can be used for image synthesis models as an evaluation tool despite the input modality. Indeed, AI-generated images usually bear explicit or implicit artifacts that result during the generation process. Prior research work on detecting the synthetic images generated by one specific model or similar models with similar architecture. Hence, a generalization problem has arisen. To tackle this problem, we propose to fine-tune a pre-trained Convolutional Neural Network (CNN) model on a special newly collected dataset. This 5 dataset consists of AI-generated images from different image synthesis architectures and different input modalities, i.e., text, sketch, and other sources (another image or mask) to help in the generalization ability across various tasks and architectures. Our contribution in general is two-fold. We first generate high-quality realistic images from simple, rough, black and white sketches, where a newly collected dataset of sketch-like images is compiled for training purposes. Second, since artificial images would have advantages and disadvantages in the real world, we create an automated system that is able to detect and localize synthetic images from genuine ones, where a large dataset of generated and real images is collected to train a CNN model.
    19 0
  • Thumbnail Image
    ItemRestricted
    Using Deep Learning Techniques for an Early Detection of Oral Epithelial Dysplasia
    (2023) Aljuaid, Abeer; Anwar, Mohd
    Oral cancer is ranked as the sixth most common type of cancer worldwide, with 90% of cases being oral squamous cell carcinoma (OSCC). OSCC has a high mortality rate, and early diagnosis can increase the survival rate. About 80% of OSCC is developed from Oral Epithelial Dysplasia (OED); thus, OED detection is critical to diagnose OSCC at the early stage. Traditionally, the OED is defined by sixteen criteria, including architectural and cytological features, under the microscope by expert oral pathologists. This manual detection is a time-consuming and tedious task, and thus, there is a need for precise automated diagnostic and classification techniques. However, disengaging a Computer Aided Diagnosis (CAD) for OED is challenging because each OED’s criteria require a particular medical image processing task for detection. Therefore, we proposed a novel multi-task learning network to combine semantic segmentation and classification to detect OED using architectural and cytological characteristics. Our proposal is the first study that jointly trained semantic segmentation and classification on a single network for automated OED detection. We developed four new frameworks called VGG16-UNet, InceptionV3-UNet, DyspVGG16, and Dysp-InceptionV3. The VGG16-UNet and InceptionV3-UNet were designed based on classic U-Net with the ImageNet pre-trained VGG16 and InceptionV3 encoder and a traditional classifier model. We built Dysp-VGG16 and Dysp-InceptionV3 using our novel modified U-Net and novel classifier network. Our modified U-Net involved dilated convolution, channel attention, spatial attention, and residual blocks for performance enhancement. The proposed models’ effectiveness and robustness were verified by running three experiments and utilizing quantitative metrics and visualization results for comparison. Consequently, our novel modified U-Net and classifier network show superior performance on classification and segmentation tasks. Our novel classifier enhanced the quantitative metrics and reduced the traditional classifier’s false positives and negative rates. Modified U-Net improved the semantic segmentation performance by 5% of the Jaccard index and provided accurate predicted masks.
    28 0

Copyright owned by the Saudi Digital Library (SDL) © 2024