145 Recommended Topics

Remove Filter
Member Avatar for
Member Avatar for usmanmalik57

As a data scientist, I have extensively used the Hugging Face library for processing unstructured data such as images, text, and audio. My previous blogs have covered various transformer models for these types of data. Lately, however, I discovered that Hugging Face also provides transformer models for tabular data. One …

Computer Science machine-learning python
1
1
Member Avatar for usmanmalik57

# Comparison Between Fine-tuned and Default GPT-3 Turbo for Text Classification In one of my previous articles, I showed you how to perform [zero-shot text classification using OpenAI GPT-4o and Meta Llama 3 models](https://www.daniweb.com/programming/computer-science/tutorials/542001/openai-gpt-4o-vs-meta-llama-3-for-zero-shot-text-classifiation). I used the default models for predicting sentiments of airline tweets. The default models perform substantially …

0
51
Member Avatar for usmanmalik57

OpenAI announced the [GPT-4o (omni)](https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700) model on May 13, 2024. The GPT-4o model, as the name suggests, can process multimodal inputs, such as text, image, and speech. As per OpenAI, GPT-4o is the state-of-the-art and best-performing large language model. Among GPT-4o's many capabilities, I found its ability to analyze images …

1
216
Member Avatar for usmanmalik57

On April 18, 2024, Meta AI released [Llama 3](https://ai.meta.com/blog/meta-llama-3/), which they claimed to be the most capable openly available LLM to date. Concurrently, OpenAI announced [GPT-4o (omni)](https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700) on May 13, 2024, which is touted as the state-of-the-art proprietary model for various NLP benchmarks. As a guy who loves to compare …

2
155
Member Avatar for usmanmalik57

In this tutorial, you will see how to generate stunning AI-generated images from text inputs using state-of-the-art diffusion models from [Hugging Face](https://huggingface.co/). You'll learn about base diffusion models and how combining them with a refiner creates even more detailed, refined results. Diffusion models are powerful because they iteratively refine an …

Member Avatar for rproffitt
1
57
Member Avatar for usmanmalik57

## Introduction Text-to-speech (TTS) technology has revolutionized how we interact with devices, making accessing content through auditory means easier. TTS is vital in various applications such as virtual assistants, audiobooks, accessibility tools for the visually impaired, and language learning platforms. This tutorial will explore how to convert text-to-speech using Hugging …

1
73
Member Avatar for usmanmalik57

In a previous article, I explained [how to extract tabular data from PDF image documents using Multimodal Google Gemini Pro](https://www.daniweb.com/programming/computer-science/tutorials/541449/pdf-image-table-extractor-web-app-with-google-gemini-pro-and-streamlit#post2296083). However, there are a couple of disadvantages with Google Gemini Pro. First, Google Gemini Pro is not free, and second, it needs complex prompt engineering to retrieve table, columns, and …

Member Avatar for Harini sri
2
691
Member Avatar for usmanmalik57

The advent of large language models (LLM) has replaced complex scripts with natural language for automating various tasks. You can now use LLM to interact with your databases using natural language, which makes life easier for people who do not have sufficient SQL knowledge. In this article, you will learn …

Member Avatar for aishamushtaq
2
110
Member Avatar for usmanmalik57

In this tutorial, you will see how to summarize YouTube video transcriptions using [Distil Whisper large V3](https://huggingface.co/distil-whisper/distil-large-v3) and [Mistral-7b-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). Both Distill Whisper Large V3 and Mistral-7B-Instruct models are open-source and free-to-use models. The Distil Whisper large V3 model is a faster and smaller variant of the [Whisper large V3 model](https://huggingface.co/openai/whisper-large-v3), …

1
51
Member Avatar for usmanmalik57

In my [previous articles](https://www.daniweb.com/programming/computer-science/tutorials/541732/paris-olympics-ticket-information-chatbot-with-memory-using-langchain), I explained how to develop customized chatbots using Retrieval Augmented Generation (RAG) approach in [LangChain](https://www.langchain.com/). However, I used proprietary models such as OpenAI, which can be expensive when you try to scale. In this article, I will show you how to use the open-source and free-of-cost …

Computer Science artificial-intelligence-llm
1
175
Member Avatar for usmanmalik57

In previous articles, I explained how to use natural language to interact with [PDF documents](https://www.daniweb.com/programming/computer-science/tutorials/541732/paris-olympics-ticket-information-chatbot-with-memory-using-langchain) and [SQL databases](https://www.daniweb.com/programming/computer-science/tutorials/541771/using-natural-language-to-query-sql-databases-with-python-langchain-module), using the Python [LangChain module](https://python.langchain.com/docs/get_started/introduction) and [OpenAI API](https://openai.com/blog/openai-api). In this article, you will learn how to use LangChain and OpenAI API to create a question-answering application that allows you to retrieve information …

2
54
Member Avatar for usmanmalik57

In my previous article, I explained how I developed a simple chatbot using LangChain and Chat-GPT that can answer queries related to Paris Olympics ticket prices. However, one major drawback with that chatbot is that it can only generate a single response based on user queries. It can not answer …

2
85
Member Avatar for usmanmalik57

I was searching for Paris Olympics ticket prices for tennis games recently. The official website directs you to a [PDF document](https://tickets.paris2024.org/obj/media/FR-Paris2024/ticket-prices.pdf) containing ticket prices and venues for all the games. However, I found the PDF document to be very hard to navigate. To make things easier, I developed a chatbot …

4
41
Member Avatar for usmanmalik57

On March 4, 2024, [Anthropic](https://www.anthropic.com/) launched the [Claude 3 family of large language models](https://www.anthropic.com/news/claude-3-family). Anthropic claimed that its Claude 3 Opus model outperforms GPT-4 on various benchmarks. Intrigued by Anthropic's claim, I performed a simple test to compare the performances of Claude 3 Opus, [Google Gemini Pro](https://deepmind.google/technologies/gemini/#introduction), and [OpenAI's GPT-4](https://openai.com/research/gpt-4) …

2
104
Member Avatar for usmanmalik57

In the rapidly evolving field of Natural Language Processing (NLP), open-source large language models (LLMs) are becoming increasingly popular as they are free to use. Among these, the [Mistral](https://docs.mistral.ai/models/) family of models stands out as a state-of-the-art model that is freely accessible to the public. Comparable in performance to the …

3
58
Member Avatar for usmanmalik57

In a previous article, I explained [how to fine-tune Google's Gemma model for text classification](https://www.daniweb.com/programming/computer-science/tutorials/541544/fine-tuning-google-gemma-model-for-text-classification-in-python). In this article, I will explain how you can improve performance of a pretrained large language model (LLM) using retrieval augmented generation (RAG) technique. So, let's begin without ado. ## What is Retrieval Augmented Generation …

2
550
Member Avatar for learnerya

I am a first-year university student from China. My major is Computer Science and Technology. I have been self-learning C++and data structures and algorithms recently. May I ask how I can learn them well? Is anyone interested in being my teacher or learning with friends? (Machine translation, my English is …

Computer Science c c++
Member Avatar for tinstaafl
1
63
Member Avatar for usmanmalik57

On February 21, 2024, Google released [Gemma](https://ai.google.dev/gemma), a family of state-of-the-art open-source large language models (LLMs). As per initial results, its 7b (seven billion parameter) version is known to perform better than Meta's [Llama 2](https://llama.meta.com/), the previous state-of-the-art open-source LLM. As always, my first test with any new open-source LLM …

2
658
Member Avatar for usmanmalik57

Integrating language models like ChatGPT into third-party applications has become increasingly popular due to their ability to comprehend and generate human-like text. However, it's crucial to acknowledge the limitations of ChatGPT, such as its knowledge cut-off date in September 2021 and its inability to access external sources like Wikipedia or …

Member Avatar for catherine_11
3
1K
Member Avatar for usmanmalik57

In my previous article, I explained [how to convert PDF image to CSV using Multimodal Google Gemini Pro](https://www.daniweb.com/programming/computer-science/tutorials/541365/converting-pdf-image-to-csv-using-multimodal-google-gemini-pro). To do so, I wrote a Python script that passes text command to [Google Gemino Pro](https://blog.google/technology/ai/google-gemini-ai/) for extracting tables from PDF images and storing them in a CSV file. In this article, …

1
219
Member Avatar for usmanmalik57

In this article, you will learn how to track faces within a video using the Python DeepFace library. Additionally, you'll discover how to include portions of the video background in face tracking by implementing custom methods that utilize the DeepFace library's `extract_faces()` method for face extraction. I explained how to …

Computer Science python
Member Avatar for EdwardMatthew
1
449
Member Avatar for usmanmalik57

## Introduction ## In a previous article, I explained [how to fine-tune the vision transformer model for image classification in PyTorch](https://www.daniweb.com/programming/computer-science/tutorials/540749/fine-tuning-vision-transformer-for-image-classification-in-pytorch). In this article, I will explain how to fine-tune the pre-trained OpenAI Whisper model for audio classification in PyTorch. Audio classification is an important task that can be applied …

Computer Science audio python
Member Avatar for habi_2
2
1K
Member Avatar for usmanmalik57

In this article, you will learn to use [Google Gemini Pro](https://blog.google/technology/ai/google-gemini-ai/), a state-of-the-art multimodal generative model, to extract information from PDF and convert it to CSV files. You will use a simple text prompt to tell Google Gemini Pro about the information you want to extract. This is a valuable …

2
266
Member Avatar for usmanmalik57

I recently tackled a challenging research task involving multimodal data for a classification problem using [TensorFlow Keras](https://www.tensorflow.org/guide/keras). One of the trickiest aspects was figuring out how to load multimodal data in batches from storage efficiently. While TensorFlow Keras offers helpful functions for batch-loading images from various sources, the documentation and …

Computer Science python tensorflow
2
56
Member Avatar for usmanmalik57

## Introduction ## This tutorial explains how to perform multiple-label text classification using the [Hugging Face](https://huggingface.co/) transformers library. Hugging Face library implements advanced transformer architectures, proven to be state-of-the-art for various natural language processing tasks, including text classification. Hugging Face library provides trainable transformer models in three flavors: 1. Via …

Member Avatar for Aravind_11
1
952
Member Avatar for usmanmalik57

In this article, we will compare two state-of-the-art large language models for zero-shot text classification: [Google Gemini Pro](https://deepmind.google/technologies/gemini/#introduction) and [OpenAI GPT-4](https://openai.com/research/gpt-4). Zero-shot text classification is a task where a model is trained on a set of labeled examples but can then classify new examples from previously unseen classes. This is …

1
96
Member Avatar for usmanmalik57

Sentiment analysis, a subfield of Natural Language Processing (NLP), aims to discern and classify the underlying sentiment or emotion expressed in textual data. Whether it is understanding customers' opinions about a product, analyzing social media posts, or gauging public sentiment towards a political event, sentiment analysis plays a vital role …

Member Avatar for Abdul_116
6
2K
Member Avatar for usmanmalik57

In a [previous tutorial](https://www.daniweb.com/programming/computer-science/tutorials/541123/stock-price-prediction-using-1d-cnn-in-tensorflow-keras), I covered how to predict future stock prices using a deep learning model with 1D CNN layers. This method is effective for basic time series forecasting. Recently, I've enhanced this model by not just considering past closing prices but also factors like Open, High, Low, Volume, …

Computer Science python tensorflow
0
72
Member Avatar for usmanmalik57

A video is a series of images, or frames, shown in rapid succession. Its frame rate, measured in frames per second (FPS), dictates the display speed. For instance, a 30 FPS video shows 30 frames each second. The frame count and frame rate determine a video's detail, smoothness, file size, …

Computer Science python
Member Avatar for usmanmalik57
2
313
Member Avatar for usmanmalik57

## Introduction ## Loss functions are the driving force behind all machine learning algorithms. They quantify how well our models are performing by calculating the difference between the predicted and actual outcomes. The goal of every machine learning algorithm is to minimize this loss function, thereby improving the model’s accuracy. …

Computer Science python
Member Avatar for AndreRet
3
278

The End.