get some info from youtube

Question

AnooooPower 3 Newbie Poster

10 Years Ago

is there a way to get info from a youtube video link such as
i want to get the title, description, video duration, thumbnail link, and videoid ie.: v=gJ_3BN0m7S8
just the bold part from a link

and yeah without the use of an webbrowser since they seem to be really slow

and no code is done for now ill manage to implement into a project im workin on later this is the only part i need, thanks!

vb.net

Edited 10 Years Ago by AnooooPower because: added last sentence

3 Contributors
4 Replies
704 Views
23 Hours Discussion Span
Latest Post 10 Years Ago Latest Post by AnooooPower

All 4 Replies

cgeier 187 Junior Poster

10 Years Ago

You could use WebClient or HttpWebRequest to get the URL source. Then look for the desired info.

Add the following Imports statements:

Imports System.IO
Imports System.Net

Version 1 (using WebClient):
getUrlSource:

Public Function getUrlSource() As String
    Dim urlSource As String = String.Empty
    Dim client As New System.Net.WebClient

    Try
        'create request to web server (url)
        urlSource = client.DownloadString(_url)
    Finally
        If Not client Is Nothing Then
            client.Dispose()
        End If
    End Try

    Return urlSource
End Function

Version 2 (using HttpWebRequest):
getUrlSource:

Public Function getUrlSource() As String
    Dim errMsg As String = String.Empty
    Dim urlSource As String = String.Empty
    Dim request As System.Net.HttpWebRequest = Nothing
    Dim response As System.Net.HttpWebResponse = Nothing
    Dim sr As System.IO.StreamReader = Nothing

    Try
        'create request to web server (url)
        request = HttpWebRequest.Create(_url)

        'set value for AutoRedirect
        request.AllowAutoRedirect = True

        'receives response from request sent to 
        'web server
        response = request.GetResponse()

        If Not response.StatusCode = HttpStatusCode.OK Then
            errMsg = "Response status code: " & response.StatusCode
            MessageBox.Show(errMsg, "Error - Response", MessageBoxButtons.OK, MessageBoxIcon.Error)
            Return errMsg
        End If

        'get encoding
        Dim encoding = System.Text.Encoding.GetEncoding(response.CharacterSet)

        'use StreamReader to read urlSource
        sr = New StreamReader(response.GetResponseStream(), encoding)
        urlSource = sr.ReadToEnd()

    Finally
        'close StreamReader
        If Not sr Is Nothing Then
            sr.Close()
        End If
    End Try

    Return urlSource
End Function

Then get the information from the html source code:

First I create a class to hold the desired data. I will call it "YTInfo.vb"

YTInfo.vb

Public Class YTnfo
    Public Property Author As String
    Public Property ChannelId As String
    Public Property Description As String
    Public Property Duration As String
    Public Property ThumbnailUrl As String
    Public Property Title As String
    Public Property Url As String
    Public Property VideoId As String

End Class

You can use a different method to get your desired data from the html source, but I use a WebBrowser and HtmlDocument.

extractDataFromHtml:

Private Function extractDataFromHtml(ByVal urlSource As String) As YTnfo

    Dim elemContent As String = String.Empty
    Dim elemHref As String = String.Empty
    Dim elemName As String = String.Empty
    Dim elemItemProp As String = String.Empty
    Dim elemProperty As String = String.Empty

    Dim htmlDoc As System.Windows.Forms.HtmlDocument
    Dim metaElements As System.Windows.Forms.HtmlElementCollection
    Dim linkElements As System.Windows.Forms.HtmlElementCollection
    Dim spanElements As System.Windows.Forms.HtmlElementCollection
    Dim webBrowser1 As New System.Windows.Forms.WebBrowser

    Dim info As New YTnfo


    If Not String.IsNullOrEmpty(urlSource) Then
        Console.WriteLine("getting tags...")

        'suppress scripting errors
        webBrowser1.ScriptErrorsSuppressed = True

        'open a blank web page
        webBrowser1.Navigate("about:blank")

        'set htmlDoc = to a new WebBrowser document
        htmlDoc = webBrowser1.Document.OpenNew(True)

        'write html to the blank
        'HTML document
        htmlDoc.Write(urlSource)

        'get "meta" elements
        metaElements = htmlDoc.GetElementsByTagName("meta")

        For Each element As System.Windows.Forms.HtmlElement In metaElements

            elemContent = element.GetAttribute("content")
            elemHref = element.GetAttribute("href")
            elemName = element.GetAttribute("name")
            elemItemProp = element.GetAttribute("itemprop")
            elemProperty = element.GetAttribute("property")

            If elemName = "title" Then
                info.Title = elemContent
            End If

            If elemItemProp = "channelId" Then
                info.ChannelId = elemContent
            ElseIf elemItemProp = "description" Then
                info.Description = elemContent
            ElseIf elemItemProp = "duration" Then
                info.Duration = elemContent
            ElseIf elemItemProp = "videoId" Then
                info.VideoId = elemContent
            End If

            If elemProperty = "og:title" Then
                info.Title = elemContent
            ElseIf elemProperty = "og:url" Then
                info.Url = elemContent
            End If
        Next

        'get "link" elements
        linkElements = htmlDoc.GetElementsByTagName("link")

        For Each element As System.Windows.Forms.HtmlElement In linkElements

            elemContent = String.Empty
            elemHref = String.Empty
            elemName = String.Empty
            elemItemProp = String.Empty
            elemProperty = String.Empty

            elemHref = element.GetAttribute("href")
            elemItemProp = element.GetAttribute("itemprop")

            If elemItemProp = "thumbnailUrl" Then
                info.ThumbnailUrl = elemHref
            ElseIf elemItemProp = "url" Then
                Console.WriteLine("URL: " & elemHref)
            End If
        Next

        'get "span" elements - to get Author
        spanElements = htmlDoc.GetElementsByTagName("span")

        For Each element As System.Windows.Forms.HtmlElement In spanElements

            elemHref = element.GetAttribute("href")
            elemItemProp = element.GetAttribute("itemprop")

            If elemItemProp = "author" Then

                'get "link" elements inside of the "span" element
                linkElements = element.GetElementsByTagName("link")

                For Each spanLinkElement As System.Windows.Forms.HtmlElement In linkElements

                    If spanLinkElement.GetAttribute("itemprop") = "url" Then
                        If spanLinkElement.GetAttribute("href").StartsWith("http://www.youtube.com/user") Then
                            info.Author = spanLinkElement.GetAttribute("href")
                        End If
                    End If
                Next
            End If
        Next
    End If

    Return info
End Function

Another option would be to use regex. You can see how to use regex in my post here.

Look at "extractData".

A regex pattern for "Title" would look like:

Dim patternTitle As String = "<meta name=""title"" content=""(?<title>.*)"">"

Or

Dim patternTitle As String = "<meta property=""og:title"" content=""(?<title>.*)"">"

However, I've read that one should not use regex to parse html.

Here is some of the info you might look for:

Author:

<span itemprop="author" itemscope itemtype="http://schema.org/Person">
  <link itemprop="url" href="http://www.youtube.com/user/amedve1">
</span>

ChannelId:

<meta itemprop="channelId" content="UC8Tvbkn57bAs_yOjaI1XiCg">

Description:

<meta name="description" content="Bear to the rescue. Filmed at Budapest ZOO (Hungary), 19. 6. 2014 Camera: Panasonic Lumix DMC-FZ72">

Duration:

<meta itemprop="duration" content="PT2M13S">

Title:

<meta property="og:title" content="Crow rescue">

<meta name="title" content="Crow rescue">

Thumbnail URL:

<meta property="og:image" content="http://i.ytimg.com/vi/gJ_3BN0m7S8/maxresdefault.jpg">

<link itemprop="thumbnailUrl" href="http://i.ytimg.com/vi/gJ_3BN0m7S8/maxresdefault.jpg">

Url:

<meta property="og:url" content="http://www.youtube.com/watch?v=gJ_3BN0m7S8">

<meta name="twitter:url" content="http://www.youtube.com/watch?v=gJ_3BN0m7S8">

VideoId:

<meta itemprop="videoId" content="gJ_3BN0m7S8">

Resources:
WebClient.DownloadString Method (String)

HtmlDocument.Write Method

How to obtain HtmlDocument from a string of a webpage?

Read text from response

Edited 10 Years Ago by cgeier

AnooooPower commented: thank you very much for this, well detailed! +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Hiroshe 499 Posting Whiz in Training · Answer 1 · 2014-08-02T19:50:20+00:00

Hiroshe 499 Posting Whiz in Training

10 Years Ago

https://developers.google.com/youtube/

cgeier 187 Junior Poster · Answer 2 · 2014-08-03T01:27:27+00:00

For duration, you see the following:

<meta itemprop="duration" content="PT2M13S">

It appears that “PT” marks the start of the time part of the value. "M" is for minute(s). "S" is for seconds.

So in the above: 2 minutes and 13 seconds

AnooooPower 3 Newbie Poster · Answer 3 · 2014-08-03T06:28:58+00:00

thank you very much cgeier i will try this tomorrow after work and come back with feedback, much appreciated for this well detailed!

get some info from youtube

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers