How to retrieve data from feed using php

If you want to get data from feed and show on your website. There are few steps to follow.

s2pfeedparser

Requirement:


  1. Localhost (Xampp/Wamp) or Online host
  2. Feed url something like http://feeds.feedburner.com/ours2ptech . It may be different
  3. Basic knowledge php
  4. Some time (30 Mins)


  • If you have fulfill all requirement. Then lets try to make a feed parser. I am taking example of retrieving BBC news from feed.
  • If you are running localhost. Then start xampp or wamp.
  • I have a BBC new feed like http://feeds.bbci.co.uk/news/rss.xml. When you open this link. You will not see its data in xml format. You can see it by view source.
  • Click Ctrl+u on keyboard. You can see its xml.
  • Now you are ready to start. Follow these steps


  1. Create a folder in htdocs. You can give any name. For example s2pfeed.
  2. Create a page index.php . I write php codes in this page.
  3. We are retrieving data from other websites or feeds. So i am using curl. This is the best way to carry data from other websites. We can use php inbuilt class SimpleXmlElement to extract xml data.
  4. Flow chart: FeedUrl ----- Curl data ---- Data array

try{
$ch = curl_init("http://feeds.bbci.co.uk/news/rss.xml");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
}
catch(Exception $e)
{
  echo 'Invalid Feed';
  //exit;
}
Before move to next step. Check carefully source code of feedurl. You will see something like following
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/shared/bsp/xsl/rss/nolsol.xsl"?>
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>BBC News - Home</title>
<link>http://www.bbc.co.uk/news/#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>
<description>The latest stories from the Home section of the BBC News web site.</description>
<language>en-gb</language>
<lastBuildDate>Wed, 19 Feb 2014 20:57:35 GMT</lastBuildDate>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/2/hi/help/rss/4498287.stm for terms and conditions of reuse.</copyright>
<image>
<url>http://news.bbcimg.co.uk/nol/shared/img/bbc_news_120x60.gif</url>
<title>BBC News - Home</title>
<link>http://www.bbc.co.uk/news/#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>
<width>120</width>
<height>60</height>
</image>
<ttl>15</ttl>
<atom:link href="http://feeds.bbci.co.uk/news/rss.xml" rel="self" type="application/rss+xml"/>
<item>
<title>Blair 'advised Brooks before arrest'</title>
<description>Former Prime Minister Tony Blair gave advice to News International boss Rebekah Brooks on handling the developing phone-hacking scandal days before her arrest, a court hears.</description>
<link>http://www.bbc.co.uk/news/uk-26259956#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/uk-26259956</guid>
<pubDate>Wed, 19 Feb 2014 16:24:51 GMT</pubDate>
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/73083000/jpg/_73083312_021157970-1.jpg"/>
<media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/73086000/jpg/_73086069_021157970-1.jpg"/>
</item>
<item><title>Ukraine president sacks army chief</title>
<description>Ukrainian President Viktor Yanukovych sacks the head of the armed forces, amid protests that have turned Kiev into a battle zone.</description>
<link>http://www.bbc.co.uk/news/world-europe-26265808#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>
<guid isPermaLink="false">http://www.bbc.co.uk/news/world-europe-26265808</guid>
<pubDate>Wed, 19 Feb 2014 20:09:37 GMT</pubDate>
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/73097000/jpg/_73097274_73097252.jpg"/> 
<media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/73097000/jpg/_73097275_73097252.jpg"/>
</item>
.
.
.
<item> ... </item>
 This is making chain like
xml
   channel
       title
       link
       description
       ...
       item
          title
          description
          link
          guid --- isPermaLink
          pubDate
          media --- thumbnail
  Item node represents individual post on feed.  Now come back to php coding.  Check above mentioned curl script.
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);  
Here $doc is an instance carrying all data of xml.

If you want to get data of channel such as title, link, description and etc.

Use following code.
$title = $doc->channel->title;
$link = $doc->channel->link;
$description = $doc->channel->description;

If you want to retrieve single recent item (post) data

Use following code
$title = $doc->channel->item[0]->title;
$description = $doc->channel->item[0]->description;
$link = $doc->channel->item[0]->link;

You can retrieve all data.

But if you want to retrieve multiple recent post data

Use following code
$cnt = count($doc->channel->item); // this variable contains total number of posts showing in this feed
for($i=0; $i<$cnt; $i++)
    {
 $url  = $doc->channel->item[$i]->link;
 $title  = $doc->channel->item[$i]->title;
 $desc = $doc->channel->item[$i]->description;
 $pubDate = $doc->channel->item[$i]->pubDate;
     }
But you cannot retrieve node like
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/73083000/jpg/_73083312_021157970-1.jpg"/>

What can we do? Very simple. Do some tricks.

In this node, you have two things namespace (media) seperated by colon (:) and some attributes.

To retrieve data from namespaces media, Check the second line of feed source code.
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
Get url from media and url is http://search.yahoo.com/mrss/ .

Now your php code:

$media = $doc->channel->item[$i]->children('http://search.yahoo.com/mrss/')->thumbnail;
Is it return any value. Answer is no. Actually values are here in form of attributes.
So,
$thumb = $media->attributes();$thumburl = $thumb['url'];$thumbwidth = $thumb['width'];
Here in this feed there are multiple media. So you can use following code to retrieve media data
$thumbcount = count($media);
for($j=0;$j<=$thumbcount;$i++)
{
    $thumb = $media[$j]->attributes();
    $thumburl = $thumb['url'];
    $thumbwidth = $thumb['width'];
}
That's it.
Check complete code here:
<?php
try{
$ch = curl_init("http://feeds.bbci.co.uk/news/rss.xml");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
}
catch(Exception $e)
{
  echo 'Invalid Feed';
  //exit;
}
$cnt = count($doc->channel->item); // this variable contains total number of posts showing in this feed
for($i=0; $i<$cnt; $i++)
    {
 $url  = $doc->channel->item[$i]->link;
 $title  = $doc->channel->item[$i]->title;
 $desc = $doc->channel->item[$i]->description;
 $pubDate = $doc->channel->item[$i]->pubDate;
$cnt = count($doc->channel->item); // this variable contains total number of posts showing in this feed
for($i=0; $i<$cnt; $i++)
    {
 $url  = $doc->channel->item[$i]->link;
 $title  = $doc->channel->item[$i]->title;
 $desc = $doc->channel->item[$i]->description;
 $pubDate = $doc->channel->item[$i]->pubDate;
                  $media = $doc->channel->item[$i]->children('http://search.yahoo.com/mrss/')->thumbnail;

$thumbcount = count($media);
for($j=0;$j<=$thumbcount;$i++)
{
    $thumb = $media[$j]->attributes();
    $thumburl = $thumb['url'];
    $thumbwidth = $thumb['width'];
}

     }
    }

?>
I think, this is a complete feed parser for you.
If you have any query. Please comment. I will try to give response as soon as possible.
0 Comments
Disqus
Fb Comments
Comments :

0 comments:

Post a Comment