,,,,,,,,,,,,,,,,,,,,,, function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d2857319436597256176\x26blogName\x3dMarcin+Probola\x26publishMode\x3dPUBLISH_MODE_HOSTED\x26navbarType\x3dLIGHT\x26layoutType\x3dLAYOUTS\x26searchRoot\x3dhttp://blog.cinu.pl/search\x26blogLocale\x3den\x26v\x3d2\x26homepageUrl\x3dhttp://blog.cinu.pl/\x26targetPostID\x3d3661101566250322549\x26blogPostOrPageUrl\x3dhttp://blog.cinu.pl/2013/09/crawling-and-parsing-web-pages-in.html\x26vt\x3d-3452049114641030242', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); (function() { var script = document.createElement('script'); script.type = 'text/javascript'; script.src = '//pagead2.googlesyndication.com/pagead/js/google_top_exp.js'; var head = document.getElementsByTagName('head')[0]; if (head) { head.appendChild(script); }})(); Marcin Probola Sunday, September 1, 2013 Crawling and parsing web pages in javascript directly from your web browser Introduction Developer tools that are built in all modern browsers are powerful tools in a skillful hands. In this post I will show you how you can use them (essentially javascript console) to parse web pages. If you are not familiar with any developer tools in web browsers, please read some introduction first. You should also have basic knowledge of html, javascript and jquery. I'll use Google Chrome as a web browser. Idea Basically in browsers javascript console we can execute javascript code in a context of current web page. Using ajax (XMLHttpRequest) we can also fetch html from nested urls and parse them as well (like crawlers do). It isn't complicated or innovative, but there are two things that are worth mentioning. I'll use jquery to produce smaller and easier code, because of its selectors and built-in ajax method. When page doesn't use that library already, we need to inject it. It will be shown later in "Live example" how to do that. On ajax-based pages it's better to disable origin policy checking by web browser, because sometimes ajax requests will trigger origin errors like "Origin http://www.example.com is not allowed by Access-Control-Allow-Origin". In google chrome we can do it by executing it with --args --disable-web-security parameter. You can read more about origin policies here and here. Basic example I prepared really basic, static web page to demonstrate idea. The url is http://cinu.pl/research/jsparsing/ Source code of this web page is: index.html: <html> <head> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script> </head> <body> <a href="a.html">link 1</a> <a href="b.html">link 2</a> <a href="c.html">link 3</a> </body> </html> a.html,b.html,c.html contains a div with value we want to read: <html> <body> <div class="container"> <div class="data">VALUE WE WANT TO FETCH</div> </div> </body> </html> As you can see in index.html there is already included jquery library so there is no need to inject it. The parser code is: var out = ''; // container for fetched values function parse() { $('a').each( // go through each anchor on page and make ajax request to fetch html function(idx, item) { var url = $(item).attr('href'); // get url console.log('Fetching: '+ url); // debug note // make ajax request (http://api.jquery.com/jQuery.ajax/) $.ajax({ url: url, async: false, // do it synchronously }).done(function(data) { // data variable contains fetched html var dataRetrieved = $('div',$(data)).html(); // get value we're looking for console.log( 'Retrieved ' + dataRetrieved); // debug note out += dataRetrieved + "\n"; // save retrieved value (+ separator) }); } ); console.log("-----------------\nParsing done, output:\n"+out); // print out parsed values } Go to http://cinu.pl/research/jsparsing/, paste above code in Developer tools console and hit enter. To execute this code just write "parse()" and hit enter. Result: I guess this code is well documented, so there is no need to describe what it does, so lets try to do some more complicated example. Live example - parsing aliexpress.com The main goal is to fetch first 5 items from products category (I'll use wireless routers as an example) and check if there is any "feedback" from poland country on first page of feedback. This task seems silly and parsed data is rather useless but this is only example which helps me to utilize things I have previously written. Step 1. Injecting JQuery Since aliexpress doesn't use jquery we need to inject it. Injection code: var $jq; // jquery handler to avoid $ conflicts function injectJquery() { var script = document.createElement('script'); script.setAttribute('type', 'text/javascript'); script.setAttribute('src', '//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js'); // fetch it from googles CDNs // Give $ back to whatever took it before; create new alias to jQuery. script.setAttribute('onload','javascript:$jq = jQuery.noConflict();'); document.body.insertBefore(script, document.body.firstChild); } injectJquery(); // call it automatically when paste into console We can see that apart from simple injection we also make a jQuery.noConflict() call and assign jquery to $jq and not $. We need to do that because some scripts can also use $ (prototype.js for instance) and we need to give $ variable back to it because some parts of javascript code on target page might be broken. Step 2. Get urls of products we want to parse "feedback" on We need to remember that when we are fetching static pages through ajax, javascript won't be parsed and executed and we need do it manually. Because "Feedback" tab is loaded dynamically with javascript we won't get "Feedback" data in html when we fetch product page. We will handle it in next step, for now parser code is: var productsNum = 5; function parse() { var urls = $jq('a.product'); for(var i=0;i<productsNum && i<urls.length; i++) { var url = $jq(urls[i]).attr('href'); // get url console.log('Fetching: '+ url); // debug note // make ajax request $jq.ajax({ url: url, async: false, // do it synchronously }).done(function(data) { // data variable contains fetched html var parsedDom = $jq(data); // check if it works console.log( '[TEST] item price: ' + $jq('#sku-price', parsedDom).html() ); }); } } Step 3. Find a way to fetch feedback (cause it's dynamically fetched through ajax). First of all we need to get url where http requests for feedback data goes. To do that we need to look in Network tab of Developer Tools, press "Feedback" tab on web page and check "Documents" and "XHR" checkboxes (we don't need scripts, images, fonts etc.). We can see couple of interesting urls like: http://www.aliexpress.com/store/productGroupsAjax.htm?storeId=413596 [with JSON response] http://www.aliexpress.com/findRelatedProducts.htm?productId=733919144&type=new [with JSON response] But what we are looking for is: http://feedback.aliexpress.com/display/productEvaluation.htm?productId=733919144&ownerMemberId=201779865&companyId=214347019&memberType=seller&startValidDate=&i18n=true It contains raw HTML response. When we look into "response" we will see that this is exactly what are we looking for. Now we need to take a closer look into parameters in url, that are: productId=733919144 ownerMemberId=201779865 companyId=214347019 memberType=seller startValidDate= i18n=true We can extract productId from product url for example in http://www.aliexpress.com/item/Hot-Sale-Wireless-N-Networking-Device-Wifi-Wi-Fi-Repeater-Booster-Router-Range-Expander-300Mbps-2dBi/733919144.html (product id is 733919144) Only two of them are unknown: ownerMemberId and companyId. However if we look in the product page source code we will find it inside script tag: ... window.runParams.adminSeq="201779865"; window.runParams.companyId="214347019"; ... We need to get it directly from the html code. I'll use regular expressions: ... var rx = /window.runParams.adminSeq="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var adminSeq = arr[1]; var rx = /window.runParams.companyId="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var companyId = arr[1]; console.log('Parsed runParams: ' + adminSeq + ' ' +companyId); ... If you look closer you can see that productId is also in source code in window.runParams, so we will get it like adminSeq and companyId. parse() function now looks like this: var productsNum = 5; function parse() { var urls = $jq('a.product'); for(var i=0;i<productsNum && i<urls.length; i++) { var url = $jq(urls[i]).attr('href'); // get url console.log('Fetching: '+ url); // debug note // make ajax request $jq.ajax({ url: url, async: false, // do it synchronously }).done(function(data) { // data variable contains fetched html //var parsedDom = $jq(data); // we dont need parsedDom since we will be executing regular expressions on raw html // construct feedbackUrl: var rx = /window.runParams.adminSeq="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var adminSeq = arr[1]; var rx = /window.runParams.companyId="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var companyId = arr[1]; var rx = /window.runParams.productId="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var productId = arr[1]; var feedbackUrl = 'http://feedback.aliexpress.com/display/productEvaluation.htm?productId='+productId+'&ownerMemberId='+adminSeq+'&companyId='+companyId+'&memberType=seller&startValidDate=&i18n=true'; console.log('Feedback url: '+feedbackUrl); // here we'll make another ajax call to fetch feedback data }); } } 4. Final step: Avoiding Origin policy checking and parse feedback html and check for searched country If we try to make ajax call on prepared feedbackUrl in our parse() function we will see in console that "Origin http://www.aliexpress.com is not allowed by Access-Control-Allow-Origin" browser error. In Google Chrome we can bypass it by adding --args --disable-web-security when we execute binary. Looking into feedbacks html we can see that flag indicating users country is described as follows: <span class="state"><b class="css_flag css_br"></b></span> Simple jquery selector will do the job: $jq('b.css_'+countryCode); The final code is: // jquery injection var $jq; // jquery handler to avoid $ conflicts function injectJquery() { var script = document.createElement('script'); script.setAttribute('type', 'text/javascript'); script.setAttribute('src', '//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js'); // fetch it from googles CDNs // Give $ back to whatever took it before; create new alias to jQuery. script.setAttribute('onload','javascript:$jq = jQuery.noConflict();'); document.body.insertBefore(script, document.body.firstChild); } injectJquery(); // parsing var productsNum = 5; function parse(country) { var urls = $jq('a.product'); for(var i=0;i<productsNum && i<urls.length; i++) { var url = $jq(urls[i]).attr('href'); // get url console.log('Fetching: '+ url); // debug note // make ajax request $jq.ajax({ url: url, async: false, // do it synchronously }).done(function(data) { // data variable contains fetched html //var parsedDom = $jq(data); // we dont need parsedDom since we will be executing regular expressions on raw html // construct feedbackUrl: var rx = /window.runParams.adminSeq="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var adminSeq = arr[1]; var rx = /window.runParams.companyId="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var companyId = arr[1]; var rx = /window.runParams.productId="(\d+)"/g; var arr = rx.exec(data); // data contains product page html var productId = arr[1]; var feedbackUrl = 'http://feedback.aliexpress.com/display/productEvaluation.htm?productId='+productId+'&ownerMemberId='+adminSeq+'&companyId='+companyId+'&memberType=seller&startValidDate=&i18n=true'; // get feedback page and check if there is searched country $jq.ajax({ // to make that request we need to disable web security in google chrome url: feedbackUrl, async: false, }).done(function(data) { console.log( $jq('b.css_'+country, $jq(data)).length ); // check if element with css_country class exists: if ( $jq('b.css_'+country, $jq(data)).length ) { console.log('[FOUND] item: '+url); } }); }); } } We executing it with parse('pl') when we want to check if there is a feedback from poland. Some thoughs In above example we made operations on a raw html code, however using json is a lot easier, because we don't need to use regular expressions, jquery selectors, etc. to fetch data. Another thing is that we don't need to store data in console log. We can inject some div into webpage and then store results in it. Posted by Unknown at Sunday, September 01, 2013 Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Labels: crawling, javascript, parsing 16 comments: (function() { var items = null; var msgs = null; var config = {}; // 0) { cursor = parseInt(items[items.length - 1].timestamp) + 1; } var bodyFromEntry = function(entry) { var text = (entry && ((entry.content && entry.content.$t) || (entry.summary && entry.summary.$t))) || ''; if (entry && entry.gd$extendedProperty) { for (var k in entry.gd$extendedProperty) { if (entry.gd$extendedProperty[k].name == 'blogger.contentRemoved') { return '' + text + ''; } } } return text; } var parse = function(data) { cursor = null; var comments = []; if (data && data.feed && data.feed.entry) { for (var i = 0, entry; entry = data.feed.entry[i]; i++) { var comment = {}; // comment ID, parsed out of the original id format var id = /blog-(\d+).post-(\d+)/.exec(entry.id.$t); comment.id = id ? id[2] : null; comment.body = bodyFromEntry(entry); comment.timestamp = Date.parse(entry.published.$t) + ''; if (entry.author && entry.author.constructor === Array) { var auth = entry.author[0]; if (auth) { comment.author = { name: (auth.name ? auth.name.$t : undefined), profileUrl: (auth.uri ? auth.uri.$t : undefined), avatarUrl: (auth.gd$image ? auth.gd$image.src : undefined) }; } } if (entry.link) { if (entry.link[2]) { comment.link = comment.permalink = entry.link[2].href; } if (entry.link[3]) { var pid = /.*comments\/default\/(\d+)\?.*/.exec(entry.link[3].href); if (pid && pid[1]) { comment.parentId = pid[1]; } } } comment.deleteclass = 'item-control blog-admin'; if (entry.gd$extendedProperty) { for (var k in entry.gd$extendedProperty) { if (entry.gd$extendedProperty[k].name == 'blogger.itemClass') { comment.deleteclass += ' ' + entry.gd$extendedProperty[k].value; } else if (entry.gd$extendedProperty[k].name == 'blogger.displayTime') { comment.displayTime = entry.gd$extendedProperty[k].value; } } } comments.push(comment); } } return comments; }; var paginator = function(callback) { if (hasMore()) { var url = config.feed + '?alt=json&v=2&orderby=published&reverse=false&max-results=50'; if (cursor) { url += '&published-min=' + new Date(cursor).toISOString(); } window.bloggercomments = function(data) { var parsed = parse(data); cursor = parsed.length < 50 ? null : parseInt(parsed[parsed.length - 1].timestamp) + 1 callback(parsed); window.bloggercomments = null; } url += '&callback=bloggercomments'; var script = document.createElement('script'); script.type = 'text/javascript'; script.src = url; document.getElementsByTagName('head')[0].appendChild(script); } }; var hasMore = function() { return !!cursor; }; var getMeta = function(key, comment) { if ('iswriter' == key) { var matches = !!comment.author && comment.author.name == config.authorName && comment.author.profileUrl == config.authorUrl; return matches ? 'true' : ''; } else if ('deletelink' == key) { return config.baseUri + '/delete-comment.g?blogID=' + config.blogId + '&postID=' + comment.id; } else if ('deleteclass' == key) { return comment.deleteclass; } return ''; }; var replybox = null; var replyUrlParts = null; var replyParent = undefined; var onReply = function(commentId, domId) { if (replybox == null) { // lazily cache replybox, and adjust to suit this style: replybox = document.getElementById('comment-editor'); if (replybox != null) { replybox.height = '250px'; replybox.style.display = 'block'; replyUrlParts = replybox.src.split('#'); } } if (replybox && (commentId !== replyParent)) { replybox.src = ''; document.getElementById(domId).insertBefore(replybox, null); replybox.src = replyUrlParts[0] + (commentId ? '&parentID=' + commentId : '') + '#' + replyUrlParts[1]; replyParent = commentId; } }; var hash = (window.location.hash || '#').substring(1); var startThread, targetComment; if (/^comment-form_/.test(hash)) { startThread = hash.substring('comment-form_'.length); } else if (/^c[0-9]+$/.test(hash)) { targetComment = hash.substring(1); } // Configure commenting API: var configJso = { 'maxDepth': config.maxThreadDepth }; var provider = { 'id': config.postId, 'data': items, 'loadNext': paginator, 'hasMore': hasMore, 'getMeta': getMeta, 'onReply': onReply, 'rendered': true, 'initComment': targetComment, 'initReplyThread': startThread, 'config': configJso, 'messages': msgs }; var render = function() { if (window.goog && window.goog.comments) { var holder = document.getElementById('comment-holder'); window.goog.comments.render(holder, provider); } }; // render now, or queue to render when library loads: if (window.goog && window.goog.comments) { render(); } else { window.goog = window.goog || {}; window.goog.comments = window.goog.comments || {}; window.goog.comments.loadQueue = window.goog.comments.loadQueue || []; window.goog.comments.loadQueue.push(render); } })(); // ]]> Mathew StephenDecember 15, 2015 at 10:57 AMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyJeanne DaviesApril 21, 2016 at 8:44 AMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplySam MaronJune 30, 2016 at 3:25 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyPeter BrooksAugust 30, 2016 at 2:54 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyPriya ROctober 18, 2016 at 2:21 PMWell Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.Web Designing Course in Chennai | web designing training in chennai ReplyDeleteRepliesReplyPriya RNovember 4, 2016 at 1:38 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyJordonMay 15, 2017 at 1:38 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplychang mangSeptember 18, 2017 at 1:55 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyASO ServicesDecember 29, 2017 at 3:52 AMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyfor IT theFebruary 13, 2018 at 3:42 PMThis comment has been removed by a blog administrator.ReplyDeleteRepliesReplyyoyoserviceDecember 5, 2018 at 6:43 PMI am very enjoyed for this blog. Its an informative topic. It help me very much to solve some problems. Its opportunity are so fantastic and working style so speedy.Mason SoizaReplyDeleteRepliesReplypavithra dassFebruary 4, 2019 at 12:29 PMI am really liked reading your nice articles. It looks like you spend a lot of time on your blog. I have saved it and I am looking forward to reading new articles. Keep it up the good work.Hadoop Training in Chennai Big Data Training in Chennai German Classes in Chennai hadoop training in OMR hadoop training in Tambaram big data course in chennai Hadoop course in chennai ReplyDeleteRepliesReplyMuhammad RafeyMay 3, 2019 at 6:43 PMThis is just the information I am finding everywhere. Thanks for your blog, I just subscribe your blog. This is a nice blog.. WebdesignReplyDeleteRepliesReplyRathinamMay 11, 2019 at 8:51 AMSuch a wonderful blog and the content was very interesting. Thanks for giving the great post with sharing us and keep blogging...Social Media Marketing Courses in ChennaiSocial Media TrainingOracle Training in ChennaiTableau Training in ChennaiPrimavera Training in ChennaiUnix Training in ChennaiPower BI Training in ChennaiSocial Media Marketing Courses in ChennaiSocial Media Marketing Training in ChennaiReplyDeleteRepliesReplysathyarameshMay 14, 2019 at 10:10 AMGood job! Fruitful article. I like this very much. It is very useful for my research. It shows your interest in this topic very well. I hope you will post some more information about the software. Please keep sharing!! Hadoop Training in Chennai Big Data Training in ChennaiBlue Prism Training in ChennaiCCNA Course in ChennaiCloud Computing Training in ChennaiData Science Course in Chennai Big Data Training in Chennai Annanagar Hadoop Training in Velachery ReplyDeleteRepliesReplyHalesMay 18, 2019 at 9:33 PMI definitely enjoying every little bit of it. It is a great website and nice share. I want to thank you. Good job! You guys do a great blog, and have some great contents. Keep up the good work. Webdesign bureauReplyDeleteRepliesReplyAdd commentLoad more... BLOG_CMT_createIframe('https://www.blogger.com/rpc_relay.html'); Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Archive November (1) December (1) June (2) April (3) September (1) August (1) July (2) May (1) April (1) March (1) February (1) January (3) December (1) Powered by Blogger. window.setTimeout(function() { document.body.className = document.body.className.replace('loading', ''); }, 10); (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-52507329-1', 'auto', 'blogger'); ga('blogger.send', 'pageview'); document.addEventListener('DOMContentLoaded', function(event) { window.cookieChoices && cookieChoices.showCookieConsentBar && cookieChoices.showCookieConsentBar( (window.cookieOptions && cookieOptions.msg) || 'Acest site folose\u0219te cookie-uri de la Google pentru livrarea serviciilor \u0219i analizarea traficului. Adresa dvs. IP \u0219i programul user agent sunt trimise c\u0103tre Google \xeempreun\u0103 cu valorile privind performan\u021ba \u0219i securitatea pentru asigurarea calit\u0103\u021bii serviciului, generarea statisticilor de utilizare, detectarea \u0219i remedierea abuzurilor.', (window.cookieOptions && cookieOptions.close) || 'Ok', (window.cookieOptions && cookieOptions.learn) || 'Afla\u021bi mai multe', (window.cookieOptions && cookieOptions.link) || 'https://www.blogger.com/go/blogspot-cookies'); }); window['__wavt'] = 'AOuZoY7q0xkUnp_57Imdy4alSkUOisbjGA:1575708426125';_WidgetManager._Init('//www.blogger.com/rearrange?blogID\x3d2857319436597256176','//blog.cinu.pl/2013/09/crawling-and-parsing-web-pages-in.html','2857319436597256176'); _WidgetManager._SetDataContext([{'name': 'blog', 'data': {'blogId': '2857319436597256176', 'title': 'Marcin Probola', 'url': 'http://blog.cinu.pl/2013/09/crawling-and-parsing-web-pages-in.html', 'canonicalUrl': 'http://blog.cinu.pl/2013/09/crawling-and-parsing-web-pages-in.html', 'homepageUrl': 'http://blog.cinu.pl/', 'searchUrl': 'http://blog.cinu.pl/search', 'canonicalHomepageUrl': 'http://blog.cinu.pl/', 'blogspotFaviconUrl': 'http://blog.cinu.pl/favicon.ico', 'bloggerUrl': 'https://www.blogger.com', 'hasCustomDomain': true, 'httpsEnabled': false, 'enabledCommentProfileImages': true, 'gPlusViewType': 'FILTERED_POSTMOD', 'adultContent': false, 'analyticsAccountNumber': 'UA-52507329-1', 'encoding': 'UTF-8', 'locale': 'en', 'localeUnderscoreDelimited': 'en', 'languageDirection': 'ltr', 'isPrivate': false, 'isMobile': false, 'isMobileRequest': false, 'mobileClass': '', 'isPrivateBlog': false, 'feedLinks': '\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Marcin Probola - Atom\x22 href\x3d\x22http://blog.cinu.pl/feeds/posts/default\x22 /\x3e\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/rss+xml\x22 title\x3d\x22Marcin Probola - RSS\x22 href\x3d\x22http://blog.cinu.pl/feeds/posts/default?alt\x3drss\x22 /\x3e\n\x3clink rel\x3d\x22service.post\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Marcin Probola - Atom\x22 href\x3d\x22https://www.blogger.com/feeds/2857319436597256176/posts/default\x22 /\x3e\n\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Marcin Probola - Atom\x22 href\x3d\x22http://blog.cinu.pl/feeds/3661101566250322549/comments/default\x22 /\x3e\n', 'meTag': '', 'adsenseHostId': 'ca-host-pub-1556223355139109', 'adsenseHasAds': false, 'ieCssRetrofitLinks': '\x3c!--[if IE]\x3e\x3cscript type\x3d\x22text/javascript\x22 src\x3d\x22https://www.blogger.com/static/v1/jsbin/1270012344-ieretrofit.js\x22\x3e\x3c/script\x3e\n\x3c![endif]--\x3e', 'view': '', 'dynamicViewsCommentsSrc': '//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js', 'dynamicViewsScriptSrc': '//www.blogblog.com/dynamicviews/7b593bfcd3c3b1be', 'plusOneApiSrc': 'https://apis.google.com/js/plusone.js', 'disableGComments': true, 'sharing': {'platforms': [{'name': 'Get link', 'key': 'link', 'shareMessage': 'Get link', 'target': ''}, {'name': 'Facebook', 'key': 'facebook', 'shareMessage': 'Share to Facebook', 'target': 'facebook'}, {'name': 'BlogThis!', 'key': 'blogThis', 'shareMessage': 'BlogThis!', 'target': 'blog'}, {'name': 'Twitter', 'key': 'twitter', 'shareMessage': 'Share to Twitter', 'target': 'twitter'}, {'name': 'Pinterest', 'key': 'pinterest', 'shareMessage': 'Share to Pinterest', 'target': 'pinterest'}, {'name': 'Email', 'key': 'email', 'shareMessage': 'Email', 'target': 'email'}], 'disableGooglePlus': true, 'googlePlusShareButtonWidth': 300, 'googlePlusBootstrap': '\x3cscript type\x3d\x22text/javascript\x22\x3ewindow.___gcfg \x3d {\x27lang\x27: \x27en\x27};\x3c/script\x3e'}, 'hasCustomJumpLinkMessage': false, 'jumpLinkMessage': 'Read more', 'pageType': 'item', 'postId': '3661101566250322549', 'postImageUrl': 'http://cinu.pl/blog/jsparsing_0.png', 'pageName': 'Crawling and parsing web pages in javascript directly from your web browser', 'pageTitle': 'Marcin Probola: Crawling and parsing web pages in javascript directly from your web browser'}}, {'name': 'features', 'data': {'sharing_get_link_dialog': 'true', 'sharing_native': 'false'}}, {'name': 'messages', 'data': {'edit': 'Edit', 'linkCopiedToClipboard': 'Link copied to clipboard!', 'ok': 'Ok', 'postLink': 'Post Link'}}, {'name': 'template', 'data': {'name': 'custom', 'localizedName': 'Custom', 'isResponsive': false, 'isAlternateRendering': false, 'isCustom': true}}, {'name': 'view', 'data': {'classic': {'name': 'classic', 'url': '?view\x3dclassic'}, 'flipcard': {'name': 'flipcard', 'url': '?view\x3dflipcard'}, 'magazine': {'name': 'magazine', 'url': '?view\x3dmagazine'}, 'mosaic': {'name': 'mosaic', 'url': '?view\x3dmosaic'}, 'sidebar': {'name': 'sidebar', 'url': '?view\x3dsidebar'}, 'snapshot': {'name': 'snapshot', 'url': '?view\x3dsnapshot'}, 'timeslide': {'name': 'timeslide', 'url': '?view\x3dtimeslide'}, 'isMobile': false, 'title': 'Crawling and parsing web pages in javascript directly from your web browser', 'description': 'Introduction Developer tools that are built in all modern browsers are powerful tools in a skillful hands. In this post I will show you how...', 'featuredImage': 'https://lh6.googleusercontent.com/proxy/hnbiRmOhMkvKxOaMcCfreeYOi05X1TsLKsazoCXNywFHJmxSWbSlCRLQSXaNMz-VEGmtcDZ8KA', 'url': 'http://blog.cinu.pl/2013/09/crawling-and-parsing-web-pages-in.html', 'type': 'item', 'isSingleItem': true, 'isMultipleItems': false, 'isError': false, 'isPage': false, 'isPost': true, 'isHomepage': false, 'isArchive': false, 'isLabelSearch': false, 'postId': 3661101566250322549}}]); _WidgetManager._RegisterWidget('_NavbarView', new _WidgetInfo('Navbar1', 'navbar', document.getElementById('Navbar1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_HeaderView', new _WidgetInfo('Header1', 'header', document.getElementById('Header1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_BlogView', new _WidgetInfo('Blog1', 'main', document.getElementById('Blog1'), {'cmtInteractionsEnabled': false, 'lightboxEnabled': true, 'lightboxModuleUrl': 'https://www.blogger.com/static/v1/jsbin/3898540639-lbx.js', 'lightboxCssUrl': 'https://www.blogger.com/static/v1/v-css/368954415-lightbox_bundle.css'}, 'displayModeFull')); _WidgetManager._RegisterWidget('_BlogArchiveView', new _WidgetInfo('BlogArchive1', 'sidebar-right-1', document.getElementById('BlogArchive1'), {'languageDirection': 'ltr', 'loadingMessage': 'Loading\x26hellip;'}, 'displayModeFull')); _WidgetManager._RegisterWidget('_AttributionView', new _WidgetInfo('Attribution1', 'footer-3', document.getElementById('Attribution1'), {}, 'displayModeFull'));
SLink, Shorten Links
adwithus2.png