Wednesday, October 26, 2011

Google maps load markers into grid

A problem i recently encoutered was displaying more than 400 markers on a single map. Reading through the google maps api documentation i found the following post regarding this problem: http://code.google.com/apis/maps/.... The first solution mentioned is Grid-based Clustering which fits my needs. However the docs do not explain how to create such a grid. Here i explain what data is necessary to create a grid and load the markers into the grid.

First we need the boundaries of the current map:

var max_lng = map.getBounds().getNorthEast().lng();
var max_lat = map.getBounds().getNorthEast().lat();
var min_lng = map.getBounds().getSouthWest().lng();
var min_lat = map.getBounds().getSouthWest().lat();

These variables much be send to the page which returns the markers, this probably is already the case.

<cfquery name="GetMarkers" datasource="source">
SELECT id, lat, lng
FROM markers
(<cfif min_lat lt max_lat>lat <= #max_lat# and lat>= #min_lat#<cfelse>lat <= #max_lat# OR lat>= #min_lat#</cfif>) AND
(<cfif min_lng lt max_lng>lon <= #max_lng# and lon>= #min_lng#<cfelse>lon <= #max_lng# OR lon>= #min_lng#</cfif>)

<!--- Root of the number of grids: for example 10 means 100 grids --->
<cfset grid_count_root = 30 />
<cfset grid_min_lng = min_lng />
<cfset grid_min_lat = min_lat />
<!--- Array that contains the grid --->
<cfset grid_arr = arrayNew(2) />
<cfset grid_arr_lng = arrayNew(2) />
<cfset grid_arr_lat = arrayNew(2) />
<cfset grid_with_data = arrayNew(1) />
<cfset grid_lng_div = abs((min_lng-max_lng)/grid_count_root) />
<cfset grid_lat_div = abs((min_lat-max_lat)/grid_count_root) />

Now we are ready to create the grid:

<cfloop from=1 to=#grid_count_root# index="i">
 <cfset grid_arr_lng[i][1] = grid_min_lng />
 <cfset grid_arr_lng[i][2] = grid_min_lng + grid_lng_div />
 <cfset grid_arr_lat[i][1] = grid_min_lat />
 <cfset grid_arr_lat[i][2] = grid_min_lat + grid_lat_div />
 <cfset grid_min_lng = grid_min_lng + grid_lng_div />
 <cfset grid_min_lat = grid_min_lat + grid_lat_div />

<cfloop from=1 to=#arraylen(grid_arr_lng)# index="i">
 <cfloop from=1 to=#arraylen(grid_arr_lng)# index="j">
  <cfset cur_pos = arrayLen(grid_arr)+1 />
  <cfset grid_arr[cur_pos][1] = grid_arr_lng[i][1] />
  <cfset grid_arr[cur_pos][2] = grid_arr_lng[i][2] />
  <cfset grid_arr[cur_pos][3] = grid_arr_lat[j][1] />
  <cfset grid_arr[cur_pos][4] = grid_arr_lat[j][2] />
  <cfset grid_arr[cur_pos][5] = '' />
  <cfset grid_arr[cur_pos][6] = '' />
  <cfset grid_arr[cur_pos][7] = '' />
  <cfset grid_arr[cur_pos][8] = 0 />

Almost there now we load the markers into the grid

<cfloop from=1 to=#GetMarkers.recordcount# index="i">
 <cfset lng_grid_pos = #evaluate((GetMarkers.lng[i] -min_lng)/grid_lng_div)# />

 <cfif lng_grid_pos gt 0><cfset lng_grid_pos= #evaluate((ceiling(lng_grid_pos-1)*grid_count_root)+1)#>
 <cfelse><cfset lng_grid_pos = 1 /></cfif>

 <cfset lat_grid_pos = #evaluate((GetMarkers.lat[i] -min_lat)/grid_lat_div)# />

 <cfif lat_grid_pos gt 0><cfset lat_grid_pos= #evaluate((ceiling(lat_grid_pos-1)))#>
 <cfelse><cfset lat_grid_pos = 0 /></cfif>

 <cfset grid_pos = #evaluate(lng_grid_pos+lat_grid_pos)# />
 <cfif grid_arr[grid_pos][5] eq ''>
  <cfset grid_arr[grid_pos][5] = GetMarkers.id[i] />
  <cfset grid_arr[grid_pos][6] = GetMarkers.lat[i] />
  <cfset grid_arr[grid_pos][7] = GetMarkers.lng[i] />
  <cfset grid_with_data[arraylen(grid_with_data)+1] = grid_pos />
 <cfset grid_arr[grid_pos][8] = grid_arr[grid_pos][8]+1 />

Wednesday, June 22, 2011

Extracting text / reading text from a secured pdf in coldfusion

So you want to extract text from a pdf that has some form of security on it. For example the pdf has the following properties:

Author someone
CenterWindowOnScreen [empty string]
ChangingDocument Not Allowed
Commenting Not Allowed
ContentExtraction Not Allowed
CopyContent Not Allowed

You probably already figured out that this is not possible with the standard cfpdf tag introduced in coldfusion version 8. You will get the following error:

An error occurred during EXTRACTTEXT operation in .
Error: The password provided is either wrong or does not have sufficient permission to perform this action.

However it is possible. A coldfusion programmor called Raymond Camden has written a coldfusion component that bypasses the standard security restrictions on a pdf. This component is called pdfutil and you can find the source code here or if you like to read more about the author you can go here. The pdfutils components uses ddx to bypasses any security issues. The installation is very straightforward so it should not take you long before you can see the results yourself.

Wednesday, March 9, 2011

Building a coldfusion based spider - Crash course

So you are interested in building a spider in coldfusion. Well i will try to explain the basic principles of building your own spider. First we will have to access data on the web. Below a simple example of downloading data through the cfhttp tag.

<cfhttp method="get" timeout="30" redirect="no" getasbinary="yes" url="http://coldfusion9.blogger.com" charset="utf-8" userAgent="Mozilla/5.0 (Windows; U; Windows NT 6.1; nl; rv: Gecko/20110303 Firefox/3.6.15">
 <cfhttpparam type="header" name="Accept-Encoding" Value="no-compression"> 
 <cfhttpparam type="header" name="Cache-Control" value="no-cache"> 

You might have noticed that i set the getasbinary argument to yes. The reason for this is to be able to convert ISO-8859-1 (or any other character set) to utf-8. One (of the many) problems you will encouter is determining which character set the spidered page is using. In many cases the cfhttp tag won't return the proper character set. The code below finds the proper character set and if necessary converts the content to utf-8.

<cfif cfhttp.errorDetail eq ''>
 <cfset mycharset = "">
 <cfset mycharset = #cfhttp.Charset#>
 <cfset filec = #ToString(cfhttp.fileContent)#>
 <cfif len(mycharset) eq 0>
  <cfset mycharset = "ISO-8859-1">
  <cfset pat = "(?i)<META(.*)charset=\s*([^\s|^""|^']*)"> 
  <cfif #refindnocase(pat,filec)# gt 0>
   <cfset local_re = #refindnocase(pat,filec,1,true)#>
   <cfif local_re.len[3] gt 0>
    <cfset mycharset = #Mid(filec,local_re.pos[3],local_re.len[3])#>
    <cfset mycharset = #Mid(filec,local_re.pos[2],local_re.len[2])#>
    <cfset mycharset = trim(rereplacenocase(mycharset,'.*charset=([^>^' & "'" & '^"]+).*','\1','one'))>
 <cfif mycharset neq 'utf-8'>
   <cfset filec = CharsetEncode(cfhttp.fileContent, mycharset)>
   <cfset filec = #ToString(filec)#>
   <cfcatch type="any">
    <cfset filec = #ToString(cfhttp.fileContent)#>

Okay we have downloaded content from an url and converted this content to utf-8. The next step is applying patterns to the content. This is where the fun begins. As of coldfusion version 8 you can use the REMatch function to find and extract patterns. However i will advice you not to use this tag because is has some limitations. Use the following function instead:

<cfset objPattern = CreateObject("java","java.util.regex.Pattern").Compile('#yourpattern#') />
<cfset objMatcher = objPattern.Matcher(filec) />
<cfloop condition="objMatcher.Find()">

Now you have learned some basic principles about spidering and extracting data from an webpage. If you like this information and like to know more about building a web spider in coldfusion please leave a comment.

Saturday, January 15, 2011

Convert a relative url to an absolute url in coldfusion

To my surprise there is not a default function in coldfusion that converts relative urls to absolute ones. So i created one. This function relies heavily on regular expressions. If you are not familiar with regular expression i suggest you do not change any of the functions that use it (like rereplace and refind). So use the function simply input the relative url and the absolute url the relative url comes from. E.g.
Relative url = "/example.html"
Absolute url = "http://www.example.com/a/b/c/index.html"
Result = http://www.example.com/example.html

Example call function

Complete function
<cffunction name="regex_UrlRelToAbs">
 <cfargument name="RelativePath">
 <cfargument name="AbsolutePath">
 <cfset var tempstruct = structnew()>
 <cfset var AbsolutePath_arr = ''>

 <cfset RelativePath=ReplaceNoCase(RelativePath,"&apos;","'","ALL")>
 <cfset RelativePath=ReplaceNoCase(RelativePath,"&quot;","""","ALL")>
 <cfset RelativePath=ReplaceNoCase(RelativePath,"&lt;","<","ALL")>
 <cfset RelativePath=ReplaceNoCase(RelativePath,"&gt;",">","ALL")>
 <cfset RelativePath=ReplaceNoCase(RelativePath,"&amp;","&","ALL")>

 <cfif isvalid('url',RelativePath)><cfreturn RelativePath></cfif>
 <cfif refind('^\/\/',RelativePath)>
  <cfset RelativePath = rereplace(RelativePath,'^\/\/','','one')>
  <cfreturn "http://#RelativePath#">

 <cfset tempstruct.domain =  reReplace(AbsolutePath,"(^\w+://)([^\/:]+)[\w\W]*$","\1\2","one") />

 <cfset AbsolutePath_arr = #refind("^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^##?\s]+)(.+)?(##[\w\-]+)?$",AbsolutePath,1,true)#>
  <cfif AbsolutePath_arr.len[5] gt 0>
   <cfset tempstruct.folders =  #trim(mid(AbsolutePath,AbsolutePath_arr.pos[5],AbsolutePath_arr.len[5]))#>
   <cfset tempstruct.folders =  ''>
  <cfcatch type="any">
   <cfset tempstruct.folders =  ''>

 <cfif left(RelativePath,1) eq '/'>
  <cfreturn "#tempstruct.domain##RelativePath#">
 <cfelseif left(RelativePath,1) neq '.'>
  <cfreturn "#tempstruct.domain##tempstruct.folders##RelativePath#">
  <cfset regex_UrlProperties_oc = #arraylen(rematch('(\.\./|\./)',RelativePath))#>
  <cfset regex_UrlProperties_newF = ''>
  <cfloop from=1 to=#listlen(tempstruct.folders,'/')-regex_UrlProperties_oc# index="i"><cfset regex_UrlProperties_newF = "#regex_UrlProperties_newF##listgetAt(tempstruct.folders,i,'/')#/"></cfloop>
  <cfreturn "#tempstruct.domain#/#regex_UrlProperties_newF##rereplace(RelativePath,'(\.\./|\./)','','all')#">