Problem

If you are working on service which need to aggregate place names from different sources, like Google Places, Foursquare and local db for example, then you always have a problem not to display same place more then one time. Imagine you have Paolo Pizza at Backer Street 43 and all three data sources return it in search result, then you need to show only one from preferred data source.

Possible solution

Solution for me was a two step algorithm.

If

  1. Distance between places less then 200 meters
  2. Their names are similar

Then

       It is same Places

With 1 everything is clear as I have lat,long for each place and can calculate distance with simple Haversine algorithmus

    double EARTH_RADIUS = 6371
    /**
* Calculate distance based on Haversine algorithmus
* (thanks to http://www.movable-type.co.uk/scripts/latlong.html)
*
* @param latitudeFrom
* @param longitudeFrom
* @param latitudeTo
* @param longitudeTo
* @return distance in Kilometers
*/
    BigDecimal calculateDistance(BigDecimal latitudeFrom, BigDecimal longitudeFrom,
                                 BigDecimal latitudeTo, BigDecimal longitudeTo) {

        def dLat = Math.toRadians(latitudeFrom - latitudeTo)
        def dLon = Math.toRadians(longitudeFrom - longitudeTo)

        //a = sin²(Δlat/2) + cos(lat1).cos(lat2).sin²(Δlong/2)
        //distance = 2.EARTH_RADIUS.atan2(√a, √(1−a))
        def a = Math.pow(Math.sin(dLat / 2), 2) +
Math.cos(Math.toRadians(latitudeFrom)) *
Math.cos(Math.toRadians(latitudeTo)) * Math.pow(Math.sin(dLon / 2), 2)
        return 2 * EARTH_RADIUS * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a))

    }

The most interesting was to find the best way to compare place names. After some googling I stopped on three algorithms JaroWinkler, SmithWatermanGotoh, Soundex and lib which support all of them – simmetrics. I made a test to find a best algorithm for my needs

/**
* Made test of three word comparison algorithms:
* JaroWinkler, SmithWatermanGotoh and Soundex
*
* Based on test result Soundex is chosen to be used in venue title merging algorithm
* It is more usable, so that I can say if similarity > 0.98 then is same word
*/
    void testMergeAlgorithms() {
        def cafeName1 = "Antonio Cafe"
        def cafeName2 = "Antonio's Cafe"
        def cafeName3 = "Antonio Kafe"
        def cafeName4 = "Antonio hotel"
        def cafeName5 = "Mary Coffee"
        def algorithm = new JaroWinkler()
        assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 0.9809524f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 0.9777778f
        assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 0.96031743f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.92564106f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.55707073f

        algorithm = new SmithWatermanGotoh()
        assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 0.9f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 0.8666667f
        assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 0.76666665f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.7f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.3272727f

        algorithm = new Soundex()
        assertTrue algorithm.getSimilarity(cafeName1, cafeName2) == 1.0f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName3) == 1.0f
        assertTrue algorithm.getSimilarity(cafeName3, cafeName2) == 1.0f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName4) == 0.9444444f
        assertTrue algorithm.getSimilarity(cafeName1, cafeName5) == 0.5555556f
        assertTrue algorithm.getSimilarity(cafeName2, cafeName5) == 0.5555556f
    }

Just to explain: I want an easy way to see that Antonio Cafe is same to Antonio’s Cafe and same to Antonio Kafe but differs from Antonio hotel and from Mary Coffee.

Soundex algorithm appeared the most appropriate for my needs. And in code I use boundary 0.98 when comparing two words

import uk.ac.shef.wit.simmetrics.similaritymetrics.Soundex


boolean isSamePlace = false
Soundex soundexAlgorithm = new Soundex()
if(distanceBetweenVenues(place1, place2) < 200 && soundexAlgorithm.getSimilarity(placeName1, placeName2) > 0.98) {
    isSamePlace = true
}

That’s our way of solving such a problem. If you have any other experience with similar problems, please post it in comment.

 

Task

Need to show two selectboxes to User:

  • Country
  • City

City select box should change its data based on Country

Possible solutions

Online Webservice

That was me fist idea: just find needed (possibly free!) API Service and use it from grails project. But after googling a lot I did not find any solution which allow to retrieve all cities of country in one API call. At all I did not find any API doing this.

The only API which looks similar was Geonames but it do not return cities of country but have children associations, like USA -> States -> Counties -> Cities. You need to call :

  1. http://ws.geonames.org/countryInfoJSON?country=US&formatted=true then take geonameId
  2. and execute http://api.geonames.org/childrenJSON?geonameId=6252001&username=demo&formatted=true
  3. doing step 2 until you get cities

This is not acceptable if I want to have only two selectboxes – Country and City

Database of Cities

Next solution was to take existing dataset from anywhere in any format and put it to database.

The most interesting free databases are:

  1. http://www.geodatasource.com/world-cities-database/free - It contain 2 million cities in text format, but file which contain it has mapping like CC_FIPS -> City and required transforming it into CountryCode (ISO3166-2) -> City based on mapping CC_FIPS -> CountryCode
  2. http://www.maxmind.com/app/worldcities - huge database of cities in csv format with lat,long for each city. Was dismissed as it is really huge – 3 millions of cities. I did not want to display all Country cities as it is too much – only 10 – 200 most important based on country. Really cool data but not usable for me
  3. http://mydatamaster.com/free-downloads/  (World Cities and Languages) – contains about 4000 cities with district/region and population info in SQL format. That was my case!  Small reprocessiong of SQL to change ISO3166-3 country codes into ISO3166-2 and that is it!

Of course in this case we do not show all cities and risk that data will be old in some years, but honestly I do not think that this will be the case for any time as in our changing world sites are re-written each 3 years minimum and info will be updated then.

 

After migration to new PROD server start getting error from some inactivity period:

2012-04-11 14:27:29,415 [TP-Processor3] ERROR (org.hibernate.util.JDBCExceptionReporter) - Communications link failure due to underlying exception: 

** BEGIN NESTED EXCEPTION ** 

java.io.EOFException
MESSAGE: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.

STACKTRACE:

java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1997)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2411)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2916)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:3256)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1313)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1448)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
	at org.hibernate.loader.Loader.getResultSet(Loader.java:1808)
	at org.hibernate.loader.Loader.doQuery(Loader.java:697)

On previous server I’ve already had Broken Pipe error with MySQL and successfully solved it with http://sacharya.com/grails-dbcp-stale-connections/
But on new server (Red Hat Enterprise Linux) it failed.

After digging a lot I found http://stackoverflow.com/questions/2983248/com-mysql-jdbc-exceptions-jdbc4-communicationsexception-communications-link-fai and some other topics which helped.

Resulted config which fix problem is following:

  • On server in /etc/hosts.allow added ‘mysqld : ALL : ACCEPT’
  • On server in /etc/my.cf added ‘port=3306′
  • In DataSource.groovy added properties
  • In DataSource.groovy changed datadase URL to have IP (127.0.0.1) instead of localhost: ‘jdbc:mysql://127.0.0.1:3306/db_name’

That’s all!

 

Problem

Ofter we need to create RESTful API to READ data for existed domain objects with easy manipulation of fields in it. For example, we do not need to see in API result fields like ‘id’, ‘lastUpdated’, ‘createdBy’, collections, references and any other fields. Also we want to have easy way to change field name. For example, instead of ‘image’ you want ‘imageUrl’ because we need to put URL link there. And of course, we want to make different manipulations with fields, like put full URL or calculate whether shop is working now or anything else.

Solution

Of course, we can do it by simple code like

But when you have more then 20 fields in object and need to send complex structures, like send ‘address’ and ‘contact’ references but do not send ‘currentLocation’ and you have more then 10 of such objects – It became a nightmare! Support is even worst!

So, to solve this problem we create rather simple (at least when looking now on it) solution.
First define in UrlMapping something like:

So all READ API calls will be
‘/appName/api/users’ or ‘/appName/api/venues?format=json&max=10&offset=10′
In ‘ApiJsonRestController’ we need to get from domainInPlural real domain class name (through a simple mapping or complicated rule – up to you) and just send it to response:

In the result you will get list of domain instances with all fields – very Ugly.

Extend DomainClassMarshaller

Now is the most interesting part.
We need to override default Grails DomainClassMarshaller, so that during object rendering ours was used instead

It is a copy of DomainClassMarshaller with several modifications:

  • We control whether to define id for object by protected needToDefineId()
  • We control object id itself by protected getObjectIdentifier()
  • processAdditionalFields() allow to put any fields you want which are not in domain
  • We control fields to skip and alternative field name via protected methods getSkippedFields() and getAlternativeName(String originalName)
  • We controll how to display references to other domains and collections via processSpecificFields()

So for each object where we have specific fields to skip and alternative name or some rules to display fields we need to create an extended class:

You can try to play with it, but once it is done you have no any problem with support. New fields added automatically, if you change field name it will be changed everywhere: in places where domain object rendered itself and in embedded object.

And do not forget to register your Marshallers!!! We do it in BootStrap.groovy:

 

Problem

Need to generate QRCode and as many as possible BarCodes based on specific code.

Solution

After looking to several plugins like qrcode and barcode4j and after unsuccessful usage of them I’ve decided to do it myself and found out that it is easy enough.

Step 1 – Add necessary libs

After looking to several frameworks which allow generate BarCode for free I stopped on zxing as it support QRCode and several most used in the world BarCodes.
For the moment I used it version was 1.7 and supported BarCodes:

  • EAN_8
  • EAN_13
  • QR_CODE
  • UPC_A
  • CODE_39
  • CODE_128
  • ITF

Currently zxing 2.0 is already available, so probably that list of supported BarCodes extended.
Libs available in SonaType repository. I used two modules: core and javase.

Step 2 – Add Service to generate BarCodes

After libs are in place just create very simple Service like following:

Now you need to specify barcode format, data (number or text) and image size (width and height) and can easily render barcode from any taglib or controller like:

Add some AJAX if you want to reload BarCode on the fly…

 

Have multiple domain which connected with each other. Need to provide deep search functionality, so that need to search in Master domain’s 3-d and 4-th level children. For example we have structure like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Person {
  static hasMany = [clothes:Cloth]
...
}
 
class Cloth {
  Color color
...
}
 
class Color {
  String name
  ...
}

And we have several Persons with different sets of Clothes in different Colors.
We want to find all people who has red Cloth. It can be often the case that Same Person has several Clothes that are red.
So query like:

1
2
3
4
5
6
7
8
9
10
11
def color = params.color
Person.createCriteria().list{
  clothes {
    color {
      ilike('name', "%${color}%")
    }
  }
  maxResults(params.max as Integer)
  firstResult(params.offset as Integer)
  cache true  
}

Will return paged results but with several same people. So we need to use distict list to have correct results.
BUT!
Grails docs states that

The listDistinct() method does not work well with the pagination options maxResult and firstResult. If you need distinct results with pagination, we currently recommend that you use HQL.

And honestly I do not like HQL very much.
So after some time of playing wit Google, Hibernate and grails DSL, solution was found:

1
2
3
4
5
6
7
8
9
10
11
12
13
Person.createCriteria().list {
  projections {
    groupProperty('id')
  }
  clothes {
    color {
      ilike('name', "%${color}%")
    }
  }
  maxResults(params.max as Integer)
  firstResult(params.offset as Integer)
  cache true
}.collect {Person.get(it[0])}

Maybe a little ugly but resolve problem with pagination on distinct elems 100%.

 

For Ajax file uploading I’m using Andrew Valums AJAX file uploader.

On UI I added jQuery script like:

jQuery(function() {
     var uploader = new qq.FileUploaderBasic({
       button: document.getElementById('addButton'),
       action: "${createLink(controller: 'upload', action: 'uploadImage')}",
       multiple: true,
       allowedExtensions:['PNG', 'JPG', 'JPEG', 'GIF'],
       sizeLimit:4194304, // max 4GB file
       onSubmit:function(){jQuery('#spinner').show()},
       onComplete:function(id, fileName, responseJSON){
         reloadImage("urlHolder", responseJSON[0])
       }
     });
  });

Where reloadImage is just another AJAX call which should show loaded image based on location where it was uploaded.
And uploadImage action of UploadController looks like

import org.springframework.web.multipart.commons.CommonsMultipartFile
 
def uploadImage = {
        def fileName
        def inputStream
        if (params.qqfile instanceof CommonsMultipartFile) {
            fileName = params.qqfile?.originalFilename
            inputStream = params.qqfile.getInputStream()
        } else {
            fileName = params.qqfile
            inputStream = request.getInputStream()
        }
        //To avoid problems with spaces
        fileName = fileName.replaceAll(" ", "_")
 
        File storedFile = new File(_some_specfic_path_)
 
        storedFile.append(inputStream)
 
        def result = [fileName] as JSON
        render text: result.toString(), contentType: 'text/html'
    }

Main points here are

  • you should not use request.getFile('qqfile') as it is not working under Safari – need to work with inputStream.
  • contentType: ‘text/html’. If using correct ‘text/json’ not working under IE
 

Problem

We have grails project which is already running in PROD without using database migration/liquibase plugins.
Need to do changes in database structure without destroying of existing data.

Solution

Let’s call database states:

  • state A is current state of database
  • state B new state of database, which we want to get in result

First of all need to have locally old version of project which is working on databases A and also new version of project which is working on database B.
It can be same one just without changes in domain model.

First need to generate changelog for database A.

Going to old project structure and install plugin first of all

grails install-plugin database-migration

After need generate changelog (I prefer groovy file)

grails dbm-generate-changelog changelog.groovy

and change Config.groovy to load data with changelog always

grails.plugin.databasemigration.updateOnStart = true
grails.plugin.databasemigration.updateOnStartFileNames = ['changelog.groovy']

But database is already exists and we do not want to re-create it from scratch neither on local install nor in PROD. So we use

grails dbm-changelog-sync-sql

After this we get SQL commands to execute on local and PROD db.

Migrate database from state A to state B

Now we can make all changes we need on our domain structure. After this run command

grails dbm-gorm-diff addition.groovy

Then put data from addition.groovy to changelog.groovy (I prefer just copy/paste to the end of file and remove addition.groovy after this)
And simple grails run-app your local app and create/deploy war for PROD (do not forget to execute SQL from previous step first!)

Hope this helps someone…

© 2012 REID Consulting Suffusion theme by Sayontan Sinha