Tuesday, October 30, 2007

GoogleBot and AdSense MediaBot are now intertwingled?

I once made a silly mistake and coded a webpage (new location) to use HTTP GET instead of POST. This was silly for various reasons, and only sensible for two:

Silly

  1. Caused the URL to be ugly
  2. Caused the button id to go into the URI. Sigh. I'm sure I could have avoided that.
  3. Caused Google AdSense to want to check out every page just in case it is different, which can't be avoided like it can with other Google tools (analytics for example)

Sensible
  1. It allows examples to be encoded within links. Useful for wikipedia.
  2. It allows other people to mash-up the tool easily. Although it must be said anyone that can't code a POST but can code the parsing of the HTML would be an oddity, but GET is easier and avoids any state problems.
Suffice it to say the sensible reasons have been found post factum :-) My page now defaults to using POST but still has to support GET.

The point of this post is that Google AdSense used to crawl the GET URI after every use. Recently, and possible related to the Google PageRank changes, is that the plain GoogleBot is now crawling these URLs. About 20-30 in a batch (which may be all of them) between 1-10 seconds apart, very friendly like. The official AdSense MediaBot is still there, but much less frequent.

This makes sense, both were doing largely the same upfront job and linking them together provides a variety of benefits to the search side including knowing the URLs are actually in-use and finding different points of entry into the same site.

My evidence for this assertion is the above referenced change in behaviour, and that the bot used to have an agent of "Mediapartners-Google/2.1" (MediaBot) and now is just the standard "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" (GoogleBot). I don't have any proof that the AdSense-induced crawls are being used in the indexing yet, but I can't see a good argument why they wouldn't be grist for that mill.

Anyone else noticed this yet?

Update: I am correct

(back to 1place in a later post, I see Ben has dug out some more info and I'm overdue responding to the comments)

Wednesday, October 24, 2007

SaaS and trust: a NZ experience

(or Why I Should Have Used Xero/Cashboard)

I recently discovered 1place, via Ben or David. It was cool, AJAXy, NZ-based, cheap ($10 a month) and almost exactly what my wife needed for her business. The owners were receptive to changes and there was a good probability that changes required for a quote-and-invoice services business, as opposed to a sell-products or bill-by-the-hour business would be implemented. They had screenshots of the changes and everything. A virtuous cycle of suggestions and responses ensued.

They even did GST returns and a cashbook.

I was suitably enthused so I set everything up, trained my wife, and started creating invoices. It worked well even without the changes. It was promising to simplify our accounting and invoicing significantly without any of the internationalisation issues from other products.

Then one day they disappeared. Completely and without warning. We hadn't paid any money yet, but this was clearly in violation of their agreements. SaaS also means that your data is no longer available, and notably there was no easy extraction/backup mechanism from 1place.

They were gone from the internet for about a month (I checked periodically, so give or take a few weeks). We'd only done two invoices and we'd saved the PDFs, so it was a lucky escape. It could have been a lot worse.

1place have now reappeared. They did not respond to my contact form email asking them where they'd been and why I should continue to use them. I was polite but I guess they don't want to answer. I've not bothered calling them, if I didn't have the invoices already I'd be annoying them daily though. Their blog is noticeably silent.

Looking at the companies office details it looks like the guy I was talking to, Bevan, is now no longer a shareholder. That event and the website shutdown would appear to be related, but the lack of consistency, communication, and disclosure mean that I cannot trust them with our data ever again. Nor should you, IMO, without at least a way to get your data out at regular intervals.

I have waited a long time before writing this entry as I really wanted to give this product and company the benefit of the doubt and a chance to succeed; it fills the gap below Xero and is completely NZ focussed, and they were reacting to my requests. But this is a betrayal of trust, and this is something that I will look closely at before using any similar SaaS product (hey: it's a service that's a product) again.

Hopefully they will respond to this public review and you, dear reader, can make up your own mind. If nothing else this is a lesson to be learnt by SaaS providers and users. It can happen with off-the-shelf products too, but at least there is a lower probability that you'll be prevented from accessing your data on your own terms.

Bruce's new list of SaaS requirements:

  1. Decent price
  2. Decent service
  3. Decent uptime
  4. Decent way of getting the hell out of dodge (added).

I am disappointed to be in the position of chopping at a NZ poppy.

Sunday, October 14, 2007

Dear Lazyweb...

(in the style of jwz)

I want a new mouse and keyboard. A wireless mouse would be nice, but I don't want a wireless keyboard as I never move it and changing the batteries would be more of a hassle than the cable. The keyboard should be standard 10x keys, straight, and boring. Preferably quiet. Colour is unimportant.

There does not appear to be a wired keyboard with wireless mouse combo. Why not? They could put the wireless receiver in the keyboard and save space.

Am I the only one who thinks this way? Surely wireless keyboards have a limited market? Is this all a grand conspiracy by the AA battery manufacturers?

Thanks,
Bruce

Wednesday, October 3, 2007

3GPP CDR processing in Erlang

So I have made progress since my last post about this. I have a working 3GPP CDR decoder for Circuit Switched, Packet Switched and other miscellaneous CDRs.

In the process I broke my machine in the USA. Oops.

I left it compiling the ASN.1 one day and for some reason my definitions caused the Erlang ASN.1 compiler to loop. I left it like that for several days, and then my windows stopped responding.
It wouldn't remote reboot, respond to ping, or even prayers to . This is the point where I realised I'd not backed up anything on it for several weeks, including the CDR code. I really must subscribe to rsync.net

The story, dear reader, has a happy ending. One support ticket and a day or so later it was back as good as new and no more was said on the matter. I shall not complain that 100% CPU utilisation killed my box, and they did not complain about me breaking my budget server. Still, I should be upset, but for US$29 a month I can't really complain.

Back to 3GPP CDRs. I also have a version of the TAP (Transferred Accounts Procedure) specification. TAP is the CDR format used for international roaming CDR exchange, and is also in ASN.1 from version 3 onwards. There are amusing comments through the documentation about limiting sizes of files to avoid systems blowing up. Natch.

To their credit the GSMA have a specification that compiles cleanly first time, 3GPP could learn from that. The dependency hierarchy of the 3GPP CDR spec is horrifying and extends all the way back (and loops around) the earliest, ugliest, definitions from the ITU-T.

Anyway, I only need to decide what to do now, I'm thinking that this could expand to be a free 3GPP -> TAP service, possibly even including the RAP (Returned Accounts Procedure). At least it would provide an alternative to all the people charging for even a CDR editing tool. Sheesh.

First things first, I'm off to whip up the first page which should allow the upload of a CDR file and the specification of it's type (Circuit/Packet/etc) so it can be decoded. Not exciting but a prerequisite for the next AJAXy page that allows scrolling and drilldown into individual lineitems.

Wish me luck.

Simple stuff: usernames and passwords

(I can't find any reference to this on the web, but there must be something. Username aka display name aka login aka screen name [yurk])

I recently signed up for (yet another) web forum, went through the dance of getting the temporary password via email, and it noticed it didn't even tell me what my username should be. Very un-Web2.0.

I went back to the login page and it wants me email address; and so this rant begins.

Think of your users. Are they corporate users? Are they private, mostly ISP-based, users? So email address sounds like a reasonable key?

STOP THAT.


Email addresses are useful for sending email to people. Most of the time. Email addresses nominally have the following characteristics:

  1. They uniquely identify a person
  2. They don't change
  3. They can be remembered by the user concerned

Let's pull these apart:
1. "They uniquely identify a person"
Well no they don't. It can be a mailing list, it can be for a family, it might identify a person most of the time but ask yourself if this is sufficient. Even if they do uniquely identify the person, that person might not want their activity to be identified; so you're going to need a username anway.

The other perspective (thanks Jonny) is that you provide your email address on email correspondence and this starts to give information away that provides others access to websites; the access quiz changes from needing to know the username and password to only requiring the password.

2. "They don't change"
Well, yes they do. People want to be able to change ISP. Mergers and acquisitions happen. Domain names get accidentally lost. Most of the people reading this will have a static email address probably because they own their domain, but this does not apply to most of your users.

You can try and mitigate this problem, but the single principle is that you cannot easily prevent a user from claiming to be scott@randomcompanya.com [aside: currently unregistered, I checked!] if randomcompanya.com refuses their email/doesn't exist. So that account is dead, along with all the things that ties that customer to you, and you've just asked them rescan the market to see if they want to reregister...

The inverse of this is when scott leaves randomcompanya and they hire another scott. Hurrah. Lets just leave the key under the mat too shall we?

3. "They can be remember by the person concerned"
This may in fact be true for most people, but I'm betting most sensible people keep their business and personal email separate. Sensible people may figure in the minority. If you have more than one email address, or you own your domain, you have a snowballs chance in hell of remembering which one you used. Better request a new password and check the email headers :-)

So what should you do?

I'm glad you asked. Let me rant a little more.

User-definable usernames

Let the user select their own username. The best websites already do this, and don't constrain people to 8 characters, 12 characters, or anything less than half a page of text.
  • Do make it case insensitive but don't mash the case - InnOcenT should be permissible (think: Bobby is different from bobby? Do you really want that confusion) and not normalised otherwise they'll get offended that their StudlyCaps are lost.
  • Don't allow embedded spaces without thinking it through.
  • Do allow punctuation, at least to a point.
  • Possibly allow Unicode, but be careful of unnoticeable collisions (a lot of characters end up looking the same, allowing impersonation). This is a key point of i18n, which could well be the difference between you or your competitor getting market share in Asia.
A user might choose to enter their email address as their username, which is fine. To be clear I'm not saying don't collect and use email addresses, especially for lost password/lost username issues.

I recognise that lost username recovery typically does involve emailing the user, but it doesn't have to if you collect sufficient identifying information from the user. You have the flexibility.

I'm also not against allowing the use of email address as a synonym for your username on login forms, although it has some security implications.

OpenID
"OpenID is a decentralized single sign-on system" according to Wikipedia. I'm of two minds on this one, and I'm still chewing on this post. OpenID does allow user-selected usernames, but beyond that it doesn't provide trust, or really authentication (due to all the holes, OpenID 2.0 is better but not necessarily fixed), and I'd debate that you can provide identity without these things.

Okay, I'm finished. I think there are more perspectives on this one and I'm interested in hearing them. Paypal, for example, use email addresses. I think this is because that is where they started from; eBay sensibly changed I think, TradeMe changed post implementation as well.

Just wait until I get to passwords. It will be heretical.