Tuesday, July 29, 2008

Internet Explorer 7 URL limit

Here's the catch....

After much testing, it comes down to this. The tags aren't inserting into IE7 because the URL size limit in IE 7 is 2048 which is weird because I'm getting stuck around 1600. This is for Get methods in IE.

Monday, July 21, 2008

Accessing your webapps on your Mac OS X from your Parallels

If you're using a mac and running Parallels and want to test your web apps in IE without having to package it in a war and move it over to your Parallels side...

You will find this INCREDIBLY useful:

Andy Peatling's HOWTO access your webapps on Mac OS X from Parallels Windows XP

And if you're easily confused like I am...
DocumentRoot refers to your CATALINA_HOME/webapps/
ServerName is whatever you want it to be

With ED, I set it up locally as (for example) www.entitydescriber.local
so as opposed to using localhost

I can now do http://www.entitydescriber.local:8080/ED/manualURI

Of couse, all this configuration with IPs is only really useful if your IP doesn't change. If it does like with me. then you need to reconfigure when you get to work for your work IP and home for your home IP

Friday, July 18, 2008

Error Handling

Getting 504 Errors from our Virtuoso OpenLink SPARQL Endpoint at biomoby.elmonline.ca/sparql
And the Tomcat Manager's down.

So I've taken this opportunity to test out some error-handling in ED. Mostly just refining the E-mail function I have set up to e-mail to ed.developers@gmail.com any errors that may popup in the SPARQLTaggingServlet. Usually it comes down to an error in trying connect to the SPARQL Endpoint in which case I fire off an E-mail to the developers gmail with
-the Response Code
-tagging XML
-Time of Error
-Extra Information (Insert statement, HTML returned from the connection (usually contains some clue as to what happenned) and some information about where the error occurred

Put in a variable and some if statements to activate and deactivate the use of the SPARQL Endpoint in case it goes down, but we still want ED to work with just ED Database on arch. In addition to this, I've decided that any errors codes returned from the SPARQLTaggingServlets or Timeouts while submitting - I'll just let it go through to connotea anyways and hide the fact that there was an error from the user.

1) The tags were submitted to ED Database via Jena already
2) If there was an error in the SPARQLTaggingServlet - it has been e-mailed to the developer's email, recorded and I will know about it
3) People can still keep using ED even if the SPARQL Endpoint goes down

which seems to be happenning a lot lately


The web methods for the ED API have been committed, but not yet deployed.

Interesting thing I ran into when trying to do an HTTP Get on google book search using Java's URLConnection.

Google adheres to a slightly different API from other web resources when you try to connect to it.

Found out about it here:

The suggestion in this thread works for google and searchmash (which is a cool little google API that returns results as JSON to you)

This suggestions works for google book search:

URL size Limit

In an earlier post, I mentioned that one of the issues of why the ed_connotea javascript would not call the SPARQLTaggingServlet was due to the size of the parameters. The xml being attached as a parameter was too long (as it turns out, it was because the xml was being duplicated and appended to itself - effectively increasing the size)

Here's some specs I found for different browsers and parameter size limit:
Site's a little old. but the main point is that there IS a size limit and that it might be better to find a different way of submitting the xml other than as a parameter, unless we're guaranteed the xml will always be under the size limit of all the browsers we intend to support and the server of which we're using.

The site where I found the specs

Internet Explorer:
Firefox at least: 100,000
Safari at least: 80,000
Apache WebServer 4,000

Will need to do some testing myself to see what the ACTUAL limits are.

Sunday, July 13, 2008

Why there was an unexpected limit on the number of tags you could add to ED

I didn't realize this until I started making a progress bar to ED as per Ben's suggestion. Originally, I suspected ED was slower with the submiting of Tags because I added in that extra step of adding tags to the Virtuoso Server with SPARQL. As it turns out, some of the tags were being repeated, so the SPARQLTagManager was adding a tag 2 or 3 times!

I traced this problem back to the xml the ed_connotea javascript was feeding it. What's worse, if you have too many tags with too many types associated with them, then the xml would get to be too big and it can't be passed as a parameter when we call the SPARQLTaggingServlet.

Firefox doesn't state that there is a limit on the size of the parameter.
But there is one for IE.

The trouble with the xml was that every time I called save_tag_action() to make it (I didn't realize it was appending the tags on to the existing one so I'd end up with duplicate tags. It's called once for the TagManagerSaveTaggingServlet and again for the SPARQLTaggingServlet. To alleviate this problem, I call the save_tag_action() once, save the xml and re use it for both servlets.

I wonder if it would be prudent to keep the xml in a cookie and destroy it once we're done.

Friday, July 11, 2008

Ajax and ED

It's time for more Ajax...

Here's the problem I'm running into with ED. When the user hits the submit button. The javascript calls a Servlet which composes the SPARQL/Update Insert statements and then submits them to the Virtuoso Server.

I have to break up the Insert statements into an Insert for Tagging information (taggedBy, taggedResource, taggedOn, etc) and an Insert for each Tag in case it doesn't already exist. The Tag Insert can be a small Insert statement or a big one depending on how many types it has. The more types it has, the bigger it is because I have to create the Type in case it doesn't already exist.

So for each insert I'm doing a Post to the Virtuoso Server awaiting reply before I continue and send off the next one.

Thus, with the more tags there are, the more Posts I'm making, the longer it takes, especially if it takes a while for the information to go across the wire and back. (What if someone's tagging in Africa!). I COULD spawn several threads to do a submission for me for the tags to avoid the "stop and wait" protocol I'm using right now...


as Ben suggested. I can use Ajax!

Question is. How do I use Ajax? Well here's what I'm thinking right now...
Ajax let's me talk to the Server and tell it to do a function without having to reload a page.

When someone adds a new tag. I'll use Ajax to submit that tag (since a tag is independent of a user until you make a connection using the "associatedTag" property) to Virtuoso. I'll leave the associatedTag property for later when the user ACTUALLY hits submit.

If the tag already exists, *shrug... user doesn't know.
If the user deletes the tag... the tag will still be added, but it won't be associated with the actual tagging, since it won't be included in the submit.

Trouble: What if a post to the Virtuoso Server fails? I guess I'd have to check for that...

Emailing Errors!

I can see this being advantageous to ED in the long run.
I've gone and implemented an EDemailer class that uses Javamail to send off emails.
Whenever I get an error in either my SPARQLTagManager or SPARQLTaggingServlet, the EDemailer's send function is called and an email is fired off to an account I set up. "ed.developers@gmail.com"

The information I pass on is the GMT time it happenned (to conform with what is being recorded in the database), the xml that was passed into the SPARQLTaggingServlet, the response code (if it gets that far) and the SPARQL Insert statement (if it gets that far).

I'm using "smtp.gmail.com" as the SMTP host (Port 465). You have to authenticate with them with a gmail account in the code or else you can't use it. Since I'm using JavaMail I had to get the activation.jar and mail.jar. Supposedly J2EE comes with it, but that's a lie. So I had to get it from sun's site. On top of it. If you put activation.jar and mail.jar in your lib under Web-INF, it won't work! Those jars NEED to be in the Tomcat's lib folder. AND ONLY in Tomcat's lib folder. If you don't do this, you will make your mailer program sad!

If you're going to be using a smtp host that's running on your localhost. you'll need some other configuration for that, that involves Tomcat's server.xml and context.xml.

Haven't tried that out yet, so I won't make any claims.

But at least this way, when things go wrong. (and assuming I'm checking the ed.developers gmail account (forward it to my normal later)) I will be informed when things go wrong.

Still need to deploy this on arch.uwindsor of course.... Hopefully if I put the extra jars in it's lib it won't complain....

Monday, July 7, 2008

Malformed/reserved Characters

Connotea doesn't like apostrophes (') and backslahes(\). Actually, I've successfully posted tags with apostrophes in them, but it just doesn't like it when ED does it.

As a fix, we've gone and modified the javascript to scan for the Semantic Tags for these characters and remove them prior to submission to Connotea.

Delay time in submitting to Connotea

Now that we've got ED submitting tagging information to the sandbox graph of the Vrituoso server, in addition to submitting it the ED Database, I've had to put in a delay to prevent the Form from submitting until all the information has been submitted to the sandbox and confirmed that it was successful.

Trouble is... this can take a while since I need to break up a submission into a SPARQL insert statement for every tag since the number types for a tag are variable. making an insert statement potentially very very long.

And if you have A LOT of tags. The wait for a response can get even longer.

I set it at 20 seconds before it times out and an error page is presented to the user.

Before I was able to add up to 6 or 7 tags between 2-4 ftypes each and it would be under 10 seconds. but just recently Ben showed me an error that he got on Entity Describer, that went past the 20 second mark.

I'll have to test the capacity of tags + ftypes I can submit before I have say it's taking too long, but if need be I can analyze how long a tag insert is going to be and start merging them together if they're short enough. Seems a waste to do a 1 tag 1 ftype insert for a whole hTTPPost.