The web methods for the ED API have been committed, but not yet deployed.
Interesting thing I ran into when trying to do an HTTP Get on google book search using Java's URLConnection.
Google adheres to a slightly different API from other web resources when you try to connect to it.
Found out about it here:
The suggestion in this thread works for google and searchmash (which is a cool little google API that returns results as JSON to you)
http://forum.java.sun.com/thread.jspa?threadID=5140278&messageID=9515635
and
This suggestions works for google book search:
http://forum.java.sun.com/thread.jspa?messageID=1033250
Friday, July 18, 2008
URL size Limit
In an earlier post, I mentioned that one of the issues of why the ed_connotea javascript would not call the SPARQLTaggingServlet was due to the size of the parameters. The xml being attached as a parameter was too long (as it turns out, it was because the xml was being duplicated and appended to itself - effectively increasing the size)
Here's some specs I found for different browsers and parameter size limit:
Site's a little old. but the main point is that there IS a size limit and that it might be better to find a different way of submitting the xml other than as a parameter, unless we're guaranteed the xml will always be under the size limit of all the browsers we intend to support and the server of which we're using.
The site where I found the specs
Internet Explorer: 2,083
Firefox at least: 100,000
Safari at least: 80,000
Apache WebServer 4,000
Will need to do some testing myself to see what the ACTUAL limits are.
Here's some specs I found for different browsers and parameter size limit:
Site's a little old. but the main point is that there IS a size limit and that it might be better to find a different way of submitting the xml other than as a parameter, unless we're guaranteed the xml will always be under the size limit of all the browsers we intend to support and the server of which we're using.
The site where I found the specs
Internet Explorer: 2,083
Firefox at least: 100,000
Safari at least: 80,000
Apache WebServer 4,000
Will need to do some testing myself to see what the ACTUAL limits are.
Sunday, July 13, 2008
Why there was an unexpected limit on the number of tags you could add to ED
I didn't realize this until I started making a progress bar to ED as per Ben's suggestion. Originally, I suspected ED was slower with the submiting of Tags because I added in that extra step of adding tags to the Virtuoso Server with SPARQL. As it turns out, some of the tags were being repeated, so the SPARQLTagManager was adding a tag 2 or 3 times!
I traced this problem back to the xml the ed_connotea javascript was feeding it. What's worse, if you have too many tags with too many types associated with them, then the xml would get to be too big and it can't be passed as a parameter when we call the SPARQLTaggingServlet.
Firefox doesn't state that there is a limit on the size of the parameter.
But there is one for IE.
The trouble with the xml was that every time I called save_tag_action() to make it (I didn't realize it was appending the tags on to the existing one so I'd end up with duplicate tags. It's called once for the TagManagerSaveTaggingServlet and again for the SPARQLTaggingServlet. To alleviate this problem, I call the save_tag_action() once, save the xml and re use it for both servlets.
I wonder if it would be prudent to keep the xml in a cookie and destroy it once we're done.
I traced this problem back to the xml the ed_connotea javascript was feeding it. What's worse, if you have too many tags with too many types associated with them, then the xml would get to be too big and it can't be passed as a parameter when we call the SPARQLTaggingServlet.
Firefox doesn't state that there is a limit on the size of the parameter.
But there is one for IE.
The trouble with the xml was that every time I called save_tag_action() to make it (I didn't realize it was appending the tags on to the existing one so I'd end up with duplicate tags. It's called once for the TagManagerSaveTaggingServlet and again for the SPARQLTaggingServlet. To alleviate this problem, I call the save_tag_action() once, save the xml and re use it for both servlets.
I wonder if it would be prudent to keep the xml in a cookie and destroy it once we're done.
Friday, July 11, 2008
Ajax and ED
It's time for more Ajax...
Here's the problem I'm running into with ED. When the user hits the submit button. The javascript calls a Servlet which composes the SPARQL/Update Insert statements and then submits them to the Virtuoso Server.
I have to break up the Insert statements into an Insert for Tagging information (taggedBy, taggedResource, taggedOn, etc) and an Insert for each Tag in case it doesn't already exist. The Tag Insert can be a small Insert statement or a big one depending on how many types it has. The more types it has, the bigger it is because I have to create the Type in case it doesn't already exist.
So for each insert I'm doing a Post to the Virtuoso Server awaiting reply before I continue and send off the next one.
Thus, with the more tags there are, the more Posts I'm making, the longer it takes, especially if it takes a while for the information to go across the wire and back. (What if someone's tagging in Africa!). I COULD spawn several threads to do a submission for me for the tags to avoid the "stop and wait" protocol I'm using right now...
OR
as Ben suggested. I can use Ajax!
Question is. How do I use Ajax? Well here's what I'm thinking right now...
Ajax let's me talk to the Server and tell it to do a function without having to reload a page.
When someone adds a new tag. I'll use Ajax to submit that tag (since a tag is independent of a user until you make a connection using the "associatedTag" property) to Virtuoso. I'll leave the associatedTag property for later when the user ACTUALLY hits submit.
If the tag already exists, *shrug... user doesn't know.
If the user deletes the tag... the tag will still be added, but it won't be associated with the actual tagging, since it won't be included in the submit.
Trouble: What if a post to the Virtuoso Server fails? I guess I'd have to check for that...
Here's the problem I'm running into with ED. When the user hits the submit button. The javascript calls a Servlet which composes the SPARQL/Update Insert statements and then submits them to the Virtuoso Server.
I have to break up the Insert statements into an Insert for Tagging information (taggedBy, taggedResource, taggedOn, etc) and an Insert for each Tag in case it doesn't already exist. The Tag Insert can be a small Insert statement or a big one depending on how many types it has. The more types it has, the bigger it is because I have to create the Type in case it doesn't already exist.
So for each insert I'm doing a Post to the Virtuoso Server awaiting reply before I continue and send off the next one.
Thus, with the more tags there are, the more Posts I'm making, the longer it takes, especially if it takes a while for the information to go across the wire and back. (What if someone's tagging in Africa!). I COULD spawn several threads to do a submission for me for the tags to avoid the "stop and wait" protocol I'm using right now...
OR
as Ben suggested. I can use Ajax!
Question is. How do I use Ajax? Well here's what I'm thinking right now...
Ajax let's me talk to the Server and tell it to do a function without having to reload a page.
When someone adds a new tag. I'll use Ajax to submit that tag (since a tag is independent of a user until you make a connection using the "associatedTag" property) to Virtuoso. I'll leave the associatedTag property for later when the user ACTUALLY hits submit.
If the tag already exists, *shrug... user doesn't know.
If the user deletes the tag... the tag will still be added, but it won't be associated with the actual tagging, since it won't be included in the submit.
Trouble: What if a post to the Virtuoso Server fails? I guess I'd have to check for that...
Emailing Errors!
I can see this being advantageous to ED in the long run.
I've gone and implemented an EDemailer class that uses Javamail to send off emails.
Whenever I get an error in either my SPARQLTagManager or SPARQLTaggingServlet, the EDemailer's send function is called and an email is fired off to an account I set up. "ed.developers@gmail.com"
The information I pass on is the GMT time it happenned (to conform with what is being recorded in the database), the xml that was passed into the SPARQLTaggingServlet, the response code (if it gets that far) and the SPARQL Insert statement (if it gets that far).
I'm using "smtp.gmail.com" as the SMTP host (Port 465). You have to authenticate with them with a gmail account in the code or else you can't use it. Since I'm using JavaMail I had to get the activation.jar and mail.jar. Supposedly J2EE comes with it, but that's a lie. So I had to get it from sun's site. On top of it. If you put activation.jar and mail.jar in your lib under Web-INF, it won't work! Those jars NEED to be in the Tomcat's lib folder. AND ONLY in Tomcat's lib folder. If you don't do this, you will make your mailer program sad!
If you're going to be using a smtp host that's running on your localhost. you'll need some other configuration for that, that involves Tomcat's server.xml and context.xml.
Haven't tried that out yet, so I won't make any claims.
But at least this way, when things go wrong. (and assuming I'm checking the ed.developers gmail account (forward it to my normal later)) I will be informed when things go wrong.
Still need to deploy this on arch.uwindsor of course.... Hopefully if I put the extra jars in it's lib it won't complain....
I've gone and implemented an EDemailer class that uses Javamail to send off emails.
Whenever I get an error in either my SPARQLTagManager or SPARQLTaggingServlet, the EDemailer's send function is called and an email is fired off to an account I set up. "ed.developers@gmail.com"
The information I pass on is the GMT time it happenned (to conform with what is being recorded in the database), the xml that was passed into the SPARQLTaggingServlet, the response code (if it gets that far) and the SPARQL Insert statement (if it gets that far).
I'm using "smtp.gmail.com" as the SMTP host (Port 465). You have to authenticate with them with a gmail account in the code or else you can't use it. Since I'm using JavaMail I had to get the activation.jar and mail.jar. Supposedly J2EE comes with it, but that's a lie. So I had to get it from sun's site. On top of it. If you put activation.jar and mail.jar in your lib under Web-INF, it won't work! Those jars NEED to be in the Tomcat's lib folder. AND ONLY in Tomcat's lib folder. If you don't do this, you will make your mailer program sad!
If you're going to be using a smtp host that's running on your localhost. you'll need some other configuration for that, that involves Tomcat's server.xml and context.xml.
Haven't tried that out yet, so I won't make any claims.
But at least this way, when things go wrong. (and assuming I'm checking the ed.developers gmail account (forward it to my normal later)) I will be informed when things go wrong.
Still need to deploy this on arch.uwindsor of course.... Hopefully if I put the extra jars in it's lib it won't complain....
Monday, July 7, 2008
Malformed/reserved Characters
Connotea doesn't like apostrophes (') and backslahes(\). Actually, I've successfully posted tags with apostrophes in them, but it just doesn't like it when ED does it.
As a fix, we've gone and modified the javascript to scan for the Semantic Tags for these characters and remove them prior to submission to Connotea.
As a fix, we've gone and modified the javascript to scan for the Semantic Tags for these characters and remove them prior to submission to Connotea.
Delay time in submitting to Connotea
Now that we've got ED submitting tagging information to the sandbox graph of the Vrituoso server, in addition to submitting it the ED Database, I've had to put in a delay to prevent the Form from submitting until all the information has been submitted to the sandbox and confirmed that it was successful.
Trouble is... this can take a while since I need to break up a submission into a SPARQL insert statement for every tag since the number types for a tag are variable. making an insert statement potentially very very long.
And if you have A LOT of tags. The wait for a response can get even longer.
I set it at 20 seconds before it times out and an error page is presented to the user.
Before I was able to add up to 6 or 7 tags between 2-4 ftypes each and it would be under 10 seconds. but just recently Ben showed me an error that he got on Entity Describer, that went past the 20 second mark.
I'll have to test the capacity of tags + ftypes I can submit before I have say it's taking too long, but if need be I can analyze how long a tag insert is going to be and start merging them together if they're short enough. Seems a waste to do a 1 tag 1 ftype insert for a whole hTTPPost.
Trouble is... this can take a while since I need to break up a submission into a SPARQL insert statement for every tag since the number types for a tag are variable. making an insert statement potentially very very long.
And if you have A LOT of tags. The wait for a response can get even longer.
I set it at 20 seconds before it times out and an error page is presented to the user.
Before I was able to add up to 6 or 7 tags between 2-4 ftypes each and it would be under 10 seconds. but just recently Ben showed me an error that he got on Entity Describer, that went past the 20 second mark.
I'll have to test the capacity of tags + ftypes I can submit before I have say it's taking too long, but if need be I can analyze how long a tag insert is going to be and start merging them together if they're short enough. Seems a waste to do a 1 tag 1 ftype insert for a whole hTTPPost.
Labels:
delay,
ED,
Entity Describer,
HTTPPost,
Insert Statements,
SPARQL,
SPARQL/Update
Subscribe to:
Posts (Atom)