Hi, first i hope this is the right place to ask this,
second: i have this problem:
Basically i'm storing a graph-datastructure in a database.
So there is Node-Table (tblNode) which just has an id and a discriminator-type column;
the Node-Table can be seen as a base class, then there are some derived classes of node, and finally there is a table for the edges edges.
here in pseudo code my tables:
...Inheritance(strategy = InheritanceType.JOINED)
...DiscriminatorColumn(name = "node_type_id",
discriminatorType = DiscriminatorType.INTEGER)
NodeBase { // the base class for all nodes
INT id
INT node_type_id
}
...DiscriminatorValue("2")
SomeNodeDerived {
INT node_id // references a [b]single[/b] id of table NodeBase (each derived class MUST have a corresponding base class)
VARCHAR(255) name
... possibly other fields
}
Edge { // edges just link nodes together
INT id
INT node_id_from // source node
INT node_id_to//target node
}
Now all i want to do is to find out all successors of a certain (set) of source nodes, identified by certain conditions (e.g. WHERE name = 'some_name')
here is the ejb-ql statment:
(target_class and source_class are entity-classes)
String stmt = "SELECT t FROM "
+target_class.getSimpleName() + " t "
+ "WHERE t.id IN ("
+"SELECT e.nodeIdTo FROM "
+ source_class.getSimpleName() + " s, "
+"IN(s.outgoingEdges) e "
+"WHERE "
+ condition_part
+ "e.nodeIdFrom = s.id "
+ ")"
Ok, this works concerning functionality - but it the performance is inacceptable (it takes more than 20 seconds to execute the query, while the sql statement takes less then a hundreth of a second)
I took a look at the generated SQL statement (BTW: how can one view the SQL statements that were generated from EJB-QL? i placed an error into the abstract schema (bad column name), so that i got an exception containing the sql statement)
Anyway, here is the generated statement, and the bold part causes all my troubles:
(the italic one is generated for the same reasons as the bold one, but it doesn't hurt the performance that bad)
tblNode_Source and tblNode_Target are derived "classes" of Node.
SELECT t0.id, t0.node_type_id, t1.node_id, t1.name
FROM tblNode t0, tblNode_Target t1
WHERE (t0.id IN
(SELECT DISTINCT t2.node_id_to
FROM tblNode_Source t4, tblNode t3, tblEdge t2
WHERE ((((t4.name = 'some_name') AND (t2.node_id_from = t3.id))
AND ((t4.node_id = t3.id))) AND (t2.node_id_from = t3.id) AND "[b](t3.node_type_id = 2)[/b]")))
AND (t1.node_id = t0.id) "[i]AND (t0.node_type_id = 15)[/i]"
Ooops, no bold and italic between code-tags - hope you can see it anyway.
without that condition, the statement exectues immediately (like 60ms), but with the part, it takes like 20 seconds. I tried creating indices in the database, but that didnt help much.
The point is: Since the query runs on the derived tables, and each id of a derived table corresponds to only a single id in the base table, which MUST have the discriminator value set to the discriminator-value of derived table from which we are retrieving (otherwise the data would be corrupted), these discriminator-column-conditions are redundant.
Is there a way to get rid of these?
(Ok, in fact, when ids of the base-table are just used to be joined with the ids of the derived classes, then this join is completely useless too, since of course for each id of a derived class there exists an entry with the same id in the base class - so if - as in my case - the base class just consists of a discriminator type and an id, then there are actually no queries to the base table neccessairy, since the id is the same as the id of the derived class, and the discriminator value can be retrievied by reflecting the derived entity class. - but still, those redundant conditions don't hurt my performance much)
Message was edited by:
raven_arkadon
Message was edited by:
raven_arkadon