Java/JDBC/PGSQL Mailing List Archiver
Java/JDBC/PGSQL Mailing List Archiver
От:
Tim Perdue <tim_perdue@yahoo.com>
Дата:
Sounds interesting. As a C++ programmer, you should have no trouble at all with Java.
I highly recommend IBM VisualAge for Java. It checks all syntax and forces you to be correct. And only $100.
I found an NNTP java class, but I don't know if it's any good or not. I guess SUN has a sun.net.nntp package out there too. (I went to Excite and did a search for java nntp)
I'm using JDBC to drop everything into postgres and it's really pretty slick.
My current table structure looks something like this:
fld_mailing_list text, /* use int and join to other table??? */
fld_date Char(14), /* 19990101010101 I'm afraid I don't understand Postgres's implementation of timestamps*/
fld_subject text,
fld_is_followup int, /* for threading purposes */
fld_from text,
fld_body text,
I didn't put in a unique key because I never plan on having to update the message (it's static). This might be a mistake.
Let me clean up what I have tonite and I'll send it your way to play around with tomorrow or saturday. I have attached the nttp.java class that I found and you can take a gander at that.
I'm not a pgsql genius, so you may have good advice on the datestamps, lobs, etc. Your idea of using multiple records with "chunk number" is really interesting. I'm just concerned about the complexity of that (I was trying to slap this together in 1-2 days). My plan at this point is to truncate messages over 8K.
Keep in touch.
Tim
---Peter Garner wrote:
>
> Hi Tim,
>
> I might be interested in
> collaborating as I am presently writing
> an offline reader that gets NNTP news
> and puts it in postgres. (I have already
> done this in C++ but I want to port it
> to Java.) If nothing else, you are
> welcome to the C++ code. Word of
> warning, I have been doing C++ for 11
> years but this is my first attempt at
> Java! :-) Another word of warning, :-)
> I am an itinerant consultant and I tend
> to get called off to work in the middle
> of personal projects like this! :-)
>
> Right now my C++ program uses LOBs. I
> think Herouz is right, however. She
> suggested splitting >8K fields into
> multiple text fields. E.g. you would
> have a table (for nntp news) like :
>
> create table MsgBodies
> (
> MsgId Text ,
> ChunkNumber int ,
> MsgBody Text ,
>
> primary key (MsgId , ChunkNumber)
> ) ;
>
> The majority of nntp msgs would just
> have one entry, but a few would be split
> into two or more entries.
==
______________________________________________________
Directricity.com - Get local!
http://directricity.com/
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com
/***************
NNTP Java Class
by Charles Bloom
quite nice
***************/
import java.io.*;
import java.net.*;
import java.lang.*;
class ArticleHeader
{
int number;
String subject;
String author;
String date;
int bytes,lines;
int seqNum,seqTot;
public void getSeq() /* from subject */
{
int slashI,fdelimI,ldelimI,a,b;
if ( (slashI = subject.lastIndexOf('/')) == -1 )
{ seqNum=seqTot=0; return; }
a = subject.lastIndexOf('(' /*)*/,slashI);
b = subject.lastIndexOf('[',slashI);
if ( a > b ) fdelimI = a; else fdelimI = b;
if ( fdelimI == -1 ) { seqNum=seqTot=0; return; }
a = subject.indexOf(/*(*/ ')',slashI);
b = subject.indexOf(']',slashI);
if ( a == -1 ) a = 999; if ( b == -1 ) b = 999;
if ( a < b ) ldelimI = a; else ldelimI = b;
if ( ldelimI == 999 ) { seqNum=seqTot=0; return; }
try
{
seqNum = clib.atoi(subject.substring(fdelimI+1,slashI));
seqTot = clib.atoi(subject.substring(slashI+1,ldelimI));
}
catch( NumberFormatException e ) { seqNum=seqTot=0; return; }
}
public String fileName() /* from subject */
{
int dotidx,fspace,lspace;
subject.replace('\t',' ');
if ( (dotidx = subject.lastIndexOf('.')) == -1 ) { dotidx = 0; fspace = 0; }
else { fspace = subject.lastIndexOf(' ',dotidx); fspace++; }
if ( (lspace = subject.indexOf(' ',dotidx)) == -1 ) lspace = subject.length();
return ( subject.substring(fspace,lspace) );
};
};
class NNTP_Client
{
public SimpleClientConnection net;
public long group_low=0,group_high=0; /* only valid right after goGroup */
boolean nntpConnected;
DataInputStream nntp_in;
PrintStream nntp_out;
PrintStream log_out;
String host,group_name;
public boolean reset() throws IOException
{
ArticleHeader none = new ArticleHeader();
none.number = 0;
return ( reset(none) );
};
public boolean reset(ArticleHeader last) throws IOException
{
disconnect();
NNTP_Connect();
if ( group_name != null )
{
if ( ! goGroup(group_name) ) return(false);
if ( ! goArticle(last.number) ) return(false);
}
return(true);
};
/* these progress indicators should be in a seperate module
so that they could be swapped out for GUI ones */
void progressUpdate(int cur,int tot)
{
System.err.print("nntp : " + cur + " / " + tot + "\r");
System.err.flush();
}
void progressDone() { System.err.println(); }
public byte[] getBody(ArticleHeader header) throws IOException
{
byte body[];
String instr;
nntp_out.println("body");
instr = nntp_in.readLine();
log_out.println(instr);
if ( instr.indexOf("body") == -1 )
throw new IOException("nntp.getBody:no article body");
{
byte tbody[];
int curoff=0,lines=0,c;
tbody = new byte[header.lines*80];
progressUpdate(lines,header.lines);
while ( lines < header.lines )
{
if ( (c = nntp_in.read()) == -1 ) break;
tbody[curoff++] = (byte) c;
if ( c == '\n' )
{
lines++;
if ( lines % 50 == 0 )
{
progressUpdate(lines,header.lines);
System.err.flush();
}
}
}
progressUpdate(lines,header.lines);
progressDone();
body = new byte[curoff];
System.arraycopy(tbody,0,body,0,curoff);
}
log_out.println( nntp_in.readLine() ); /* '.' line */
return( body );
};
public ArticleHeader getHeader() throws IOException
{
boolean goterror;
String instr,curstr;
ArticleHeader header = new ArticleHeader();
int next_pos;
do
{
goterror = false;
nntp_out.println("xover");
instr = nntp_in.readLine();
log_out.println(instr);
if ( instr.indexOf("data follows") == -1 )
throw new IOException("nntp.getHeader:no xover data");
instr = nntp_in.readLine();
log_out.println(instr);
/* now process instr and fill in header */
/* article number */
next_pos = instr.indexOf('\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
try { header.number = clib.atoi(curstr); }
catch ( NumberFormatException e ) { goterror = true; }
/* subject */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
header.subject = curstr;
/* author */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
header.author = curstr;
/* date */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
header.date = curstr;
/* message-id */
next_pos = instr.indexOf('\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
// ignore
/* references */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
// ignore
/* bytes */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
try { header.bytes = clib.atoi(curstr); }
catch ( NumberFormatException e ) { goterror = true; }
/* lines (last one) */
next_pos = instr.indexOf( '\t');
curstr = instr.substring(0,next_pos); instr = instr.substring(next_pos + 1);
try { header.lines = clib.atoi(curstr); }
catch ( NumberFormatException e ) { goterror = true; }
log_out.println( nntp_in.readLine() ); /* a line of just "." */
} while( goterror );
return(header);
};
public boolean goArticle(int number)
{
String retStr;
nntp_out.println("stat " + number);
try { log_out.println( retStr = nntp_in.readLine() ); }
catch ( IOException e ) { return(false); }
if ( retStr.indexOf("Bad") != -1 ) return(false);
return(true);
};
public boolean goGroup(String in_group_name)
{
String reply;
String[] replyToks;
group_low = group_high = 0;
group_name = in_group_name;
nntp_out.println("group " + group_name);
try { log_out.println( reply = nntp_in.readLine() ); }
catch ( IOException e ) { return(false); }
if ( reply.indexOf("No such group") != -1 )
{
log_out.println("NNTP_Client:Got:No such group");
return(false);
}
replyToks = clib.stringSpaceTok(reply);
if ( replyToks.length < 5 )
{
log_out.println("NNTP_Client:Got: less than 5 tokens in group header");
return(false);
}
group_low = clib.atol(replyToks[2]);
group_high = clib.atol(replyToks[3]);
return(true);
};
/* next : returns false when no more messages */
public boolean next() throws IOException
{
String statstr;
nntp_out.println("next");
statstr = nntp_in.readLine();
log_out.println(statstr);
if ( statstr.indexOf("retrieved") == -1 ) return(false);
return(true);
};
public boolean isConnected()
{
return( nntpConnected );
};
public void disconnect()
{
if ( nntpConnected ) nntp_out.println("quit");
nntpConnected = false;
net.disconnect();
};
public NNTP_Client(String in_host,PrintStream in_log_out) throws IOException
{
host = in_host;
log_out = in_log_out;
group_name = null;
nntpConnected = false;
NNTP_Connect();
}
public void NNTP_Connect() throws IOException
{
net = new SimpleClientConnection(host, 119); /* 119 is NNTP */
nntpConnected = net.isConnected();
if ( nntpConnected )
{
nntp_in = net.inputStream();
nntp_out = net.outputStream();
}
else
{
nntp_in = null; nntp_out= null;
}
/* init */
{
String incoming = null;
log_out.println("Waiting for 'ready'");
do
{
incoming = nntp_in.readLine();
log_out.println(incoming);
} while ( incoming.indexOf("ready") == -1 );
}
};
};
Re: Java/JDBC/PGSQL Mailing List Archiver
От:
Peter Garner <peter_garner@yahoo.com>
Дата:
Hi Tim! :-) > Sounds interesting. As a C++ programmer, you should >have no trouble at all with Java. Actually the trouble I am having is that they are so close that I am having trouble with subtle differences, hehehe. >I highly recommend IBM VisualAge for Java. It checks >all syntax and forces you to be correct. And only >$100. I have never been a big fan of IDEs. I use Visual Slick Edit for Linux. Also IBM drug tests their employees so I try to boycott them! :-) Although I have been told they no longer do this. Does VA run under Linux? >I'm not a pgsql genius, so you may have good advice on >the datestamps, lobs, etc. Your idea of using multiple >records with "chunk number" is really interesting. I'm What is the trouble with pgsql dates? BTW it was Herouth's idea, not mine! (Hey Herouth, sorry about miss-spelling your name in the previous message! ;-) Thanks for the java classes! What is the licensing on that source code? Can we LGPL it? == Peace, Peter We are Microsoft of Borg, you will be assimilated!!! Resistance is fut... ***BZZZRT*** THUD!!! [General Protection Fault in MSBorg32.DLL] Please contact the vendor for more information _________________________________________________________ DO YOU YAHOO!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: [SQL] Java/JDBC/PGSQL Mailing List Archiver
От:
Fabrice Scemama <fabrice.scemama@gesnet.net>
Дата:
This could be the beginning of a very nice GPL Project. I'd personally advocate using Perl to do this, since we have all necessary modules, all of whom are easy to use : Net::NNTP DBI DBD::Pg ... everything related to Emails and MIME and MD5 etc. I'd rather not split text messages in 8k parts! That's what BLOBS were designed for, and what would you do with attachments ? Just storing them as MIME encoded texts ? This would mean having to decode them a number of times, instead of which storing them as MIME encoded *and* binary files would increase the server performance while only wasting a few megs. We might consider setting up a small mailing-list to talk about such a project. Fabrice French Philosophical Forums http://www.gesnet.net/philo/ Tim Perdue wrote: > > Sounds interesting. As a C++ programmer, you should have no trouble at all with Java. > > I highly recommend IBM VisualAge for Java. It checks all syntax and forces you to be correct. And only $100. > > I found an NNTP java class, but I don't know if it's any good or not. I guess SUN has a sun.net.nntp package out there too. (I went to Excite and did a search for java nntp) > > I'm using JDBC to drop everything into postgres and it's really pretty slick. > > My current table structure looks something like this: > > fld_mailing_list text, /* use int and join to other table??? */ > > fld_date Char(14), /* 19990101010101 I'm afraid I don't understand Postgres's implementation of timestamps*/ > > fld_subject text, > fld_is_followup int, /* for threading purposes */ > fld_from text, > fld_body text, > > I didn't put in a unique key because I never plan on having to update the message (it's static). This might be a mistake. > > Let me clean up what I have tonite and I'll send it your way to play around with tomorrow or saturday. I have attached the nttp.java class that I found and you can take a gander at that. > > I'm not a pgsql genius, so you may have good advice on the datestamps, lobs, etc. Your idea of using multiple records with "chunk number" is really interesting. I'm just concerned about the complexity of that (I was trying to slap this together in 1-2 days). My plan at this point is to truncate messages over 8K. > > Keep in touch. > > Tim
Re: [SQL] Java/JDBC/PGSQL Mailing List Archiver
От:
Fabrice Scemama <fabrice.scemama@gesnet.net>
Дата:
I globally agree with your points. Thanks for taking time to write them for us. If 1, 2, and 3 cannot be considered as essential -- we still can backup the BLOBs, delete them, etc. with few programming skills, 4 is very important. So, as you put it, as far as Mailing List Archives are concerned, bodies should be split, and attachments BLOBed. Chavouah Tov! Fabrice Scemama Internet Developer too, but in France ;) Herouth Maoz wrote: > > At 2:05 +0200 on 22/1/99, Fabrice Scemama wrote: > > > I'd rather not split text messages in 8k parts! > > That's what BLOBS were designed for > > Just to give you a few off points on BLOBs: > > (1) They are not dumped with pg_dump, and you have to design your own > backup procedure for them, which you will be able to integrate later > with a dumped database. > > (2) You can't display them in a casual psql query. I usually use psql > when I want to check something in my database. Storing the text in > split records would allow you to view the content in psql. > > (3) Sometimes one needs to delete some data which was entered incorrectly > or as a result of a buggy implementation. If you want to delete a > large object you have to write a program. If you want to delete a > bunch of split text records, you only have to enter psql and issue > a DELETE statement in SQL. > > (4) You can't search on large objects. Not even an unindexed search. > You have to retrieve every large object, and perform the search in > the front end. If you had it in split text records, you would be > able to search with LIKE or regexp. > > Attachments are a different story. All of the above applies when we are > dealing with text processing. If the content is something other than text, > you don't need to search it, there is no point in viewing it in PSQL > because you need a proper viewer anyway, etc. > > So if you want to treat your attachments as binary objects, binary objects > they should be. > > Herouth > > -- > Herouth Maoz, Internet developer. > Open University of Israel - Telem project > http://telem.openu.ac.il/~herutma
Re: [SQL] Java/JDBC/PGSQL Mailing List Archiver
От:
Herouth Maoz <herouth@oumail.openu.ac.il>
Дата:
At 2:05 +0200 on 22/1/99, Fabrice Scemama wrote:
> I'd rather not split text messages in 8k parts!
> That's what BLOBS were designed for
Just to give you a few off points on BLOBs:
(1) They are not dumped with pg_dump, and you have to design your own
backup procedure for them, which you will be able to integrate later
with a dumped database.
(2) You can't display them in a casual psql query. I usually use psql
when I want to check something in my database. Storing the text in
split records would allow you to view the content in psql.
(3) Sometimes one needs to delete some data which was entered incorrectly
or as a result of a buggy implementation. If you want to delete a
large object you have to write a program. If you want to delete a
bunch of split text records, you only have to enter psql and issue
a DELETE statement in SQL.
(4) You can't search on large objects. Not even an unindexed search.
You have to retrieve every large object, and perform the search in
the front end. If you had it in split text records, you would be
able to search with LIKE or regexp.
Attachments are a different story. All of the above applies when we are
dealing with text processing. If the content is something other than text,
you don't need to search it, there is no point in viewing it in PSQL
because you need a proper viewer anyway, etc.
So if you want to treat your attachments as binary objects, binary objects
they should be.
Herouth
--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma