![]() |
![]() |
Introduction
Search Maestro has been developed by Leo Galambos at Charles University, Prague. You can visit Leo's site for a free download, technical specifications, source code, and to offer help if you would like to get involved in this project. Technical support is free.Leo is making the software available free of charge showing that the Unix tradition of free software is extended to other operating systems too. Search Maestro runs on a number of different versions of OS/2 Warp.
Summary
Search Maestro is a client/server search engine which can be used to index documents on PCs or Network Servers. It can be accessed from OS/2 based PCs via its own client software. However, its main implementation is as a WWW based search engine. It features a robot and a full suite of cgi programs to enable queries over the Web. The cgi programs can be fully customized as is the case in this site, or they can be used more or less as supplied.
Platforms
- OS/2 Warp Server vs.4.
- OS/2 Warp Server Advanced.
- OS/2 Warp Server Advanced SMP
- Search Maestro can run on a PC with up to 16 Processors
- OS/2 Warp vs.4.
- OS/2 Warp vs.3, Connect etc.
Instances & Threads
Search Maestro, like OS/2 is multithreaded. Each thread can solve one request. Requests can be to Index, to Search or to get Statistical Information. In the case of SOMIS:The operating system on which SOMIS runs, OS/2 Warp Server, has around 3 to 4000 of threads still free to be used, remains to be seen what happens if Maestro takes up like a MiniVista.
- Instance 001 - Index of SOMIS only, 20 threads allocated.
- Instance 002 - Index of University Minutes, 20 threads allocated [Intranet documents].
- Instance 003 - Index of all the WWW servers of the University, 20 threads allocated.
- Instance 004 - Index of The English Periodical, 5 threads allocated.
- Instance S001 - Index of all Scottish Universities, 20 threads allocated.
Maestro's Robot
Leo has developed a polite robot which obbeys the rules specified in the robots.txt file. The rules it follows are explained by Martijn Koster.The robot runs on a single thread so it does not flood a server with multiple concurrent requests. However, many instances of the robot can be run at the same time. Thus, we could be indexing many servers concurrently. The present version of the robot can update live indexes without any significant drop in the performance of Search Maestro. In effect just another thread is being used. After the first complete indexing, the robot only downloads the header information of a file. If that shows that the file has changed since the last index then it doenloads the complete file. For our purposes it is only allowed to access the domain of only one server, thus in order to index many servers we run many concurrent on sequential instances of the program.
The current setup of the robot does not allow for indexing of documents larger than 200Kb. This limit has been set by us as some of our larger documents approach that size. If it is left unlimited then it can download very large and probalby bad files. Further, a very large document is likely to have an extensive vocabulary therefore it will be likely to hit by most queries.
Finally the present release of the robot can understand different codepages and therefore index documents of different languages.