The Egothor project was established in 1998 with a primary focus on the development of a novel search engine infrastructure. Over time, the project diversified its objectives to explore various subgoals. In 2012, a significant transition occurred as the project embarked on the development of an experimental database technology aimed at addressing limitations inherent in contemporary big data platforms. This technology, known as Egothor3 (Q!D project), emerged as a result of this endeavor.
This website serves as a platform to showcase not only the original Egothor search engine but also several associated subprojects.
Egothor addressed the challenge of index maintenance fragmentation and devised a universal stemmer capable of processing multiple languages. The project's latest iteration (v2) involved collaboration with students from the Faculty of Mathematics and Physics at Charles University in Prague. Their contributions included the implementation and testing of several noteworthy components, encompassing:
- Development of a new dynamization algorithm to facilitate rapid index updating
- Integration of transactional support (ACID)
- Implementation of plagiarism detection capabilities
- Incorporation of incremental update functionality
- Recognition of popular file formats such as HTML, PDF, PS, Microsoft's DOC, and XLS
- Utilization of an extended Boolean model adaptable as either the Vector or Boolean models
- Deployment of a universal stemmer capable of processing diverse languages
Would you like to ascertain the postal addresses of potential attackers or notable visitors to your website? While several solutions exist, the intricate nature of routing processes renders this challenge appealing to those inclined towards complex problems.
Galeo, an innovative solution, employs a novel algorithm to pinpoint the target IP address, yielding the most probable deployment areas.
This algorithm operates under the assumption of network heterogeneity, distinguishing it from others that rely on a constant data-flow speed of approximately 100km/msec. Users have the opportunity to track the complete calculation process, gaining insight into the algorithm's functionality. Additionally, Galeo possesses the capability to process various input datasets, including direct GPS coordinates of routers, potential router positions, and areas annotated by population density.
There exists a multitude of crawlers in the digital landscape, ranging from high-performance systems to those equipped with intriguing features for deep-web exploration. However, Bobo stands apart as a universal crawling architecture capable of accommodating any conceivable crawler type. Whether it's a deep-web crawler, a classic web crawler, a worm, or virtual entities, Bobo serves as the hosting platform for all such crawlers. It essentially functions as a virtual army in operation, offering unparalleled versatility and adaptability.
The
Bobo project originated as a classic distributed crawler designed for Egothor2. Through a novel implementation approach, it evolved into a versatile tool, earning the moniker 'devil's software'. Bobo possesses the capacity to support an unlimited number of virtual entities, each capable of executing various tasks on behalf of the user. These entities are adept at collecting or disseminating a wide array of information within the cyberspace.
The Bobo crawler is founded on a groundbreaking concept known as 'co-operating services'. This framework enables the exchange of any system component without necessitating shutdowns, restarts, or reloads. Furthermore, compatible components within the system can seamlessly process requests of another component in the event of failure or overhaul. It is important to note that not all applications require similar features. The essence lies in our pursuit to develop 'something completely different', rather than simply creating 'another middleware'.
J5m serves as an accessible replacement for JINI, offering tested reliability and features as demonstrated in its use as the underlying platform for distributed crawlers. It ensures production-ready stability and functionality by guaranteeing the following operations:
- Location of remote objects
- Communication with remote objects: RMI serves as the primary protocol for client-server communication, although it can be replaced by alternative protocols like JMS if necessary
- Optimization of remote calls: Profiling of client connections enables prioritized communication with faster server objects
- Loading and initialization of server objects: Programmers have the flexibility to define dependency trees among deployed server objects, facilitating orderly initialization with appropriate parameters
J5m remains a lightweight layer for distributed applications, distinct from a fully-fledged middleware with extensive features.