Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Front-Running Detection #239

Closed
evgenydmitriev opened this issue Sep 2, 2021 · 9 comments
Closed

Front-Running Detection #239

evgenydmitriev opened this issue Sep 2, 2021 · 9 comments

Comments

@evgenydmitriev
Copy link
Contributor

evgenydmitriev commented Sep 2, 2021

Front-running is trading stock or any other financial asset by a broker who has inside knowledge of a future transaction that is about to affect its price substantially. A broker may also front-run based on insider knowledge that their firm is about to issue a buy or sell recommendation to clients that will almost certainly affect the price of an asset.

Investopedia

In the crypto industry, front-running is a common occurrence, with even the biggest trading venues involved. What often starts as an in-house bot to provide additional liquidity to a centralized exchange, sometimes turns in an illegal tool for to earn money by trading ahead of their own customers.

Please comment below with ideas on detecting front-running trades based on a stream of executed trades. More specifically, describe an algorithm you would use to create a metric capable of automatically flagging suspicious trades. Feel free to support your ideas by adding references, datasets, graphs, and code. Comments with the best ideas will be hidden to allow others to participate. Multiple submission awards are available.

Many of the previous challenge participants focused on investigative approaches that involved manual analysis of specific cases of front-running. This, however, is an engineering challenge, requiring successful submissions to include an algorithm, supported by references, datasets, graphs, and/or code. We have more than enough of ingenious ideas on how it can be done, but no solid plans of how to implement it using real-time streaming data.

@sherrisherry
Copy link

This detection logic only applies to the front-running operations of market makers. Assuming that large trade is rare, front-running can be detected by tracking large orders. A pair of metrics, the mode of the hourly percentage of consecutive buy orders and the mode of the hourly percentage of consecutive sell orders, are used as thresholds.

With an hourly window, partition the stream by the positions the market maker takes and then aggregate the stream to obtain the following metrics: quantity of buy, quantity of sell, percentage of consecutive buy orders, percentage of consecutive sell orders.

If for a single direction, the quantity is substantially high compared to the quantity of the opposite direction and the percentage of consecutive orders of that direction is higher than normal, there is potentially a front-running operation going on.

Once a stream is flagged for front-running, it is moved into a cache with its aggregated metrics. The following streams are aggregated and added to the cache until the percentages of consecutive orders of both directions decline to the norm in a stream. When examing the cache, the earlier orders should be dominant in one direction and the later orders in the other direction.

@evgenydmitriev evgenydmitriev added $300 and removed $100 labels Jan 4, 2022
@evgenydmitriev
Copy link
Contributor Author

In other words, are you asking the algorithm to determine, based on a WebSocket streams, whether centralized exchanges such as Binance are front running against their clients like BitMEX with its 'desk' in the Medium article?

It takes different forms, but you gave a great example.

@DonR428
Copy link

DonR428 commented Mar 3, 2022

From what I can tell, prevention of Front Running Trades seems as if most prevention is within the ability to analyze the Trades as well as any market maker trades. I think analyzing large trades if those trades are with or against the market, how many of those orders are coming from the same account. You want a code that first tags the accounts with large trades for and against the market, then you would want another filter for market makers, and frequency of trades. In the case of Bitmex, a clear example was provided of how data analytics can at the very least show any risk of a particular company. In regards to the previous, creating a dataset for these companies, policies that allude to foul-play, server issues during high trades, frequency of trades halted due to server issues, when do server issues occur in regards to volatile trades, and so on. Once the key points are nailed down, you can begin to build a key of factors that add risk to the company. I would add one last category that compiles how many flagged risk items there are. Once that is added you would then be able to filter all the companies with a certain amount of Risk items. From there, if no improvements or changes are recommended I would suggest not recommending high risk for front running trade companies, and continually add companies to the list. This way the dataset builds more and more with each company. As a sole person alone, the dataset could be complied and put up somewhere like tableau where data can be verified and added accordingly. This way I’d have more eyes on my work and suggestions to improve this dataset. The more accurate and more ideas are added the more front-running trades we can avoid. Thank you for your time, please provide feedback if possible. Thanks again!

@Nanobelka
Copy link

Usually there is a spread between buy and sell orders, the buy price is lower than the sell price.

  1. If the buy price is higher than the sell price, this is the first signal.
  2. A large buy order above the sell price (or sell below the buy price) is a strengthening signal.
  3. If there is a large buy order and a large sell order at the same time (and the buy price is higher than the sell price), one of the orders is probably a distraction.
  4. Sometimes splitting is used to hide large orders.
  5. To hide the real bid, many bids can be placed with the minimum possible number (minimum lot).
  6. The order is placed as close as possible to the start of trading.
  7. Sometimes over-the-counter transactions are carried out in this way at a price that is very different from the market. The order is placed a second before the start of trading.

@Kaesoron
Copy link

Kaesoron commented Apr 7, 2022

Front-running relies on a basic condition: clients of the trading platform must receive information with a significant delay. This will give the broker time to process the information and place orders on the exchange.
Most likely, technically, this will be implemented as a permanent time lag in providing market information to the clients of such a broker. So, the fact of front-running can be identified by:

  1. There is a significant (and constant) time lag in the provision of market information. This can be identified by comparing the broker's trading terminal with the terminal of another, reference one.
  2. If we track the moments of mass entry / exit from positions, we will see several accounts that are ahead of must of orders by several seconds. These may be lucky traders, but in the case of automatic trading, we will observe the same time intervals, amounts and accounts.
  3. If we have access to account data, we will observe an abnormally high percentage of successful transactions and trading with the maximum available leverage from the same accounts.
  4. Such trading will be intra-day, scalping, and not a long-term strategy. When the trend reverses, the corresponding order will be placed instantly, since it makes no sense to set up trading algorithms for long-term investment. Then we will watch how brokers bid against their clients - a sell order will be placed while other clients are still placing buy orders because of time-lag.
  5. In the most severe cases, we will observe the manipulation of market information - abnormal candlesticks that will lead to automatic stopping and closing the positions of the broker's clients. This can be found out by comparing short-term charts with other brokers.

@arielchernyy
Copy link

According to my understanding, the essence of front-running is when a market participant from some source learns some information that is likely to influence the market substantially (move market prices in some direction), and buys/sells the asset to benefit from the future price movements that occur once the information becomes widely available.

The data on stream of executed trades provided by Binance contains only information on price and volume of individual trades (probably Buyer/Seller order ID’s can be used to infer Buyers/Sellers, but I’m not sure), so we have to use only this information to decide on potential front-running cases. In other words, we need to find some anomalies in the flow of trades that can be considered to be front-running episodes.

New significant information becoming publicly available should lead to relatively large price movements (that’s one of the characteristics of “significance”). So, front-running is something that happens shortly before prices move relatively large.

To truly benefit from having yet-private information, front-runners deals should probably be larger than "normal" deals. So, we are looking for “unusually large deals happening shortly before prices move unusually large”.

In the code I define periods of unusually large price movements as a coincidence of to factors: relatively large on-off price move followed by a period of unchanged prices or prices going in the same direction (to distinguish between a sustained shift in prices and usual volatility). I use the following criteria: 1 period pct_change > some cutoff AND MA_pct_change over next several periods > some cutoff.

I use second-long intervals for BTCUSD, but the code can easily be extended for any time domain.

Then, for relatively large trades, I look for trade volumes within several seconds (2 in the code, but can be modified) immediately before this unusual price move begins, that are larger than some cutoff

In the code (which is just an example using only 10 minutes of data) for cutoffs I use different percentiles of the relevant data (e.g. 0.95 or 0.975 percentiles).
If a much longer, say, weeks of months of data, time series were available, and if we had a set of confirmed front-runners deals, all the hyper-parameters could be fine-tuned (via grid search or genetic algorithm of some kind) to find optimal values resulting in the largest share of front-runners’ deals detected

https://github.com/arielchernyy/FrontRunning/blob/daff45d5a754c7037e19a3582819a64b3c1d93c3/front_running_ArielChernyy.ipynb

@evgenydmitriev
Copy link
Contributor Author

The challenge is still open. Many of the challenge participants focused on investigative approaches that involved manual analysis of specific cases of front-running. This, however, is an engineering challenge, requiring successful submissions to include an algorithm, supported by references, datasets, graphs, and/or code. We have more than enough of ingenious ideas on how it can be done, but no solid plans of how to implement it using real-time streaming data.

@mdrbase
Copy link

mdrbase commented Sep 25, 2022

Im sure people are already doing this but if anyone is so inclined im sure they could create a "honeypot" to either identify and track or potentially deceive people who are attempting.

@evgenydmitriev evgenydmitriev transferred this issue from 1712n/challenge Sep 5, 2023
@marina-chibizova
Copy link
Contributor

Big thanks to all challenge participants! We've honed our methodology & metrics with the help of a brilliant team member hired from the challenge program.
Now, we're on the lookout for any valuable contributions to our Market Manipulation Wiki (in a form of market manipulation report, documentation, fixes, metrics suggestions, etc) - check out the latest bounty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants